How to remove dublicates items from a list (its no important if the element of the list is capital or small [see it the same])

How to remove dublicates items from a list (its no important if the element of the list is capital or small [see it the same])

Hello, @hasanmoghrabi5, and welcome to Python Software Foundation Discourse!

When you describe a programming task or algorithm, it is important for the description to be clear and unambiguous. In the following list, how should we decide which items among the duplicates to keep?:

words = ['spam', 'EGGS', 'SPAM', 'Ham', 'Spam', 'SPAM', 'eggs', 'Cabbage', 'HAM']

Should we keep the first or the final occurrence of duplicate items, or should another criterion govern which one we keep? Also, should we alter the original list, or instead create a new one with the duplicates removed?

def dedup(input_list):
    """
    Deduplicate input_list.

    Keep the first duplicate.
    Preserve the original order.
    O(n) time.
    """
    seen_set = set()
    output_list = []
    for element in input_list:
        lowered = element.lower()
        if lowered in seen_set:
            continue
        else:
            seen_set.add(lowered)
            output_list.append(element)

    return output_list

HTH.

Please don’t do this.

The problem is clearly homework or a tutorial exercise, and the OP will
only benefit from solving the problem themselves.

We usually try to:

  • elicit from the OP what part of the problem is giving them trouble
  • example code they’re tried, and its failure output
  • point out bugs and suggest better approaches

Just posting a code snippet doesn’t really help them improve.

To reuse a tired metaphor, you’re giving the OP a fish instead of
teaching them to fish.

Cheers,
Cameron Simpson cs@cskk.id.au

1 Like

That sounds dangerously like a “royal we”.

Shrug. Ut isn’t intended that way. - CameronI

No, it is we the community don’t approve of doing people’s homework for them.

If it is homework, students can get into serious academic trouble for plagarism.

It’s sometimes difficult to tell what’s formal homework, what’s self-directed learning, and what’s just some coder asking for help, but the safest tactic is, if in doubt, err on the sake of caution. We (the community) can always come back with a more extensive answer later on.

1 Like

@Hasanmoghrabi5, you will learn best by trying to develop your own solution to this problem. If you decide to incorporate any of the techniques from the offered solution or from any other source into yours, make sure you fully understand whatever techniques you use. For example, why do you think a set, rather than a list, was used to keep track of the items that have been encountered in the original list, and why were those items added to the set in lowercase, rather than always in their original form?

See Set Types — set, frozenset.

Note that the above reference states:

A set object is an unordered collection of distinct hashable objects.

What is the significance of the fact that the contained objects must be hashable?

It is better to use the str.casefold() method rather than str.lower(), as it better handles non-ASCII characters which should compare equal.

It’s still not perfect for non-English strings, e.g. it doesn’t work for Turkish, but casefold is better.

@hasanmoghrabi5, please don’t assume that the solution that was posted here is optimal, even if you discover that it works. You can view it initially with the perspective that it might or might not be a good approach.

You can develop your own expertise on the topic by performing searches to find out how others have approached similar tasks. Study a diversity of solutions, make sure you understand each of them, then write your own code to solve the problem.

Depending upon the circumstances under which you are learning, is possible that someone may ask you to demonstrate that you have developed your own solution to this exercise. Be able to explain all the code that you write. You are welcome and encouraged to post your code here as you develop it, which will enable us to offer advice.

Forgive me, but you seem to be implying that I’m not part of “the” community.

comp.lang.python used to be so helpful. Python’s a great language, but in significant part it was that spirit of volunteering ideas and solutions that attracted me to it.

thank you very much bro , the problem is that im new on the world of programming and did not learn all the things about python ,(for example we dont learn the “set”) , from the other hand i did it but remain a little problem on my code, because of this i want to compare my code to others code for compare it … , thank you for your attention , and you seem really worried about me…

doesn’t work for me

Using a set is not the only way to keep track of the items that have been encountered in the original list. It was used in the example because it is an efficient way to do it.

The best thing to do at this time is for you to post the code that you have thus far, so we can offer effective help. Please read the following before you post the code:

After reading the material at the above links, you will know how to post code in the proper format.

ok ,thank you my friend

Rather than post screen shots or other images of code, please copy your code as text, then paste it into your post. After that, format the code by placing lines (fences) of three back ticks before and after the code, as follows:

```
# This is example code.
print("Hello, World!")
```

Note that there is also a </> button at the top of the editing window that will format code, after that code has been selected.

The code you posted returns a new list rather than the original. But be careful when modifying a list as you iterate through it. Removing items from a list as you iterate from its beginning to its end can cause problems due to shifting of the indexes or positions of items as the iteration approaches them. Have you learned about list slicing? You can iterate through a list in reverse, safely removing the item at the end of the list if it is duplicated anywhere in the preceding slice of the list that begins at index 0.

Remember to use the str.casefold() method or the str.lower() method as you perform comparisons, otherwise, your result will contain what you consider to be duplicates that differ in case (capital and small letters).

As an alternative to modifying the original list during the iteration, you can assemble the result in a new list, then afterwards, replace the contents of the original list with the contents of the new list.

Completing this exercise without using a set may be inefficient, but it can be done successfully.

Feel free to post new code, formatted of course, as you progress through this exercise.

Thank you bro

1 Like