try another color:
try another fontsize: 60% 70% 80% 90%

How to Dedupe Items for a Unique List

A common problem when dealing with lists is deduping or removing duplicate items. For a completely unique list there are a few ways to accomplish this in Python.

The Fastest Way to Dedupe

>>> yourList = list(set(yourList))

All we're doing is leveraging Python's built in functions: set() and list(). The 'set()' function will convert your list into type set; and by definition, sets only have unique entries, so this will automatically remove any duplicate items in your original list.

Since you probably still want a list, we convert the set back to a list with the 'list()' function. The list function simply overrides the set type, with the list type.

Preserving Order

Sometimes you'll want to preserve the order of a list. Since sets need to be hashable they may or may not preserve the original order of your list. To solve for this we write a slightly longer script

>>> yourList = [0,1,2,2,3,5,5,5,7,9,9]
>>> uniqueList = []
>>> for value in yourList:
>>> .... if value not in uniqueList:
>>> ........ uniqueList.append(value)

>>>uniqueList 
[0,1,2,3,5,7,9]

Here we create a new container (uniqueList) after we've processed the original list (yourList). Then we use a for loop to go through every value in the original list (yourList). If we haven't seen the value before, we add it to the new list (uniqueList). If we've seen it before, we disregard it and move to the next value in the original list (yourList).

In the end, you're left with a completely unique deduped list in the same order the original list was in.

Comments

And here's about how to make it fast

http://www.peterbe.com/plog/uniqifiers-benchmark

Martin

The second approach is not very efficient. It would be more efficient to add a uniqueSet variable and use that for the 'not in' test.

Itertools

There's a recipe in the stdlib's itertool's documentation to do this: http://docs.python.org/library/itertools.html#recipes (look at the unique_everseen() function at the end).

Masklinn

The second approach is not very efficient. It would be more efficient to add a uniqueSet variable and use that for the 'not in' test.

Though for small enough lists, the overhead of creating the set will be superior to the cost of walking the list on every in.

This is definitely the kind of cases which requires testing with production data in order to know which solution to select.

a two liner (not for very large sequences)

def uniqfy(seq):
"""Returns List Without Duplicates Preserving the Original Order

Removes the second and following duplicates in a sequence without
altering the original order (in contrast to the builtin set type
where ordering is not defined).

@param seq sequence that my contain duplicates or not
@return list without duplicates preserving the original order

Usage
-----

>>> uniqfy([1, 2, 1, 1, 2, 3])
[1, 2, 3]

"""
uniq = set(seq)
return [item for item in seq if item in uniq and not uniq.remove(item)]

Great suggestions

Thanks for all the links and comments! I'll definitely be utilizing some of the suggestions when I think about making unique lists in the future!

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • Textual smileys will be replaced with graphical ones.

More information about formatting options