18

Given a Python list, I want to remove consecutive 'duplicates'. The duplicate value however is a attribute of the list item (In this example, the tuple's first element).

Input:

[(1, 'a'), (2, 'b'), (2, 'b'), (2, 'c'), (3, 'd'), (2, 'e')]

Desired Output:

[(1, 'a'), (2, 'b'), (3, 'd'), (2, 'e')]

Cannot use set or dict, because order is important.

Cannot use list comprehension [x for x in somelist if not determine(x)], because the check depends on predecessor.

What I want is something like:

mylist = [...]

for i in range(len(mylist)):
    if mylist[i-1].attr == mylist[i].attr:
        mylist.remove(i)

What is the preferred way to solve this in Python?

16

You can use itertools.groupby (demonstration with more data):

from itertools import groupby
from operator import itemgetter

data = [(1, 'a'), (2, 'a'), (2, 'b'), (3, 'a'), (4, 'a'), (2, 'a'), (2, 'a'), (3, 'a'), (3, 'a')]

[next(group) for key, group in groupby(data, key=itemgetter(0))]

Output:

[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (2, 'a'), (3, 'a')]

For completeness, an iterative approach based on other answers:

result = []

for first, second in zip(data, data[1:]):
    if first[0] != second[0]:
        result.append(first)

result

Output:

[(1, 'a'), (2, 'b'), (3, 'a'), (4, 'a'), (2, 'a')]

Note that this keeps the last duplicate, instead of the first.

  • You don't need any key parameter, just take the key of each group – yatu Apr 17 at 9:05
  • @yatu The question says "the duplicate value is an attribute of the list", which means that that wouldn't work if (2, 'a') and (2, 'b') are considered equal. – gmds Apr 17 at 9:06
  • I see yes in that case indeed it makes sense @gmds. Hard to tell however with this example. IMO if tht was what OP meant a more general example would make more sense – yatu Apr 17 at 9:09
12

In order to remove consecutive duplicates, you could use itertools.groupby:

l = [(1, 'a'), (2, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]
from itertools import groupby
[tuple(k) for k, _ in groupby(l)]
# [(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]
7

If I am not mistaken, you only need to lookup the last value.

test = [(1, 'a'), (2, 'a'), (2, 'a'), (3, 'a'), (4, 'a'),(3, 'a'),(4,"a"),(4,"a")]

result = []

for i in test:
    if result and i[0] == result[-1][0]: #edited since OP considers (1,"a") and (1,"b") as duplicate
    #if result and i == result[-1]:
        continue
    else:
        result.append(i)

print (result)

Output:

[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (3, 'a'), (4, 'a')]
2

If you just want to stick to list comprehension, you can use something like this:

>>> li = [(1, 'a'), (2, 'a'), (2, 'a'), (3, 'a'), (2, 'a')]
>>> [li[i] for i in range(len(li)) if not i or li[i] != li[i-1]]
[(1, 'a'), (2, 'a'), (3, 'a'), (2, 'a')]

Please not that not i is the pythonic way of writing i == 0.

2

You could also use enumerate and a list comprehension:

>>> data = [(1, 'a'), (2, 'b'), (2, 'b'), (2, 'c'), (3, 'd'), (2, 'e')]
>>> [v for ix, v in enumerate(data) if not ix or v[0] != data[ix-1][0]]
[(1, 'a'), (2, 'b'), (3, 'd'), (2, 'e')]
  • Nice one, because no need for any imports. Also v[0] can be replaced by any v.get_attribute(), which makes it quite universal. – Sparkofska Apr 18 at 5:59
1

I'd change Henry Yik's proposal a little bit, making it a bit simpler. Not sure if I am missing something.

inputList = [(1, 'a'), (2, 'a'), (2, 'a'), (3, 'a'), (2, 'a')]
outputList = []
lastItem = None

for item in inputList:
    if not item == lastItem:
        outputList.append(item)
        lastItem = item
print(outputList)
1

You can easily zip the list with itself. Every element, except the first one, is zipped with its predecessor:

>>> L = [(1, 'a'), (2, 'b'), (2, 'b'), (2, 'c'), (3, 'd'), (2, 'e')]
>>> list(zip(L[1:], L))
[((2, 'b'), (1, 'a')), ((2, 'b'), (2, 'b')), ((2, 'c'), (2, 'b')), ((3, 'd'), (2, 'c')), ((2, 'e'), (3, 'd'))]

The first element is always part of the result, and then you filter the pairs on the condition and return the first element:

>>> [L[0]]+[e for e, f in zip(L[1:], L) if e[0]!=f[0]]
[(1, 'a'), (2, 'b'), (3, 'd'), (2, 'e')]
1

It's somewhat overkill but you can use 'reduce',too:

from functools import reduce
data=[(1, 'a'), (2, 'b'), (2, 'b'), (2, 'c'), (3, 'd'), (2, 'e')]                                                    
reduce(lambda rslt,t: rslt if rslt[-1][0]==t[0] else rslt+[t], data, [data[0]])                                      
[(1, 'a'), (2, 'b'), (3, 'd'), (2, 'e')]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.