I have two lists that I want to join on a condition. Unlike in a relational algebra join, if more than one element of each list can be matched, only one should be selected, and shouldn’t then be reused. Also, if any of the elements of the first list don’t match any of the second list, the process should fail.
Example:
# inputs list1 = [{'amount': 124, 'name': 'john'}, {'amount': 456, 'name': 'jack'}, {'amount': 456, 'name': 'jill'}, {'amount': 666, 'name': 'manuel'}] list2 = [{'amount': 124, 'color': 'red'}, {'amount': 456, 'color': 'yellow'}, {'amount': 456, 'color': 'on fire'}, {'amount': 666, 'color': 'purple'}] keyfunc = lambda e: e['amount'] # expected result [({'amount': 124, 'name': 'john'}, {'amount': 124, 'color': 'red'}), ({'amount': 456, 'name': 'jack'}, {'amount': 456, 'color': 'yellow'}), ({'amount': 456, 'name': 'jill'}, {'amount': 456, 'color': 'on fire'}), ({'amount': 666, 'name': 'manuel'}, {'amount': 666, 'color': 'purple'})]
I’ve written a working implementation in Python, but it seems clunky, unclear and inefficient:
def match(al, bl, key): bl = list(bl) for a in al: found = False for i, b in enumerate(bl): if key(a) == key(b): found = True yield (a, b) del bl[i] break if not found: raise Exception("Not found: %s" % a) result = list(match(list1, list2, key=keyfunc)