Comparing lists and returning only items that are present in a pre-vetted list - diagnosing incorrect results?

I’m picking up data from a field in a sqlite table that has been stuffed with one or more phrases delimited by a ‘\\’, turning that string into a list of items split by the delimiter. A function call compares that list with a predefined list and returns a sorted list containing only the items that match entries in the predefined list. To do this I’ve coded the following::

def vetted_items():
    ''' return a tuple of vetted items '''
    return(("Bolt", "Bolt Threader", "Bolt Cutter", "Ball Bearing"))


def dedupe_and_sort(input_string, delimiter=r'\\'):
    ''' get a list items that contains a delimited string, dedupe and sort it and pass it back '''
    distinct_items = set(x.strip() for x in input_string.split(delimiter))
    return (delimiter.join(sorted(distinct_items)))


def get_permitted_items(source, target, delimiter=r'\\'):
    ''' function to return all items in source that do appear in target '''
    return [item for item in source if item in target]


def delimited_string_to_list(input_string, delimiter=r'\\'):
    ''' convert delimited string to list and pass it back '''
    return(input_string.split(delimiter))



def list_to_delimited_string(input_list, delimiter=r'\\'):
    ''' convert list to delimited string and pass it back '''
    return delimiter.join(map(str, input_list))

string = 'Bolt\\\\Nut\\\\Bolt Cutter\\\\Bolt, Cutter\\\\Ball-Bearing\\\\Nut\\\\Bolt; Cutter\\\\Bolt\\\\Ball Bearings'

string_to_list = delimited_string_to_list(string)


print(f"Vetted items..............: '{vetted_items()}'\n")

print(f"Incoming string from table: '{string}'")

print(f"Incoming string to list...: {string_to_list}")
string_to_list.sort()
print(f"Sorted incoming list......: {string_to_list}")

vetted_input = get_permitted_items(vetted_items(),dedupe_and_sort(string))
print(f"Cleansed incoming list:...: {vetted_input}")
print(f"Cleansed string...........: '{list_to_delimited_string(vetted_input)}'")

Running the above yields the following:

Vetted items..............: '('Bolt', 'Bolt Threader', 'Bolt Cutter', 'Ball Bearing')'

Incoming string from table: 'Bolt\\Nut\\Bolt Cutter\\Bolt, Cutter\\Ball-Bearing\\Nut\\Bolt; Cutter\\Bolt\\Ball Bearings'
Incoming string to list...: ['Bolt', 'Nut', 'Bolt Cutter', 'Bolt, Cutter', 'Ball-Bearing', 'Nut', 'Bolt; Cutter', 'Bolt', 'Ball Bearings']
Sorted incoming list......: ['Ball Bearings', 'Ball-Bearing', 'Bolt', 'Bolt', 'Bolt Cutter', 'Bolt, Cutter', 'Bolt; Cutter', 'Nut', 'Nut']
Cleansed incoming list:...: ['Bolt', 'Bolt Cutter', 'Ball Bearing']
Cleansed string...........: 'Bolt\\Bolt Cutter\\Ball Bearing'

Can you see any circumstance where the input string may contain characters that would cause an incorrect result?

Some feedback on the code.

return does not need ()

Not sure why you are using map(str,) you can return delimiter.join(input_list) as you know it is a set of strings.

If both source and target where set() then you could use set intersection.
For example return list(source & target)

Thanks for the feedback, I’ll incorporate it when I get to optimising my code, but for the moment I’m needing to figure out where my code has gone wrong as illustrated by the following example.

If I pass it any of the strings :

string = 'Bolt Ball Bearing' 
string = 'Bolt, Ball Bearing' 
string = 'Bolt; Ball Bearing' 
string = 'Bolt&  Ball Bearing' 
string = 'Bolt & Ball Bearing' 
string = 'Bolt Ball Bearing'

i.e. if there is a single item only, without delimiter, then:

print(f"Cleansed string...........: '{list_to_delimited_string(vetted_input)}'")

in every instance returns:

Cleansed string...........: 'Bolt\\Ball Bearing'

Which is incorrect given the string doesn’t match any element defined in the tuple?

I added debug prints to yhour code to see what was going on:

def vetted_items():
    ''' return a tuple of vetted items '''
    return(("Bolt", "Bolt Threader", "Bolt Cutter", "Ball Bearing"))


def dedupe_and_sort(input_string, delimiter=r'\\'):
    ''' get a list items that contains a delimited string, dedupe and sort it and pass it back '''
    print(f'QQQ dedupe_and_sort: input_string {input_string}')
    distinct_items = set(x.strip() for x in input_string.split(delimiter))
    print(f'QQQ dedupe_and_sort: distinct_items {distinct_items}')
    result = delimiter.join(sorted(distinct_items))
    print(f'QQQ dedupe_and_sort: result {result}')
    return result


def get_permitted_items(source, target, delimiter=r'\\'):
    ''' function to return all items in source that do appear in target '''
    print(f'QQQ get_permitted_items: source {source}')
    print(f'QQQ get_permitted_items: target {target}')
    return [item for item in source if item in target]


def delimited_string_to_list(input_string, delimiter=r'\\'):
    ''' convert delimited string to list and pass it back '''
    return(input_string.split(delimiter))



def list_to_delimited_string(input_list, delimiter=r'\\'):
    ''' convert list to delimited string and pass it back '''
    return delimiter.join(map(str, input_list))

string = 'Bolt\\\\Nut\\\\Bolt Cutter\\\\Bolt, Cutter\\\\Ball-Bearing\\\\Nut\\\\Bolt; Cutter\\\\Bolt\\\\Ball Bearings'
string = 'Bolt Ball Bearing'

string_to_list = delimited_string_to_list(string)
print(string_to_list)

print(f"Vetted items..............: '{vetted_items()}'\n")

print(f"Incoming string from table: '{string}'")

print(f"Incoming string to list...: {string_to_list}")
string_to_list.sort()
print(f"Sorted incoming list......: {string_to_list}")


vetted_input = get_permitted_items(vetted_items(),dedupe_and_sort(string))

print(f"Cleansed incoming list:...: {vetted_input}")
print(f"Cleansed string...........: '{list_to_delimited_string(vetted_input)}'")

And get this output:

% py  a.py
['Bolt Ball Bearing']
Vetted items..............: '('Bolt', 'Bolt Threader', 'Bolt Cutter', 'Ball Bearing')'

Incoming string from table: 'Bolt Ball Bearing'
Incoming string to list...: ['Bolt Ball Bearing']
Sorted incoming list......: ['Bolt Ball Bearing']
QQQ dedupe_and_sort: input_string Bolt Ball Bearing
QQQ dedupe_and_sort: distinct_items {'Bolt Ball Bearing'}
QQQ dedupe_and_sort: result Bolt Ball Bearing
QQQ get_permitted_items: source ('Bolt', 'Bolt Threader', 'Bolt Cutter', 'Ball Bearing')
QQQ get_permitted_items: target Bolt Ball Bearing
Cleansed incoming list:...: ['Bolt', 'Ball Bearing']
Cleansed string...........: 'Bolt\\Ball Bearing'

You can see that target is a string and I suspect that you want a set or a list of strings.
The if item in target in get_permitted_items() is looking for the item string contained in the target string. e.g 'Ball' in 'Bolt Ball Bearing' that is True.

Thanks, your feedback helped me find my error. Classic problem of blindly reusing code I wrote some time back for another purpose without clearly thinking through the problem I’m trying to solve.

I’ve rewritten get_permitted_items() as follows:

def get_permitted_list(source: list, target: tuple):
    ''' function to return all items in source that appear in target '''
    return sorted(set(source).intersection(target))

and made sure to pass a list rather than a string, so the call now looks as follows:

vetted_input = get_permitted_list(delimited_string_to_list(dedupe_and_sort(string)),vetted_items())

and it’s producing the results I was expecting.

Thanks again.

1 Like

I’ve gone one step further realising that I’d be better off ignoring text case, but replacing lower case with proper case, so defined the following to replace get_permitted_list:

def caseless_list_intersection(source: list, target: tuple):
    intersection = []
    s = [x.lower() for x in source]
    for t in target:
        if t.lower() in s:
            intersection.append(t)
    return intersection