I’m picking up data from a field in a sqlite table that has been stuffed with one or more phrases delimited by a ‘\\’, turning that string into a list of items split by the delimiter. A function call compares that list with a predefined list and returns a sorted list containing only the items that match entries in the predefined list. To do this I’ve coded the following::
def vetted_items():
''' return a tuple of vetted items '''
return(("Bolt", "Bolt Threader", "Bolt Cutter", "Ball Bearing"))
def dedupe_and_sort(input_string, delimiter=r'\\'):
''' get a list items that contains a delimited string, dedupe and sort it and pass it back '''
distinct_items = set(x.strip() for x in input_string.split(delimiter))
return (delimiter.join(sorted(distinct_items)))
def get_permitted_items(source, target, delimiter=r'\\'):
''' function to return all items in source that do appear in target '''
return [item for item in source if item in target]
def delimited_string_to_list(input_string, delimiter=r'\\'):
''' convert delimited string to list and pass it back '''
return(input_string.split(delimiter))
def list_to_delimited_string(input_list, delimiter=r'\\'):
''' convert list to delimited string and pass it back '''
return delimiter.join(map(str, input_list))
string = 'Bolt\\\\Nut\\\\Bolt Cutter\\\\Bolt, Cutter\\\\Ball-Bearing\\\\Nut\\\\Bolt; Cutter\\\\Bolt\\\\Ball Bearings'
string_to_list = delimited_string_to_list(string)
print(f"Vetted items..............: '{vetted_items()}'\n")
print(f"Incoming string from table: '{string}'")
print(f"Incoming string to list...: {string_to_list}")
string_to_list.sort()
print(f"Sorted incoming list......: {string_to_list}")
vetted_input = get_permitted_items(vetted_items(),dedupe_and_sort(string))
print(f"Cleansed incoming list:...: {vetted_input}")
print(f"Cleansed string...........: '{list_to_delimited_string(vetted_input)}'")
Running the above yields the following:
Vetted items..............: '('Bolt', 'Bolt Threader', 'Bolt Cutter', 'Ball Bearing')'
Incoming string from table: 'Bolt\\Nut\\Bolt Cutter\\Bolt, Cutter\\Ball-Bearing\\Nut\\Bolt; Cutter\\Bolt\\Ball Bearings'
Incoming string to list...: ['Bolt', 'Nut', 'Bolt Cutter', 'Bolt, Cutter', 'Ball-Bearing', 'Nut', 'Bolt; Cutter', 'Bolt', 'Ball Bearings']
Sorted incoming list......: ['Ball Bearings', 'Ball-Bearing', 'Bolt', 'Bolt', 'Bolt Cutter', 'Bolt, Cutter', 'Bolt; Cutter', 'Nut', 'Nut']
Cleansed incoming list:...: ['Bolt', 'Bolt Cutter', 'Ball Bearing']
Cleansed string...........: 'Bolt\\Bolt Cutter\\Ball Bearing'
Can you see any circumstance where the input string may contain characters that would cause an incorrect result?