New here - great community, looking forward to contributing.
Good python, or being ‘pythonic’ is a combination of code style, architecture patterns, design approach, documentation, testing and writing code in a way that leverages the performance and readability of the language and the standard library.
A good place to start the ‘pythonic’ journey is PEP 8 – Style Guide for Python Code | peps.python.org and the great tutorials that teach the unique and powerful features of python: Python For Beginners | Python.org
TL;DR
The code here is a python anti-pattern. Possibly the worst python code ever written
Coming from years of furious ANSI’ish C programing for embedded systems, becoming ‘pythonic’ was, and still is, a struggle.
Tired, under pressure and on a midnight deadline, this piece of C’ish, python vomit got the job done!
A good example of the wrong way to be pythonic and maybe useful for a giggle.
Please be kind
Background
I was leading a not-for-profit community organisation and for various reasons needed to call a general meeting with a prescribed notice period. To make dates work, the notice had to be sent by
midnight. It was 8pm.
It was decided that the recipient list needed to be broad. We had an Excel sheet with member emails (various formats), maintained inconsistently; an old mailing list in a Word document; several CC copies from MS Outlook; miscellaneous other lists; fragments updating addresses etc. In all, a large number of email addresses, in different formats, with many duplicates.
The Task
Combine all lists (manual paste to a single text file), strip noise (mailto: http: etc) parse various formats, check syntax accuracy, identify duplicates and produce a clean list for pasting into a BCC
field.
After years of C code with nothing but a cut-down ANSI C standard library it seemed that the elegance, utility and power of Python should get the job done easily and quickly.
3 hours and 50 minutes of fatigued, list indices debugging later, the email flew with a good result! This was prior to discovering a good python IDE too - was still using Eclipse C/C++ with the pydev
plugin.
The Code (Vomit)
"""
Clean up and de-dupe a heap of email addresses from different
sources with inconsistent formatting.
"""
with open('list1.txt') as f:
lines = f.readlines()
block = " ".join(lines)
estart= ['<', '(']
eend = ['>', ')']
delims = [';', ',', ' ', '\n']
class StripDetect():
def __init__(self):
self.strip_prefixes = [ 'mailto:', 'http://', 'https://' ]
self.strip = []
self. match_array = []
for d in delims:
for s in self.strip_prefixes:
self.strip.append(d+s)
self.match_array.append(0)
self.buf = ''
self.matching_count = 0
self.unique_match = False
self.matching = False
self.last_complete_i = None
def add_check(self, c):
self.buf += c
for i in range(0, len(self.strip)):
#ss = self.strip[i][:self.match_array[i]+1]
ssi = self.strip[i]
mai = self.match_array[i]
ss = ssi[:mai+1]
buf_end = self.buf[-self.match_array[i]-1:]
if ss in buf_end:
self.matching = True
self.match_array[i] += 1
else:
self.match_array[i] = 0
partials = 0
completes = 0
last_complete_i = None
if self.matching:
for i in range(0, len(self.strip)):
if self.match_array[i] > 0:
partials +=1
if self.match_array[i] == len(self.strip[i]):
completes +=1
last_complete_i = i
if completes == 1 and partials == 1:
for i in range(0, len(self.match_array)):
self.match_array[i] = 0
return len(self.strip[last_complete_i])
else: return 0
name_buf = ""
email_buf = ""
in_name = False
in_email = False
emails = []
names = []
sd = StripDetect()
name_count = 0
for i in range(0, len(block)):
c = block[i]
if sd.add_check(c) != 0:
# match - fix it
name_buf = ''
email_buf = ''
else:
if c not in delims:
if c in estart:
in_email = True
elif c in eend:
if not in_email:
raise ValueError('Closing > found before opening <')
if name_count <= 1:
in_email = False
in_name = False
emails.append(email_buf)
names.append(name_buf)
email_buf = ""
name_buf = ""
name_count = 0
else:
raise ValueError('Delimited names without assoicated email')
elif in_email:
email_buf = email_buf + c
elif in_name and c == '@':
# switch it to email
email_buf = name_buf
name_buf = ""
in_name = False
in_email = True
email_buf += c
else:
in_name = True
name_buf = name_buf + c
else:
# is delim
if in_name:
if c == ' ':
# spaces ok in names
name_buf += c
else:
in_name = False
name_count +=1
#names.append(name_buf)
#name_buf = ""
elif in_email:
#due to email detected by '@' while in name rather than <>
if name_count == 0:
in_email = False
in_name = False
emails.append(email_buf)
names.append(name_buf)
email_buf = ""
name_buf = ""
#name_count = 0
else:
raise ValueError('Token was email but name already in name buf')
unique = {}
def add(i):
unique[emails[i].lower()] = [emails[i], names[i]]
for i in range(0, len(emails)):
if emails[i].lower() in unique.keys():
if len(names[i]) > len (unique[emails[i].lower()][1]):
add(i)
else:
add(i)
sorted_list_tup = sorted(unique.items())
res = ''
for _i, v in sorted_list_tup:
e, n = v
if n == '':
recipient = f'{e}'
else:
recipient = f'{n} <{e}>'
print(recipient)