How NOT to be Pythonic

Brendan · February 3, 2023, 1:45am

New here - great community, looking forward to contributing.

Good python, or being ‘pythonic’ is a combination of code style, architecture patterns, design approach, documentation, testing and writing code in a way that leverages the performance and readability of the language and the standard library.

A good place to start the ‘pythonic’ journey is PEP 8 – Style Guide for Python Code | peps.python.org and the great tutorials that teach the unique and powerful features of python: Python For Beginners | Python.org

TL;DR

The code here is a python anti-pattern. Possibly the worst python code ever written

Coming from years of furious ANSI’ish C programing for embedded systems, becoming ‘pythonic’ was, and still is, a struggle.

Tired, under pressure and on a midnight deadline, this piece of C’ish, python vomit got the job done!

A good example of the wrong way to be pythonic and maybe useful for a giggle.

Please be kind

Background

I was leading a not-for-profit community organisation and for various reasons needed to call a general meeting with a prescribed notice period. To make dates work, the notice had to be sent by
midnight. It was 8pm.

It was decided that the recipient list needed to be broad. We had an Excel sheet with member emails (various formats), maintained inconsistently; an old mailing list in a Word document; several CC copies from MS Outlook; miscellaneous other lists; fragments updating addresses etc. In all, a large number of email addresses, in different formats, with many duplicates.

The Task

Combine all lists (manual paste to a single text file), strip noise (mailto: http: etc) parse various formats, check syntax accuracy, identify duplicates and produce a clean list for pasting into a BCC
field.

After years of C code with nothing but a cut-down ANSI C standard library it seemed that the elegance, utility and power of Python should get the job done easily and quickly.

3 hours and 50 minutes of fatigued, list indices debugging later, the email flew with a good result! This was prior to discovering a good python IDE too - was still using Eclipse C/C++ with the pydev
plugin.

The Code (Vomit)

"""
Clean up and de-dupe a heap of email addresses from different
sources with inconsistent formatting.
"""

with open('list1.txt') as f:
    lines = f.readlines()
   
block = " ".join(lines)

estart= ['<', '(']
eend = ['>', ')']
delims = [';', ',', ' ', '\n']


class StripDetect():
    def __init__(self):
        self.strip_prefixes = [ 'mailto:', 'http://', 'https://' ]
        self.strip = []
        self. match_array = []
        for d in delims:
            for s in self.strip_prefixes:
                self.strip.append(d+s)
                self.match_array.append(0)
        self.buf = ''
        
        self.matching_count = 0
        self.unique_match = False
        self.matching = False
        self.last_complete_i = None

    def add_check(self, c):
        self.buf += c
        
        for i in range(0, len(self.strip)):
            #ss = self.strip[i][:self.match_array[i]+1]
            ssi = self.strip[i]
            mai = self.match_array[i]
            ss = ssi[:mai+1]
            buf_end = self.buf[-self.match_array[i]-1:]
            if ss in buf_end:
                self.matching = True
                self.match_array[i] += 1
            else:
                self.match_array[i] = 0

        partials = 0
        completes = 0
        last_complete_i = None
        
        if self.matching:
            for i in range(0, len(self.strip)):
                if self.match_array[i] > 0:
                    partials +=1
                    if self.match_array[i] == len(self.strip[i]):
                        completes +=1
                        last_complete_i = i
                        
        if completes == 1 and partials == 1:
            for i in range(0, len(self.match_array)):
                self.match_array[i] = 0
            
            return len(self.strip[last_complete_i])
        else: return 0


name_buf = ""
email_buf = "" 

in_name = False
in_email = False

emails = []
names = []

sd = StripDetect()

name_count = 0

for i in range(0, len(block)):

    c = block[i]
    
    if sd.add_check(c) != 0:
        # match - fix it
        name_buf = ''
        email_buf = ''
    
    else:
        if c not in delims:
            if c in estart:
                in_email = True
            elif c in eend:
                if not in_email:
                    raise ValueError('Closing > found before opening <')
                if name_count <= 1:
                    in_email = False
                    in_name = False
                    emails.append(email_buf)
                    names.append(name_buf)
                    email_buf = ""
                    name_buf = ""
                    name_count = 0
                else:
                    raise ValueError('Delimited names without assoicated email')
            elif in_email:
                email_buf = email_buf + c
            elif in_name and c == '@':
                # switch it to email
                email_buf = name_buf
                name_buf = ""
                in_name = False
                in_email = True
                email_buf += c
            else: 
                in_name = True
                name_buf = name_buf + c
        else:
            # is delim
            if in_name:
                if c == ' ':
                    # spaces ok in names
                    name_buf += c
                else: 
                    in_name = False
                    name_count +=1
                    #names.append(name_buf)
                    #name_buf = ""
            elif in_email:
                #due to email detected by '@' while in name rather than <>
                if name_count == 0:
                    in_email = False
                    in_name = False
                    emails.append(email_buf)
                    names.append(name_buf)
                    email_buf = ""
                    name_buf = ""
                    #name_count = 0
                else:
                    raise ValueError('Token was email but name already in name buf')

unique = {}


def add(i):
    unique[emails[i].lower()] = [emails[i], names[i]]


for i in range(0, len(emails)):
    if emails[i].lower() in unique.keys():
        if  len(names[i]) > len (unique[emails[i].lower()][1]):
            add(i)
    else:
        add(i)
    
sorted_list_tup = sorted(unique.items())

res = ''
for _i, v in sorted_list_tup:
    e, n = v
    if n == '':
        recipient = f'{e}'
    else:
        recipient = f'{n} <{e}>'
    print(recipient)

steven.daprano · March 4, 2023, 9:44am

I’ll be honest here. I don’t see what is so bad about this code. It is certainly not even close to

I wouldn’t necessarily have written it the same way. There is some unidiomatic code, e.g. the reliance on indexing, but honestly without spending a lot of time on refactoring I’m not 100% sure that indexing is such a bad thing here.

It could do with some improvement. But its a first draft of a script written under enormous time pressure and it got the job done in time. Presumably it will never be used again, so in that sense it doesn’t need any improvement and there will never be a second draft.

Its fine for what it is.

damien_mattei · March 5, 2023, 6:11am

i agree the code is not so bad. Anyway i always thought that parsing and analysing I/O files is always a furious task

Brendan · March 6, 2023, 1:44am

you are very kind Steven! If I’d written in C (which is probably what I should have done), at least I’d be able to work out what it does. Looking back at this in python is reads like an obfusicated code competetion entry.