Building a DoublyLinkedList in Python - - append method

As @enoren5 is using this code to learn about programming and design isn’t worth pointing out issues?

Where I have seen harm is where __repr__ that returns huge strings is put into logs.

Yeah, but what I (and evidently others) am not understanding is how this is any more of an issue than the stdlib list, which does the same thing with both its __str__ and its __repr__, outputting the entire list to a string in a very similar fashion.

If anything, its much less of a problem; as the above results illustrate, the relative performance of str() to initial construction is ≈2 OoM higher for a DoublyLinkedList vs a list, with str() roughly a fourth the time to construct the list, and less than that factor slower than str() for a list in absolute terms. In other words, compared to a list that takes the same time to construct (of ≈50x greater size), str() on the DoublyLinkedList is around 13x faster in absolute terms:

%timeit create_dll(10_000)
24.9 ms ± 1.56 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit list(range(500_000))
26.5 ms ± 981 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

dll_long = create_dll(10_000)
lst_verylong = list(range(500_000))

%timeit str(dll_long)
8.83 ms ± 778 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit str(lst_verylong)
113 ms ± 2.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Of course, this disregards the actual I/O overhead to print the output to the screen or write it to a file, but this will be essentially the same for a list as a DoublyLinkedList even in absolute terms, so I don’t see how the latter is any worse in that regard.

Is there something we’re missing here?

From a design perspective, I would point out though that only __str__ is defined, not __repr__, and the former falls back to the latter but not the converse. The simplest course of action here would be to make the current __str__ here the __repr__, so both are useful.

However, it would be better still for the DoublyLinkedList constructor to be able to accept an iterable (e.g. a list) to store its elements into a DoublyLinkedList (superseding the need for the create_dll helper function), i.e. something like

class DoublyLinkedList:
    def __init__(self, initial_values=()):
        self.head = None
        self.tail = None
        self.length = 0
        for value in initial_values:
            self.add_right(value)

With that change, you can create a new DoublyLinkedList by simply calling its constructor, and easily convert other iterables to a DoublyLinkedList:

>>> dll_short = DoublyLinkedList(range(10))
>>> print(dll_short)
Doubly Linked List: [0 <-> 1 <-> 2 <-> 3 <-> 4 <-> 5 <-> 6 <-> 7 <-> 8 <-> 9]

It then follows that a reasonable __repr__ would be

    def __repr__(self):
        current = self.head
        elements = []
        while current:
            elements.append(current.value)
            current = current.next
        return f"DoublyLinkedList({elements!r})"

Which would give

>>> dll_short = DoublyLinkedList(range(10))
>>> dll_short
DoublyLinkedList([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

This has the desirable and idiomatic property for repr() that if the resulting string is evaluated (pasted into a console, etc),. it reconstructs the original object (whereas the __str__ is formatted for user-friendly “pretty” display).

Oh I see what you are getting at, yes using a list objects repr is just as good or bad, depending on context.

As @kknechtel pointed out and to re-iterate, my __str__ is for testing and debugging, not for production.

This is a good point. Logs in a production enviornment would be total chaos for a list input to the tune of one million or more nodes handled by my custom DLL script.

Since the way my __str__ (or similar __repr__) is written isn’t best practices, @barry-scott , what would you recommend as an ideal alternative?

These bencharks are excellent. Thank you. To add to your analysis, below is Big-O chart comparing operations to elements. @CAM-Gerlach, where would you position your benchmark results on the __str__ on the digram below? Would it be fair to say that while the custom DLL script (“short” clocking in at 7.56us) and built in list (also for “short” but at 2.08ns) are both O(n) (and not O(1)) just with the custom DLL script being slightly less performant than the other?

Well, if your __str__ doesn’t follow best practices in that it outputs all the elements, then neither does the built-in list either. IMO, it’s really the responsibility of the consumer to use it appropriately, and use prettyprint, manual truncation, simply not dumping the full object to logs, etc., in situations where this can potentially be an issue (which I’ve indeed spent a good bit of time dealing with myself).

I’d actually mentioned it above, but my post text got wiped in a browser crash and I missed including it the second time. No need for the fancy chart—I think you can presume that anyone answering you here understands the basics of what big-O notation means :slight_smile: — but all of the operations above appear to scale roughly linearly and in fact nearly directly with length, i.e. O(n), with list(range(n)) having a relatively high per-call overhead/y-intercept (given list(range(1)) takes essentially the same time as list(range(10)).

1 Like