Add a new token to force the methods to returns the previous object

1. Grammar (PEG)

I thought of this token (&) because in C it’s an address operator; it’s not the same thing, but there’s a similarity.

primary:
    | primary '.' '&'? NAME
    | primary genexp 
    | primary '(' [arguments] ')' 
    | primary '[' slices ']' 
    | atom

2. Motivation

There are classes with methods that don’t return the instance itself, instead returning None, which forces me to create a variable in certain situations.

Before:

>>> received_checksum = s.recv_exact(2)
>>> crc = CRC16(b'foo')
>>> crc.update(b'bar')
>>> calculated_checksum = crc.digest()
>>> assert received_checksum == calculated_checksum

It’s not very attractive code, is it? I was forced to create a variable in order to use the update() method.

I spend more time thinking of a good variable name than actually coding, but there are times when you don’t want to create a variable or fill your code with utility functions.

After:

>>> received_checksum = s.recv_exact(2)
>>> calculated_checksum = CRC16(b'foo').&update(b'bar').digest()
>>> assert received_checksum == calculated_checksum

3. Behavior

When using . + & token, should returns the evaluated object before the getter.

This should not change the result of __getattribute__, but the expression should force the return of the evaluated object.

Internally, Python should evaluate the expression preceding the getter, complete the getter, but return just what was evaluated before the getter.

Examples

SHA1 (can be any hashlib class)

>>> from hashlib import sha1
>>>
>>> """
... Signature: sha1(data=b'', *, usedforsecurity=True, string=None)
... Docstring: Returns a sha1 hash object; optionally initialized with a string
... """
>>>
>>> sha1()
<sha1 _hashlib.HASH object at ...>
>>>
>>> sha1(b'foo').update(b'bar') # None
>>>
>>> sha1(b'foo').&update(b'bar')
<sha1 _hashlib.HASH object at ...>
>>>
>>> sha1(b'foo').&update(b'bar').hexdigest()
'8843d7f92416211de9ebb963ff4ce28125932878'

Queue

>>> from queue import Queue
>>>
>>> """"
... Init signature: Queue(maxsize=0)
... Docstring: Create a queue object with a given maximum size.
... """
>>>
>>> # Creates a queue containing 3 items.
>>> q = Queue(); q.put(1); q.put(2); q.put(3)
>>> # Or:
>>> q = Queue(); for n in (1, 2, 3): q.put(n)
>>> # Or:
>>> q = Queue().&put(1).&put(2).put(3)
>>>
>>> # Drop two items, get the next. 
>>> x = q.get(); q.get(); x = q.get()
>>> # Or:
>>> for _ in range(3): x = q.get()
>>> # Or:
>>> x = q.&get().&get().get()

sorted() vs list.sort() or reversed() vs list.reversed()

>>> # You probably prefer using `sorted()` rather than `list.sort`, but it always creates a new list, therefore it's slower and uses more memory.
>>> sorted([3, 2, 1])
[1, 2, 3]
>>>
>>> # Here, the list itself is sorted without a copy (much faster).
>>> [3, 2, 1].&sort()
[1, 2, 3]
>>>
>>> # This result is equivalent to sorted(), but subtly slower, prefer sorted().
>>> [3. 2. 1].copy().&sort()
[1, 2, 3]

4. Considerations

  1. Obviously its use is debatable and there would be cases where someone would overuse it, but that didn’t stop the walrus operator (:=) from being added.
  2. It is (or should be) a rule that methods created solely to change the object’s state should return None instead of self to differentiate them from methods that create copies of the object and return them.
  3. I discourage the use of this token in some methods that return anything other than None, as it could cause confusion:
    >>> mylist = [1, 2, 3, 4]
    >>> mylistcopy = mylist.&copy()
    >>> assert mylist != mylistcopy, 'are the same list'
    ---------------------------------------------------------------------------
    Traceback (most recent call last)
    ...
    AssertionError: are the same list
    
  4. Should Python be too strict to prevent the user from making mistakes? I don’t think so; perhaps this task could be assigned to linters as a bad practice. I believe that natively preventing this would make it less flexible, less free.
2 Likes

I think you meant
calculated_checksum = CRC16(b'foo').update(b'bar').&digest()
The idea is to merge consecutive inplace-modifying methods to blend them as chained ones. But at the cost of introducing a cryptic syntax. I think this idea has very very low chances of being accepted, python usually prefers to involve verbosity than sacrificing readability, and blending inplace and chained methods feels like homogeneizing heterogeneous things, which might be considered as a bad practice.

What would be typically considered first for your case would be :
1: Use CRC16 method from the class :
CRC16.digest(CRC16(b'foo').update(b'bar')
2: Create a special “chainer” class to wrap all you want into chainable methods.

2 Likes

These toy examples could just as easily be written:

>>> [2, 1]
[2, 1]
>>> sorted([2, 1])
[1, 2]
>>> received_checksum = s.recv_exact(2)
>>> calculated_checksum = CRC16(b'foobar').digest()
>>> assert received_checksum == calculated_checksum

Do you have examples where you would need to iteratively modify an object on a single line that could not also be simplified so easily? I see the utility of what you propose, and I’m sure it would be handy here and there, but the complex use cases I can think of would already need multiple lines, e.g.,

received_checksum = s.recv_exact(2)
crc = CRC16(initializer)
while block := s.recv(MAX_CHUNK):
    crc.update(block)
calculated_checksum = crc.digest()
assert received_checksum == calculated_checksum

Adding syntax needs a strong justification, and I don’t see one yet.

3 Likes

I’m not a fan of new syntax that encourages methods with in-place modification.

But on the other hand, this would further encourage in-place modifying methods to return None, which would be a gain.

3 Likes

I don’t think so, the expected behavior would be an AttributeError.

>>> CRC16(b'foo')
<crc.CRC16 at 0x7faae0e37e00>
>>> CRC16(b'foo').update(b'bar') # None
>>> CRC16(b'foo').update(b'bar').&digest()
---------------------------------------------------------------------------
Traceback (most recent call last)
...
AttributeError: 'NoneType' object has no attribute 'digest'

So it should be:

>>> CRC16(b'foo')
<crc.CRC16 at 0x7faae0e37e00>
>>> CRC16(b'foo').&update(b'bar')
<crc.CRC16 at 0x7faae0e37e00
>>> CRC16(b'foo').&update(b'bar').digest()
b'\x00\x00'

Option 1:

>>> CRC16.digest(CRC16(b'foo').update(b'bar'))
---------------------------------------------------------------------------
Traceback (most recent call last)
...
AttributeError: 'NoneType' object has no attribute 'integer'

Option 2: The idea is precisely to reduce the number of utilities and solve the problem elegantly by using the token (&) as a language feature.

1 Like

Why are you asking for a whole new language feature rather than just asking for update to be changed to return self?

4 Likes

This is kind of the entire point of the builder pattern, to have a long-lived object whose state modification spans multiple statements. If you can do it all in one expression, why not something like

CRC16.from_list([b'foo', b'bar']).digest()

To be clear, I’m advocating better APIs in place of modifying the language to accommodate existing APIs.

9 Likes

It’s a well established language feature at this point that methods that perform in-place modification shouldn’t return self. So that you can tell by looking at

def f(x: X)->X:
  return x.with_y("y").sorted()

that this won’t change the original x.

At least in the Python ecosystem I work in. It is genuinely useful that this convention exists. It means I can tell what code does without having to read all the definitions.

5 Likes
class Builder:
    def __init__(self, obj): self.obj = obj
    def __getattribute__(self, name):
        if name == "obj": return object.__getattribute__(self, name)
        att = getattr(self.obj, name)
        if callable(att):
            def caller(*a, **kw):
                att(*a, **kw)
                return self
            return caller
        return att

Or pick whatever rules you like. Maybe only return self if the original method returned None, or maybe nominate specific methods, or whatever makes sense for your code.

1 Like

I know. And now that’s been made clear, it’s fairly obvious why adding a language feature that lets people work around that (deliberate, and IMO sensible) design choice is unlikely to get anywhere.

Sorry for the straw-man question - I was probably being a bit of a smart-ass :slightly_frowning_face:

3 Likes

I wonder though, to what extend .& would really annul the advantages of that convention. I mean you’d instantly be able to tell that

def f(x: X)->X:
  return x.&add_y("y").sorted()

is naughty but

def f(x: X)->X:
  return x.with_y("y").&sort()

is fine.

Actually that last one made my heart stop for a moment, so maybe it’s not so nice for code readers.

There’d also be the new foot-gun that

def f(x: X)->X:
  return x.with_y("y").&sorted()

(where the effect of the .sortedmethod gets discarded because it doesn’t do any in-place modification)
is buggy but not error-raising and not noticeable to type checkers.

All in all I’m -0, but I think it is an interesting proposal.

2 Likes

Ignore that.


In my real-world example, I should use update(data) because concatenating bytes is more expensive as it creates a new copy of the bytes in memory, whereas update(data) will only receive the reference and iterate through it.

>>> # Receives first 6 bytes (message header)
>>> header = s.recv_exact(6)
>>> # Gets the length of message data.
>>> data_length = header[-1]
>>> # Receives the remaining message data.
>>> data = s.recv_exact(data_length)
>>> # Receives CRC-16 from message.
>>> received_checksum = s.recv_exact(2)
>>> # Calculates the checksum.
>>> calculated_checksum = CRC16(header + data).digest()
>>> # Test checksums.
>>> assert received_checksum == calculated_checksum

I only see some options:

  1. I continue to use update() (very easy, but ugly).
  2. I use bytearray() + recv_exact_into(buffer, size, offset) (unnecessary complexity).
  3. I create utilities (which is precisely what I want to avoid from the start).
1 Like

Excellent! For my class, this is a great idea; it’s within my reach. I can use packing in the argument, and then I don’t need to call update() or concatenate bytes (which is expensive).

def __init__(self, *data: Iterable[int]) -> None:
calculated_checksum = CRC16(header, data).digest()

At least for this specific case, since __init__ and update() are practically the same thing; otherwise, it wouldn’t be so simple.

1 Like

Those feel like reasonable options to weigh for your code base, not justifications for adding language syntax. Personally, I would probably just use update() if I only did it once, and factor it out into a function if I did it many times:

def checksum(chunks: Iterable[bytes]) -> bytes:
    crc = CRC16()
    for chunk in chunks:
        crc.update(chunk)
    return crc.digest()

Though, it looks from your most recent post that you control the class, so it seems like you could do anything you like, including making update() return self.

1 Like

I could actually make update() return self, but apparently that’s not a very good idea in methods that only change the object’s state, unless it’s a copy of the object.

Although I used my own class in the examples, this also applies to third-party classes, or even those from the native hashlib library that exhibit the same behavior.

>>> from hashlib import sha1
>>>
>>> """
>>> Signature: sha1(data=b'', *, usedforsecurity=True, string=None)
>>> Docstring: Returns a sha1 hash object; optionally initialized with a string
>>> """
>>>
>>> sha1()
<sha1 _hashlib.HASH object at ...>
>>>
>>> sha1(b'foo').update(b'bar') # None
>>>
>>> sha1(b'foo').&update(b'bar')
<sha1 _hashlib.HASH object at ...>
>>>
>>> sha1(b'foo').&update(b'bar').hexdigest()
'8843d7f92416211de9ebb963ff4ce28125932878'

I’d be much happier with, for example, allowing the hashlib functions to accept an iterable or file-like than adding a whole new syntax for those otherwise slightly tedious APIs.

sha1([b'foo', b'bar']).hexdigest()
# or
with open("big-file", "rb") as f:
    digest = sha1(f).hexdigest()
4 Likes

(1) the feature could work everywhere - not just a few methods; (2) it wouldn’t break backwards compatibility on any existing method.

so, it makes sense to ask for it. I am not sure if I like the proposed syntax - and not even if it would be a compelling enough feature -
For one hand, if could incentive the creation of more methods that do not return the modified object as a side effect: they can focus in doing one thing.

2 Likes

I believe one obvivous thing is that thsi should not introduce ambiguities -

So, if the “decorated” method call with the new syntax, would return anything other than `None` it should raise a runtime error.
(there is plain no use in `mylist.&copy()` to returning the original list, for example - this should raise)

Maybe, allow `None` or the bound instance itself, instead of strictly None - this could make some patterns simpler when alternating methods that already return `self` with returning `None` ones: just decorate all calls. But out of these two cases, if it does not raise an error, I think this feature would be more harmful than anything else.

Other than that, I find it an interesting idea (not a pretty syntax, but let’s bike-shed on that in other occasion)

3 Likes

What if it returns something else, though? For example, writing to a file returns the amount written - are you unable to chain write calls for that reason?

1 Like

Write/send operations often receive buffers as arguments, the size of which can be easily determined using len(buffer).

It makes sense for send(buffer) in sockets to return the number of bytes transmitted because partial data transmission can occur, and it doesn’t guarantee complete data transmission.

And write(buffer) is apparently no different; partial writes certainly don’t happen frequently in regular local files, but they can occur in special files such as Unix Sockets, TTYs, FIFO, Pipes, etc.

So this is the kind of scenario where you might not want to be left without knowing how many bytes were written.

Should Python be too strict to prevent the user from making mistakes? I don’t think so; perhaps this task could be assigned to linters as a bad practice. I believe that natively preventing this would make it less flexible, less free.

1 Like