XOR operator between bytes


How do you think about supporting XOR ^ operand between bytes objects?

Equivalent python code adding to the inherited class

    def __xor__(self, other):
        Overrides xor operator in order to xor bytes.

        return bytes(x ^ y for x, y in zip(self, other))

1 Like

Why not support all bit wise operations then?

I’m not intimately familiar with an application for this in python. My experience with bit masking is operating on integers for setting individual bits which are turning on/off register values for setting up peripherals on microcontrollers, but that’s in C.


To be honest I’ve proposed XOR because of that this is the only one operand which I’ve realized that I need it :slight_smile:

But hey, your idea is cool, I’m even not aware about possibilities of bytes objects.

XOR is required in some security related calculations like Miyaguchi-Preneel compression where you’re xoring some messages all together, I find it really useful

How would this (any of the bitwise operators) work with bytes of unequal lengths, i.e. len(self) != len(other)?
As given, zip ignores the excess bytes of the longer of the two, but doubt that is by intent

I see three basic options for unequal lengths.

  • Raise an exception if the two aren’t the same length.
  • Behave like zip and truncate to the shorter length.
  • Once the shorter one runs out, just take the other’s byte values verbatim. This would be analogous to or-ing with 0, and-ing with 255, and xor-ing with 0 for those bytes, all of which are no-ops.

It’s hard to really say without some concrete use cases, but I suspect the typical way this would be used is between two bytes of equal lengths, in which case (2) and (3) would hide what is probably an error.

Indeed those possibilities. Or a fourth:

  • extend the shorter with zero bytes

This is what int.from_bytes(…, byteorder='little') does, so consistent between bytes and int.

My question was intended to clarify what’s exactly proposed

Let me introduce first use case:

As a user I would like to use those operands in security operations where bytes has equal range (plain or cipher blocks).

And for plain text not being a multiple of the cipher length, you’d extend the plain text, right?
So that’s zip(…, …, strict=True).

Apologize, could you rephrase it?
Basically most of crypto stuff bases on fixed size blocks, so I meant that if they were not - I would like to get informed about that (Exception)

So that’s for this first use case.
I would try to get a new soon :smiley:

Rewrite → rephrase

By meaning of “Fixed” I meant that in this use case both sides of operands would be equal, because both has its own, same fixed size

About extend: I assume if the plain text length is not a multiple of the cipher text, the code will extend the last block of plain text with additional bytes to match the cipher length before xor’ing them

About strict zip: it’ll fail on unequal lengths, see PEP 618 – Add Optional Length-Checking To zip | peps.python.org

Does that clarify my remark?

Hello - yes, I think I’ve got the point now, thanks!

Regarding extend - no, this use case will not extend plain text to fit somewhere, in this use case cipher and plaintext are the same length in those particular functions messages (plain) and cipher are blocks with the same length.

Regarding strict zip - yea, I get it ^^ but as I’ve said - same length in this use case so strict would be used in equivalent python code.

I would propose find the new use cases before any decision.

It was already discussed earlier.

It would be strange to only implement ^, but not |, & and ~.

But bytes objects are collections, and operators | and & already defined for sets which also collections, and some set methods accept arbitrary iterables, including bytes objects. I afraid that it could lead to some errors.

I think that it would be better to use functions for bitwise operations on bytes. It is less ambiguous. And you can add other useful functions: set or clear bits in the specified range, test whether all bits in the specified range are set or clear, shift bits, etc, etc. You can also look at existing implementations for bitarrays or bitsets.

Do you mean adding methods like bytes.bitwise_xor, bytes.bitwise_or, bytes.bitwise_not, and bytes.bitwise_and?
Or do you mean adding functions to binascii or hashlib module?

One advantage of operators is bytearray can implement inplace operations in natural (|=, ^=, and &=). But this is not a big deal.

By the way, int.from_bytes() + int.to_bytes() is 10x faster than common idioms.

def xor_bytes_list(a, b):
    return bytes([(aa ^ bb) for (aa, bb) in zip(a, b, strict=True)])

def xor_bytes_generator(a, b):
    return bytes((aa ^ bb) for (aa, bb) in zip(a, b, strict=True))

def xor_bytes_via_int(a, b):
    if len(a) != len(b):
        raise ValueError(f"a and b must have same length; {len(a)=} {len(b)=}")
    aa = int.from_bytes(a)
    bb = int.from_bytes(b)
    return (aa ^ bb).to_bytes(len(a))
$ rye run python xor_bytes.py
list: Mean +- std dev: 7.59 us +- 0.12 us
generator: Mean +- std dev: 10.1 us +- 0.3 us
int: Mean +- std dev: 685 ns +- 7 ns
1 Like

FWIW Perl has full support for bitwise operations on two strings, really convenient for cipher uses. It resolves strings of unequal lengths by extending the shorter string with 0s for or-ing and xor-ing, and truncating the longer to the length of the shorter for and-ing:

Bitstrings of any size may be manipulated by the bitwise operators (~ | & ^).

If the operands to a binary bitwise op are strings of different sizes, | and ^ ops act as though the shorter operand had additional zero bits on the right, while the & op acts as though the longer operand were truncated to the length of the shorter. The granularity for such extension or truncation is one or more bytes.

# ASCII-based examples
print "j p \n" ^ " a h";        	# prints "JAPH\n"
print "JA" | "  ph\n";          	# prints "japh\n"
print "japh\nJunk" & '_____';   	# prints "JAPH\n";
print 'p N$' ^ " E<H\n";		    # prints "Perl\n";