AzureAD UID to Immutable Id

Hello,
just wondering if someone have found out the way to convert UserId fetched form MS AzureAD to AzureAD immutableId? Sure we can do it with Microsoft Powershell using ToBase64String method but I prefer to do it in Python script. Unfortynately haven’t found out how. Regular Python base64 encode will not do it right directly.

It would help if you could share your code. What does your Python base64 code output, and how is this incorrect? Also, do you have a link to the relevant documentation? Take into account that people here might be Python experts, the majority of us will not be AzureAD experts.

It looks like they just convert the binary value of a UUID to base64, so the following should be perfectly fine for converting between UUIDs and their base64-encoded versions in Python:

from base64 import b64decode, b64encode
from uuid import UUID

def uuid_to_immutableid(uuid_string: str) -> str:
    return b64encode(UUID(uuid_string).bytes).decode('ascii')

def immutableid_to_uuid(base64_string: str) -> str:
    return str(UUID(bytes=b64decode(base64_string)))

Demo:

>>> uuid_to_immutableid("5c78336d-77d8-df44-8007-d277dad5c569")
'XHgzbXfY30SAB9J32tXFaQ=='
>>> immutableid_to_uuid("XHgzbXfY30SAB9J32tXFaQ==")
'5c78336d-77d8-df44-8007-d277dad5c569'

This seems to fit with this published PowerShell script to do the same, the above UUID and Base64 values were taken from its documentation.

Thanks a lot! I have missed that uuid lib and tried to encode UID-string bytes to b64 and therefore got it wrong.Your methods do it just right.

Sorry, I was bit too exited. If I convert UID 7e26997b-09a9-4f6e-b406-7327e3a5c370 to Immutableid with PowerShell I’ll get e5kmfqkJbk+0BnNy46XDcA== but with b64encode(UUID(‘e26997b-09a9-4f6e-b406-7327e3a5c370’).bytes).decode(‘ascii’) give me fiaZewmpT260BnMn46XDcA== So still some difference with conversions

Then you’ll have to find us better documentation on what exactly is being done.

The Powershell base64 value is clearly not directly based on the GUID hex value, even though there is some correlation:

>>> from base64 import b64decode
>>> from difflib import Differ
>>> from uuid import UUID
>>> from pprint import pprint
>>> def hexdiff(a, b):
...     a = [format(v, '02x') for v in a]
...     b = [format(v, '02x') for v in b]
...     return Differ().compare(a, b)
...
>>> print(b64decode("e5kmfqkJbk+0BnNy46XDcA==").hex(), UUID('7e26997b-09a9-4f6e-b406-7327e3a5c370').hex, sep="\n")
7b99267ea9096e4fb4067372e3a5c370
7e26997b09a94f6eb4067327e3a5c370
>>> pprint(list(hexdiff(b64decode("e5kmfqkJbk+0BnNy46XDcA=="), UUID('7e26997b-09a9-4f6e-b406-7327e3a5c370').bytes)))
['+ 7e',
 '+ 26',
 '+ 99',
 '  7b',
 '+ 09',
 '- 99',
 '- 26',
 '- 7e',
 '  a9',
 '- 09',
 '+ 4f',
 '  6e',
 '- 4f',
 '  b4',
 '  06',
 '  73',
 '- 72',
 '+ 27',
 '  e3',
 '  a5',
 '  c3',
 '  70']

Note that a UUID consists of various components, so for your data, we have:

component (byte count) input powershell output
time_low (4) 7e 26 99 7b 7b 99 26 7e
time_mid (2) 09 a9 a9 09
time_hi_and_version (2) 4f 6e 6e 4f
clock_seq_hi_and_reserved (1) b4 b4
clock_seq_low (1) 06 06
node (6) 73 27 e3 a5 c3 70 73 72 e3 a5 c3 70

which then nicely explains most of the differences. I bet that if you ran the same Powershell function a few times, that you’d get different values as time progresses.

The only difference I don’t then know how to reliably change, is the second byte of the node component, which here clearly changed from 27 to 72. Perhaps 72 is the Immutable ID identifier?

As far as I know, powershell script do quite a same as Your Python function uuid_to_immutableid above; It converted UID to bytearray and then b64encoded it to b64 string as seen below in code copied from Power Shell script used to conversion

$guid = [GUID]$valuetoconvert
$bytearray = $guid.tobytearray()
$immutableID = [system.convert]::ToBase64String($bytearray)

Clearly it doesn’t. It appears to create a new GUID based on the old.

Most importantly, the Powershell value has a 6 in the version position (6e 4f), so either we are interpreting this value entirely wrong, or this is another little vs big endian issue that makes it such a joy to work with MS data sometimes.

In fact, that’s exactly what is going on. Out of time right now, but the bytes for each time component are listed in reverse order.

I can write code for this tomorrow, perhaps.

Thanks! This is not in great hurry, but nice to have it working some day. Until that day I can call PoverShell script from my Python code to get right value out of conversion.

In fact, PowerShell scipt will always give me the same ImmutableId for same UID.

Yes, I already figured out that it’s not a UUID1 with an increasing time value, but that the time_* fields of the UUID are encoded in little-endian order rather than network (big-endian) order. I found the Guid.ToByteArray method documentation which confirms this:

// The example displays output similar to the following:
//
//    Guid: 35918bc9-196d-40ea-9779-889d79b753f0
//    C9 8B 91 35 6D 19 EA 40 97 79 88 9D 79 B7 53 F0
//    Guid: 35918bc9-196d-40ea-9779-889d79b753f0 (Same as First Guid: True)

The only difference that remains then is the 72 vs 27 byte in the node value; could it be that you transposed two digits there?

No, I don’t transpose those. I just did quick and dirty mod to Your function and now it’ll give same ImmutableId as Power Shell script. It reverses time fields in uuid_string and concat them to new ms_uuid_string which it converts to ImmutableId.

def uuid_to_immutableid(uuid_string: str) -> str:
# split uuid_string to its parts
parts = uuid_string.split(’-’)
# time_low
time_low = parts[0][6:] + parts[0][4:-2] + parts[0][2:-4] + parts[0][:2]
# time_mid
time_mid = parts[1][2:] + parts[1][0:2]
# time_hi_and_version
time_hi_and_version = parts[2][2:] + parts[2][0:2]

# concat ms_uuid_string
ms_uuid_string = time_low +"-"+ time_mid +"-"+ time_hi_and_version +"-"+ parts[3] +"-"+ parts[4]

# convert ms_uuid_strin to UUID and encode it to b64
return b64encode(UUID(ms_uuid_string).bytes).decode('ascii')

Assuming you did transpose two digits, here is Python code that encodes the time fields as little-endian bytes:

import struct
from base64 import b64decode, b64encode
from uuid import UUID

# Microsoft encodes the fields of a GUID to little-endian bytes, with the node
# encoded as an array of 6 char.
# see https://docs.microsoft.com/en-us/dotnet/api/system.guid.tobytearray
# and https://docs.microsoft.com/en-us/dotnet/api/system.guid.-ctor
# This struct handles all but the node field.
_time_clock_fields = struct.Struct('<IHHBB')

def guid_to_immutableid(uuid_string: str) -> str:
    guid = UUID(uuid_string)
    time_clock_val = _time_clock_fields.pack(*guid.fields[:5])
    return b64encode(time_clock_val + guid.node.to_bytes(6, 'big')).decode('ascii')

def immutableid_to_guid(immutableid_string: str) -> str:
    bytes_val = b64decode(immutableid_string)
    time_fields = _time_fields.unpack_from(bytes_val)
    guid = UUID(fields=(*time_fields, int.from_bytes(bytes_val[-6:], 'big')))
    return str(guid)

Revisiting the PowerShell script I found earlier, which I thought was converting two different values, I can produce the same base64 value now and round-trip back to the same GUID:

>>> guid_to_immutableid('748b2d72-706b-42f8-8b25-82fd8733860f')
'ci2LdGtw+EKLJYL9hzOGDw=='
>>> immutableid_to_guid('ci2LdGtw+EKLJYL9hzOGDw==')
'748b2d72-706b-42f8-8b25-82fd8733860f'

If you didn’t then your code doesn’t produce the exact same output. You wrote:

but the base64 value would suggest you converted 7e26997b-09a9-4f6e-b406-7372e3a5c370, which differs in the second byte of the node value:

  • 73 27 e3 a5 c3 70 (your GUID)
  • 73 72 e3 a5 c3 70 (the encoded base64 value)

Everything suggests that 7e26997b-09a9-4f6e-b406-7327e3a5c370 would encode to e5kmfqkJbk+0BnMn46XDcA==:

  • e5kmfqkJbk+0BnNy46XDcA==
  • e5kmfqkJbk+0BnMn46XDcA==

And I just figured out we didn’t need to do any of this work. uuid.UUID() supports a bytes_le value:

from base64 import b64decode, b64encode
from uuid import UUID

def guid_to_immutableid(uuid_string: str) -> str:
    guid = UUID(uuid_string)
    return b64encode(guid.bytes_le).decode('ascii')

def immutableid_to_guid(immutableid_string: str) -> str:
    bytes_val = b64decode(immutableid_string)
    return str(UUID(bytes_le=bytes_val))

which produces the exact same output. :stuck_out_tongue:

From the UUID.bytes_le documentation:

The UUID as a 16-byte string (with time_low , time_mid , and time_hi_version in little-endian byte order).

That’s true. 7e26997b-09a9-4f6e-b406-7327e3a5c370 will turn to e5kmfqkJbk+0BnMn46XDcA==

Sorry about messing this. There was that typo in UUID string just as You said

That’s Great! I just started to digging into that struct stuff ( because I didn’t understand anything how it works) but now I can push understanding of it to the future.