Convert column of data from base36 to Base10

I am using Python 3.9 with PyCharm and am a newbie. I am trying to decode a column in my DataFrame which has Base36 data and I need to convert it to Base10. I have scoured the internet and to no avail I have tried everything I have come across only to receive all kinds of errors. Any help would be appreciated.

You shouldn’t need to scour the internet for this stuff. You will need
to know a little about base 36 and strings and ordinals.

Also, we won’t provide complete solutions for problems which look like
homework, because learning comes from writing the code. So show us some
code you’ve written and its output, and any errors. Pasted inline as
text please.

We’re happy to critique code, explain not understood errors and suggest
approaches.

So: base 36.

Base ten uses the numerals 0 through 9, and bases up to 36 append the
latin alphabet for the higher numerals, A for 10, B for 11 through to Z
for 35.

Now, as it happens, the int() class constructor knows how to convert a
string in bases up to 36 already. Read here:

https://docs.python.org/3/library/functions.html#int

If you want to do it yourself you need to:

  • decode a string into characters - strings are sequences, so you can
    just iterate over one in a loop or turn it into a list; there’s no
    “char” type in Python, so you just get single character strings
  • converting a character (single character string) to the numeral value
    (0 for ‘0’, 11 for ‘B’ etc) is done by taking its ordinal (character
    code, if you like) and subtracting an offset
  • add and multiply the numeral values together to compute the final
    value

You probably want to convert the string to lower or upper case first to
save converting ‘b’ and ‘B’ separately.

Python strings are sequences of Unicode codepoints, which are the same
as ASCII for the first 128 values. The digits and latin letters are in
that range. You don’t even need to know that! You do need to know that
the digit codes are contiguous, and so are the upper case letters and so
are the lower case letters, but each of these groups start at different
places. So if you’ve got a character in the variable “c”:

The character code comes from the ord() function:

code = ord(c)

For a digit, numeral = ord(c) - ord(‘0’)
For an uppercase letter, numeral = ord(c) - ord(‘A’) + 10
For a lowercase letter, numeral = ord(c) - ord(‘a’) + 10

Then compute the value by adding and multiplying the numerals.

Cheers,
Cameron Simpson cs@cskk.id.au

Cameron, This is not an assignment, I am new to python and trying to understand how to do some stuff with it. I do appreciate the link and knowledge but I am not trying to reinvent the wheel as I am sure it is out there and used to be part of python is what I have read. This is for my own knowledge and betterment. Thanks

Here is one of the things I tried and the results

def base36decode(number):
return int(number, 36)

base36decode(‘Order_ID’)

RESULTING IN
print(df[‘Order_ID’] .loc[0:4])

0 AK27GA0000DT
1 BK27GA00000K
2 BK27GA00000L
3 BK27GA00000M
4 AK27GA0000DU

Here is one of the things I tried and the results

def base36decode(number):
return int(number, 36)

That’s the int() suggestion I made in my post.

The rest was about how you’d go about doing this if you needed to make
one from scratch.

base36decode(‘Order_ID’)

This call tries to decode the string ‘Order_ID’. I doubt that is what
you want.

RESULTING IN
print(df[‘Order_ID’] .loc[0:4])

That print() looks like code, not output.

0 AK27GA0000DT
1 BK27GA00000K
2 BK27GA00000L
3 BK27GA00000M
4 AK27GA0000DU

Should I presume that you actually want to decode “AK27GA0000DT” and
friends?

It helps if you post your complete programme, ideally trimmed to be as
small as possible, so others can see the whole thing or run it.

I’m guessing you’re using a pandas dataframe. It looks like the
expression:

df['Order_ID'] .loc[0:4]

returns a special view of a column of your dataframe. Your print() call
tries to print:

str(df['Order_ID'] .loc[0:4])

because print() prints str() of each of its arguments. Looks like that
produces a nice little table output for you.

I’m not a pandas user, but I would expect that view to be iterable. So
you could go:

for order_id in df['Order_ID'] .loc[0:4]:
    print("Order_ID', order_id, base36decode(order_id))

and see what it produces. Note that here you’re calling
base36decode(order_id), and not base36decode(‘Order_ID’). The former
oasses in each value from the loop, whereas the latter passes in the
fixed string ‘Order_ID’ (not what you want).

Cheers,
Cameron Simpson cs@cskk.id.au