Hello,
This is as expected. It is doing what it is supposed to be doing. See definition below.
Maybe the default on your system is something other than UTF-8
? Have you tried telling the encoding
and decoding
functions explicitly which type of coding type that you want implemented?
Here is a simple test script. Note that you start with the original string with type str
. It converts it to type raw bytes
prior to sending. At the receiver end, it is decoded to obtain the original message type str
.
strVar = 'eggs and ham'
print('\nOriginal string: ', strVar)
encode_str = strVar.encode(encoding='utf-8')
print('After encoding: ', encode_str)
decode_str = a.decode(encoding='utf-8')
print('After decoding: ', decode_str)
After running this test snippet, you should get:
Original string: eggs and ham
After encoding: b'eggs and ham'
After decoding: eggs and ham
If you tell it which coding scheme that you want explicitly, there will be no ambiguity.
These janky
types are special characters that fall outside of the regular ASCII
type characters that have been encoded to type bytes
.
print('\nSpecial character: Ä')
special = 'Ä'
sp_encoded = special.encode(encoding='utf-8')
print('After encoding: ', sp_encoded)
sp_decoded = sp_encoded.decode('utf-8')
print('After decoding: ', sp_decoded)
If you run the snippet, you get this:
Special character: Ä
After encoding: b'\xc3\x84'
After decoding: Ä
So, if you are getting byte strings
, of any form, you should decode
.
— Encoding —
When translating characters from and to raw bytes - the rules for translating a string of Unicode
characters to a sequence of bytes, and extracting a string from a sequence of bytes. This translation back and forth between bytes and strings is defined by two terms: Encoding
and Decoding
.
• We encode
from string to raw bytes.
• We decode
from raw bytes to strings.
Encodings really only apply when text is stored or transferred externally, in files and other mediums. Text is translated to and from an encoding-specific format only when it is transferred to or from external text files, byte strings, or APIs with specific encoding requirements. Once in memory, though, strings have no encoding.