Having a problem getting around encoding error on a Python 3.6
UnicodeDecodeError: ‘utf-16-le’ codec can’t decode bytes in position 14-15:
unexpected end of data
Any suggestions?
-Thanks
That means that the data cannot be decoded. Either the file is corrupted, or it is being read using the wrong encoding. UTF-16 is not so common as far as I know; are you overriding the encoding manually? Have you tried opening the file or stream with explicit encoding, e.g., open(..., encoding="utf8")
?
Thank you for responding. I am using statement and several variations of -
if type(content) == str:
content = content.encode(‘utf-8’)
This works with other Turkish files I have been working with.
So, how are you reading the files? Could you post a code example that fails along with the complete error traceback?
Thanks - This is a work script and a lot of it is packaged so I don’t have access to a lot of the code. I will pursue this with our engineering team.
Thank you so much for taking the time to help me.
The “unexpected end of data” message suggests that the bytes in position 14-15 are part of “surrogate pairs”, and therefore the bytes in position 15-16 must also be a surrogate pair.
Check if the bytes in 14-15 lies between D800 and DBFF. If so, then you’ll need to grab them best two bytes to compete the character.