New to Python...Getting Syntax Error

jamainebarnes · May 31, 2022, 2:56pm

I am getting this error below.

SyntaxError: (unicode error) ‘unicodeescape’ codec can’t decode bytes in position 2-3: truncated \UXXXXXXXX escape

It seems to have something to do with this line in the script…

source_folder = Path(‘C:\Users\JB095808\OneDrive - Cerner Corporation\Documents\VA EHRM TO 0043\RLM Log Sample\SampleRLMLogs\Logs’)

Is my syntax incorrect?

vbrozik · May 31, 2022, 3:03pm

The backslash character \ has a special meaning in strings in Python and many other languages. It changes the meaning of the character(s) following it. You have to either escape it using \ or use a raw string:

r'C:\Users\JB095808\OneDrive - Cerner Corporation\Documents\VA EHRM TO 0043\RLM Log Sample\SampleRLMLogs\Logs'

Also note that in your post you have incorrect quote characters around the string.

More information about string literals: 2. Lexical analysis — Python 3.10.4 documentation

sandraC · May 31, 2022, 3:38pm

Also note that in your post you have incorrect quote characters around the string.

It is probably a matter of Discourse. It re-formats some characters unless you do not use md code blocks.

mlgtechuser · June 4, 2022, 4:51am

To extend Václav’s reply, Jamaine, if backslash ( ’ \’ ) is one of the character in a string, the default meaning of the backslash is “the following character is a control character” where a control character is a special code to make the interpreter do something more than just display a character in a string. You may have already seen ’ \n ’ used to trigger a Line Feed (go to next line) in a print() instruction like

print("first line \n second line \n third line")

So with source = Path('C:\Users') Python’s interpreter expects that ‘U’ is a control code. However, there is no such control code, so the error message is telling you that ‘\U’ can’t be decoded as a control code. If you happen to follow the backslash with a valid control code, this can lead to some unexpected behavior like when you run something like print("this\next week").

To use the backslash as a character in a string, you have to reassert the backslash so Python knows that ‘\\’ means “this is only a backslash, not a control code”. So your system paths in Python will look like ‘C:\\Users\\...’

BTW, this “escape character” is found in many programming languages. In fact, this forum platform (Discourse) uses it. Here’s a useful example: When you paste code here, you should “fence” it with three backticks ( ``` ) as the first and last lines. Since three backticks are a control code to trigger the monotext font, I had to precede the backticks in the previous sentence with a backslash to “escape out” of the default behavior in order to just print the backticks.

The ``` fence also has other useful features like preserving indents and suppressing curved quotation marks. So next time you paste code, it will look like this in your raw post text:

```python
<your code here>
```
The ‘python’ on the first line activates color coding so that the code components are easier to track.

source_folder = Path('C:\Users\JB095808\...')

P.S. These text format control codes are called “Markdown”. (@epicwink abbreviated it as ‘md’, in case you didn’t recognize this abbreviation). A full treatment of the Commonmark markdown used by Discourse (and other platforms, like WhatsApp) CAN BE FOUND HERE. AND ALSO HERE

vbrozik · June 6, 2022, 8:25am

@mlgtechuser

Just a small correction: \U has a special meaning in Python. It marks Unicode 32-bit codepoint in hexadecimal. See The String Type in the documentation. That is the reason the error message contained: (unicode error) ‘unicodeescape’ codec can’t decode bytes …

Here is an example how you can write the letter A multiple ways (32b, 16b, 8b):

>>> '\U00000041 == \u0041 == \x41 == A'
'A == A == A == A'

Edit: Non-existing escape sequences do not cause errors. They have the literal meaning:

>>> '\z'
'\\z'

mlgtechuser · June 6, 2022, 12:25pm

Thanks for clarifying this, Václav. I tested \U before posting but haven’t used an escaped unicode before.

So the issue is that what folllows the \U isn’t a unicode value.

And this example might illustrate a non-existent escape code a little clearer (both are informative):

>>>print("\c")
\c

eryksun · June 6, 2022, 1:08pm

The error message states “truncated \UXXXXXXXX escape” at position 2-3 in "C:\Users...". How can this be improved to make it clear that 8 hexadecimal (“X”) digits are expected after "\U" in the string literal? Does Python need error codes for its syntax errors that can be looked up online or via help()? Or references to sections of the documentation?

vbrozik · June 6, 2022, 2:15pm

@eryksun I think that ability to go from an error message to the most useful place in the documentation would be extremely useful for beginners. Error codes could probably be a solution…

Also ability to go from a standard library object in REPL, IPython, vim, VS Code… to the documentation would be very useful.

steven.daprano · June 6, 2022, 3:30pm

If beginners won’t google for error messages, they certainly won’t google for error codes.

https://www.startpage.com/sp/search?query=truncated+\UXXXXXXXX+escape