Problems entering Unicode character literals using \u syntax

NathanCL · March 23, 2023, 5:32am

Hi everyone,

I’m trying to do something that I believe is not too complicated but I could not make it work. Here is the problem:

Say that we have a list L = [a,u,i]. I would like to create from L the word w=âûî. The unicode for â is given by a\u0302, for û by u\u0302 and so on. So what I tried to do was:

w = ' '
for i in range(3):
           w.join(' L[i]\u0302')

unfortunately this method does not provide what I want…

Thanks for your eventual help,
Nathan

rob42 · March 23, 2023, 6:37am

What you may be looking for is this:

L = ['a', 'u', 'i']
output = ''
for i in L:
    output += f"{i}\u0302"
print(output)

To add:

Another way to do that:

L = ['a', 'u', 'i', '']
word = '\u0302'.join(L)
print(word)

CAM-Gerlach · March 23, 2023, 12:00pm

To actually explain a bit why this doesn’t work as well as point out several other mistakes here you’ll want to look out for, what you can learn from this and how you can avoid making them again in the future. I’ll start by reproducing your code, with the list L included and the indents fixed:

L = ['a','u','i']
w = ' '
for i in range(3):
    w.join(' L[i]\u0302')

For starters, if you print(w) after running your code snippit, you can see it still prints ' '—your loop didn’t actually change anything. Why? In Python, strings are immutable, which means they cannot be changed in place—the only way you can modify a string is by creating a brand new string object. Therefore, a string method like join cannot modify the string in place, but rather returns a new string. Above, you aren’t assigning that string to a new or existing variable name (or printing it, etc), so it just gets thrown away and the original w value remains unchanged. You can fix that issue by assigning the result of w.join() back to w, which de-facto “updates” the string to the new value:

L = ['a','u','i']
w = ' '
for i in range(3):
    w = w.join('L[i]\u0302')

What do we have now?

>>> print(w)
print(w)
    L [ i ] ̂L  L [ i ] ̂[  L [ i ] ̂i  L [ i ] ̂]  L [ i ] ̂̂L   L [ i ] ̂L  L [ i ] ̂[  L [ i ] ̂i  L [ i ] ̂]  L [ i ] ̂̂[   L [ i ] ̂L  L [ i ] ̂[  L [ i ] ̂i  L [ i ] ̂]  L [ i ] ̂̂i   L [ i ] ̂L  L [ i ] ̂[  L [ i ] ̂i  L [ i ] ̂]  L [ i ] ̂̂]   L [ i ] ̂L  L [ i ] ̂[  L [ i ] ̂i  L [ i ] ̂]  L [ i ] ̂̂̂

Err, that doesn’t look right—we just have L [ i ] ̂ repeated over and over, rather than the result we want. Let’s check the docs for the join() method:

Return a string which is the concatenation of the strings in iterable. A TypeError will be raised if there are any non-string values in iterable, including bytes objects. The separator between elements is the string providing this method.

Hmm, that’s not what we want. The join method inserts the string w between every item in the iterable, which in this case is a string which gets iterated by character and has w inserted between each one, which then gets assigned back to w. The loop then repeats twice more, except this time w is not just a space but the whole string with spaces. That’s not what we want here—we can instead have two options—first, we can concatenate the string with one desired latter + accent on each iteration (as in Rob’s first example), or we can construct each letter + accent, once per iteration, and then use join to concatenate that whole list in one go. The second approach is more efficient, so let’s try it:

L = ['a','u','i']
w_list = []
for i in range(3):
    w_list.append('L[i]\u0302')
w = ''.join(w_list)

Let’s see what we have now:

>>> print(w)
 L[i]̂ L[i]̂ L[i]̂

We’re getting there but instead of being the value of L[i], (i.e. a, u, i), the string instead contains a literal L[i]. Why? Well, Python has no way of knowing you want it to parse the string and treat these particular characters as Python code instead of literal characters. We can solve that by using an f-string (placing an f in front of the opening quote), and wrapping the parts we want to treat as code with {}:

L = ['a','u','i']
w_list = []
for i in range(3):
    w_list.append(f'{L[i]}\u0302')
w = ''.join(w_list)

And if we check our result:

>>> print(w)
âûî

We see that it worked! However, there are still some issues with our code we should fix. First, we’re creating a list, and then iterating over a range with a hardcoded maximum index, and then using that index to access the list. That’s a lot of extra work, and if you add or remove something from L, your code will stop working correctly and contain a bug. Therefore, let’s just iterate over the list directly:

L = ['a','u','i']
w_list = []
for c in L:
    w_list.append(f'{c}\u0302')
w = ''.join(w_list)

We can still do better, though. We’re creating a new list, then setting up a for loop, then performing our operation, then appending to the list. With a list comprehension, we can do that all in one go, which is shorter and more efficient:

L = ['a','u','i']
w_list = [f'{c}\u0302' for c in L]
w = ''.join(w_list)

In fact, we don’t even need to create a list and then join it; we can pass the iterator (generator) directly to join which is even shorter and more efficient:

L = ['a','u','i']
w = ''.join(f'{c}\u0302' for c in L)

Finally, we really should use more descriptive and readable variable names:

vowels = ['a','u','i']
word = ''.join(f'{vowel}\u0302' for vowel in vowels)

By the way, when adding code or output to your post, please make sure you enclose it in code fencing so it is formatted correctly for others to read and copy, as I’ve done for you this time. You can do so with the </> button in the toolbar, or via typing triple backticks above and below the code in question, optionally with the language name (e.g. python) for syntax highlighting, like this:

```python
<YOUR CODE HERE>
```

NathanCL · March 24, 2023, 5:26am

Hi Rob and Gerlach,

Thank you for the code and the clear explanations, it was very helpful . I did not know the option f-string, very convenient. And sorry for the layout of the code, first time using this forum. Next time I’ll do it properly.

Cheers,
Nathan

Topic		Replies	Views
This won't work Python Help	4	376	June 3, 2021
How do i transform a list into a string whitout making the string containing the signs Python Help release	4	497	September 29, 2020
Indexing operators and string elements Python Help	6	424	July 14, 2021
TypeError: 'src' object does not support item assignment Python Help	2	1986	November 20, 2021
Python add or insert tab (\t) into list Python Help help	4	9291	June 8, 2023

Problems entering Unicode character literals using \u syntax

Related Topics