Problems entering Unicode character literals using \u syntax

Hi everyone,

I’m trying to do something that I believe is not too complicated but I could not make it work. Here is the problem:

Say that we have a list L = [a,u,i]. I would like to create from L the word w=âûî. The unicode for â is given by a\u0302, for û by u\u0302 and so on. So what I tried to do was:

w = ' '
for i in range(3):
           w.join(' L[i]\u0302')

unfortunately this method does not provide what I want…

Thanks for your eventual help,
Nathan

What you may be looking for is this:

L = ['a', 'u', 'i']
output = ''
for i in L:
    output += f"{i}\u0302"
print(output)

To add:

Another way to do that:

L = ['a', 'u', 'i', '']
word = '\u0302'.join(L)
print(word)

To actually explain a bit why this doesn’t work as well as point out several other mistakes here you’ll want to look out for, what you can learn from this and how you can avoid making them again in the future. I’ll start by reproducing your code, with the list L included and the indents fixed:

L = ['a','u','i']
w = ' '
for i in range(3):
    w.join(' L[i]\u0302')

For starters, if you print(w) after running your code snippit, you can see it still prints ' '—your loop didn’t actually change anything. Why? In Python, strings are immutable, which means they cannot be changed in place—the only way you can modify a string is by creating a brand new string object. Therefore, a string method like join cannot modify the string in place, but rather returns a new string. Above, you aren’t assigning that string to a new or existing variable name (or printing it, etc), so it just gets thrown away and the original w value remains unchanged. You can fix that issue by assigning the result of w.join() back to w, which de-facto “updates” the string to the new value:

L = ['a','u','i']
w = ' '
for i in range(3):
    w = w.join('L[i]\u0302')

What do we have now?

>>> print(w)
print(w)
    L [ i ] ̂L  L [ i ] ̂[  L [ i ] ̂i  L [ i ] ̂]  L [ i ] ̂̂L   L [ i ] ̂L  L [ i ] ̂[  L [ i ] ̂i  L [ i ] ̂]  L [ i ] ̂̂[   L [ i ] ̂L  L [ i ] ̂[  L [ i ] ̂i  L [ i ] ̂]  L [ i ] ̂̂i   L [ i ] ̂L  L [ i ] ̂[  L [ i ] ̂i  L [ i ] ̂]  L [ i ] ̂̂]   L [ i ] ̂L  L [ i ] ̂[  L [ i ] ̂i  L [ i ] ̂]  L [ i ] ̂̂̂

Err, that doesn’t look right—we just have L [ i ] ̂ repeated over and over, rather than the result we want. Let’s check the docs for the join() method:

Return a string which is the concatenation of the strings in iterable. A TypeError will be raised if there are any non-string values in iterable, including bytes objects. The separator between elements is the string providing this method.

Hmm, that’s not what we want. The join method inserts the string w between every item in the iterable, which in this case is a string which gets iterated by character and has w inserted between each one, which then gets assigned back to w. The loop then repeats twice more, except this time w is not just a space but the whole string with spaces. That’s not what we want here—we can instead have two options—first, we can concatenate the string with one desired latter + accent on each iteration (as in Rob’s first example), or we can construct each letter + accent, once per iteration, and then use join to concatenate that whole list in one go. The second approach is more efficient, so let’s try it:

L = ['a','u','i']
w_list = []
for i in range(3):
    w_list.append('L[i]\u0302')
w = ''.join(w_list)

Let’s see what we have now:

>>> print(w)
 L[i]̂ L[i]̂ L[i]̂

We’re getting there but instead of being the value of L[i], (i.e. a, u, i), the string instead contains a literal L[i]. Why? Well, Python has no way of knowing you want it to parse the string and treat these particular characters as Python code instead of literal characters. We can solve that by using an f-string (placing an f in front of the opening quote), and wrapping the parts we want to treat as code with {}:

L = ['a','u','i']
w_list = []
for i in range(3):
    w_list.append(f'{L[i]}\u0302')
w = ''.join(w_list)

And if we check our result:

>>> print(w)
âûî

We see that it worked! However, there are still some issues with our code we should fix. First, we’re creating a list, and then iterating over a range with a hardcoded maximum index, and then using that index to access the list. That’s a lot of extra work, and if you add or remove something from L, your code will stop working correctly and contain a bug. Therefore, let’s just iterate over the list directly:

L = ['a','u','i']
w_list = []
for c in L:
    w_list.append(f'{c}\u0302')
w = ''.join(w_list)

We can still do better, though. We’re creating a new list, then setting up a for loop, then performing our operation, then appending to the list. With a list comprehension, we can do that all in one go, which is shorter and more efficient:

L = ['a','u','i']
w_list = [f'{c}\u0302' for c in L]
w = ''.join(w_list)

In fact, we don’t even need to create a list and then join it; we can pass the iterator (generator) directly to join which is even shorter and more efficient:

L = ['a','u','i']
w = ''.join(f'{c}\u0302' for c in L)

Finally, we really should use more descriptive and readable variable names:

vowels = ['a','u','i']
word = ''.join(f'{vowel}\u0302' for vowel in vowels)

By the way, when adding code or output to your post, please make sure you enclose it in code fencing so it is formatted correctly for others to read and copy, as I’ve done for you this time. You can do so with the </> button in the toolbar, or via typing triple backticks above and below the code in question, optionally with the language name (e.g. python) for syntax highlighting, like this:

```python
<YOUR CODE HERE>
```
1 Like

Hi Rob and Gerlach,

Thank you for the code and the clear explanations, it was very helpful :slight_smile: . I did not know the option f-string, very convenient. And sorry for the layout of the code, first time using this forum. Next time I’ll do it properly.

Cheers,
Nathan

1 Like