About csv file typed in non English language

I am trying to make a csv file for Python input but the content is typed in non English(Persian Language). However whatever I typed in Persian language in excel file, in the csv file gets reduced to question marks. I have seen other people have asked the similar questions (for a variety of diff languages) in different forums but recommended solutions have not worked with my file.
Any specific solution please?
Thank you!

How are you creating the CSV file from Excel? Did you choose UTF-8 encoding?

3 Likes

Yes, when I created the file I chose UTF-8 encoding. Now that want to open the file there are some question marks. And also when want to read using pandas gives me error.

OK, so you probably created the file properly.

When you say ‘open the file’ what exactly do you mean? Which program are you using to open the file?

If your pandas code is not working, please copy and paste the code into a code block in your reply here so that we can see what you’re trying to do.

3 Likes

By Kevin P. Fleming via Discussions on Python.org at 07Jun2022 22:32:

If your pandas code is not working, please copy and paste the code into
a code block in your reply here so that we can see what you’re trying
to do.

And for clarity, it would also help to see a few of the offending lines
in the CSV file itself. Open it in a text editor (CSV files should be
just text) and display some troublesome lines here between triple
backticks:

```
some CSV lines here
```

It will help isolate whether your problems come exporting the data from
Excel, or from importing the data into Pandas.

Cheers,
Cameron Simpson cs@cskk.id.au

2 Likes

Thank you Kevin P. Fleming!

  • Regarding your first question, When I double click and open csv file it shows me question marks instead of Persian language words.
  • Second, following is pandas code I’m using:
    “”“import pandas as pd
    result =pd.read_csv(‘New1.csv’)
    print(result.to_string())”""

The output of the code is:
“”" ?? ?? ??? ??? ???.1
0 2149984 ??? ??? ???
1 2149876 ??? ??? ???
2 2134756 ??? ??? ???
3 2149076 ??? ??? ???
4 2134567 ??? ??? ???
5 2147893 ??? ??? ???
6 2145678 ??? ??? ???
7 2189345 ??? ??? ???
8 2147393 ??? ??? ???
9 2238437 ??? ??? ???
“”"
Note: I’m using Google Colab but tried many times in other IDE as well but had the same problem.

Thank you Camerson Simpson!
Below is few lines of CSV file:
‘’’ ?? ?? ??? ??? ???.1
0 2149984 ??? ??? ???
1 2149876 ??? ??? ???
2 2134756 ??? ??? ???
3 2149076 ??? ??? ???
4 2134567 ??? ??? ???
5 2147893 ??? ??? ???
6 2145678 ??? ??? ???
7 2189345 ??? ??? ???
8 2147393 ??? ??? ???
9 2238437 ??? ??? ???
‘’’
Note: the numbers are ID number which is written in English and question marks are Persian Language names and words.

Try:

pd.read_csv('New1.csv', encoding='utf-8')

The pandas documentation seems a bit unclear about the default encoding used.

Although on Google Colab your code works without changes for me with a New1.csv file containing Persian text in UTF-8.