In python3(3.9 and 3.11), I am using codecs.EncodedFile inside DictWriter to write some data to a csv file which may contain non-ascii characters too.
Using below code-
import codecs, csv, tempfile, os
s=b'c1318'
print(s, type(s))
temp_file_dir = "/Users/chaturvedi/Documents"
file_fd, tmp_out_file_url = tempfile.mkstemp(dir=temp_file_dir, text=True)
print("file_fd=", file_fd, "tmp_out_file_url=", tmp_out_file_url)
out_file_descriptor = os.fdopen(file_fd, "wb")
print("out_file_descriptor=", out_file_descriptor)
csv_writer = csv.DictWriter(codecs.EncodedFile(out_file_descriptor, 'utf-8', 'utf-16'), [b'head'], extrasaction='ignore', dialect='excel-tab')
print("csv_writer=", csv_writer.__dict__)
csv_writer.writerow({b'head':s})
But when i run this i get this error-
Traceback (most recent call last):
File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/csv.py", line 162, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen codecs>", line 836, in write
File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/encodings/utf_8.py", line 24, in decode
return codecs.utf_8_decode(input, errors, True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: a bytes-like object is required, not 'str'
Upon some investigation i found that the bytes data is being converted to string before reaching the decode method of encodings/utf_8.py. This method in turn calls codecs.utf_8_decode which expects a byte data.
Below are some logs-
(python3_env) chaturvedi@Abhisheks-MacBook-Pro scripts % python encoding_test.py
b'c1318' <class 'bytes'>
file_fd= 3 tmp_out_file_url= /Users/chaturvedi/Documents/tmpvh76vaji
out_file_descriptor= <_io.BufferedWriter name=3>
csv_writer= {'fieldnames': [b'head'], 'restval': '', 'extrasaction': 'ignore', 'writer': <_csv.writer object at 0x1003f9480>}
reached writerow
rowdict= {b'head': b'c1318'}
self.writer= <_csv.writer object at 0x1003f9480> <built-in method __dir__ of _csv.writer object at 0x1003f9480>
k= b'head' <class 'bytes'>
v= b'c1318' <class 'bytes'>
self.writer= <_csv.writer object at 0x1003f9480>
reached decode
input= b'c1318'
type= <class 'str'>
I tried several tweaks like changing binary mode to text, passing str data etc but nothing worked
Some conversion is taking place in between which I am not able to see since its happening in frozen code it seems(either _csv or _codecs). Upon explicitly converting it to bytes in decode method, it works
Also, In python2 same code works fine without any issues
Please help me find a solution to this or confirm if this is some bug w.r.t to python3