Change class from '_io.TextIOWrapper' to 'bytes'

Hello Pythonic minds,

I have an text file named “ex23_RawBytes_ConvertedToBytes.txt”

These are just text with class ‘_io.TextIOWrapper’ when I checked type in python

Any way I can convert them into bytes?

The reason I am doing this is that I have all the raw bytes in text but their type is not actually ‘bytes’ which is needed by me to convert them to cooked strings.

Below is the python code I am using.

import sys
script, input_decoding, error = sys.argv


def main(RawBytes_file, decoding, errors):
    line = RawBytes_file.readline()
    
    if line:
        print_line(line, decoding, errors)
        return main(RawBytes_file, decoding, errors)


def print_line(line, decoding, errors):
    next_lang = line.strip()
    cooked_string = next_lang.decode(decoding, errors = errors)
    raw_bytes = cooked_string.encode(decoding, errors = errors)

    print(raw_bytes, "<===>", cooked_string)


languages = open("ex23_RawBytes.txt", encoding="utf-8")

main(languages, input_decoding, error)

Please find the text file link below.

ex23_RawBytes_ConvertedToBytes.txt

Please let me know if I am missing something here.

Open the file with open(path, mode="rb") without encoding.

See: the Python docs for open

(For reading and writing raw bytes use binary mode and leave encoding unspecified.)

yugalgarg@Yugals-MacBook-Pro PythonExercise % python3 ex23_ExtraChallenging.py bytes strict                                           
Traceback (most recent call last):
  File "/Users/yugalgarg/Desktop/PythonExercise/ex23_ExtraChallenging.py", line 23, in <module>
    main(languages, input_decoding, error)
  File "/Users/yugalgarg/Desktop/PythonExercise/ex23_ExtraChallenging.py", line 9, in main
    print_line(line, decoding, errors)
  File "/Users/yugalgarg/Desktop/PythonExercise/ex23_ExtraChallenging.py", line 15, in print_line
    cooked_string = next_lang.decode(decoding, errors = errors)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
LookupError: unknown encoding: bytes

This is the error when I am running the Python script.

To read the raw content of a file do the following:

with open(filename, 'rb') as f:
    raw_contents = f.read()

To read the contents of a file as a specific encoding, for example utf-8 use the following:

with open(filename, 'r', encoding='utf-8') as f:
      unicode_text_contents = f.read()

There is no such thing as a bytes encoding.

1 Like
raw_bytes = b'\xe6\x96\x87\xe8\xa8\x80'

utf_string = "文言"

raw_bytes.decode()

utf_string.encode()

raw_bytes == utf_string.encode()

utf_string == raw_bytes.decode()

print(type(raw_bytes))

This is what I mean by class bytes. If you try run this code to print the type of this code it will give you the below result

<class 'bytes'>

This is working very fine but I want to make it compatible with the below code. The below code:

import sys
script, input_decoding, error = sys.argv


def main(RawBytes_file, decoding, errors):
    line = RawBytes_file.readline()
    
    if line:
        print_line(line, decoding, errors)
        return main(RawBytes_file, decoding, errors)


def print_line(line, decoding, errors):
    next_lang = f.read()
    cooked_string = next_lang.decode(decoding, errors = errors)
    raw_bytes = cooked_string.encode(decoding, errors = errors)

    print(raw_bytes, "<===>", cooked_string)


languages = open("ex23_RawBytes_ConvertedToBytes.txt", "rb")

main(languages, input_decoding, error)

Sorry you cannot.

You need different code to handle bytes and text that is decoded.

Also your code recursively calls main and will crash.

Just to be very clear: _io.TextIOWrapper is the return type from open(filename). It has nothing to do directly with the type of the data in the file, but the mode in which it is being read, and represents the object from which you will read the data.

In this case open(filename) is the same as open(filename, 'rt') which opens the file for reading “text”, and readline() will then give a str; changing to open(filename, 'rb') opens in binary mode, which gives a type of _io.BufferedReader and readline will give bytes.

The issue is the recursive call to main (as @barry-scott pointed out).
Try this:

def check(path, encoding="utf-8", errors="strict"):
    with open(path, "rb") as f:
        for lineno, line in enumerate(f):
            try:
                cooked = line.decode(encoding, errors)
            except Exception as exc:
                # print some error message
                continue
            try:
                raw = cooked.encode(encoding, errors)
            except Exception as exc:
                # print some error message
                continue
            # you could add here:
            # assert raw == line
            # for comparison it's nicer to print one under the other:
            print(f"{lineno:04}: {raw}")
            print(f"{lineno:04}:  {repr(cooked)}") 

Get the underlying _io.BufferedReader by languages.buffer, the it will yield bytes when being read.