Change class from '_io.TextIOWrapper' to 'bytes'

yugal68 · February 5, 2024, 1:04pm

Hello Pythonic minds,

I have an text file named “ex23_RawBytes_ConvertedToBytes.txt”

These are just text with class ‘_io.TextIOWrapper’ when I checked type in python

Any way I can convert them into bytes?

The reason I am doing this is that I have all the raw bytes in text but their type is not actually ‘bytes’ which is needed by me to convert them to cooked strings.

Below is the python code I am using.

import sys
script, input_decoding, error = sys.argv


def main(RawBytes_file, decoding, errors):
    line = RawBytes_file.readline()
    
    if line:
        print_line(line, decoding, errors)
        return main(RawBytes_file, decoding, errors)


def print_line(line, decoding, errors):
    next_lang = line.strip()
    cooked_string = next_lang.decode(decoding, errors = errors)
    raw_bytes = cooked_string.encode(decoding, errors = errors)

    print(raw_bytes, "<===>", cooked_string)


languages = open("ex23_RawBytes.txt", encoding="utf-8")

main(languages, input_decoding, error)

Please find the text file link below.

ex23_RawBytes_ConvertedToBytes.txt

Please let me know if I am missing something here.

hansgeunsmeyer · February 5, 2024, 2:49pm

Open the file with open(path, mode="rb") without encoding.

See: the Python docs for open

(For reading and writing raw bytes use binary mode and leave encoding unspecified.)

yugal68 · February 5, 2024, 3:54pm

yugalgarg@Yugals-MacBook-Pro PythonExercise % python3 ex23_ExtraChallenging.py bytes strict                                           
Traceback (most recent call last):
  File "/Users/yugalgarg/Desktop/PythonExercise/ex23_ExtraChallenging.py", line 23, in <module>
    main(languages, input_decoding, error)
  File "/Users/yugalgarg/Desktop/PythonExercise/ex23_ExtraChallenging.py", line 9, in main
    print_line(line, decoding, errors)
  File "/Users/yugalgarg/Desktop/PythonExercise/ex23_ExtraChallenging.py", line 15, in print_line
    cooked_string = next_lang.decode(decoding, errors = errors)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
LookupError: unknown encoding: bytes

This is the error when I am running the Python script.

barry-scott · February 5, 2024, 5:29pm

To read the raw content of a file do the following:

with open(filename, 'rb') as f:
    raw_contents = f.read()

To read the contents of a file as a specific encoding, for example utf-8 use the following:

with open(filename, 'r', encoding='utf-8') as f:
      unicode_text_contents = f.read()

There is no such thing as a bytes encoding.

yugal68 · February 6, 2024, 6:15am

raw_bytes = b'\xe6\x96\x87\xe8\xa8\x80'

utf_string = "文言"

raw_bytes.decode()

utf_string.encode()

raw_bytes == utf_string.encode()

utf_string == raw_bytes.decode()

print(type(raw_bytes))

This is what I mean by class bytes. If you try run this code to print the type of this code it will give you the below result

<class 'bytes'>

yugal68 · February 6, 2024, 6:30am

This is working very fine but I want to make it compatible with the below code. The below code:

import sys
script, input_decoding, error = sys.argv


def main(RawBytes_file, decoding, errors):
    line = RawBytes_file.readline()
    
    if line:
        print_line(line, decoding, errors)
        return main(RawBytes_file, decoding, errors)


def print_line(line, decoding, errors):
    next_lang = f.read()
    cooked_string = next_lang.decode(decoding, errors = errors)
    raw_bytes = cooked_string.encode(decoding, errors = errors)

    print(raw_bytes, "<===>", cooked_string)


languages = open("ex23_RawBytes_ConvertedToBytes.txt", "rb")

main(languages, input_decoding, error)

barry-scott · February 6, 2024, 9:18am

Sorry you cannot.

You need different code to handle bytes and text that is decoded.

Also your code recursively calls main and will crash.

defjaf · February 6, 2024, 9:58am

Just to be very clear: _io.TextIOWrapper is the return type from open(filename). It has nothing to do directly with the type of the data in the file, but the mode in which it is being read, and represents the object from which you will read the data.

In this case open(filename) is the same as open(filename, 'rt') which opens the file for reading “text”, and readline() will then give a str; changing to open(filename, 'rb') opens in binary mode, which gives a type of _io.BufferedReader and readline will give bytes.

hansgeunsmeyer · February 6, 2024, 3:00pm

The issue is the recursive call to main (as @barry-scott pointed out).
Try this:

def check(path, encoding="utf-8", errors="strict"):
    with open(path, "rb") as f:
        for lineno, line in enumerate(f):
            try:
                cooked = line.decode(encoding, errors)
            except Exception as exc:
                # print some error message
                continue
            try:
                raw = cooked.encode(encoding, errors)
            except Exception as exc:
                # print some error message
                continue
            # you could add here:
            # assert raw == line
            # for comparison it's nicer to print one under the other:
            print(f"{lineno:04}: {raw}")
            print(f"{lineno:04}:  {repr(cooked)}")

frostming · February 7, 2024, 8:04am

Get the underlying _io.BufferedReader by languages.buffer, the it will yield bytes when being read.

Topic		Replies	Views
Str(mybytes): wrong docs? Python Help	3	666	December 14, 2019
Converting bytes to string to bytes Python Help help	8	5841	October 29, 2021
Alliow `bytes(mystring)` without specifying the encoding Ideas	6	2399	September 25, 2022
Help pls with TypeError: byte indices must be integers or slices, not str Python Help help	13	252	April 9, 2024
Performance of str.encode vs codecs.getwriter Python Help performance	9	230	February 14, 2024

Change class from '_io.TextIOWrapper' to 'bytes'

Related Topics