Please help to adapt script from python2 to python3

Hi, see if someone could help me to convert a script to python3. I am trying to convert the following pyhton2 script to python3:

I currently have the following script:

!/usr/bin/env python

import sys
import signal
import collections
import subprocess

# Create two collections. One for mapper devices
# and another for block devices that map to the
# mapper devices.
dmname = ""
blockdevice = ""
blockdevices = collections.defaultdict(dict)
mapperdevices = collections.defaultdict(dict)
multipath = "/sbin/multipath -ll"
iostat = "/usr/bin/iostat -xkyz 1 1"


# Catch SIGINT so we can exit with grace.
def gexit(signum, frame):
    """
       Function to call if a SIGINT is received
    """
    sys.exit(1)


def initcollections():
    """
       Initialize the read/write metrics to 0
    """
    for mdev in mapperdevices:
        mapperdevices[mdev]['await'] = 0
        mapperdevices[mdev]['reads'] = 0
        mapperdevices[mdev]['writes'] = 0
        mapperdevices[mdev]['bytes_read'] = 0
        mapperdevices[mdev]['bytes_written'] = 0

    for bdev in blockdevices:
        blockdevices[bdev]['await'] = 0
        blockdevices[bdev]['reads'] = 0
        blockdevices[bdev]['writes'] = 0
        blockdevices[bdev]['bytes_read'] = 0
        blockdevices[bdev]['bytes_written'] = 0


def parse_devs():
    """
       Parse multipath -ll to get the list of mapper and sd devices
    """
    signal.signal(signal.SIGINT, gexit)

    try:
        subproc = subprocess.Popen(
            multipath, shell=True, stdout=subprocess.PIPE)
    except OSError:
        print("Error opening the multipath utility.")
        print("Command executed: " + multipath)
        sys.exit(1)

    # Check the string to see if it contains "dm-". If it does we located
    # a device mapper entry that we need to save. If we don't encounter a
    # dm line we need to locate the block devices that are part of the
    # mapper device we saved. These entries contain "|-" or "`-".
    for line in subproc.stdout.readlines():
        if ' dm-' in line:
            dmname = line.split()[2]
            mapperdevices[dmname]['pretty_name'] = line.split()[0]
        elif ' |-' in line or ' `-' in line:
            blockdevices[line.split()[-5]]['mapper_device'] = dmname

    # Set the counters to zero.
    initcollections()


def process_io_stats():
    """
       Iterates over iostat data and updates the device mapper array
       Iostat format:
       Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
       sda               0.00     1.00    0.00    3.00     0.00    12.00     8.00     0.00    0.00   0.00   0.00
    """

    # Iterate over the iostat output and update the counters
    while True:
        print("%-43s  %-8s  %-8s  %-9s  %-9s  %-8s" % ("Device Name", "Reads",
                                                       "Writes", "KBytesR/S",
                                                       "KBytesW/S", "Await"))

        try:
            subproc = subprocess.Popen(
                iostat, shell=True, stdout=subprocess.PIPE)
        except OSError:
            print("Error opening the iostat utility.")
            print("Command executed: " + iostat)
            sys.exit(1)

        for line in subproc.stdout.readlines():
            if "sd" in line:
                blockdev = line.split()[0]
                if blockdev in blockdevices:
                    blockdevices[blockdev]['await'] = float(line.split()[9])
                    blockdevices[blockdev]['reads'] = float(line.split()[3])
                    blockdevices[blockdev]['writes'] = float(line.split()[4])
                    blockdevices[blockdev]['bytes_read'] = float(
                        line.split()[5])
                    blockdevices[blockdev]['bytes_written'] = float(
                        line.split()[6])
            elif "dm" in line:
                mapperdev = line.split()[0]
                if mapperdev in mapperdevices:
                    mapperdevices[mapperdev]['await'] = float(line.split()[9])
                    mapperdevices[mapperdev]['reads'] = float(line.split()[3])
                    mapperdevices[mapperdev]['writes'] = float(line.split()[4])
                    mapperdevices[mapperdev]['bytes_read'] = float(
                        line.split()[5])
                    mapperdevices[mapperdev]['bytes_written'] = float(
                        line.split()[6])

        for mdev in mapperdevices:
            if mapperdevices[mdev]['reads'] > 0 or mapperdevices[mdev]['writes'] > 0:
                print("%-43s  %-8.2f  %-8.2f  %-9.2f  %-9.2f  %-8.2f" % (
                    mapperdevices[mdev]['pretty_name'],
                    mapperdevices[mdev]['reads'],
                    mapperdevices[mdev]['writes'],
                    mapperdevices[mdev]['bytes_read'],
                    mapperdevices[mdev]['bytes_written'],
                    mapperdevices[mdev]['await']))

                for bdev in blockdevices:
                    if blockdevices[bdev]["mapper_device"] == mdev:
                        print("|- %-40s  %-8.2f  %-8.2f  %-9.2f  %-9.2f  %-8.2f" % (
                            bdev, blockdevices[bdev]['reads'],
                            blockdevices[bdev]['writes'],
                            blockdevices[bdev]['bytes_read'],
                            blockdevices[bdev]['bytes_written'],
                            blockdevices[bdev]['await']))

    # Reset the counters to zero
    initcollections()
    print("")


def main():
    """
       Main code block
    """
    parse_devs()
    process_io_stats()


if __name__ == "__main__":
    main()

I got following error:

Traceback (most recent call last):
File “./multipath.py”, line 204, in
main()
File “./multipath.py”, line 199, in main
parse_devs()
File “./multipath.py”, line 117, in parse_devs
if ’ dm-’ in line:
TypeError: a bytes-like object is required, not ‘str’

I do not know how to continue so if someone would be so kind to see what is the fault I would appreciate it.

Thank you very much

Start by using the 2to3 tool then see how well the converted script runs and fix the problems.

This assumes you know python and understand its error messages.

Also when posting code the the pre-formatted text feature by using the </> button.

1 Like

I ran into a similar issue (while porting a script that did something similar) and the fix, which I can’t easily find any indication that 2to3 (which I did try) applies, was to include encoding = 'utf-8' in the arguments of the Popen calls.

Error messages can only ever tell you (with certainty) about a proximate cause. To find the ultimate cause, we need to understand three things:

  1. What does this code try to do?
  2. What do we want it to do instead?
  3. Where did the input come from?

Please commit this to memory, and/or write it down somewhere that you won’t lose it. These are the three golden keys to debugging.

In particular, we cannot reliably fix a TypeError by just figuring out which thing has the type being complained about, and adding code to coerce the type (i.e., create a different value with a suitable type, by following a mindless rule). If that could work, there would be no reason for Python to have TypeErrors at all - it could just do the coercion for you.

Programming languages don’t use errors to make your life difficult; errors exist to force you to be precise.

What does the code try to do?

The error message is pretty explicit: there is something in this code which is a 'str' (i.e., a string), but performing the operation would require “a bytes-like object” - meaning, something that is sufficiently similar to the bytes type. A string doesn’t meet this requirement; it is an unsuitable type, so this causes a TypeError.

To relate this description to the code, of course, we must look at that code. Thankfully, the stack trace also shows us where the error was reported:

There is more to the stack trace behind that, but it won’t matter this time - it just shows how we got to the parse_devs function, but this doesn’t help solve this particular problem, and also shouldn’t be surprising.

There are two operations performed here: checking whether ' dm-' in line, and then using that result to decide if the next block of code should execute. Clearly, the issue is in the first operation - because it’s clear that a successful in operation should reliably give us a result that can be used for if (indeed, almost anything can be used for if).

Of course, ' dm-' is a string, so that’s another important clue. That must be the 'str' that was complained about. So, what we need to know is that in Python 2, where the code works, line also becomes a string; but in Python 3, it becomes a bytes object.

Why?

Because in Python 3, we stopped pretending that bytes are strings - in Python 2, bytes and str mean the same type, but in Python 3, they are different types. Now str means what unicode used to mean (and there is no more unicode). Either way, a literal like ' dm-' creates what the language calls a str. But in Python 2, it creates a mere sequence of byte values that encode the text according to some rule stated elsewhere. In Python 3, it creates an actual string, which actually represents the text. (Of course, everything in memory boils down to bytes eventually. But the point is that Python 3 creates an abstraction that is a sequence of Unicode text characters - not a sequence of special integers that range from 0 to 255 and require a single byte of memory each to store.)

Okay, so we understand how a problem occurred. Now what?

What do we want it to do instead?

If the code should try to look for a sub-sequence of bytes within a longer sequence of bytes, then we should specify a sub-sequence of bytes to search for. Currently, we specify a string, which would be clearly wrong for that task. To fix the problem, we would need to understand how to specify a sequence of bytes.

On the other hand, if the code should try to look for a substring within a string - that is, to treat what we’re searching as text - then we should have text to search within, instead of a byte sequence. To fix the problem, we would need to understand how to get a string.

Of course, there are conversions we can do back and forth. But this is, again, not properly solving the problem. For example, if we decided that we want to look for bytes within bytes, we could convert the search string ' dm-' to bytes by adding more code for that purpose. But it would clearly be better to just specify the bytes in the first place, and to understand how to do that. Similarly the other way around.

At this point, of course, I can’t read your mind about what you want to happen at this point in the code. But I can at least explain to you that there is a decision to make, and what it entails.

I can also get a very important clue from the third step in the process.

Where did the input come from?

Instead of following the stack trace (because that only works a function at a time), we need to look backwards through the current function, and reason about how line got this bytes type value (since that’s the only input that’s actually in question; the other input is a literal).

This is straightforward - the code is at the top of a for loop, and it’s using the iteration variable.

Thus, subproc.stdout.readlines() must be a sequence of bytes objects, from which we get an individual bytes object each time through the loop.

Okay, now what? Well, clearly this is reading lines from the stdout of the subproc. One might think that extracting “lines” requires a source of text, but in fact Python is happy to create “lines” from a stream of bytes, by splitting them at the byte value that represents a newline character in ASCII (the value is ten; if you don’t understand why, you should research it independently). Thus, when we have a stdout stream that is sourcing bytes - i.e., when it’s open in binary mode - then we get a bytes sequence each time through the loop.

Okay, why is the stream open in binary mode? To answer that, we need to consider where subproc came from:

If an exception was raised, the program would have aborted (via sys.exit(1)) - so if the for loop is running, we know that the subproc was created successfully. Thus, we know that subproc is a process created with subprocess.Popen, and thus subproc.stdout is the standard output stream for that process.

Near the top of the code, we see:

so that is the command that we are passing.

Resolution

First, let’s circle back for a moment. The goal is to run the Linux multipath program and parse its output. This is a program designed to display information to a user in a terminal, who types the command manually. Thus, we clearly want to treat its output as text.

We get additional clues to this fact, from how line is used in the subsequent code. The code tries to .split() the line into words, breaking at whitespace. While the bytes type supports this operation for legacy backwards-compatibility reasons, it’s clearly a textual operation.

Thus, in the erroneous code, the problem is that line should be a string and is not. (A naive reading of the error message would only give the other interpretation, i.e. that ' dm-' is the problematic part; and this would lead to an inappropriate “fix” causing more problems further down the line.)

The reason that line isn’t a string is that it’s a bytes, which comes from iterating over a sequence of bytes, which is created from an output stream of bytes, which sources bytes because it is open in binary mode.

Thus, the ultimate cause of the problem is that the stdout of the process is open in binary mode.

To fix this, we need to consult the subprocess.Popen documentation, to understand how to get a text-mode stdout from the process.

When we do this, we find that the common keyword arguments for subprocess functionality include encoding, errors, and text (synonymized to universal_newlines). We need to set at least one of these in order to get what we want. If we only set text=True, Python will use (via io.TextIOWrapper) default values for the text encoding and encoding-error handling. If we want to use a specific text encoding (such as UTF-8), we should specify that using the encoding keyword parameter.

Hello, thank you for the help received and comment that the problem was solved by adding the encoding option = ‘utf-8’.

Thank you very much!