TypeError: cannot use a string pattern on a bytes-like object

ramiro.jaceson · July 26, 2023, 12:03pm


import subprocess
import optparse
import re

def get_arguments():
    parser = optparse.OptionParser()
    parser.add_option("-i", "--interface", dest="interface", help="Interface for changing its MAC address")
    parser.add_option("-m", "--mac", dest="new_mac", help="New MAC address")
    (options, args) = parser.parse_args()
    if not options.interface:
        parser.error("[+] Please specify an interface, use --help for more info.")
    elif not options.new_mac:
        parser.error("[+] Please specify a MAC address, use --help for more info.")
    else:
        return options

def change_mac(interface, new_mac):
    print("[+] Changing MAC address for " + interface + " to " + new_mac)
    subprocess.call(["ifconfig", interface, "down"])
    subprocess.call(["ifconfig", interface, "hw", "ether", new_mac])
    subprocess.call(["ifconfig", interface, "up"])

options = get_arguments()
change_mac(options.interface, options.new_mac)

ifconfig_result = subprocess.check_output(["ifconfig", options.interface])
print(ifconfig_result)

mac_address_search_result = re.search(r"\w\w:\w\w:\w\w:\w\w:\w\w:\w\w", ifconfig_result)
print("[+] Your current MAC address is: ", mac_address_search_result.group(0))

I wrote that in Pycharm in Kali Linux in my Windows 11 Virtualbox. Then in kali Terminal Emulator (as root user) typed the following command:

python mac_changer.py -i eth0 -m  00:11:22:22:22:99

The output is:

[+] Changing MAC address for eth0 to 00:11:22:22:22:99
b'eth0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500\n        inet 10.0.2.16  netmask 255.255.255.0  broadcast 10.0.2.255\n        ether 00:11:22:22:22:99  txqueuelen 1000  (Ethernet)\n        RX packets 32294  bytes 36342047 (34.6 MiB)\n        RX errors 0  dropped 0  overruns 0  frame 0\n        TX packets 20187  bytes 1888532 (1.8 MiB)\n        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0\n\n'
Traceback (most recent call last):
  File "/home/kali/PycharmProjects/mac_changer/mac_changer.py", line 30, in <module>
    mac_address_search_result = re.search(r"\w\w:\w\w:\w\w:\w\w:\w\w:\w\w", ifconfig_result)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/re/__init__.py", line 176, in search
    return _compile(pattern, flags).search(string)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: cannot use a string pattern on a bytes-like object

What’s the problem here?

kknechtel · July 26, 2023, 1:19pm

In the future, it will be better if you use the title of a post to describe the problem - “can’t get the expected output” could be talking about almost anything. What you are trying shows considerable skill already; I want to encourage you not to apply labels like “beginner” - even if it were true, it would only hold you back to keep thinking that way. The computer does not know or care about your skill level, and the answer to questions doesn’t change because of your skill level.

ramirojaceson:

The output is:

[+] Changing MAC address for eth0 to 00:11:22:22:22:99
b'eth0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500\n        inet 10.0.2.16  netmask 255.255.255.0  broadcast 10.0.2.255\n        ether 00:11:22:22:22:99  txqueuelen 1000  (Ethernet)\n        RX packets 32294  bytes 36342047 (34.6 MiB)\n        RX errors 0  dropped 0  overruns 0  frame 0\n        TX packets 20187  bytes 1888532 (1.8 MiB)\n        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0\n\n'

As you can see from the b prefix, the ifconfig_result value is not a string, but instead a bytes object. It represents a raw sequence of bytes that was output by the ifconfig program. Although it looks readable on the terminal, it is not actually textual: this text is the result of applying a very primitive interpretation to the data, called ASCII.

The regular expression library offers two basically separate tools: a way to match substrings within a string, and a way to match byte sub-sequences within a byte sequence (bytes object). When you use a string for the regex pattern, it builds a regex that matches on strings, to look for matches that are also strings… When you use a bytes object for the regex pattern, it builds a regex that matches on other bytes, to look for matches that are also bytes.

Since you want to print a result that should be a string, we should use string matching, as we already do. This means we need a string input. Therefore, we should convert the bytes that we got from ifconfig, into the corresponding string, by specifying an appropriate encoding.

It seems as though the output from ifconfig should always be interpretable with the ordinary ASCII encoding. We can test this assumption, by using that encoding, and seeing if anything ever goes wrong. It looks like this:

# replace this
# ifconfig_result = subprocess.check_output(["ifconfig", options.interface])
# by converting the `subprocess.check_output` result before assigning:
ifconfig_result = str(subprocess.check_output(["ifconfig", options.interface]), 'ascii')
# Or another way:
# ifconfig_result = subprocess.check_output(["ifconfig", options.interface]).decode('ascii')

If the encoding is wrong, we will get either wrong results or (usually) a different kind of exception: UnicodeDecodeError. We can take that opportunity to study the data some more and make a better choice for the encoding.

The ASCII encoding is restricted: it cannot handle any byte with values 0x80 through 0xff. There are many so-called “ascii-transparent” encodings, that will give the same result as ASCII when given ASCII-compatible data, and also interpret the other bytes in… some other way, depending on the encoding. A lot of these just directly interpret each of those bytes as some other specific character. Some, like UTF-8, can use more than one byte to represent a character, which lets them show a wider variety of characters. UTF-8 can represent every character in the Unicode standard - all the characters that computers consider to be valid elements of “text”. There are other systems as well, like “shift-jis” which supports Japanese text (but cannot handle a lot of other languages) while not using more than two bytes per character.

ramiro.jaceson · July 26, 2023, 2:21pm

(post deleted by author)

barry-scott · July 26, 2023, 9:13pm

Please do not post pictures of text use preformatted text just as you have for code.

You need to decide when you are going to decode the bytes into strings.
You could use the encoding option in the subprocess.call to get back a string instead of bytes.

Or you could convert to string before you print out messages.

The reason it works with python2 is that python2 is converting between bytes and unicode as needed. If your data is only ASCII this tends to work. But will also require explicit decode/encode to work robustly. Python3 made this explicit.
And of course python2 is end of life.

barry-scott · July 26, 2023, 9:16pm

On linux ifconfig is deprecated. Depending on how networking is setup
you would use the ip command or nmcli if NetworkManager is in control.
You might also be using systemd-networks and need to use its api.