serial.read(serial.inWaiting()) 'cuts up' number in individual digits?

(heads up: not too experienced with serial communication or Python for that matter)

Reading incoming serial data, numbers above 9 are printed on separate lines. From the code I can only deduce it is read as such too, however, it must be me of course :slight_smile: What am I doing wrong?

I have the following Python code, using pyserial (main.py):

import serial

print(serial.__version__)

ser = serial.Serial(
    port='/dev/cu.usbmodem141102',
    baudrate = 9600,
    timeout = 1)

while True:
    bytesWaiting = ser.inWaiting()
    if(bytesWaiting != 0):
        x = ser.read(bytesWaiting)
        if x:
            print(int(x))

The code that sends the data is from makecode.microbit,org. The JavaScript version of it is as follows:

let isSending = 0
let counter = 0

serial.redirect(
  SerialPin.USB_TX,
  SerialPin.USB_RX,
  BaudRate.BaudRate9600
)

function toSerial () {
    serial.writeNumber(counter)
    basic.showNumber(counter)
    counter = counter + 1
}
input.onButtonPressed(Button.A, function () {
    isSending = 1
})
input.onButtonPressed(Button.B, function () {
    isSending = 0
})
input.onButtonPressed(Button.AB, function () {
    toSerial()
})
basic.forever(function () {
    if (isSending == 1) {
        toSerial()
        basic.pause(500)
    }
})

The output on the console (macos) is as follows (the last 6 numbers are actually 10, 11 and 12):

% python3 main.py
3.5
0
1
2
3
4
5
6
7
8
9
1
0
1
1
1
2

Thanks in advance for any tips!

Kind regards,
Manno

Presumably what I’m guessting is happening here is:

  1. The script sending the data is doing so as characters of ASCII text ("10"), rather than bytes of binary integer numbers (10).
  2. Your Python script loops fast enough that it receives one byte (which, in ASCII, == one character) at a time
  3. Each character is then cast to an integer and printed by itself

The simplest solution to your proximate problem as stated is to simply pass end="" to print(), which will not print a newline between successive print calls, and thus print all your numbers continuously. Additionally, if you take this approach, you should eliminate the unnecessary int cast in your print() call, so you’re printing exactly the characters that come in over the wire. Therefore, you have

            print(x, sep="")

which would print

0123456789101112

Of course, if you want to actually treat these as separate numbers rather than individual characters in an unbroken stream (which is how the data is sent on the wire), you have a bit of a problem. It looks like from your sender script that individual numbers are sent every 500 ms (assuming I understand the basic.pause() call correctly, so you could try to rely on the timing since the last character to infer whether they are supposed to be part of one number (otherwise, your Python code has no way to tell from the data you’re sending what characters are supposed to be grouped together), but that’s hacky, complex and unreliable.

Instead, I suggest if at all possible you modify the format of the data you’re sending to disambiguate this, for which there are countless options (the first two are probably what you should start with, moving to some sort of dedicated protocol later if needed, but I include some others for completeness, to give some idea of the other possibilities out there):

  • Send a line break character (\n) between numbers to separate them.
  • Send the numbers as binary integers of a certain length, and use Python’s struct module to decode them.
  • Your sender and receiver could agree to some convention, sending strings of fixed length (allowing numbers up to a certain number of digits) and padding out the rest with zeros or null bytes
  • Send the number of digits as an ASCII number character (up to 9-10), then send the characters
  • Use some higher level protocol, like modbus
  • Encode your data into a binary format like protobuf, or a text format like JSON
  • Etc., etc.

Of course, most of these approaches aren’t that robust to errors or joining mid-transmission, but that’s a problem for another day.

Best of luck!

Thanks for the elaborate reply!

I took to Python, purely because I wanted a quick way of testing. In the end I’m experimenting a bit with communication between microprocessors and en the end Unity. This was is just a quick test of sending data from the microbit to the computer. Next step would be read sensible data in Unity.

So, yes, the data is to be a stream of numbers serving as input, not one long string.

Myeah, I’m indeed not sure as to how the microbit code sends the data. It does say

serial.,writeNumber()

Looking at the JS version of it, that treats it as a number. Otherwise counter += 1 would add ones to an ever growing string instead of adding the 1 mathematically.

Your understanding of basic.pause() is correct, it is similar to the time.sleep()

I haven’t tried all of your tips, but on the other format:
Microbit has this too:

serial.writeValue("x", 0)

Then the pythin code above, stripped from the int cast gives this:

b'x'
b':'
b'0'
b' '
b' '
b' '
b' '
b' '
b' '
b' '
b' '
b' '
b' '
b' '
b' '
b' '
b' '
b' '
b' '
b' '
b' '
b' '
b' '
b' '
b' '
b' '
b' '
b' '
b' '
b' '
b'\r'
b'\n'
b'0'

I was just hoping to be able to have to send less data to get just one number over the wire. I should get into serial communication more too i suppose… :slight_smile:

Thanks again for the elaborate answer and tips!

Thanks for the additional details; they are certainly interesting.

This looks like a custom key-value format that presumably can be natively parsed by whatever libraries Microbit provides; there may already be a Python parser, or with a spec or some examples you could probably write a basic one.

Even by the anemic standards of UART, unless you are sending on the order of a thousand characters/numbers per second or more, it isn’t worth worrying about prematurely optimizing transmission efficiency, as opposed to code simplicity, robustness and following established practices.

However, just as something to think about, whether bit-efficiency is higher to send numeric data as binary integers or textual characters depends on the nature of the data. Assuming you’re using UTF-8 (or ASCII, Latin-1 or another encoding that encodes the Hindu numerals in 7-8 bits), if your data was mostly single digit integers, but could very occasionally range into the tens of thousands, you’d use an average of around 2 bytes (including a terminator byte) per character, while a 32 or 64 bit integer would be 4 or 8 bytes per character. By contrast, if you know your numbers range fairly evenly between between 0 and 255 or -127 and 127, you could represent it in 1 byte with an 8-bit int/unit but a character-based representation would take an average of 3-4 bytes. I discuss this a little bit more in the Python socket docs.