hello,
I have a project in which I use a protocol for communication between client and server.
For instance, if a client logs in to the system, he sends to the server his username and password. those are seperated by a comma when I send them.
when the server retrieves the data sent, he uses the split method to seperate between username and passsword and handle them.
The problem is: what if the user’s username / password includes a comma? that would mess everything up.
What is the best way to handle this?
Thanks
Hello,
you can potentially try an if else
conditional statement pair to check which way the username / password
was sent:
# Comment out only one at a time to test either method
login_credentials = 'myUsername,ghud^jd@' # comma separated
# login_credentials = 'myUsername ghud^jd@' # no comma - space separated
if ',' in login_credentials:
print('\nFound comma separator.')
split_credentials = login_credentials.split(',')
else:
print('\nNo comma separator found.')
split_credentials = login_credentials.split()
print(split_credentials)
This is usually handled either by 1) prefixing each value with its length or 2) using quoting/escaping to unambiguously represent any characters that have special meaning in your protocol.
An example of the former is the HTTP protocol where the Content-Length
header tells the other party how many bytes to read. In HTTP the length itself is represented text and ends with a newline, but in some other protocols they use a fixed number of bytes (e.g. a 32-bit integer) for the length.
As for the latter approach, there are lots of examples. For example, how do you represent a "
character in a string in a Python program, where it’s used to mark the end of the string? You escape it: "\""
. But now how do you represent \"
? You escape the escape character: "\\\""
.
How do you represent a comma in a CSV file where it’s used (as in your protocol) to separate values? You quote the field in double quotes: ","
. But now how do you represent ","
? You escape the double quotes by doubling them: ""","""
.
How do you represent a &
character in a URL query string value, where it’s used to separate different values? You quote it using %
as an escape character followed by the hexadecimal value of the original character: %26
. But now how do you represent %26
? Using the same escaping approach: %2526
.
Just make sure every input value has an unambiguous representation, including the ones which include your special escape characters, and can always be transformed back to the original value on the receiving side.
Or do it the easy way and use an existing implementation of a standard format, such as the json module which already handles this for you.
Can you be certain that the username does not include a comma? Often you can mandate more about usernames than you can about passwords. If so, you could do something like this:
USER = "admin" # May not contain a comma
PASSWORD = "correct,horse,battery,staple"
combined = USER + "," + PASSWORD
# then on the server
user, password = combined.split(",", 1)
print("login: " + user)
print("Password: " + password)
The second parameter to split() is maximum number of splits. Anything after the last permitted split is kept in the last parameter.
Technically yes, but that was just an example.
I’m also using it for sending private messages and some more.
I can’t really limit the user from sending commas in private messages or having a comma as part of his username…
using quoting/escaping to unambiguously represent any characters that have special meaning in your protocol.
do you have an example for how can i do this?
Thanks for the response
What do you mean by which way it was sent? do you mean checking if it’s from server to client or client to server?
Another option could be to base64 encode the username and password as two different things then put the comma between them.
Then on the other end split on the comma then decode both. (Base64 encode data won’t have a comma in it).
See the docs for the methods themselves: base64 — Base16, Base32, Base64, Base85 Data Encodings — Python 3.13.2 documentation
Then, don’t use a comma! If you’re designing the protocol from scratch, pick something that absolutely cannot possibly be part of the user name - "\0"
is probably safe - and use that as the separator. Or define everything using length-prefixed strings (sometimes called Hollerith strings in honour of a very very old Fortran design), which is a lot more hassle for a human to work with, but easy enough for a computer. Or use delimited strings in some form.
The best strategy is probably to solve this in a completely generic way, so that you can always send whatever information you need; don’t special-case users/passwords. One extremely effective method would be to JSON-encode all your messages. This is an easy way to use delimited strings (since every string has quote marks at the ends), and it handles internal delimiters (" \" \\ "
is a string containing both a quote and a backslash). Pretty much every programming language and environment has an easy (and usually very high performance) way to encode and decode JSON, and it’s well known and understood, so it won’t be a surprise to people.
One possible such protocol would be to have single-line JSON messages (no whitespace), terminated with a newline. When you send a message, you JSON encode it, add a newline, and send. To receive messages, you read one line of text [1] and decode it as JSON.
When I do JSON-based protocols, I like to have them always be objects with a "cmd"
(command) attribute. For example: {"cmd": "login", "user": "admin", "password": "correct,battery,horse,staple"}
might do what you’re looking for. This makes it easy to debug later on, and if you end up making a large app with a lot of possible messages, you can dispatch based on the command, keeping your code clean and readable.
buffered reading is fairly easy in most systems, but ask if you’re struggling with this part ↩︎
What I meant is the potentially two ways that the username / password
combination can be entered by the user:
You can alternatively stipulate the required length for the password assuming that it is entered after the username. Let’s say the requirement is 10 characters. Now, you can combine this with the initial method that I provided above in my first response where you get the username via the conditional statement. Then, via slicing, you can read the last 10 characters in the string like this to obtain the password:
# login_credentials = 'username,Jngjd&hd%2'
login_credentials = 'username Jngjd&hd%2'
username = login_credentials[0:8]
password = login_credentials[-10:]
print(username)
print('The password is:', password)
Output:
username
The password is: Jngjd&hd%2
Note that you can also make a requirement for the lengh of the username and read the first, in this case, 8 characters.
As you can see, you do have different options to go about reading the username
and password
.
Please don’t. Ever. Specifying an exact length requirement (rather than a minimum) is an awful thing to do to people.
(A precise length requirement might be for a hashed password, but in that case, you won’t need to worry about character set since you can just transmit it in hex.)
Expanding on the three examples I gave, here’s how the csv module does it:
>>> username = r'a,"b",\\,c'
>>> password = r'd,"e",\\,f'
>>> import csv, io
>>> buf = io.StringIO()
>>> writer = csv.writer(buf)
>>> writer.writerow([username, password])
>>> encoded = buf.getvalue()
>>> print(encoded)
"a,""b"",\\,c","d,""e"",\\,f"
And the urllib.parse module:
>>> username = r'a,"b",\\,c'
>>> password = r'd,"e",\\,f'
>>> import urllib.parse
>>> encoded = urllib.parse.quote(username) + "," + urllib.parse.quote(password)
>>> print(encoded)
a%2C%22b%22%2C%5C%5C%2Cc,d%2C%22e%22%2C%5C%5C%2Cf
And the json module:
>>> username = r'a,"b",\\,c'
>>> password = r'd,"e",\\,f'
>>> import json
>>> encoded = json.dumps({"username": username, "password": password})
>>> print(encoded)
{"username": "a,\"b\",\\\\,c", "password": "d,\"e\",\\\\,f"}
In each of these cases you can use the corresponding decoding functions to get back the original value, no matter what characters in contains.