[SOLVED] - [unwanted \r in database] End of string overwrites beginning

GiMS_Pang · March 26, 2023, 12:56pm

Sorry for the long question. I’ve done very little programming on any language.
I tried using Python 3.11.2 as well as some versions of 3.10 and 3.9 with the same error.

When I try to create a text variable that contains a query to be sent to an SQL database, I get the beginning of the query messed up with the end.

Basically, I get:

q=f'select thetype,thetxt,thebin from {tbl} where theid="{oneitem}";'

That should contain select thetype,thetxt,thebin from thetable where theid="item1";
but ends up containing ";lect thetype,thetxt,thebin from thetable where theid="item1

The following code worked perfectly:

from array import *

tbl="thetable"

def toprint(t):
	thelist=t.split("\n")
	for oneitem in thelist:
		q=f'select thetype,thetxt,thebin from {tbl} where theid="{oneitem}";'
		print("todolist query:\t\t"+q)
	print("\n")
	return 0

# MAIN STARTS HERE

q=f'select thetype,thetxt,thebin from {tbl} where theid="thefirst";'
print("Main query:\t\t"+q+"\n")
thetxt="""abc
ABC
123
xyz
XYZ"""
thestatus=toprint(thetxt)

The output is:

Main query:             select thetype,thetxt,thebin from thetable where theid="thefirst";

todolist query:         select thetype,thetxt,thebin from thetable where theid="abc";
todolist query:         select thetype,thetxt,thebin from thetable where theid="ABC";
todolist query:         select thetype,thetxt,thebin from thetable where theid="123";
todolist query:         select thetype,thetxt,thebin from thetable where theid="xyz";
todolist query:         select thetype,thetxt,thebin from thetable where theid="XYZ";

But that was only a very simplified code.
Another bit of code that is less simplified and shows the error is the following:

import os
from os import path
import shutil
import sys
import random
from math import *
import locale
from datetime import date
import mariadb
from array import *

cur=""	# Database select result cursor
tbl="thetable"
firstone="thefirst"

def dbconnect():
	result=[0,"",""]
	conn=-1
	try:
		conn=mariadb.connect(user="theuser",password="thepasswd",host="127.0.0.1",port=3306,database="thedb")
	except mariadb.Error as e:
		result[0]=-1
		result[1]=e
	result[2]=conn
	return result

def dbcursor(conn):
	cur = conn.cursor()
	result=[0,"",cur]
	return result

def askquery(q):
	result=[0,"",""]
	try:
		cur.execute(q)
	except mariadb.Error as e:
		result[0]=-1
		result[1]=e
	result[2]=cur
	return result

def dbdisconnect(conn):
	conn.close()

def chkres(result):
	if result[0]==-1:
		print(result[1])
		quit()

# type functions start below

def todolist(lstxt):
	result=[0,"",""]
	if lstxt=="":
		return result
	thelist=lstxt.split("\n")
	for oneitem in thelist:
		q=f'select thetype,thetxt,thebin from {tbl} where theid="{oneitem}";'
		if q[0]==';':
			q=f'select thetype,thetxt,thebin from {tbl} where theid="{oneitem}";'
		print("todolist query:\t\t"+q)
		result=askquery(q)
		chkres(result)
		cur=result[2]
		datatuple=cur.fetchone()
		thetype=datatuple[0]
		thetxt=datatuple[1]
		thebin=datatuple[2]
		result=picktype(thetype,thetxt,thebin)
		chkres(result)
		print("\n")
	return result

def picktype(thetype,thetxt,thebin):
	if thetype=='direct':
			result[2]=thetxt
			print(thetext)
	elif thetype=='todolist':
		result=todolist(thetxt)
		chkres(result)
	else:
		result=[-1,'Unknown type '+thetype+'"']
		chkres(result)
	return result

# MAIN LOOP STARTS HERE
result=dbconnect()
chkres(result)
conn=result[2]
result=dbcursor(conn)
chkres(result)
cur=result[2]
q=f'select thetype,thetxt,thebin from {tbl} where theid="{firstone}";'
print("Main query:\t\t"+q)
result=askquery(q)
chkres(result)
cur=result[2]
startuple=cur.fetchone()
thetype=startuple[0]
thetxt=startuple[1]
thebin=startuple[2]
result=picktype(thetype,thetxt,thebin)
chkres(result)
dbdisconnect(conn)
quit()

The output is:

Main query:             select thetype,thetxt,thebin from thetable where theid="thefirst";
";dolist query:         select thetype,thetxt,thebin from thetable where theid="item1
Traceback (most recent call last):
  File "/home/theuser/./test.py", line 102, in <module>
    result=picktype(thetype,thetxt,thebin)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/theuser/./test.py", line 79, in picktype
    result=todolist(thetxt)
           ^^^^^^^^^^^^^^^^
  File "/home/theuser/./test.py", line 66, in todolist
    thetype=datatuple[0]
            ~~~~~~~~~^^^
TypeError: 'NoneType' object is not subscriptable

Note that the first output works and the second doesn’t, while on the shorter example both worked.
I am trying to figure out if there’s something related to how I delimit strings, if there’s some variable running out of memory, some python configuration issues or anything else, but I can’t think of any solution.
Do you know what I am doing wrong, please?
Or can you point me in the right direction, please?
Thank you in advance.

GiMS_Pang · March 26, 2023, 1:12pm

If it can help, I got the SQL code to create the table I’m using. If anyone wants to test with actual data or look at data, here it is:

SET SQL_MODE = "NO_AUTO_VALUE_ON_ZERO";

START TRANSACTION;

SET time_zone = "+00:00";

CREATE TABLE `thetable` (
  `theid` varchar(255) NOT NULL,
  `thetype` varchar(255) NOT NULL,
  `thetxt` longtext DEFAULT NULL,
  `thebin` longblob DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

INSERT INTO `thetable` (`theid`, `thetype`, `thetxt`, `thebin`) VALUES
('item1', 'direct', 'thevalue1', NULL),
('item2', 'todolist', 'item4\r\nitem5', NULL),
('item3', 'direct', 'thevalue3', NULL),
('item5', 'direct', 'thevalue5', NULL),
('item6', 'direct', 'thevalue6', NULL),
('thefirst', 'todolist', 'item1\r\nitem2\r\nitem3', NULL);

ALTER TABLE `thetable`
  ADD PRIMARY KEY (`theid`);
COMMIT;

GiMS_Pang · March 26, 2023, 1:14pm

Looking at the database, I can see it’s using \r\n to terminate rows, while my system uses \n only. Could this cause any issues?

GiMS_Pang · March 26, 2023, 1:19pm

SOLVED (I don’t know how to mark it as such).
The \n should be replaced by \r\n.
There are a few other errors, but they were due to quick typing to adapt a very complex source and make it as simple as possible, so now I can go back to debugging the rest of the code, should any more errors arise.
Thanks for your time, anyway. Cheers.

MRAB · March 26, 2023, 2:29pm

Building query strings like you’re doing, where you’re inserting values using string formatting, is a bad idea. It’s called “SQL injection”.

For example, using f'select thetype,thetxt,thebin from {tbl} where theid="{oneitem}";' and assuming tbl is 'thetable':

If oneitem is 'my_id', you’ll get 'select thetype,thetxt,thebin from thetable where theid="my_id";. OK so far.

If oneitem is '" or true or "', you’ll get 'select thetype,thetxt,thebin from thetable where theid="" or true or "";'. It’ll return all of the rows.

The solution is to use “SQL parametrisation” to insert the values safely.

GiMS_Pang · March 26, 2023, 2:52pm

Thank you Matthew,
I’ll look at all the security issues as soon as I need to.
Currently, that program is only for my own use on an isolated server (not connected to the net) and I fill in the database, too. Everything is done by this one-man-band (me), so I didn’t bother.
In the future, though, I may want to develop the project further and make it suitable for different type of users and more of them at the same time.
I’ll be happy to rewrite the whole code, if/when the times comes, as there are other aspects that also need upgrading to make it enterprise grade.
You’ve been very kind, as you’ve only shown a code that would return wrong results, rather than any more malicious code. Thanks for that.

MRAB · March 26, 2023, 5:05pm

It’s a good idea to start with good habits re programming.

A string-formatted query looks like this:

q = f'select thetype,thetxt,thebin from {tbl} where theid="{oneitem}";'
cur.execute(q)

whereas a parametrised query looks like this:

q = f'select thetype,thetxt,thebin from {tbl} where theid=?;'
cur.execute(q, (oneitem,))

The ? in the query is a placeholder for the value. You don’t need to worry about whether it’s a number or a string because that’ll be handled automatically.

It works only for values, not tables; the tables are normally known beforehand and unchanging.

GiMS_Pang · March 26, 2023, 5:25pm

I like that way of setting the queries. Thanks.

kknechtel · March 27, 2023, 2:41am

The string is not getting corrupted in the way you describe; instead, it looks strange because the interpolated value contains a carriage return character.

I can reproduce this much more simply in my environment:

>>> print('test\rtex')
text

What happens is that the carriage return character (represented by the \r escape sequence in the string literal) is “printed” by going back to the beginning of the current line, instead of actually displaying any text symbol. So the terminal prints t, e, s, t, then goes back to the beginning of the line, prints t (overwriting the existing t), e (overwriting the existing e), x (overwriting and replacing the s).

In your code, the result is still a “valid” SQL query, which is why you don’t get a SQL error; but the table doesn’t contain entries where the theid value ends with a carriage return, so the query doesn’t return any rows, and subsequent Python code that expects a result will fail.

However, as others noted, SQL queries should not be constructed this way. This is the source of one of the most important security vulnerabilities in existence (SQL injection), which costs real companies billions of dollars a year. Instead, use a parameterized query, as the others described.

Regardless of that, however, you will probably want to investigate where the bad data comes from. In particular, it seems that you produce oneitem values via thelist=t.split("\n"). This makes sense if you have, say, newline-separated values from a text file that was properly read in text mode (or from a string literal in your source code). However, other ways of getting the data might result in lines that end with \r\n (a carriage return followed by a newline) - because on Windows, text files are expected to actually contain those carriage return characters. (Similarly, old Macs (pre MacOS X) may use only a carriage return to separate lines.)

As a quick fix, simply .strip() the strings to remove leading and trailing whitespace (assuming that you don’t want that in your query). However, it’s worth investigating the process that creates the actual list of values for the query, to make sure it does the right thing.

Relevant Stack Overflow Q&A:

GiMS_Pang · March 27, 2023, 8:23am

Yeah, I got that. I wrote a comment about \r\n entering data instead of \n (it is due to phpmyadmin configuration or something similar).
The question is now solved, but I don’t know how to mark it that way.
Thanks.

GiMS_Pang · March 27, 2023, 8:26am

PS: I don’t use commercial operating systems, so I’ll stick with \n only.