Techniques for Thick Client Access to a Database

sproggit · December 21, 2021, 1:56pm

First post - please at least try and be gentle… Before I get to the specifics of my question, I suspect that I might need to prefix this with the acknowledgement that what I’d like to do (as expressed here) might be a bit unusual. I’m treating this as a learning exercise, not just a search for a solution to specific problem.

It might also be relevant to mention my experience and skill level, which is moderate. I’ve got a reasonable amount of experience with PHP in writing web apps, but am still learning the basics of Python. One of the reasons I’ve turned to Python is because the PHP-GTK project seems essentially dead in the water, while Py-GTK looks to be going from strength to strength.

OK, so here’s the question…

If I’m writing in a web paradigm and I want my web app to authenticate users to my platform, my use of the web model makes this easy for me. Code on the web server runs under a local account - and I give that local account access to a back-end SQL server such as MariaDB. The server account can have broad rights to the database - and I can have a user table in that database that contains salted hashes of user-specific passwords. I do not need to register all my application users on the database.

However, if I want to replace a thin web client with a thick client, written in Py-GTK, then I need to think about how to authenticate my users to my database. As an added spin, one of the things that would be nice to achieve would be a model in which the back end database can be used with both thick and thin clients without modification.

I want my thick client users to be able to access the data, with the same IDs and Passwords that they would use via the web app. But in order to do that, I’m going to need a way of unlocking the database for the thick client.

The simplest, dirtiest and least secure way to do this would be to have some form of .ini file on each client, with an “application level” password embedded in that file in some way. Problem is, that isn’t secure, because my local user could simply scan the file for the password and from that get full rights to the application’s data.

So far, I’ve come up with one alternate method, but I’m concerned that it might be clumsy, fragile, or not scale well. This would be to use Python’s socket library and write a network daemon to be co-located on the database server. The authentication paradigm would then be as follows:-

Daemon starts on database server (or an intermediate app server, if performance/scaling/separating for security makes sense).
User launches thick client written in PyGTK.
Thick client presents user with login screen.
User submits ID and Password to thick client.
Thick client submits credentials to server via a sockets call.
Server side Python code, running under a local ID, authenticates to the database, retrieves user record and validates password.
If the credentials match, the server side code creates a session token and passes this back to the client as the socket reply.
Server side code also caches the token - i.e. in memory, with a defined TTL (Time to Live).
User continues to interact with application, now submitting each server-side request prefixed with the provided token.

Now, there are a bunch of other problems to solve here. I’d need to gracefully handle an expiring token if a user tries an end-run around session time limits. I’d need to protect against spoofing, likely by having some form of certificate-protected exchange - either by using TLS directly on the socket, or by coding that in to my server-side wrapper. I can strengthen this by doing things like hashing the client IP address in to my token, for example.

Perhaps the most interesting challenge will be crafting the protocol between client and server logic. I’ve done some really primitive prototyping and learned I can use something like

client_request = conn.recv(1024)

(where “client_request” is the variable into which I want to deliver the client command and 1024 is the maximum length of data I am expecting the server to receive) to handle client-to-server traffic, then

conn.sendall({server_response})

(where server_response is the returned data) to get a reply back to the client. The $64,000,000, however, is: will this form a viable foundation, or am I building on a weak/ill-advised foundation? Is there a better way?

Lots of other questions start popping up… Things like:-

Is there a viable upper limit on the transfer size?
Does it make more sense to break up result sets and send them as discrete rows, with a set terminator?
If I break up the result sets, how do I handle them efficiently on the server side code?

I don’t expect to answer all these questions here and now. I’m also very conscious that people will read this and think, “Hmm… Bad idea. There’s a better way”. I’m very much interested in “better ways”, but please bear in mind I’m still at “early days” on this journey.

The reason I’m asking here and now is that I’ve reached a point in my Python education where I’m starting to experiment with small chunks of code that will contribute to this larger project - and I desperately don’t want to learn bad habits or take a wrong turn that will turn around and bite me later.

I’m happy to learn and grateful for advice. I know that the problem I’m trying to solve is narrow and specific and your answer might be, “Hmm, I wouldn’t set out to do that at all…” So if you can suggest a way of handling the authentication for a thick client in a way that’s reasonably secure, difficult to defeat on the client side… and not too difficult for a relative newbie to understand, then I’d be grateful for advice.

Thank you!

steven.daprano · December 22, 2021, 9:04am

Hello Sproggit,

Your post made an interesting read, but I’m afraid I don’t think I will
be much help here, and I fear that unless you get lucky, nobody else
here will either. (I’ll be happy to be proven wrong!)

Your questions don’t seem to be Python specific, in fact the language
that you choose to write your application in will surely be irrelevant
for the design.

Unfortunately the community here on Discuss is, I think, relatively
small. It might help to pose your question on (say) the Python-List
mailing list, where you may find some people with the required
experience. Or even StackOverflow.

One thing that confused me, and may confuse others, is your use of the
terms “thin client” and “thick client”. To me, a thin client is a
dedicated terminal without an OS, where all the code runs on the backend
server. Think 1970s style green-screen applications running on dedicated
terminals. Any modern PC or even smart phone app would be a “thick
client”, or a browser running on such a machine. You seem to be using
the terms differently, it might help to explain what they mean to you,
in case it makes a difference.

You also talk about the web server running as a local account I
presume you mean a local account on the server, rather than a local
account on the users’ machine.

Ultimately, I don’t think that the distinction between a “thin” and
“thick” client is very useful here. You have one sort of client,
sometimes it runs on a trusted web server, sometimes it runs on an
untrusted desktop application, but in either case, the client still has
to be authenticated to prevent arbitrary users from telling the database
to do things they shouldn’t be able to do.

But what do I know?

sproggit · December 22, 2021, 11:18am

Steven,
Firstly, many thanks for taking the time to reply; that’s much appreciated. I’m grateful for your advice to search elsewhere and will certainly look further afield.

You point out some significant weaknesses in my first post, so lets see if I can clear that up, to help ensure that I don’t confuse readers further. Firstly, to clarify what I mean by thin client vs. thick client: where I learned the terms, “thin” client means one in which nothing apart from presentation logic sits on the client. There are two common scenarios that I can think of to use as examples. The most obvious of these would be web browsers (with a caveat - we do see some instances where developers use javascript to do things like input validation), but better examples might be telnet, green-screen applications and X-Window based applications. By contrast, “thick” clients are those where business logic executes on the client workstation, often with the server just providing storage logic via an ODBC or similar connection.

Your interpretation of my use of “local account” was correct - I meant “local to the server” but should have made that explicitly clear.

Obviously the most practical way for me to push my knowledge envelope is to “suck it and see” and experiment. As I stated in my original post, I’m a little nervous of this, because I might find myself in a scenario where things look like they’re working, only to hit a wall a couple of months down the road. But I think even that would be a worthy learning experience.

I’m continuing to think about how this might work and to try and figure out how to keep this as simple as possible, which is of course the first and most significant challenge. For example, if I were writing this in PHP, I would be using Objects to abstract all of my main data structures.

Back in Python, I could write a simple piece of SQL along the lines of,

“SELECT userAccountStatus, userPassword, userPasswordSalt, userLastAuthenticationDate WHERE userID=’” + varUserName + “’;”

and populate varUserName from the Python code running on my client. If I submitted this using my proposed sockets solution, I would end up with either a single returned row, or a null result set with no rows [if the user does not exist]. Handling this using the technique I propose would be pretty straightforward.

((And for anyone reading this and wondering at the selection of returned elements - if the Account Status = “L” for locked, even the right password should not grant access; if the userLastAuthenticationDate > 60 days, then we might want to check and make sure this user is still entitled to use this application; and userPassword is the salted hash of the user’s password (userPasswordSalt is the salt!) - you must always use a salt and you must never write SQL that is vulnerable to injection… instead, always get your SQL to return something you need to process with logic - which implicitly means you can sanity-check your returns in code.))

But where this gets complicated - and where I’m nervous - is what happens if I need to manage SQL result sets of multiple rows. Do I cache or queue these on the server somehow, in my server-side stub code [i.e. the code running the sockets listener]? Yes, I could do that; it would allow me to manage result sets as object collections. But on the other hand, I’m going to start to get concerned about content persistence. How does the socket code know when to “forget” stuff - i.e. purge it’s data caches? How does it manage memory? What if our user finishes; they close the application on their workstation without “logging out” of the back end? Does that data just remain on the server side until the daemon is recycled? With PHP, I can sort of handle all this with server-side objects. I’m following my user through their activities in the code, so I can destroy objects when they’re no longer needed; I can use language features to help.

Well, OK, if the complexity comes from trying to manage the split of data between client and server, let’s do away with the problem and simply push everything to the workstation all the time. Networks are fast, right? I could do this, too. And this is the scenario where I think I am at risk of heading up a dead-end if I’m not careful. The lingering doubt in my head is to think about what happens if I adopt this model and then have an application that needs to pull a large amount of data.

For example, for my first “real” project, once I have this understanding nailed down, I’d like to port something I’ve written before in PHP, which is an application that presents content from NIST’s SP800 publications (NIST = National Institute for Standards and Technology; SP = Special Publications; 800 Series = Technology Security). SP800-53 covers Security Controls and the content of the NIST PDF is laid out in a way that is perfect for being converted in to a relational database that could be navigated on a click-to-follow-link basis… But the “Systems and Information Integrity” control family in that Standard contains 23 discrete entries, each of which has numerous sub-sections. So if you wanted to scroll down a list of those controls looking for the one you needed, my application would need to display that list in a window on your screen, likely with descriptors or meta-data. So I’d run a query that said, in essence, "Get me a list of records where record type is ‘SI’ ". But now my code needs to handle 23 returned rows - and do so gracefully (i.e. without breaking).

SQL provides some help here - it includes the “LIMIT” verb, which would allow me to restrict the number of returned rows to something my code can handle safely. But I just need to think about how to efficiently handle the relationship between what the “stub code” (i.e. the sockets listener, running on the database server) and the business logic running in a PythonGTK application on a workstation.

What this example boils down to is: do I transfer all 23 rows of data to the client “up front” and let the client sort it out? Or do I cache the result set in the server code and return a portion of the set and a pointer to the client? The client can then navigate through the set by adjusting the pointer and sending that back to the server-side code with a request for more rows…

My get tells me - and hopefully by know you can tell I’m just a dangerous amateur at this - that the cleanest approach is to force all my interactions to be self-contained (“atomic”) in nature, so that each Request-Response exchange closes as “finished”. There’s a small trade-off here in that I’m likely going to need to have my client cache the user ID and Password and re-submit every time… which would force the server-side stub code to re-authenticate every time. But… if I write that logic in a callable function with a clearly-defined interface to my application logic, then I can swap that out in future if I have to.

Not the only problem I need to solve, of course, but it’s a good illustration of the sort of challenge I expect and why I’m approaching this carefully!

Anyway, many thanks for your guidance… I shall keep looking!

sproggit · December 22, 2021, 11:23am

“My get tells me” = “My gut tells me”.

If I write code as badly as forum posts, I’m doomed. Doomed, I tell you!

Mholscher · December 22, 2021, 12:25pm

… and open yourself up to SQL injection. Only ever use parametrized queries in any serious application. Or, when using Python, turn to the brilliant SQLAlchemy.

In Python, you can do that too. Django, Flask (to name just the two most popular) are web frameworks that enable you to write web applications and do that in a safe manner.

No. Networks are by far the slowest part of your application. There is not a large penalty if your application can fit its data in a single package, but still, the slowest part is on the network.

To go back to your original question: A fat client solution is nice to limit the load of your server, but is inherently unsafe as the client software runs on a computer that is not under your control. There is a lot of controversy about what is best, I am on the side of “Run as much of your code as possible on the server, under your control”. The browser is a fine client, no need to have anything else.

sproggit · December 22, 2021, 1:47pm

Menno, thanks for responding.

On the SQL example, you make a fair point. I should have added a comment that the string would be assembled after sanitisation of the input and not before. Good to clarify that.

“Networks are fast, right?” Clearly self-deprecating sarcasm doesn’t translate. Since it wasn’t obvious that I was being sarcastic, thanks again for making the correction - just in case someone with less experience reads this and assumes I was being serious.

For your final conclusion, however, I think that you might have missed the point of the question. I’d like to learn how to write python code that interacts with a remote database. I believe that your statement that client-side code is “inherently unsafe” might be a little bit of an exaggeration, because there are of course ways to harden client-side platforms in just the same way that we can use good database design, (stored procedures if we really must), WAFs and traditional packet-filtering firewalls to protect servers. I would absolutely agree that “running interpreted code on a client without adequate safeguards” is likely to be less safe than running the same logic on a server, but that is orthogonal here.

More important is that, as I am specifically trying to acquire good practices related to the subjects of “socket programming with Python” and “accessing a database using sockets”, an answer of “The browser is a fine client, no need to have anything else” might be a factually valid statement, but it is not an answer to the question I have asked.

By all means point out risks in my approach or gaps in my logic - as you have done elsewhere - but please don’t be so condescending and dismissive to assume that I could not possibly have other, valid reasons for wanting to use this approach.

Thank you.

Mholscher · December 22, 2021, 3:15pm

I am sorry if you perceived my comment as condescending and dismissive, because that was definitely not my intention.

The first bit is actually a good illustration of why I wrote:

The second bit is not exactly what I meant, but on re-reading it I can see why you thought so. The point is, your server software does not know which software is running on the client if that is not under your control. It may be your software, a bastardized version of it, or something completely different. You (rightfully) paid a lot of attention to the security of the connection, I assumed therefore you were (also) seeking advice on this part of the design.

Now if this is only to gain experience in socket programming of course my remarks are very much moot.