Request for comment: Making PEP for NOSQL databases

Request for comment: Making PEP for NOSQL databases

Hello everybody,

this is my first time writing in this section and I hope I’m not out of place.

I wanted to propose a new PEP that describes the APIs that a NOSQL database library should have, as I am going to describe below.

These types of databases are spreading very quickly and I believe it is necessary to have common interfaces between libraries. In fact, all four types of NOSQL databases have similarities on the most common operations (CRUD operation) and on database level operations.
This approach has already been defined for SQL databases in this PEP 249. The python libraries that follow this PEP are many and have the same interfaces for objects, methods and functions.

Many python SQL libraries follow PEP 249, using functions, methods and properties with the nomenclature mentioned therein.
This leads to a much easier development and use consistency of these libraries.

I wanted to create a similar thing for NOSQL databases although there are many differences between the four types of databases, grouping them into four categories:

  • Column database, such as Apache Cassandra
  • Key/Value database, such as Redis
  • Document database, such as MongoDB
  • Graph database, such as Neo4j

These four types of databases have different characteristics on how data are requested and how they are inserted but they also have some common peculiarities.

On this assumption I have created a library of interfaces which should represent these common characteristics. I have also written a series of tests to simulate the behavior of existing libraries by encapsulating their methods in interfaces.

nosqlapi is an interface/ORM/utility library that is used to write, in turn, python libraries for NOSQL databases, so that they reflect the characteristics of the interfaces and therefore, of the API.

In this documentation you will find in detail what I will briefly explain below.

Abstract

The PEP introduces the API which describes the interfaces and the names of classes, methods, properties and functions that a NOSQL python library should have.

The API covers all four types of NOSQL databases. The PEP will also provide extended APIs for the unique peculiarities of each database type.

The goal of the API is simplicity and ease of use.

Motivation

The libraries that exist today concerning NOSQL databases are inconsistent in names. For example, it is easy to find objects dealing with database connections called Database, DatabaseConn and again DBClient. These objects produce the same result: an object that allows you to work directly with the database and its data.

These objects are instantiated with the same arguments, but different in names. Some use host for the server name, others use hostname other servers. Same thing for the other arguments.

Furthermore, there is no clear distinction between the database layer and the data layer, as is the case for SQL databases.

It is therefore necessary for consistency and ease of development and use, to have APIs that allow you to unify all this.
Furthermore, instantiating an object that deals at the database level and one at the data level allows for a very clear separation of duties.

For this the API will provide a Connection object which will take care of the database level operations, and a Session which will take care of the data instead.

Rationale

Separating tasks into two different objects has the advantage of isolating programming and execution errors.
Another advantage is that a user connecting to a database may not have permissions to work directly on the data. With this separation it is possible to isolate some information to authorized users.

The Connection object, will directly deal with working with databases. It will never go into the merits of the data in the database you are working on.

The Session object, on the other hand, will work directly with the data that a user can request, insert, modify or delete (CRUD operation). This object will also offer additional methods such as create / modify / delete users, execute Selector and Batch objects.

Each response to each operation can be encapsulated in a further object Response, which can contain the result of the operation and even more information.

In other languages like Java, a library for nosql database types has been implemented: JNoSQL.

Specification

The connection to the database will be done through a Connection object. Once this object is instantiated, you will not have an immediate connection to the data, but only to the outermost layer of the server hosting the database, dealing exclusively with the various databases.

To get the ability to work on data (CRUD operations) you need to create a new object that will be similar to the Cursor object as far as relational databases are concerned: the Session object.

Calling the connect() method of the Connection object will return a Session object .

Each operation performed by a Connection object or a Session object should return an object of type Response.

This type of object can be instantiated with extra information in addition to the return data, such as the call header or an exit code and relative exception object.

Data read operations (_SELECT_s in relational databases) can be implemented directly through a special object (Selector object) that will have a build() method, which will build its query string based on the database dialect. The object can be passed directly to the find() method of the Session object.

In addition, there are operations that are found only in one type of database vendor. These operations can be implemented as extensions of the core API classes or, it will be possible to implement a Batch object to pass a series of instructions together with an instance of the Session class.

Backwards Compatibility

Existing libraries do not have this type of structure and nomenclature to comply with these APIs. In the nosqlapi library there is a decorator that allows you to map the names of existing methods with API compliant names.

Potential Problems and their Solutions

This section outlines some pitfalls that can arise from using the API.

Reference Implementation

Nosqlapi documentation site: noslapi docs

Nosqlapi GitHub: noslapi repo

Nosqlapi production usage: example library

Copyright/license

This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.

3 Likes

This might be better off in the Ideas section, as I only see one prior post of a much earlier proposal there that didn’t attract any comments, and this section is mostly used for discussion of proposals that have made it to the PEP stage, or at least very close to it. I think regular users can move their own posts to different sections, but I can do it if you aren’t able to.

@CAM-Gerlach ok. The precedent post that is in a ideas section was incomplete. That’s why I wrote here. But if you feel it needs to be moved, I do it immediately.
Thanks for the feedback!

What do APIs for these DBs in other programming languages look like? Do any other languages offer any form of uniform API structure for NoSQL databases? If those exist, learning from them and mirroring concepts and terminology from they may be useful - along with covering why or why not or that they “don’t appear to exist for languages [a,b,c,d,e,f,g] at all” in this proposal.

Another place for ideas: IIRC there are some open source multi-cloud tenancy platforms out there to abstract the major cloud providers offerings to make running services redundantly on different providers easier. Do any of those abstract relevant nosql database APIs as part of that?

PEP 249 is “interesting” when looked at today. It is very old, and it was used to describe recommended API practices outside of CPython itself. This merely Informational status. I don’t know if we’d take this on as a PEP today? I’m not against the concept, but it also isn’t describing a Python language feature itself. This is something that’d feel better delegated to nosql DB API domain experts to decide on a Draft Proposal → Informational recommendation status change for it… but who should that even be? Python db-sig seems almost devoid of traffic these days. The only interesting “recent” thread over on db-sig was a year ago with a few people acknowledging that the 21 year old PEP 249 is quite stale.

It’d probably help if a few of the major NoSQL database API owners were onboard with creating a common API framework for this. I suspect that audience would care about presenting the same API to multiple languages, not just Python.

For a similar example, WSGI (PEP 3333 from 2010) was/is fine as a PEP, bit its spiritual successor ASGI seems to be doing fine as an external specification – see this comment.

2 Likes

Hi Greg, please don’t make the mistake of interpreting stability as being “stale”. The DB-API PEPs have served the Python ecosystem significantly and still do, by aligning database APIs across many different systems.

For NoSQL databases, the same could be done and I also believe that ASGI should be imported as an informational PEP, since it’s important for the Python eco system as well.

These informational PEPs are maintained by SIGs or similar groups outside the Python core system, but nonetheless serve as standards for the whole community.

As for PEP 249 or WSGI not describing Python language feature: the stdlib sqlite module uses the DB API and the wsgiref module implements WSGI. So these are relevant for Python core to some extent as well.

+1 on having a NoSQL informational PEP for the above reasons. It will do good in the community to have a base standard for NoSQL APIs.

1 Like

I doubt that this proposal is worth enough.

SQL has standard and many implementations. HTTP has standard and many implementations.
So making standard API for them had a lot of benefit.

On the other hand, there is no standard of NoSQL. Each NoSQL has vary semantics.
So I am not sure standardized API can leverage something.

Anyway, if you want to create a good standard, you need to get attention from NoSQL Python library authors and NoSQL&Python experts.

I believe some Python experts are here. But I am not sure enough number of NoSQL library authors and experts use this website.

3 Likes

Hi @methane,

you are right, there are no NoSQL standards for all databases, but for every database type, yes.

Each document database, (eg MongoDB and CouchDB) have the same characteristics, even if they may read and write the data differently, under the hood. (the former uses a metalanguage and the latter uses the HTTP API).

For example, both databases can have a .get(doc), .insert(doc), .update(doc), and .delete(doc) method.

Ironically, all four types of NoSQL databases have a data query method (get, find), a data insertion method (insert, insert_many), an update method (update, update_many) and one for deleting data (delete).

SQL has the same operations. The only thing that unites all SQL databases is the SQL language for data queries.
But python doesn’t care about the dialect used to query/insert/delete data in the database: you will always do it with the .execute(sql) or .execute_many(sql) (or possibly .callproc(procedure)) method.

One that uses an SQl library, cares little about how it was built under the hood. we only care that it has an execute() method, an execute_many() and a callproc() method.

Hi @gpshead,

The first approach was in the Java world. There is a Java NoSQL database library developed by the Eclipse Foundation which i calls JNoSQL.

This library takes a different approach; it creates entity objects that present themselves as class decorators that represent an object in the NoSQL world (a database, a table [for column db], a key or a node).
This works great in Java because it mirrors the object oriented paradigm.

For python, since it does not only have OOP as a pattern, I wanted to think differently, implementing interfaces similar to those already made 21 years ago with SQL (still used today in the SQL world, even by other languages) for the reasons mentioned above.

Yes, you are perfectly right. But as happened with python’s SQL DBAPIs, it might be the right reason to start with python.

I know this thread may not be the place to talk about it, but the same APIs implemented in Rust and Go have the same benefit demonstrated in the test modules of my library.
Since I have been developing python for a long time, I have tried to implement them in this language. Could this be a starting point?

Anyway, thank you very much for all the questions and criticisms, because they are constructive and make you think a lot.

1 Like

Would you ask review to the authors of MongoDB and CouchDB client libraries?

I think that’s the best way to create the standard.
I think DB-API is built by this way.

The DB API was built first by taking an existing standard (ODBC) and using its concepts to build a common interface logic for Python database modules.

After this initial round, we had several additional rounds of refinements and extensions, always taking a look at how various database modules implemented additions to the API and standardizing them in a collaborative way (on the DB-API SIG mailing list).

Some of the additions were turned into required standard APIs, others into optional extensions.

The main theme behind the DB-API is that it defines an concept, but leaves enough freedom for database authors to adapt this to their particular database, e.g. by not defining a fixed set of connection parameters in the standard. They are also free to add new APIs in order to expose database specific features, but, of course, then leave the set standard and make it harder for people to switch database backends.

Overall, this strategy has worked really well and resulted in a large number of DB-API compatible database modules. The success of the API standard has made abstractions such as SQLAlchemy or the Django ORM possible.

4 Likes

I’ve tried in the past via IRC and Slack, posting reasons and specs but was ignored.
Precisely, every NoSQL database producer has no reason to think about making an interface standard for a possible competitor.

I’m new into contributing to NoSQL databases, mostly MongoDB. Could we try implementing such a thing as proof of concept and see some real world results, whether good or bad?

Apologies, I didn’t mean to imply no longer important. At least I inadvertently communicated my own unawareness. :sweat_smile:

I don’t expect this to always be true. It’s more that some people may not feel rewarded for such work thus some of the silence. But portability between systems is also its own user-attracting reward: Real world users rightfully seek to avoid lock-in and want to be able to change underlying stacks with as little fuss as possible. In part why SQL exists at all.

So it can be a bit of an “if you build it they will come” scenario until others find it attractive. I expect most would cooperate in the end, or wind up with widely used third party shim modules in front of their API.

stdlib wise we have key value data stores (for better or worse) and sqlite can obviously be used as one so a basic nosql API implementation fronting those would make sense if this becomes a thing.

1 Like

Hi @TobiasHT,
you can try with the MongoDB library without changing a line of code.
Use the nosqlapi.api decorator to map existing methods

import nosqlapi
import pymongo

@nosqlapi.common.utils.api(database_names='databases', drop_database='delete_database', close_cursor='close')
class ApiConnection(pymongo.Connection): ...

connection = ApiConnection('localhost', 27017, 'test_database')

print(hasattr(connection, 'databases'))     # True

Of course, it is applicable to any NoSQL database.

Also, I wrote the tests for the document databases by making Mock on the MongoDB footprint: nosqlapi/test_docdb.py at main · MatteoGuadrini/nosqlapi · GitHub

I’m sure it’s true. In fact, the sooner you learn to use a common interface, the sooner developers and even end users will be able to benefit from the API.

I didn’t quite understand how to use a SQL database like sqlite (relational) as the basis of a NoSQL API for a database like document databases or graph databases (which have no relationship between existing objects).

Thank you for telling history.

I am maintainer of PyMySQL and mysqlclient (fork of MySQL-python). But I just maintain them because they are abandoned. So I don’t know much about how they are born.

Then, how about O-D mapper libraries?

If we can not get support from neither of drivers and O-D mappers, I don’t think this standard API is worth enough.

To have a consistent library of ODM (Object Data Mapping) or OD, you also need to have a standard, that’s why the proposal.
In my library, I have also written a primitive ODM that includes objects common to all four database types and those specific to each. For example, this is the ODM part for column databases.

I think you are approaching this from the wrong angle. Standards such as the DB-API standard for relational database interfaces get adopted because they simplify learning and interacting with new database backends.

Adoption is typically had by having API compliant interfaces to a couple of important backends. These don’t need to be supported by the main driver vendors initially. It’s good enough to have such compliant drivers maintained separately.

Because it makes their lives easier, application developers will then start using those standard compliant interfaces and over time, more and more backend drivers adopt the standard or at least provide a version of the standard API in addition to a more low level native one.

That’s how the DB API grew to become the standard it is today.

ORMs and other meta interfaces on top of the DB API came later. They are not the prerequisite for adoption.

Anyway, in order to get started, I’d suggest to get a few heavy users of NoSQL backends together and start brainstorming. Once you feel that you have a working version, create standard compliant wrappers for a few existing backends and try to get more users and vendors on board. Around that time, it’s time for an informational PEP :slight_smile:

1 Like

Maybe, we are talking same thing.

I called “author of O-D mappers” because they are “heavy users of NoSQL backends”, and the spec might make their life easier a lot.

What I against for is discussing only in Python experts and release the spec in PEP. It seems abuse of the PEP brand.
I think we need some NoSQL experts.