I read something about “Data Oriented Programming” (DOP), which boils down to using JSON-like objects to represent data and keep data and code separated.
I think this was popularized by Clojure. Indeed, it’s very lispy:
-
Usually, one wants syntactically rich code. The compiler will then convert it to an AST (Abstract Syntax Tree).
Lisp’s idea: Why don’t we program directly with the AST? -
In OOP, we want to hide implementation details regarding both code and data. When needed, we can serialize the objects by converting their data into a canonical format.
Clojure’s idea: Why don’t we just use serialized data from the start?
I’m half-joking, of course.
Consider this:
{
"books": [
{
"title": "...",
"author": "...",
"publisher": "..."
},
{
"title": "...",
"author": "...",
"publisher": "..."
},
...
]
}
The main advantage is that “plain data” is fully manipulable using standard functions.
The main disadvantage is the loss of encapsulation, validation, and static types.
What I know is that every time I used “plain data” in my programs, I ended up refactoring it by introducing (data)classes and my code improved. I’ve never felt the need to go back to using “plain data”. I like auto-completion and ahead-of-time type checking too much.
Even though I’ll probably never adopt DOP in my programs, this got me thinking about how we access and organize data.
Yesterday I had to sort some data wrt custom keys:
taus.sort(key=key_func)
The introduction of a key
arg was a good idea, but I also wanted to check the keys beforehand, so I had to do something like this:
keys = [get_key(tau) for tau in taus]
# inspect and alter the keys (don't ask)
...
tau_to_key = {tau: key for tau, key in zip(taus, keys)}
taus.sort(key=tau_to_key.__getitem__)
The introduction of tau_to_key
and the use of __getitem__
is just noise. Wouldn’t the following be better?
taus.sort(keys=keys)
(Even better (especially if we had several args):
taus.sort(*, keys)
That "*,
" would indicate that only kwargs follow and that each x
without an equal sign stands for x=x
, where x
is any valid identifier.)
One simple solution is to implement our own sort
with an arg keys: Callable[[T], K] | list[K] | dict[T, K]
, but the general problem remains.
All we want is a key for each element. The exact way the pairing is expressed should be abstracted away:
- the caller shouldn’t have to put the data in a specific format;
- the writer of the function shouldn’t have to add explicit support for all the formats.
As far as I know, no one has ever tried to generalize the way data is accessed and manipulated. I feel like we’re still in the “asm era” regarding this aspect of programming.
Thoughts?