Struggling to maintain readability and testability for code with side effects

quartz · February 9, 2024, 9:21pm

I am currently trying to write a small library to make calls to a certain JSON RPC API. As is the case when working with an API, the HTTP requests are all over the place and it is very difficult to isolate them. Having a call in the middle will make testing the output so much more laborious and I would like to avoid that.

Another point is typing. Specifically, maybe inspired by Go’s interfaces, I want the purpose of the data to be unambiguous and have a clear association with related methods.

Here is some pseudo code:

input_data = {...}

# io_opertions_1
id = uuid()
# statements 1: generate data to upload
payload = {"id": id, ...}
# io operations 2: make an API request
resonse = request.post(..., json=payload)
# statements 2: produce a local representation
output_resources = {...}

In regards to that code, I have a few idea on how to structure it and I would be interested in your thoughts.

1. an impure function


input_data_ = {...}

def do_something(input_data):
	# io_opertions_1
	id = uuid()
	# statements 1: generate data to upload
	payload = {"id": id, ...}
	# io operations 2: make an API request
	resonse = request(..., json=payload)
	# statements 2: produce a local representation
	output_resources = {...}
	return output_resources


ouput_resources = do_something(input_data=input_data_)

This seems to be the easiest and the most readable. There is not much overhead. To associate the data with the function, I could just use python types.

The main issue with it is that the code is cumbersome to test due to IO at the beginning and at the end. I will have to mock the requests library. Also, I will have to duplicate a lot of code if I want to use something that is not requests, e.g. maybe httpx or an async library.

2. Split the code with side effects from the core logic


# statements 1: generate data to upload
def process_input(input, id):
	...
	payload = {"id": id, ...}
	request_args = {method:"POST", "json"=payload, ...}
	return request_args 


# statements 2: produce a local representation
def process_response(response):
	output_resources = {...}
	return output_resources


def do_something():
	# io_opertions_1
	id = uuid()
	request_args = process_input(input, id)
	# io operations 2: make an API request
	resonse = request(..., json=payload)
	output_resources = {...}
	return output_resources


ouput_resources = do_something(input_data=input_data_)

After this refactoring we have two pure functions that contain the core logic and which are trivial to test. do_something() will be tested in the integration tests, which I am happy with.

That being said, the function being split into three is not something I am fond of. It definitively breaks the reading flow because the smaller functions don’t really make much sense on their own. For smaller operations this means that a big proportion of the code will be passing arguments from one function to another.

3. Split the code with side effects from the core logic \w Generators


# statements 1: generate data to upload
def _do_something(input_data):
	# io_opertions_1
	id = yield
	# statements 1: generate data to upload
	payload = {"id": id, ...}
	request_args = {method:"POST", "json"=payload, ...}
	response = (yield request_args) 
	# statements 2: produce a local representation
	output_resources = {...}
	yield output_resources


def do_something(input_data):
	_generate_something = _do_something(input_data)
	next(_generate_something)
	# io_opertions_1
	id = uuid()
	_do_something.send(id)
	request_args = next(_do_something)
	# io operations 2: make an API request
	resonse = request(..., json=payload)
	output_resources = _do_something.send(response)
	return output_resources


ouput_resources = do_something(input_data=input_data_)

Here I like the fact that _do_something actually preserves the original flow of code like in the 1st example. At the same time all the IO is isolated into do_something.

What I don’t like about it is that it is somewhat difficult to understand what is happening. do_something also looks ugly.

4. Passing functions with side effects as arguments


input_data_ = {...}

def _do_something(uuid_func, request_func, input_data, uuid_func, request_func):
	# io_opertions_1
	id = uuid_func()
	# statements 1: generate data to upload
	payload = {"id": id, ...}
	# io operations 2: make an API request
	resonse = request_func(..., json=payload)
	# statements 2: produce a local representation
	output_resources = {...}
	return output_resources

do_something = partial(_do_something, uuid_func, request_func)
ouput_resources = do_something(input_data=input_data_)

Here we preserve the original flow, there is not to much boilerplate. It is also fairly easy to test.

It would’ve been perfect if not for the static type checking completely giving up on uuid_func and request_func. This feels like asking for trouble.

5. Split the code with side effects from the core logic \w Data Classes or Attrs


input_data_ = {...}


@declare(frozen=True)
class Request:
	method: str
	url: str
	payload: dict

	def __call__(self, ...):
		# io operations 2: make an API request
		requests.request(self.method, self.url, json=self.payload)


@declare(frozen=True)
class Resource:
	id: str
	data: dict
	...
	@classmethod
	def setup(input_data):
	    # io_opertions_1
		return cls(uuid, input_data)

	def _prepare_request(self):
		# statements 1: generate data to upload
		payload = {"id": id, ...}
		request_args = {method:"POST", "json"=payload, ...}
		return Request
	
	def _process_the_response(self, response)
		# statements 2: produce a local representation
		...
		return cls(response, self...)

	def do_something(self):
		request = self._prepare_request()
		response = request()
		new_state = self._process_the_response()
		return new_state


resource = Resource.setup(input_data_)
ouput_resources = resource.do_something(input_data=input_data_)

Now hold to you seat, buddy, because when I see those Attrs, I pull out my cyberpunks glasses and hack directly into hyperspace. With Attrs it is easier for me to think about what I would want to do and I immediately start adding things that… I don’t really need right now.

The good part is that it is easy to relate the logic to the data. The data itself and its purpose is unambiguous. In the end, for me, this is just structs with methods.

The bad part, jokes aside, is that while it is easier to think about what I am trying to write, it is also very verbose and sparse on meaning. If I compare it to the 1st example, I am not exactly adding anything new. Remembering the [PyVideo.org · Stop Writing Classes](Stop Writing Classes) talk I can see that at twice the amount, I don’t receive twice as much benefit.

6. Deferred computation or Command Pattern-like solution


input_data_ = {...}

def do_something(id, input_data):
	# statements 1: generate data to upload
	payload = {"id": id, ...}
	deferred_action = {
		"request_args": {"method": "POST", "url": input_data["url"],"json": payload},
		"build_resource": lambda json: {...}
	}
	return output_resources


# io_opertions_1
id_ = uuid()
deferred_action_ = do_something(id=id_, input_data=input_data_)

def apply(deffered_action):
	# io operations 2: make an API request
	response = requests.request(**deferred_action[request_args])
	# statements 2: produce a local representation
	resource = deferred_action["build_resource"](response.json())
	return resource

To make this example simpler, I moved the the uuid() out.

I like this because all the key logic is still encapsulated in do_something but the side effects are outside. I think the flow of the code makes it more readable than examples 2 through 5. Maybe the lambda can be an external function.

The only thing missing here is that the deferred action as a dictionary is a not transparent. But here I think the solution in form of a Data class would be acceptable.

Here is where I am asking for your input:

Could anyone give some constructive criticism to the points I have proposed? Suggestions such as articles or videos to read and watch are also welcome.

Thanks advance! Happy coding!

hansgeunsmeyer · February 9, 2024, 10:17pm

I would suggest making a strict distinction (in your implementation) between the protocol layer and the transport layer. In the layer that deals with the protocol you shouldn’t have any reference to (for instance) the requests lib. And the transport layer doesn’t really need to know which protocol is being used. In the code snippets that you provided you seem to be mixing up those two - I don’t see essential differences between the different samples you gave in this regard.

I don’t know - I would look at some existing implementations of json rpc in Python, and see how other people dealt with this. I saw there are some libraries around on the GitHub (have never used them, so cannot make recommendations).

quartz · February 9, 2024, 10:39pm

Hi Hans,

Would the second example be the closest to implement Transport and Protocol layer separation? Could you give me a short example?

Thanks!

hansgeunsmeyer · February 9, 2024, 11:56pm

Hi - I don’t think any of the code here really separates the two, but perhaps nr. 2 does come closest. If you look at json-rpc/jsonrpc/jsonrpc1.py at master · pavlov99/json-rpc · GitHub
you’ll see kind of what I have in mind. In that implementation they only implement the protocol layer.
On top of that you’d then build a server/client or plug in a transport layer (using either requests or other suitable libs).

kknechtel · February 10, 2024, 5:36am

Sure they do. You were able to give them sensible names, right? They have a coherent purpose, right? There’s a function to create request_args from whatever input data and an ID, and a function to convert an API response into a data structure. Those ideas are easy to understand.

Taking the actual implementations of those ideas out of line (to separate functions) is helping to understand the larger function, because it abstracts away details that are irrelevant in the larger context and isolates them to the more specific context where they are relevant. This is one of the fundamental ideas of programming. If you’re worried about jumping back and forth to read the entire process, you’re missing the point. When you read a non-fiction book, you don’t feel the need to jump back and forth to the table of contents, yet the latter is still useful. You can read the entire table of contents in order to understand what the book is about, and you don’t have to look at the chapters for that. That’s the point of having it.

… So what?

That said, I wouldn’t normally think of uuid as performing I/O, and I wouldn’t normally think of it as a separately needed input for creating the API request. If the API demands a UUID for each request, then generating that UUID is a natural part of the process of generating the request. Meanwhile, although it’s pedantically true that the UUID generator might rely on state external to the program, it likely isn’t performing a meaningful alteration to that state. (If it’s time-based, then it’s just observing state that independently constantly changes anyway; if it’s RNG-based, then you probably don’t really care about preserving the stream state anyway.)

Also, aside from all of that, you will realistically still need to mock request to test everything properly.

quartz · February 10, 2024, 1:31pm

Hey, thanks for you feedback.

I understand your point. Breaking things down can make things clearer and hide some complexity. However, as I see it, it does so at the cost of adding more complexity. Let me try to give you an example:

Let’s say I have a fairly long function 50 that is doing something. If I break it into 5 functions, I will roughly get 6 more lines of code: 2 spaces in front of the signature, 1 for the signature itself, 1 return and 1 call to the function. This is without estimating the effort that is necessary to set the types and sometimes prepare the arguments. In the end we get 75 lines of code, where the original function’s body is now 5 lines long and after looking at it one can imagine what it does but doesn’t really have a clue until, as you mentioned, one starts jumping around.

I get your point with the table, as in the end the reading of the code can still be pretty linear with a table of contents giving a good idea and the chapters(functions) following below in the correct order.

Although, this same argument can also be turned around: You don’t really need a table of contents for a single article, as long as the headline is descriptive enough. In my case, the original do_something is 20 lines long. It is, in my opinion, very straightforward to read because the variable names are reflecting exactly what is happening, exactly the same way the function name does.

I would say it is just harder to maintain and justify. For example, creating do_something is easy to justify because that code would be used and putting it into the function would keep the code base meaningfully DRY. But other functions would be called from do_something once.

Here is a section a blog post that covers exactly what I am referring to http://bryce.fisher-fleig.org/separate-io-from-logic/#when-separating-logic-and-io-doesn-t-make-sense .

Thanks for pointing that out. I realized that my issue is not so much with IO as it is with side effects. I have updated the table accordingly.

quartz · February 10, 2024, 2:30pm

Thanks Hans for the example,

I read the code in the JSON-RPC repo and applied it. Would the 6th example be better at separating the layers?

hansgeunsmeyer · February 10, 2024, 3:56pm

Hm - I don’t think so (though there is nothing wrong with the code as such). The main problem I have with it is just that it entangles building the protocol payload with specific actions (POST) (in a totally different protocol) and with a very specific 3d-party library (requests) which seems irrelevant to this payload. I’ve a feeling we’re kind of on different wavelengths here - I’m talking about protocol and transport layers, while you are talking about IO and non-IO. IO vs non-IO doesn’t seem the right perspective to me when designing this code (since in a sense it’s all about supporting certain kinds of IO).

I think some way of refactoring the code - using separate functions - as @kknechtel also emphasized – can only help. But then you still need to deal with the different layers (depending on what you actually want to support in your lib). What I would do – mostly to keep going – is

keep your questions in the back of your mind, don’t decide on them yet
select whichever of your alternatives just feels most natural to you
but refactor that code first as code that only supports the protocol (encoding/decoding of payloads plus whatever steps the protocol has), without making any references to libraries such as requests
when you run into trouble or start writing unit tests, refer back to your initial questions and
see if your new code makes sense or could use further refactoring

I don’t know if this makes sense to you. I’m not a strong top-down programmer, but someone who is continually refactoring and tinkering with code, so this process may not suit your tastes

quartz · February 20, 2024, 1:01pm

Hello Hans,

Thanks for the kind reply.

That makes sense to me. At the same time, I just don’t know how to separate, to continue with your example, the API core logic from the fact that it is implemented by a specific method. The method (“POST”, “GET”, etc.) and the API action seem to be intrinsically related.

I am brushing through the json-rpc, that you shared with me and am also looking into Algolia’s API client library. What do you think about the later?

I am bringing IO vs non-IO in because I like the functional approach, as far as I understand it at least, and because I don’t have experience with managing layers separately.

I am not sure if I am missing the point, but extracting the code into functions is just not something I like anymore. The main reason is because it hides complexity and obscures issues. If I have 100 lines of logic then usually it is a good indicator of something being really wrong. Of course, if the code needs to be reused, is a method for a certain data, or needs to be simplified for the user of API I will extract it.

Thanks I am following your advice as I am working on my project.

I think me, too. I already re-wrote code at least a couple times.

Thanks again.

hansgeunsmeyer · February 20, 2024, 3:07pm

Unfortunately, I’m not familiar with that library. Took a very quick look at the sample python code - which seems pretty clean.