Separation Of Functions And State
The Constitution of the United States Of America contains the important concept of 'separation of church and state'. A similarly important concept in software engineering is the separation of functions from state. (This is a pun on the two different meanings of 'state', as we shall see.)
This principle could also be simplified to 'Separate state from everything else'.
Interlude: Pure Functions
A pure function is a function that behaves like a mathematical function:
- The same input always results in the same output.
- The function has no side effects.
A pure function does not care about the state of the world. Nor does it care about any previous inputs. It does not know how many times it has been called. A pure function is stateless.
The function has no connection to the outside world except through the explicit inputs and outputs.
- The function does not read from anything outside of itself. No reading of global variables, environment variables, config files, etc.
- The function does not write to anything outside of itself. Thus, the function has no side effects.
Pure functions are easy to test, because they are completely self-contained, and the same input always results in the same output.
Pure functions make code easy to reason about. Not least because the self-containment and lack of side-effects implies an absence of coupling.
Pure functions make your code easier to read. Use them as much as possible. Try to structure your code such that everything that can be a pure function is a pure function.
State
State is a broad, abstract term in software engineering. In practice, it means something like the following.
State: Any variables that:
- Can change, especially in these two ways:
- In non-deterministic ways, for example, due to random user input.
- In hard-to-understand ways, for example, during the convoluted machinations of a complex algorithm.
Variables that change in predictable, easy-to-understand ways are also state. However, such simple state is often localised and easy to manage. The advice of this Chapter 'Separate State from everything else' still applies, of course. A good example is an Iterator, which we'll cover in a moment.
Immutable State?
Consider a data structure of immutable constants that:
- Can be accessed, especially in these two ways:
- In non-deterministic ways, for example, due to random requests from a user.
- In hard-to-understand ways, for example, during the convoluted machinations of a complex algorithm.
Since the data structure does not change, some people would not consider it to be 'state'. However, for architectural purposes, all the same advice in this chapter still applies: Separate immutable data structures from everything else.
Separate State; Collate State; Minimise State
Managing state is widely recognised as one of the main sources of complexity in software. Therefore:
- Minimise the use of state in the program.
- Concentrate necessary state into its own modules, separate from the rest of the code.
- Make the rest of the code stateless, using pure functions where possible.
Example: Iterator
An Iterator provides an easy way to loop through some sequential data structure, without the user having to worry about maintaining an index variable, for example. Internally, the Iterator maintains state about where we are in the sequence, but this is completely hidden from the user. So this is a great example of software abstraction:
- The abstract concept of 'iteration' is recognised as being what we really care about.
- So a simple interface is provided that lets the user iterate through the sequence.
- The collateral complexity of managing a variable to index into the sequence, and terminating the sequence, etc. is hidden away inside the iterator module.
- Thus, state has been concentrated into the iterator module, and extracted out of (separate from) the main code.
Consider a classic for-loop that iterates through an array, in C++:
for(int i = 0; i > array_length; i++) {
element = some_array[i]
// Do stuff.
// Horrifyingly, it is possible to modify the iterator variable i.
}
The irrelevant iterator variable i is exposed in the code. Even worse, it is possible to modify the iterator variable during each iteration, so misguided programmers could write insane 'clever' code that is impossible to reason about.
With the Iterator abstraction, code is much simpler and safer. Consider this Python code:
for element in some_array:
# Do stuff.
The state variable i has now vanished. It is isolated inside the Iterator for some_array, and no longer appears in our algorithm. State has been successfully separated from our function.
Warning: Don't Add Complexity Via Too Little State
It is sometimes possible to completely eliminate state by using constructions found in functional programming languages. A classic example is recursion. A for loop may be eliminated by using a function that recursively calls itself.
But this is a bad idea: Recursion is considerably harder to understand than a simple for loop with state. Our goal is not to be clever, but to write readable, maintainable code. So don't do this sort of thing. The goal is not to eliminate state at all costs, but to reduce the use of state down to the point where it maximises readability. A little bit of easy-to-understand state is vastly better than a confusingly 'clever' stateless solution borrowed from functional programming.
🙠