Make Execution Flow Obvious
Four paragraphs appearing at the beginning of the chapter Distinguish Algorithm-like From Data-like Code apply here. For convenience, we repeat them verbatim here:
Time is linear. Time is just one damn thing after another.
Important corollary: The execution of a computer program is a sequence of operations, one after the other. We can say that program flow is linear.
Even more important corollary: Reading is a linear process. We read one element at a time, incrementally building an understanding of the whole piece.
Thus, to answer "what does this code do?" is to understand the sequence of operations during the execution of the code. Furthermore, our understanding of that sequence of operations is built by reading parts of the code, one after the other.
Therefore, a key principle for structuring code for readability is: make execution flow obvious.
Before exploring how to apply this principle, we shall first deal with the exception - code that does not have a meaningful execution order, which we have defined as 'data-like code'.
Data-like Code Favours A Tree Structure
Data can be thought of as just a bunch of static variables. However, the variables may be grouped in various ways, and the groups themselves may be further grouped into higher-order sets, forming a hierarchy. Thus, a tree-like structure can be imposed onto the bunch of variables. If the data is correctly abstracted in this way, it will make more logical sense, and be easier to understand.
Obviously, code whose main purpose is to represent data should be structured so as represent the abstract tree of variable groupings. Then, accessing the data in the rest of the program will be as intuitive and logical as possible.
Consider an example of some data in JSON format. The first example shows data in 'flat' format.
{
"student_id": 12345678,
"name": "Joe Aloysius Bloggs",
"email": "j.bloggs@gmail.com",
"phone": 07412345678,
"street_address": "21 Jump Street",
"town": "Crapstone",
"county": "Devon",
"postcode": "PL20 7PJ",
"bank": "Barclays",
"account_name": "J A Bloggs",
"sort_code": "12-34-56",
"account_number": "87654321",
}
The second example shows the same data grouped into a sensible tree structure.
{
"student_id": 12345678,
"name": "Joe Aloysius Bloggs",
"contact_details": {
"email": "j.bloggs@gmail.com",
"phone": 07412345678,
"postal_address": {
"street_address": "21 Jump Street",
"town": "Crapstone",
"county": "Devon",
"postcode": "PL20 7PJ",
}
},
"bank_details": {
"bank": "Barclays",
"account_name": "J A Bloggs",
"sort_code": "12-34-56",
"account_number": "87654321",
}
}
The second example with the tree structure is obviously much easier to parse.
Suppose the JSON example above needs to be represented as a data structure in a computer program. There are various ways of converting JSON into, for example, a composition of objects. The details of that conversion are not important here - what is important is that the organised tree structure is preserved. Assuming that details of reading and converting the JSON are abstracted away, then (Python) code that uses the data structure could look like this:
student_json = read_student_json()
student = convert_json_to_student_object(student_json)
...
email = student.contact_details.email
...
postcode = student.contact_details.postal_address.postcode
As we can see, the tree structure is accessed in an intuitive way using the nested 'dot' syntax of object attributes.
The logical grouping and sub-grouping of all the data elements makes the data structure easy to understand and work with. So favour tree-like structures for data-like code.
A Fundamental Cause Of Badly-Structured Code
Now that we have seen how tree-like code structures can be a great idea, let us indulge in a rant about how tree-like structures can be a terrible idea.
Why do intelligent people write such obfuscated, tangled code? Here is one reason:
They are organising code as if it were nothing more than a static bunch of text statements. Of course, it is a static bunch text statements, but crucially, it may be more than that: there is another dimension to consider. We'll come back to this point.
The intelligent but unwise programmer looks at code as a bunch of static text statements, and proceeds to 'organise' it. The programmer abstracts out an underlying logical structure, and breaks the code into separate parts. He or she carries on, further breaking each newly-created part into separate logical chunks. The final result is a large number of separate pieces, all connected together in a tree structure. The intelligent but unwise programmer is probably proud of this achievement, believing that he or she has cleverly followed the Separation Of Concerns Principle to the letter. The code is now so beautifully organised.
Now, as we have seen, this is eminently appropriate as long as the code is data-like.
But most code in an application is algorithm-like. For algorithm-like code, this approach is completely and utterly wrong. As we noted above, there is another dimension: time. More specifically: execution order.
Algorithm-like code consists of a series of actions performed in a sequence. To understand the code, one must understand the exact sequence of actions. A linear type of structure obviously perfectly captures the sequence of actions. On the other hand, a tree structure obfuscates the sequence of actions.
Algorithm-like Code Suits Linear Structures
As the preceding sections have made clear, linear-type code structures maximise the readability of algorithm-like code. Let us consider a plan of attack for achieving well-structured code, using our skills of software abstraction to separate concerns.
Practical Advice:
- Most code follows the pattern handed down from the ancients: Take some data, do things to the data, put the data somewhere.
- Assume we are considering a non-trivial algorithm of this type, that takes up more than one screen of code to implement.
- Take the entire algorithm, and decide what the different logical elements are. A logical element does one thing, it is a logically distinct operation. This breakdown has nothing to do with the number of lines of code: Some operations can be done in one line of code, others in one thousand. In other words, separate concerns.
- Create a separate function to do each logical element. Because of the clean logical separation, it should be easy to give the function an accurate and complete self-documenting name.
- Now the entry point, the algorithm in question, can be written as a 'main' or 'orchestrator' function containing a linear sequence of sub-functions.
Now, code is fractal: each logical element described above may itself be composed of distinct logical sub-elements. The same principle applies at any level: for each element, write code that clearly reveals the flow of execution order.
Because of the fractal nature, we end up with code that has a nested structure, and is therefore tree-like in some sense. However, the structure is not a 'pure' tree - there is a critical additional property: The linearity of each element means that an explicit execution flow can be traced through the structure.
The diagram below illustrates the 'nested chain' structure. The black arrows show the execution flow.
An example sketch in Python would look something like:
def main():
config = read_config()
data = load_data_from_file(config)
transformed_data = operation_1(data)
...
...
discombobulated_data = operation_4(transmogrified_data)
...
def operation_1(data):
modified_data = sub_operation_1(data)
...
mangled_data = sub_operation_3(altered_data)
...
def sub_operation_3(data):
mutated_data = sub_sub_operation_1()
...
# Et cetera.
As noted in the section above A Fundamental Cause Of Badly-Structured Code, the point is not to create a neatly-structured tree of nodes, with each node having minimal content, resulting in a very deep tree. Therefore, resist the temptation to group the functions into higher-level functions with vague names like transform(). This incredibly widespread plague of abstraction addiction is the opposite of writing readable code.
Furthermore, as noted in the chapter Separate Input/Output From Computation, Input dependencies such as reading a configuration file, reading environment variables from the operating system, loading a data file, etc, should happen at the start of the algorithm. Making dependencies obvious is an important part of making code easy to understand quickly, so declare them up front.
The Daisy-Chain Anti-Pattern
Code with a 'daisy-chain' structure has a linear structure, but lacks a proper 'main' or 'orchestrator' function. Code consists of a chain of function calls, where each function calls the next function in the sequence, instead of returning back up to an orchestrator. There are two main flaws of this structure:
- With no 'main' or orchestrator function, the overall structure of the algorithm cannot be seen anywhere. One must read all the functions in turn to build a mental picture of the key steps in the algorithm.
- The entry point will be the first function in the sequence. Since it will have a single clear concern, it should have a simple descriptive name such as
read_data_file(). However, its final act is to call the next function in the sequence, thus ultimately executing the entire program. Executing the entire program is the mother of all side-effects for a simple function!
So don't do this. Instead, use the (nested) linear structures with orchestrators described in the previous section.
The example sketch above structured in this anti-pattern would look something like this:
def read_config():
...
load_data_from_file(config)
def load_data_from_file(config):
...
operation_1(data)
def operation_1(data):
...
operation_2(data):
def operation_2(data):
...
operation_3(data)
Lexical Order Should Follow Logical Order
By Lexical Order we mean the order of words in our source code.
By Logical Order we mean the logical order of operations that must be done to achieve the aim of the code. Note that this is not necessarily the order of operations that execute on the machine, as that may depend on the compiler optimisations et cetera. Rather, we mean the logical order of steps that makes it easiest for a human to understand what the code is trying to achieve.
For example, consider some code that does two simple steps:
- Retrieve a value from a data source. (Eg. a Python dictionary.)
- Pass that value into a function.
Code to achieve that is sometimes seen written something like this:
some_function(
relevant_value = source_dictionary.get("obscurely_named_variable")
)
To be understood, the code above must be read from the inside out. For such a simple example, this is not such a problem, but the code is easier to read if the lexical order follows the logical order of operations:
well_named_variable = source_dictionary.get("obscurely_named_variable")
some_function(relevant_value=well_named_variable)
Arguably, one cost of this improved code is that is now necessary to create an explicit 'dummy' variable to pass data to the next line. But there are two benefits to having this dummy variable:
- Debugging is much easier. A debugger can show the value of the dummy variable.
- The dummy variable can be given a self-documenting name that describes its purpose in the context of the function call. As the example indicates, the original variable name in the data source may have an obscure name that reveals nothing about its meaning in the local context.
Now, for such a simple example as the one above, inside-out code hardly matters. But for more complex function signatures, and any kind of nesting of function calls et cetera, inside-out code rapidly becomes a problem: The reader must perfectly maintain a mental stack of code operations in reverse order, or re-read up and down several times to figure out what is going on.
A reasonable metric of 'readability' is the extent to which code can be read in order, from top to bottom, without going back. Obviously, code in which the lexical order follows the logical order maximises this readability metric.
For simple examples, inside-out code does not matter, and could be argued to be a matter of taste. But the style must rapidly be abandoned for code of any complexity, if readability is to be maintained. For consistency, then, it makes sense to always favour code in which lexical order follows logical order.
As an aside, the inside-out style mirrors function composition in mathematical notation. For example, f(g(h(x))) means 'apply function h to x, then apply function g to the result, then finally apply function f to the final result'. Since functional programming languages are inspired by the mathematics of functions, perhaps they have tended to promote inside-out coding styles. Interestingly, modern functional programming languages tend to provide constructions with a syntax along the lines of: h() -> g() -> f(), that denote function composition in a readable left-to-right manner i.e. the lexical order follows the logical order.
Summary
The structure of code should reflect its purpose. Is the purpose of a module of code to store data or is it to do some computation?
Note that a data-like code component may still have algorithm-like code within it, for example, non-trivial accessor methods on a complex data structure. Within those accessor methods, the advice for algorithm-like code applies (favour linear 'nested chain' structures).
Code should look like what it is. Code should look like what it does.
🙠