The Art Of Abstraction

The English word 'abstract' is derived from the Latin word abstractus, meaning 'pulled away from'. (The Latin particle abs means 'away', and one may recognise 'tract' inside words like 'traction' and 'tractor', and see some connection to pulling. A farm tractor pulls a plough.)

The first use of abstract in Middle English was as a noun to describe the summary of a longer text. This usage is still standard in papers published in scientific journals, which nearly always have an Abstract at the start of the paper that summarises the key findings. The other main usages of the word abstract are 'non-material, not concrete' and 'general, not specific'. (There is also Abstract Art, which does not depict physical, material things (and is usually bloody awful).)

From the point of view of philosophy, the art of abstraction is an essential tool in the scientific method. It is therefore also an essential tool in engineering. To begin explaining how abstraction is so useful, we shall start with this definition:

To Abstract: To pull out the essential elements of something.

A corollary is that abstraction is the art of ignoring the irrelevant. We identify what is important, and ignore the rest because it does not matter to our intellectual context.

The great power of abstraction comes from the fact that a single concrete entity can be abstracted in different ways. There are many different abstractions to be made from a single thing.

Example: Consider an apple. There are many different 'essential' things that we can identify about the apple. Which elements we consider to be essential is a choice, that will depend on what we care about in the relevant intellectual context. Here are some elements of an apple, each of which could be considered the 'essential' element, depending on the intellectual context:

It is coloured red.
It is approximately spherical.
It is edible.
It is a member of the plant kingdom.
It has a lifespan measured in weeks (without special cold storage).
It is traded internationally.
Its retail price contributes to the Consumer Price Index measure of inflation.
Et cetera.

By abstracting from a concrete apple in these ways, we naturally tend to think in terms of categories or sets. An apple is is in the set of red things. An apple is in the set of plants. Et cetera. Categorical or set-based thinking is very natural to people with sufficient education in mathematics. In the modern world, primary school children learn the Venn diagrams of set theory. And the extension into formal Boolean algebra and propositional calculus may be encountered in the first year of university.

Categories: Abstractions In Common

As alluded to in our mention of sets above, if we abstract the same thing from two different concrete objects, we have found something that is common to both. For example, an apple is a plant, and a cactus is a plant; both belong to the set of 'plants'. Furthermore, an apple is edible, and a fish is edible; both belong to the set of 'edible things'.

So abstraction allows us to make connections between disparate things that are deeper than the immediate concrete connections. An example of a merely concrete connection: an apple and an orange may both be in the same fruit bowl. This connection is an abstraction ('set of things in this fruit bowl'), but it is mere association, and we would scarcely think of it as abstraction.

Hierarchies Of Abstraction

Abstraction can be applied to abstractions themselves. For example, the abstraction 'fruit' defines a set including things like apples and oranges. And the abstraction 'flowers' includes things like tulips and roses. Moreover, the sets 'fruit' and 'flowers' are both plants. Qualities that define plants can be abstracted from both fruit and flowers.

Hierarchies of abstractions can be constructed: sets of sets; sets within sets.

The value of such set hierarchies is that they underpin logical inference. A classic illustration is the logical syllogisms formalised in Ancient Greece: "All humans are mortal. Socrates is a human. Therefore, Socrates is mortal." Various other logical rules have also been formalised.

The Power Of Abstraction

Abstract reasoning lets us find hidden connections between concrete observable things, and derive new information by using logical inference. This allows us to make predictions about the world. Science is essentially the systematic application of abstract reasoning. Engineering can be viewed as a scientific approach to building things -- such as software.

'Intelligence', broadly speaking, could be described as the ability to abstract the essential elements of a situation and apply logic to them to derive new knowledge.

Abstraction lets us impose a structure on the universe that allows us to make sense of it. With careful effort, we can construct a tree-like structure of concepts and categories that best allows us to understand a problem. And we can construct other conceptual hierarchies of the same physical situation, in order to look at the problem in a different way.

Concrete-bound thinking: The Flynn Effect

An important point is that abstract reasoning is far from universal. People with the privilege of a Western-type education in the 21st century use some degree of abstract reasoning all the time, and usually take it for granted that everybody always uses it and always has. This is not true. We have hard data on this, from research on the Flynn Effect of psychology.

Flynn Effect: The discovery by New Zealand academic James Flynn that humans are getting smarter: The IQ scores of the population are increasing over time. The main cause of this seems to be that modern humans have been exposed to abstract reasoning through their education, and are thus better equipped to answer the logical reasoning questions in an IQ test. How does this compare to the reasoning of uneducated people? Consider these questions posed to illiterate peasants in Soviet Uzbekistan in the 1930s. They went something like this:

Question: "There are no elephants in Germany. Berlin is the capital of Germany. Are there elephants in Berlin?"
Peasant: "I don't know, I've never been to Berlin. But if Berlin is a big city, they might have elephants there."

The modern mind would see this a failure to use set theory or logical inference.

Another type of question lists some objects, and asks for the connection between them.

Question: "Consider a dog and a rabbit. What do they have in common?"
Peasant: "You use a dog to hunt rabbits."

The modern mind would think that the 'correct' answer is "They are both mammals", and consider the peasant to be thinking at the lowest possible level of abstraction - mere association.

It would be easy to mock these peasants for their lack of sophisticated reasoning. But that would be utterly unfair. In fact, primitive peoples have an encyclopaedic knowledge of their environment, and a sharply-tuned operational intelligence that promotes their success in that (often-harsh) environment. Such people never need abstract reasoning, and see it as useless word games.

Risks And Failure Modes Of Abstract Thinking

Correct Abstractions

Scientific knowledge consists of elaborate conceptual structures that accurately map to the real world, and are internally consistent. The key distinction that makes a concept 'scientific' is that it has been tested by experiment. Abstractions are formulated, and eventually confirmed (or falsified) by experiments in the real world.

For example, the subatomic particle known as a quark started out as a purely mathematical abstraction that simplified and clarified the theory of subatomic particles. But much later, high-energy physics experiments confirmed the physical existence of quarks.

But if our abstractions cannot be tested with the rigour of science, then how do we decide if they are 'good' or not?

The answer to this question is deep, and beyond the scope of this essay. Only some rough guidelines can be suggested:

A good abstraction simplifies our understanding of the situation.
A good abstraction unifies various disparate elements, that were previously seen as having no relation.
A good abstraction will tend to be generally applicable, to a wider problem space beyond the current problem. It may have predictive power.
A good set (or hierarchy )of abstractions will be internally consistent. There will be no contradictions.

Thus, abstract reasoning is actually very risky, in the following sense: It can be done very badly, leading to a false understanding of the universe, and that false understanding may persist for a very long time, because of the difficulty of testing it. Enormous amounts of time and energy may be wasted on building castles in the air, discussing how many angels may dance on the head of a pin.

Primitive people don't suffer much from this problem, because errors in concrete thinking are usually immediately visible. "Trust me, there are no bears in this cave!" Chomp. But it doesn't take much additional cultural sophistication to suffer badly from it. "We need to throw a virgin into the volcano to prevent it from erupting". "Aaaaargh!" Silence. "See, it's working."

Correct Level Of Abstraction

Given that hierarchies of abstractions may be constructed, it is critical to choose the abstraction at the right level for the problem. The correct level of abstraction is this: as concrete as possible (but no more). In other words, choose the abstraction that is the least abstract that you can get away with.

A little-publicised but absolutely fundamental rule of good prose writing (and good communication in general) is to use the lowest possible level of abstraction (but no lower). Think about it: a concrete concept is more precise, explicit, and easy to understand than a vague generality. The same applies to writing computer code.

As an aside, overly abstract language may be deliberately employed by people not wanting to commit to any specifics, or more nefariously, by people trying to give the false impression they know what they are talking about.

When a person reaches the level of intellectual maturity at which they can start using abstract language, they may take delight in using it as much as possible. This irritating phenomenon often appears in the early teens, and some people never grow out of it, wrongly believing that overly-abstract language is a sign of intelligence. At its worst, it manifests as the use of multiple high-level abstractions to triangulate on the concrete; the specific object is the intersection of several sets. For example: "Manual excavation implement" is the intersection of 'set of things operated by hand', 'set of things related to digging', and 'set of tools'; in other words, a spade. Don't be a wanker, call a spade a spade.

The point is that rational, abstract thought is difficult and unnatural. But is is an essential skill for software engineering. One needs careful, disciplined thinking to develop this skill. Make no mistake, Uzbek peasants suck at programming.

Abstract Reasoning in Software Engineering

The preceding section on the art of abstraction contained rather a lot of philosophical material, none of it directly concerned with programming. The reader may be surprised at this. But there is a motivation: The word abstraction is used all the time in software engineering, very often without much depth of understanding. This book strives to help with this.

In practice, abstraction in software engineering works like this:

The problem domain is analysed, and broken into different entities, by identifying fundamental elements and abstracting out various actions, responsibilities, etc. The relationships between the various entities give rise to a (probably hierarchical) structure.
In general, each separate entity is created as a separate module of computer code. The separate modules are connected together to reflect the relationships between the abstract entities.

Your computer code will live or die depending on the quality of your abstractions. Abstractions come to life as the very components and structures of your code, the architecture itself. Once the code has been written and is in use, it can be very hard to change the structure. Therefore, it is very important to spend the effort finding good abstractions, because a hastily-conceived bad abstraction may become a burden that you are permanently stuck with. The wrong abstractions can f@*$ you up.

This book is in a large part a guide to abstractions that have proven useful - a guide to various elements of computer code that have turned out to be essential.

Information Hiding

In software engineering, abstraction often results in information hiding. Consider this: Some lines of code are seen to be all related to the same abstract thing, so the code is pulled out into a separate module (function or object). The module name describes the abstraction, and the details are hidden away inside the module. At the point in the code where the new module is used, the abstraction is reified as an explicit module name, and the irrelevant details are hidden.

For more discussion, see the chapter Separate Inherent And Collateral Complexity

Not all abstractions involve information hiding. An essential aspect of the code may be pulled out into its own module, but nothing is hidden. All elements are simply accessible from a single coherent module. (See the definition of coherent in the chapter Separation Of Concerns.)

Malabstractions

Since both abstract reasoning and software engineering are difficult and unnatural (and in the case of software engineering, unwholesome), standard errors in abstraction abound. Here we examine some commonly-observed errors in abstraction (or 'malabstractions', to coin a term).

Making the wrong abstractions.
Using the wrong level of abstraction, usually too high a level of abstraction.
Abstraction Addiction 1. Making things as abstract as possible, i.e. as general as possible.
Abstraction Addiction 2. Abstracting out everything that can be. (See DRY below.)
Adding useless layers and wrappers that do not actually provide any abstraction, because they do not hide irrelevant detail, just pass calls through.
Peasant abstraction. Creating a module that is a tangle of disparate elements that are merely associated somehow.

DRY is bad advice

A ubiquitous piece of advice given to beginner programmers is "Don't Repeat Yourself". So beginners assiduously pull out every instance of repeated code and create an abstraction for it. Now, for a given piece of repeated code, this may be the correct thing to do. Or it may not.

There are two rules that should be followed when applying DRY:

A little repetition is better than the wrong abstraction.
A lot of repetition is still better than the wrong abstraction.

Be careful; don't blindly follow DRY. The point is not "Don't Repeat Yourself", the point is "The right abstractions massively improve your code, so search carefully for them".

(The opposite of DRY is WET, which is alleged to mean either "Write Everything Twice" or "We Enjoy Typing". WET is obviously terrible advice. But the joke shouldn't be taken as support for dogmatically applying DRY.)

Peasant Abstraction

Creating a separate module of code that contains a tangle of disparate elements with only some loose association with each other, and proudly calling it an 'Abstraction'. Typically, the abstraction implicitly defines "the set of all the things I might need to do this thing". As noted above, this is barely an abstraction at all - it is mere association.

Obviously, the art of abstraction should be applied to the tangle of disparate elements within the module, and to the module itself in the context of the wider code base. It is likely that the entire module itself should not exist.

🙠