Tuesday, November 1, 2011

My Four Natural Laws Of Programming

(work in progress...)

I wrote my first program in BASIC on a Commodore 8032 in 1984. The computer was magical and I had no idea what was actually going on inside it.
Today the magic of the machine is gone. Yet I still feel the magic of the structure and aesthetics of programming.

Recently, I realized that the more I program, the fewer rules I use do decide what's good.

There are loads of rules out there describing what good code looks like. Too many of them. Put yourself into the mind of a beginner and try to figure how to learn them. I try to condense my rules to a small universal set, which include many others as special cases. I like to draw an analogy to the fundamental forces of physics, which became less numerous with time, embracing more forces previously seen as individual ones.

Further, I believe in the emergence of software architecture when the laws are applied continuously and code gets refactored accordingly.

Do you also think "I know good code when I see it"? I definitely do. Programming is a very visual activity. Good code always looks good.

When I do code reviews, I find myself applying a handful of universal laws, which reliably uncover flaws without having to know what the code is actually doing. The rest if covered by sufficient testing.

I'll write down my personal laws of programming to share them and to become aware of them myself. They lean towards object-oriented programming in some areas because that is what I do most. I also talk about how I check the laws during a code review.

#1 Redundancy

Redundancy means two things: repetition and waste.

Repeating code is an indicator for missing abstraction. It can appear everywhere, from data access patterns, which repeatedly follow the same chains, to algorithmic patterns, which differ in little details not properly abstracted.

Waste code adds no value, solves no relevant problem. However, it introduces bugs and maintenance cost. Waste conjures up workarounds due to missing understanding by programmers. Useless layering and over-abstraction fall into this category.
Using code generators to produce waste makes them waste generators.

The art is to find the right abstraction balance to avoid repetition without introducing waste.

Automated code duplication checkers help to find repetition. Waste can only be identified in the context of the given architecture constraints. Waste in one case may be a necessary construct somewhere else. Mechanical exercise of too limited rules is harmful here. This is where experience makes all the difference.

#2 Consistency

Consistency is about recurring patterns, which do not contradict each other.

Consistent code becomes familiar quickly. It has no exceptions to rules.

Consistency has many facets. Names are chosen consistently. An API uses a consistent level of abstraction. Data access patterns are similar. Things are identified consistently and not sometimes by IDs, sometimes as object refs, for example. A class hierarchy is not designed only to be torn apart with instanceof later. Consistency shows in method signatures in the order of parameters, and imperative vs. functional style. Countless examples are possible here.
Error handling is another large field for consistency hazards.

Checking consistency is something no machine can do for you (yet). As a code reader, I critically watch out for surprises, which introduce a change in patterns.

#3 Fat

With fat I am not referring to code, which has no use and is redundant. Fat is missing structure, it fails to provide a proper segmentation of responsibility. Too much is packed into one location.

Fat is easy to see. In methods for example, there are big code blocks or unrelated blocks. A method starts with one task and continues with some other and you cannot quickly assess where and why. If the programmer was diligent, he left some comments between the blocks. Classes contain too many methods and again you cannot see where the boundaries are between the tasks a class has.

You can tell by one look that something is not ok, regardless what the code actually does. It is purely a matter of form. Automated metrics like XS (excessive structural complexity) in Structure101 or the well-known cyclomatic complexity help to find the hot spots quickly.

#4 Coupling

Coupling addresses the problem of who knows what.

This is a huge field and I think the most important aspect of architecture on large and small scales.
The fundamental question is:

Why do I access this information here and now from that source?

Bad coupling restricts the source of the information in its ability to change. It is like pinning a tense wire to it. You find coupling problems in interfaces to external systems, which are too tight to be backwards compatible, in missing encapsulation, in badly organized separation of concerns causing wild dependencies (hair balls), in methods juggling with too many things at once, in packages with little coherence. Bad coupling produces dependency cycles between packages.

Coupling also comes in multiple dimensions,

  • Structure
  • Time
  • Location

being the most important ones.

There are many techniques to reduce coupling in specific situations. Knowing them, and as with abstractions, using them wisely, is a matter of experience.

Identifying cycles in non-trivial programs is impossible manually. For Java, Structure101 became an indispensable tool for me. I seem to be too dumb or lazy to understand a colored dependency matrix like IntelliJ or Sonar have to offer. You need the help of the machine here.


That's it. Really. The facets are numerous and need to be learned in each environment. Still, I believe it all comes down to a handful of principles. When I struggle with some code or design, I can always root the problem in the four laws. And every once in a while, it was my own fault :-)

Programming as technique is not magical, it is a craft. Sometimes we forget and make a big fuss about it ("Popanz und Gedöns" in German). The magic shines when stuff is done right, like in a painting of a master.