A growing number of programmers have observed that the object-oriented programming (OOP) dogma DRY (“don’t repeat yourself”) is sometimes actually harmful. A number of alternatives have been proposed, all of which involve thinking clearly and carefully. Think about the right level of abstraction at which shared implementation makes sense. Think about avoiding shared logic and ideas, rather than shared code. Think about bounded contexts, and avoid straying across them.
These are all great ideas, but they all require careful thought. Unfortunately, in many situations developers lack the ability (especially places that had massively duplicated code pre-DRY) or time (especially research settings) to think through these delicate questions.
Evolution offers a useful analogy to think through this problem in such settings. Genetic evolution enables adaptability through duplication. Quantitative traits are encoded across hundreds or thousands of parallel genetic variants. These polygenic traits are able to evolve much more rapidly than monogenic traits where a single genetic variant underlies all variation in a population. Sexual reproduction breaks up highly-correlated genetic variants (“linkage disequilibrium”) so that they can evolve independently to assist the fitness of a species. And the code for these traits is not only duplicated within a single individual’s genetic code: they are duplicated among all the individuals in a population.
Deep neural networks also evolve via a learning algorithm, such as stochastic gradient descent, and they utilize duplication to adapt their weights. In recent years, it has been discovered that wide “overparameterized networks”, with more activations than the number of samples, have better learning dynamics. This extra capacity is not strictly necessary to represent complex functions, but networks are more easily able to move their weights to such functions with this overcapacity.
In software development, when we tolerate duplicated code rather than force shared implementation, each copy can evolve in its natural direction, unencumbered by other use cases. Of course, this approach has a cost: code bloat. Fortunately, evolution offers us a solution: pruning. In biological evolution, natural selection filters out harmful genetic variants. And increasingly, pruning is used in deep learning to speed up inference of over-parameterized networks. Notably, pruning in both evolution and deep learning works very well with even simple strategies. It is often hard to find an accurate neural net, but once you have one, it’s easy to find hidden units that can be discarded.
This is a useful approach in R&D as well, both at an individual level and at a team level. When getting started on a new research project, it’s useful to copy-and-paste old code, recklessly changing it to meet one’s needs. As the project matures, one can readily identify duplicated code, and then start merging them. Similarly, it is hard for a development team to figure out and agree in advance on which code logic can rely on shared implementation. Yet, so long as all programmers have the discipline to “clean up after themselves” — to look out for old duplicated code and remove it — code quality can be maintained. Code deduplication (while it may not be fun) is actually intellectually easy, because it comes with the benefit of hindsight.