Origin of Names: Hidden Rules Abide by Coders

As parents, we would spend much effort in choosing a good name for the new born: a name that is both easy to pronounce or remember, and meaningful to pass on our best wishes.

As a software engineer, we also know the importance to give a good name to products or program elements, which are otherwise too abstract. However, it is never an easy task. A name is either too obscure to understand, or carrying a meaning very different from the intended. Even worse is to give a name that is simply wrong or misleading, making the outcome a joke. Sometimes, geeky programmers invent a name outside any dictionary, perhaps, just to show off.

However, programs are to be read, and products are to be used. Without a proper name, a sound one, a memorable one, it is doomed to fail. A good program is also a readable one, which attributes to a hidden convention amongst the project team members. They abide on these conventions, even hidden ones.

Are programs written in object-oriented languages (e.g., Java, C++, Objective C, Python) follow some hidden conventions, for naming their classes, objects, attributes, functions, and variables ? Dr. Simon Butler studied this subject for 60 open-source Java projects. First, he tokenised the identifiers into soft words, by carefully splitting them according to camel cases and numerical numbers. Then he analysed the parts of speech of these soft words, to formulate phrasal patterns of different types of identifiers. The positive correlation between these patterns and the structural object-oriented relationships among program elements shows the existence of those hidden rules and the compliance of these rules by the coders. Of course, the outliers who do not comply to the naming conventions led to worse readability in the programs.

As a result of this research, his Nominal tool can not only extract the hidden rules from source code, but also assist project managers check the enforcement of these rules.

Some with a bit programming experience would doubt that a name is simply a name, if it is bad let's refactor it. So what? The underlying rationale for a good naming convention is that people in large-scale software development must team up to exchange ideas. Trust and certainty is important, otherwise no one can follow each other if the names are too fluid. High-quality, high-value, and high complexity software is often referred to "technical debt", which resists frequent changes to allow the experience of engineers become the convention. So it is understandable that programmers tends to be a bit conservative in terms of names.

Ideally software adapt to external changes without worrying too much about the names. Without naming something, or postponing their naming, it is one of the virtues for early design activities. However, the evolutionary history seen from the software projects could show that renaming is a frequent phenomena happened more often at the earlier stages of projects, and once the product is released the names ought to become more stabilised. These observations are based on our empirical evidence.

See also:

Chinese translation
Email: y.yu@open.ac.uk Office: +44 (0) 1908 6 55562