Software projects face risks in numerous shapes and forms. Some are obvious and highly visible, while others are less tangible and harder to evaluate. A typical example is the departure of the key project members (and therefore know-how), which is often a critical risk to the stability and progress of any software.
One way to assess a project’s resilience to this risk of losing crucial knowledge is the “bus factor”, a well-known metric for identifying the concentration of knowledge in the different areas of software development environments.
The bus factor is the number of team members whose simultaneous departure from a project (say, if they all get hit by a bus), would put the project at severe risk due to a lack of knowledgeable personnel.
What is the bus factor? According to Wikipedia , the concept is similar to the (much older) idea of “key person risk”, but also considers the irreplaceability of lost technical experts. Personnel must be both key and irreplaceable to contribute to the bus factor.
While the concept of the bus factor can seem rather morbid on the surface, it’s a powerful framework for businesses to consider actions that ensure resilience. Team members do not literally have to get "hit by a bus" for the "bus factor" to apply. Many different events can lead to a developer being suddenly prevented from contributing to a project - think job offers, parental leaves, personal emergencies, or simply a loss of interest.
A low bus factor puts the company at high risk since crucial information and/or knowledge are held (exclusively) by very few contributors. If one single team member holds crucial, isolated knowledge and skills, then the bus factor is 1. This is a situation that every company should try to avoid or at least mitigate. Consider an early startup with a bus factor of 1 that loses a key developer as the company grows; the remaining team may have to start over entirely or, even worse, see the whole company forced to pivot in a new direction.
Having a low bus factor is one of the most common findings among early-stage ventures and startups. When first venturing into a new project, it’s natural to divide responsibilities between founders and early employees. Redundancy and replication are not (and shouldn’t be) the main priorities of a newly founded startup. Why would you fiercely and costly protect something that is yet to be proven as “real”?
However, what is effective in the early stages of a company can pose significant risks once growing knowledge and learnings are not properly distributed and shared. Just as engineering efforts are made to avoid a single technical point of failure (e.g. using cloud platforms instead of own infrastructure), company managers should ensure that critical knowledge is well distributed among all members in the event that key contributors (both founders and employees) become unavailable.
It's not just about the code you own Open-source software is now powering a major part of the world’s economy. Making use of third-party open-source software has become the standard for technology companies. This is especially true for early-stage ventures, as they have limited resources and typically focus on their specific insights and innovative ideas to provide novel solutions for business problems. Or phrased differently: standing on the shoulders of known giants instead of reinventing the wheel. Most technology teams make use of third-party open-source libraries to speed up their development efforts.
So here’s the catch: many open-source projects tend to have a very low bus factor. This 2016 article examined 25 popular open-source projects at the time and discovered that ten of them had an (alarming) bus factor of 1. In order to ensure the sustainability of their software, companies need to carefully select the open-source software they want to depend on in order to avoid inheriting an unnecessarily low bus factor wherever possible. So functionality isn’t everything when evaluating options for third-party solutions.
How is it calculated? Despite the simplicity of the bus factor concept, calculating it can quickly turn into an error-prone and time-consuming process. The significant size and complexity of many projects can make it practically impossible to manually calculate the factor. Additionally, translating the bus factor definition into an estimation algorithm is clearly not trivial; however, most existing algorithms simply collect the distribution of information in the project from the Git version control system.
Various articles and papers have been released aiming toward a more concise calculation of the bus factor. An accepted scientific practice for calculating bus factor is processing the source code’s evolution history and analysing the contributions made to the source code. This method weighs a developer's knowledge in a bottom-up approach, that is, by summing up the knowledge points calculated for each line of code in the repository for each individual developer.
Example distribution of overall knowledge among top 5 developers for period of 12 months Different articles propose the calculation of the knowledge points in different ways. Below are some methods for knowledge calculation (ordered from basic measurements to more advanced):
Last change takes it all : the contributor who changed the line most recently is assigned 100% of the knowledge Multiple changes are equally considered : for each modification, a knowledge point is added to the relative contributor Non-consecutive changes : for each non-consecutive change by the same developer, one knowledge point is assigned to the contributor (for example, if contributor A modified the same line of code twice in a row, then it is regarded as only one change) Weighted non-consecutive changes : for each non-consecutive change by the same developer, add 1*n knowledge points, where n is the number of changes (for example, A->B->B->A: developer A created the line [+1]; developer B modified it next [+2]; then A modified it again [+3]. Knowledge of A = 4 [66%], Knowledge of B = 2 [33%]) The problem with the above methods is that while they consider the sequence of contributions, they do not take an important parameter into account: the time of contribution . Suppose developer A contributed 10 lines of code to the source code two years ago, while developer B also wrote another 10 lines of code just a week ago. In all methods mentioned above, both developers are credited with equal knowledge scores.
Example distribution of knowledge in top 5 active repositories (highest impact) among top 2 developers of each repositor However, if both developers are asked about their contributions, there’s a high probability that developer A cannot explain the code he/she wrote 2 years ago as precisely and in the current context as developer B (whose contribution was very recent). Hence, practically, developer B should receive a higher knowledge score. The more recently a developer has contributed to the code (regardless of any other activity on that part of the code from other developers), the more knowledgeable that person’s contribution should be considered.
Since there was no tool available to include time of contribution as a factor, we as TechMiners decided to develop our own knowledge calculation tool that considers time as another factor when evaluating knowledge distribution. Using our proprietary algorithm, we are able to calculate the bus factor more accurately than other available methods, no matter the size of the development project.