When I became the tech lead, I inherited a product code base that was over a decade old, which also meant I inherited over a decade worth of technical debt and “spaghetti” code. I oversaw and helped on numerous projects to pay off tech debt but one project that stood out among the others. This project removed close to 8,000 lines of duplicated code and made it much more efficient to write code in this particular area.
TL;DR: Clean code, refactoring, and paying off technical debt are essential for software development. Don’t accumulate a decade of tech debt!
It all started with a bug report about a strange behavior that happened on one OS (operating system) platform but wasn’t present on another OS platform. Intuitively, the C++ code should have been designed to function the same way regardless of what OS it ran on. While diving deeper into the problem with a senior engineer, we discovered there was a bug fix that had already fixed the problem. However, the bug fix wasn’t copied over to the other platform. While I was explaining that these source files are duplicated many times and contained code that is similar to each other, my senior engineer suggested that it should be refactored to remove the duplication. A light bulb lit up at that moment. My senior engineer and I started planning out how to refactor this spaghetti code into cleaner code.
Problem: There are seven sets of different product source files that were copied and pasted multiple times for five OS platform. Each of the source files contained thousands of lines of code. From my experience, my guess is about 70% was duplicated source code, and the remaining 30% were either OS dependent code or code that forked. Over the years, some developers refactored parts of the code or added more code in the source files.
Solution: Merge the five copies of source files and separate the differences into an OS platform specific files. Repeat seven times for each source file.
- Single source of truth. Ensuring there is only one copy of the code
- Cleaner code is easier to maintain
- Consistent behavior across the OS platforms
The solution is simple to say out loud but quite challenging to implement. Three-way merges can already be an intense challenge in forked source files. Imagine doing a five-way merge on different source files. There isn’t even a merge tool for doing that.
Design and Strategy
The first challenge was how to design something that separates the common code from the platform-specific code. Several plans were considered such as using platform-specific ‘define’, a platform-specific switch statement, and polymorphism. Two main design problems had to be solved:
OS platform specific headers (i.e., Windows.h).
A lot of complicated platform-specific code written that can’t be removed.
A very simple example is getting the time of day, which may be implemented differently by each OS. On Mac and Linux, it’s a function called “gettimeofday”. On Windows, the function could be “getlocaltime” or “getsystemtime”. A function using this OS specific function has to account for the different function names and different variable types.
The ‘define’ approach would have made the code consistent but dreadful to read. The ‘switch statement’ approach would have added code bloat.
Using the examples above, the polymorphism approach would solve the different function problem by creating a “gettime” wrapper function which will call the correct OS dependent function.
However, polymorphism alone couldn’t solve the entire problem. A new header file (i.e., WinStuff.h) was created to contain each of the operating system’s specific header files. Also, the cpp file (i.e., WinStuff.cpp) could hold the implemented child classes.
Another design decision was the base class would contain a ‘virtual’ function that had an assertion and child class would have the ‘override’ function. An assertion was placed in the virtual function to quickly show an engineer that they forgot to implement the needed child function.
The common code would only exist in one source file and each platform would include the correct OS platform header. Windows would #include WinStuff.h, Mac would #include MacStuff.h, etc.
Implementation / Refactoring Code
The five operating systems were Android, two Windows variants (x86 and arm), Mac, and Linux. The strategy was first to merge Android and Linux because Android OS is based on Linux OS, which meant a lot of the product source code between the two were going to be the same. All of the OS-specific code were separated with the strategy described above. The source code between the two was nearly identical except for some header file and function name differences so this eliminated one of the platforms quite quickly. The Android-Linux merged file was then merged with the Mac file. Interesting merge conflicts encountered during the merge was:
- Certain variables were trivially named differently on different platforms
- One file contained additional functions that were not present in another file
- Some functions were unnecessarily overloaded and not used
- Some functions had different names but had the same content (i.e., foo versus bar)
- The function calls made in a different sequence in different versions of the same function
- Portions of code within a function did not match up to the in different versions of the same function
- More bug fixes made in one source file but not in the others
- Refactored code which contained improvements (i.e., C++11)
- Functions that do the same thing but implemented differently
- Comments missing or were added later by an engineer to clarify functionality
The minor problems such as different names were easy to resolve. The more difficult ones such as implementation differences had to involve looking at the code history, reading design docs, and understanding what the intended behavior should be. It was quite a slow and time-consuming effort to resolve those differences.
Then an unexpected problem that encountered, which was the two Windows variants source files turned out to be entirely different from each other. They contained code to deal with the CPU architectural differences, which the original design did not account. The overall design and strategy didn’t change, but additional helper functions were created to account for CPU architectural differences.
Finally, Android-Linux-Mac merged file was then merged with Windows-Variants merged file to create the final common code source file. The same set of problems listed above occurred but there weren’t any significant new problems found.
Reaping the Benefits
The immediate benefit was the product’s behavior across all the OS platforms were consistent. All the scattered bug fixes were now applied correctly and all the code that was supposed to be the same were now actually the same. The strange behaviors went away which lowered the support cases coming in and also meant that engineers didn’t have to spend time in chasing down weird problems anymore. The benefit that wasn’t externally visible was it was a lot easier to maintain and understand the code.
One unexpected payout was all future implementation of new features within that segment of the code base used the new design. It unintendedly cut down the time needed to implement one feature that spans across different OS platforms. Code only needed to be written once. All the OS platforms gained that new functionality at a massive discount.
This technical debt project was done in small pieces during a developer’s spare time, which resulted in taking a few years to complete. The result was removing close to 8,000 lines of duplicate code. The code became more stable and easier to maintain. The effort needed to add new functionality added to this segment of the code base for different OS platforms was significantly reduced. It was a complex and worthwhile project to complete.