Properly Dealing with Software Bugs
Software bugs are handled by all levels of an engineering department in some shape or form. The goal of dealing with software bugs is not reaching zero bugs as that’s not realistic and causes unneeded stress on the engineering team. A robust process is the best way of dealing with software bugs to keep software bugs manageable without significantly affecting the software development.
What is a software bug?
A software bug is behavior in the software that’s different from the intended behavior. There isn’t a way to avoid software bugs as software is written by humans who are error prone. Small bugs can be typos in some text while showstopper bugs may cause the entire software to be unusable.
Filing Helpful Bugs Reports
When a software bug is found, it is reported to the engineering team via a bug report that contains information on what the bug is. The worst kind of bug reports are “this doesn’t work; fix it” where there’s not enough information to investigate the issue. A lot of time can be wasted to figure what scenario causes the bug to occur.
Great bugs reports should have:
- Steps to reproduce the issue: really important in order to pin point the area of the code where the bug occurs and to verify the that the fix works
- Logs: logs can help pinpoint the area of the code where the bug is
- Expected behavior vs observed behavior: the reporter may be expecting a certain behavior but the software functions as intended
- Which software version it was found in: a bug found in version X.X.X may have already been fixed in the latest version.
- Severity: how bad is the bug?
It’s not realistic to expect all these from customers as they may not be able to supply logs. At the very minimum, the steps to reproduce and the software version provided would greatly reduce the time spent to investigate the bug.
“Not a bug. Working as intended.”
Bug Triage — severity, prioritization, and risk analysis
How bugs are handled is very similar to an emergency room of a hospital where there are a limited number of doctors that can’t help everyone coming into the emergency room at once. There’s a triage system based on severity of the patient compared to other patients in the queue. As you can imagine, someone heavily bleeding will be placed in front portion of the queue compared to someone who is suffering from stomach aches.
Software companies borrow this system from the hospitals to deal with bugs as there is a never ending amount of bugs. Important aspects of a bug triage is to prioritize bugs based on severity which requires a risk analysis. A typical priority system is usually from 1 to 5 where it translates to highest, high, medium, low, and lowest. Severity varies based on the company but typical levels are minor, major, and critical. There are other terms that may be used such as “show stopper”, “blocker”, or even “SNAFU”. The severity is based on how bad the issue is with criteria that is important to the company.
Some example criteria to use:
- Lost of functionality
- Data loss
- Is there a workaround?
- Customer experience
- Likely chance of running into the issue
After understanding the severity and the fix, the risk analysis of the fix is another important to discuss. Similar to taking medicine to treat an issue, it is best to understand the potential side effects or risks involved.
Example of some risks analysis questions to ask are:
- How likely are there side effects?
- How much code is changed?
- How many code paths use this code?
- How much testing was done for the fix? Are unit tests written?
- Can the fix be pushed to the next release?
These questions probe at how risky the bug fix is. Risk increases higher with a larger amount of code changed. Even if the code is one function, it is more risky if there are many parts of the system using the function. Ideally written functions are small, single purpose, and have unit tests written for it. Making changes to a large multipurpose function is more risky than smaller functions. Testing reduces the risk of having side effects from the fix. Sometimes it may be better to push the bug fix to the beginning of a release than to rush a fix at the end of the release to have more time to test.
Fixing Bugs — Symptom vs Root Cause
When fixing a bug, it’s important to address the actual cause of the bug rather than fixing a side effect. A common phrase asked is “Is this a symptom or the root cause of the bug?” This phrase comes from “Root Cause Analysis” with a goal of fixing the cause the problem instead of fixing a side effect caused by the problem. There are a few methods to reach the root cause that can be read below.
Root Cause Analysis Explained: Definition, Examples, and Methods
There are a large number of techniques and strategies that we can use for root cause analysis, and this is by no means…
For example, the software is using a lot of memory is a symptom and treating the symptom with would be increasing the memory of the machine in order to run the software. Identifying the root issue by using the 5 Whys method:
- Why is the software using so much memory? Whenever it starts up, it needs to load all the data files.
- Why does it need to load all the data files? The data files contain the graphics to render the user interfaces.
- Why are the graphics taking up so much memory? All of the interfaces need to be ready for the user to use.
- Why does the all of the interfaces need to be ready? In order to reduce the load time in between different user views.
- Why does the load time need to be reduced between different views? The transition needs to be smooth when the user is switching views.
This particular analysis brings the conclusion to using higher amount of memory to reduce the load times. A possible fix would be to not load all the views during start up and to load them as the user progresses to views that can only be seen after certain views have been traverse. However, there may not be a bug fix to this issue because it’s a conscious trade being made.
Regardless of what software engineering position, everyone has worked with bugs. All levels of management positions deal with bug triage while individual contributors are creating or fixing bugs. Having a process that deals makes it less painful which starts with having good bug reports to investigate the issue. Followed by a triage process in order to prioritize the bugs what needs to be fixed at what time. Then fixing the root causes instead of symptoms to truly deal with the issue.