The Art of Debugging - Summer Swallow Press

Preface

This was an article I originally wrote, intended for a public blog aimed at software engineers and IT professionals. However, now that I have my own business and blog site, it felt appropriate to share it here, even though it doesn’t really have much to do with writing or, for that matter, technology in writing.

Introduction

Debugging is a crucial aspect of software development, but it can be a difficult and time-consuming endeavor for programmers of all skill levels. Despite its challenges, debugging is an unavoidable part of the process. I, personally, have found myself to be quite proficient in debugging, however, this can also be a curse as it often leads to being tasked with debugging not just in my professional life, but also in my personal life, such as when my elderly parents ask for my help troubleshooting problems without me being able to see the issue firsthand.

While searching the internet for debugging tips, you may come across a list of steps such as isolating the source of the bug, identifying the cause, determining a solution, and testing the fix. These steps may seem straightforward, but in reality, the first two steps can be quite challenging. The true difficulty of debugging lies in identifying the root cause of the bug. Once the cause is identified, the solution often becomes clear, and the problem becomes more of an engineering challenge than a mystery.
In this blog, we will focus on reproducible bugs, which are bugs that can be consistently reproduced. We will also discuss transient bugs, which are bugs that do not occur consistently. The first part of the blog will be dedicated to understanding reproducible bugs and how to tackle them effectively.

In this blog, we will not detail the mechanics of debugging such as print statements, error logs, and debuggers as there are plenty of resources available online that already cover those topics. Instead, we will focus on the strategy of debugging, specifically, how to approach the task of debugging without making assumptions, how to apply the scientific method, and how to use the “divide and conquer” strategy. We will provide tips on how to develop a debugging mindset, how to use the available tools effectively, and how to debug efficiently which will help you become a better, more productive programmer.

Assume Nothing

One of the most common pitfalls in debugging is making assumptions. When you encounter a bug, it’s easy to jump to conclusions about where it may be located based on initial observations. However, focusing all of your efforts on one specific area without first testing your hypothesis can lead to frustration and wasted time if the bug turns out to be somewhere else. To avoid this, it’s crucial to test your assumptions and make sure you are on the right track before investing too much time and energy into a particular area. In this blog, we’ll explore strategies to help you stay on track and avoid making assumptions during the debugging process.

Technical support often starts with basic questions like “Did you turn the power on?” or “Is it plugged in?” because they want to avoid making assumptions. It’s not that they think you’re stupid; it’s a systematic approach to eliminate basic issues before diving into more complex ones. This approach helps ensure that they are starting the debugging process from a neutral standpoint and are not influenced by any preconceived notions.

Here’s a personal story that illustrates the importance of avoiding assumptions in debugging. I once called my cable provider because my internet wasn’t working, and I wasn’t sure if I had reconfigured my devices. When I spoke with technical support, I knew they had a script, so I preemptively told them that the modem was plugged in, turned on, and that the Ethernet cable was connected. As I went through the checklist, I took a moment to inspect my equipment and realized that the Ethernet cable wasn’t actually fully plugged in. As a trained electrical engineer, I was embarrassed to admit such a simple mistake.

The point of this story is that you should never assume anything. If I had taken the time to check the “stupid stuff” first, I could have saved myself some embarrassment. In the debugging process, it’s important to eliminate basic issues before moving on to more complex ones, and this applies to all areas of life, not just technical support.

Apply the Scientific Method

When we say “apply the scientific method”, what do we mean in the context of debugging? Simply put, treat the bug as if you were conducting a scientific experiment, formulating a hypothesis and devising a test to validate it. In programming, this could be as straightforward as inserting a print statement to see the program’s execution flow after an if statement, or setting a breakpoint in a debugger to inspect the value of variables. This approach to debugging is not meant to be a strict, formal process requiring documentation of experiment setup and predictions, but rather a mindset to guide your thinking.

I’d like to share another personal story that highlights the importance of applying the scientific method in debugging. I once visited my mother, who told me her garbage disposal was not working. She assumed it was stuck, so I took a look. I found that it was plugged into a power outlet through an unknown device. To isolate the problem, I unplugged the garbage disposal and tried another device in the same outlet. When that device worked, I eliminated the outlet as the cause of the problem. As I continued with my experiment, I plugged the garbage disposal directly into the outlet, but it still did not work. This indicated that the unknown device could not be the cause of the problem, or at the very least, not the only issue. I concluded that the problem must be with the garbage disposal itself. I felt around the unit and found a circuit breaker button, which turned out to be the cause of the issue. By using this process of experimentation, the source of the problem was easily identified. The moral of the story is that when debugging, it’s important to think scientifically and eliminate possibilities through experimentation.

Divide and Conquer Strategy

When it comes to debugging, taking a divide and conquer approach can be an effective strategy. Start by testing the highest level units first, and then breaking down the problem into smaller and smaller components until you isolate the source of the issue. This process can be compared to a binary search, where you continuously divide the code into smaller sections until you pinpoint the specific line or lines of code that are causing the problem.

When you reach the function level, you can divide the function into two parts – the top half and the bottom half – and then continue with the experimentation process. If the line of code causing the issue is complex, consider breaking it down into simpler lines and testing those individually. By following a systematic and organized divide and conquer approach, you can quickly identify the source of the problem.

It’s important to keep in mind that sometimes, especially with early versions of code, a single bug may have multiple causes. In such cases, you can choose to tackle one cause at a time and fix it before moving on to the next, or you can work on multiple divide and conquer strategies simultaneously. Regardless of the approach, the key is to have a systematic and structured plan in place.

In my opinion, by utilizing the strategies mentioned, you should be able to efficiently identify and resolve a bug. Having access to the write-build-execute cycle will expedite this process, however, even without it, this approach can still be effective, although it may take a bit longer.

The Dreaded Non-Reproducible Bug

As a programmer, encountering a transient bug can be a frustrating and challenging experience. Unlike a persistent bug that can be consistently reproduced, a transient bug is elusive and difficult to pinpoint. Some transient bugs may occur frequently enough to be captured through experimentation, but others may only occur infrequently and elude detection. To make matters worse, attempting to diagnose the bug by adding diagnostic code can sometimes alter the behavior of the bug or cause it to disappear completely.

Unfortunately, there is no surefire solution for resolving transient bugs. However, in my experience, these bugs are typically caused by either a memory issue or a timing issue, also known as a race condition.

Memory Issue

Memory issues are often easier to address than timing issues. High-level languages like Python handle memory management automatically, making these types of issues less common. In contrast, low-level languages like C require manual memory management, which can lead to issues when a program exceeds its allocated memory block and reads into uninitialized memory space or another allocated memory block.

While memory issues in memory-managed languages like Python are rare, they can still occur as the language or associated libraries may be built on lower-level languages like C. Keeping libraries up to date can help prevent potential memory issues.

To mitigate these problems, many Integrated Development Environments (IDEs) and other development frameworks provide memory allocation tools. If a program encroaches on uninitialized memory, these tools often initialize the memory with recognizable hexadecimal code, such as BAAD F00D. This helps in detecting memory issues, as examining a corrupted variable with this code signals that uninitialized memory was accessed.

In addition, utilizing a good memory profiler can also help in detecting and resolving memory issues. By monitoring memory usage, the profiler can identify areas of the code that are causing problems and provide insights on how to resolve them.

Timing Issue

In modern programming, the use of multithreaded operations has become more widespread, and as a result, race conditions have emerged as a major source of temporary bugs. Unfortunately, there are no foolproof solutions to these issues, and the resolution process may involve completely restructuring your code. Timing issues tend to occur when two threads compete for the same resource, and the way the resource is accessed can vary based on the execution time of the threads. For example, in one scenario, Thread 1 might write to the resource and then read from it later, but Thread 2 might overwrite it before Thread 1 has a chance to read it. In another scenario, Thread 2 might write to the resource after Thread 1 has already read it.

There are two approaches to addressing this issue. The first approach is to use synchronization mechanisms, such as mutex locks, on the contested resource to prevent unexpected access. This approach ensures that only one thread at a time can access the resource, eliminating the possibility of race conditions. However, the challenge with this approach is that if the lock system becomes too complex, it can lead to deadlock, where the code that unlocks a mutex is blocked from operation because it is waiting on another lock. This results in a lock that can never be unlocked.

The second approach is to adopt an asynchronous programming model, such as using promises in JavaScript. This model allows multiple threads to execute simultaneously without waiting for each other, reducing the likelihood of race conditions. However, the disadvantage of this approach is that not all programming languages support asynchronous programming models, and the code may need to be ported to another language. Additionally, the asynchronous model requires a different mindset and programming style, which can be a significant challenge for developers who are used to the traditional synchronous model.

Conclusion

In conclusion, debugging is a challenging but solvable task for any programmer. By following basic troubleshooting principles, such as assuming nothing, approaching the problem scientifically, and using a divide and conquer strategy, persistent bugs can be quickly identified. When dealing with transient bugs, the solution often lies in addressing memory corruption or a race condition. The approaches discussed in this blog can help guide you in your debugging efforts, making the process more efficient and successful.