Take the Red Pill and Push Errors to the Left

#devops #agile #productivity
DevOps Concepts

Matrix Red Pill vs Blue Pill.jpeg
Photo by ANIRUDH on Unsplash

How many times has the software development team said an enhancement was done but there were still many errors found in the Quality Assurance (QA) testing? How many times did the new code break existing functionality? This can be a frustrating cycle that will burnout engineers and cause them to leave. There is a solution that will create a better working environment to preserve your best engineers and also enhance performance in the long run, but it takes leadership commitment to temporarily slow down software delivery (or hire additional help) to build it. If your deadlines are so urgent that you feel that you need to keep throwing bodies at the problem, then take the blue pill and stop reading. If you want to open your eyes to a whole new world of possibility, take the red pill and see where it takes you.

“This is your last chance. After this, there is no turning back. You take the blue pill — the story ends, you wake up in your bed and believe whatever you want to believe. You take the red pill — you stay in Wonderland, and I show you how deep the rabbit hole goes. Remember: all I’m offering is the truth. Nothing more.” Morpheus

Note

Much of the following is adapted from DevOps Foundations: Core Concepts and Fundamentals on Pluralsight.

Push Errors to the Left with a Poka-Yoke

Poka what? What kind of rabbit hole is this? Do not fear, my friend … remember you took the red pill. A Poka-Yoke is a Japanese term used in Lean Management that refers to anything that helps with Error Avoidance. Have you ever had a problem with a tool, and someone told you, “Ya, everyone makes that mistake. You just need to do it this way.” All of those people took the blue pill, a world where you need to ask about “the right way” to use a tool, and that is just the way it is. We call it “user error”. Now you may think, “What’s the big deal? I learned how to do it, and now it works every time.” Ah, but what if you and the many other users never had to waste time learning the right way? Some examples of tools that prevent errors are a multiple choice drop down menu or a GFCI electrical outlet breaks the circuit if there is an overload.

The Design of Everyday Things, by Donald A. Norman, has many examples of tool designs that intuitively lead you to operate them correctly, and tool designs that lead you to make the wrong choice. The book cover has a fun bad design example.

For example, the design of a door should indicate how it works without any need for a sign that says “Push” or “Pull”. You may laugh at the following Far Side cartoon, but how many times have you pulled on a door handle only to realize you were supposed to push it? The architect should have put a door plate on the “push” side, and then you would have made the correct decision the first time. It many such cases the design is at fault, not the users.

The Far Side, by Gary Larson

When we stay in the Matrix we have someone submit a form and then tell them what errors they made on the form. When we live in the Real World we have an error check during data input. Some examples of error avoidance in software development:

Avoid Errors by Eliminating Waste

Remove non-value added steps to avoid errors and perform better. Attack waste in the software development life cycle (SDLC):

“If you aim at speed, you may get speed, but you’ll get waste. If you aim at the elimination of waste, you’ll eliminate waste and get speed.” Chris Behrens

The number one mistake of a star engineer is optimizing a thing that shouldn’t exist.” Elon Musk

Books on Work Productivity

Reduce Time and Steps with Automated Testing

Sometimes we think it will waste time to thoroughly test our code.

“Perfect is the enemy of good enough.” Engineering Adage

And yet we know there is a risk that insufficient software testing may result in a bug that will cause rework. But we got away with it before, and we hope we can get away with it again. Then we can deliver more instead of confirm what we already “know”: the code works properly. But here is where we begin to digest the red pill: we can have sufficient testing without any additional work. How is this possible? Follow me further down the rabbit hole.

Build a system where the code review catches the bugs. Write the test first and then write the code: it will inform how to write the code. No code will be error free. There is a debate about writing code first or tests first. It could waste time to write tests: during coding you may change your approach and the tests are no longer applicable so you need to re-write them. But you could also waste time coding: if you write the tests later, you may realize there are corner cases that you did not consider that require significant changes to the code. Your choice depends on the situation. You may have a failure of imagination: you may not think of a scenario so you cannot build the associated test case. You cannot anticipate all bugs, but you can learn from the past.

You may eliminate steps in the process because of automation. For example, you may remove some approval stages if there is sufficient confidence in the automated testing or if it is always approved anyway.

“The best part is no part. The best process is no process. It weighs nothing. Costs nothing. Can’t go wrong.” Elon Musk

Some leaders may complain that the software release cycles already take too long and adding automatic regression testing would add development time. Yes, it would take more time at first, but automating regression testing would speed up testing later. The SWE manager may be focussed on optimizing software development time. But the mistake is that optimizing the parts will optimize the whole. Google’s best practice is for Site Reliability Engineers (SREs) to spend 50% of the time automating the development.

Take the Red Pill and Live in the Real World

Sometimes we fail to weigh the impact on the customer or tester when they catch a bug. It damages the relationship, reduces trust, shifts the burden of work to others, passes the buck. Nobody likes to be dumped on. Quality issues are deprioritized to ship on time. We incorrectly think the best approach is deliver quickly and “hope for the best”. But when we push bugs to the left the overall cost of the bugs are less, the performance is better, and your best engineers will stay. Take the red pill.

What are examples where you found your group making good steps toward pushing errors to the left? What waste do you see in your SDLC? What resistance did you have to automate?