It’s important to remember that solutions make little sense without problems and that you should avoid applying solutions in a vacuum.
Applying solutions within a vacuum – for example, reading about VPNs and then deciding to implement one for your team without having a strong driving impetus – has a few downsides. Skipping the problem statement makes it difficult to measure whether the solution works or how to iterate on it. It also runs the risk of decreasing team happiness when solutions are not tethered to reality.
We avoid applying them unless needed in an effort to reduce bureaucracy and process, so that developers can concentrate on developing, designers can concentrate on designing, product owners can concentrate on prioritization, and so on.
To round this out, we’ll present some solutions that we apply to problems, along with some of the problems that they attack. Our choice of the word “attack” here is meaningful; these solutions, despite their name, do not solve the problems. Rather, they are part of a long-running process of dealing with the problems. They might solve them entirely, they might need refinement, or they might not work at all.
Acceptance criteria is one way to attack these problems:
- Unrelated results. The ticket says one thing and the dev does another. If you have another person accepting the stories, this will lead to lost time as the dev and QA go back and forth on solutions. Without QA, this leads to an app that you don’t recognize.
- The banana ticket. The developer knows how to implement the solution but doesn’t know when to stop, leading to an infinite refactoring of the entire codebase under the guise of finishing this one ticket.
To implement acceptance criteria, when writing tickets and stories for the team, provide a detailed description of what the solution might look like – a description of when the story is finished and the ticket can be accepted by the quality assurance team. “As an unconfirmed user, I cannot message anyone,” is a quick example, but they can get bigger and more descriptive.
Retrospectives are one way to attack these problems:
- Unresolved conflict. The team is unable to communicate effectively during conflict. Anger and resentment grow instead, buried and festering inside, manifesting as passive-aggressive code review comments, poor collaboration, low morale, people quitting, or an explosion of anger.
- Fights. Instead of passive-aggressive, low-level frustration, the team could express constant anger. Code review ends in tears.
Scheduled, frequent, recurring retrospectives are a time for the team to reflect on their happiness and internal relationship. These happen rain or shine; it helps to practice communication in good times so that it becomes a normal reflex in times of need. Some teams pair it with drinks at the end of a workday; others do it mid-day as a normal part of the workday.
Standup is one way to attack these problems:
- Isolation. People feel lonely working on remote teams or siloed projects.
- Othering. People on different teams have inhuman expectations of another team. This is expressed as increasing demands via faceless platforms like chat and ticket trackers, constant rejection of solutions, or warring teams.
A quick check-in once a day to catch everyone up on small details: what issue or project people are working on, whether they are blocked, perhaps something interesting that they learned, and general announcements. They are done standing in an effort to keep them short.
A ticket tracker is one way to attack these problems:
- Duplicated work. Multiple people open code reviews only to find that another review exists with that exact feature implemented. This kind of simple miscommunication can be exacerbated by large teams, microservices, distributed teams, and other communication challenges.
- Hurry up and wait. The marketing department waits until the feature is shipped, and then hurries to advertise it (meanwhile they sat around waiting).
- Surprise changes. The support team first learns that the UI has changed when complaints roll in about the redesign; the CEO hears about the removal of her favorite feature during a Q&A session at a conference.
Ticket trackers are common, though often using different vocabulary. Trello uses cards; Trajectory uses stories; Pivotal Tracker uses bugs, chores, and stories. Jira does all of that plus provides visibility into metrics – some of which are projections, and others that are correct.
Story format is one way to attack these problems:
- Mysterious business. The developer will happily implement a feature, lacking the understanding of how it fits into the product or how it might be used. Long term this leads to a disconnect between the code and the product – the domain-specific wording and language used throughout the app bears no relation to the reality that it must model – causing frustration among the users and confusion when onboarding new developers.
- Unfollow-through. The developer implements the letter of the ticket, but not the spirit, leading to situations where the feature is done but nothing links to it; the JSON API functions but sends useless data; the user can receive the password reset email but Gmail marks it as spam.
- Inflexibility. When the dev runs into complications, she pushes through instead of re-evaluating for time sensitivity. This one solution is the only one, and no compromises are entertained, regardless of how long this takes and how it affects the user or the company.
The story format phrases tasks in terms that the end user cares about with an explanation for why the user might want the task done. Similarly, the jobs story format puts the user’s context first and the motivation right in the middle. This is all in contrast to the traditional task format, which focuses only on what the developer must change in the code, with no explanation or motivation.
TDD is one way to attack these problems:
- Blind fixes. The bug is fixed…or is it? No one is quite sure, but the new commit sure has a great description of how it could have fixed the bug.
- Runtime whimsy. Return values go unchecked, from
nilto missing files to failed credit card payments, leading to errors at runtime.
- Fear of deployment. The development workflow is running efficiently until it comes time to actually merge the branch. Hesitation, followed by asking for review after review, followed by begging others for reviews because no one wants the responsibility of saying that it will work in production.
Test-driven design (TDD) uses programmatic tests to drive the design and architecture of the codebase. An incidental side effect is that the major code paths are tested, including error flows. Running the test suite exercises all parts of the application, finding regressions in paths where bugs have been found and fixed.
Refactoring is one way to attack these problems:
- Unworkable feature. Adding the new feature requires its own separate app or a fragile connection through the database. What could be a simple button on a web page that performs a common activity takes days, weeks, or months to implement.
- Hidden bug. You’ve traced the crash to one method, but that method was written by a developer from two generations of coworkers ago, is 127 lines long, and the commit message was, “it works.” The twenty code paths, including liberal use of
return, obscure the source of the error.
Refactoring is the process of shuffling code around without adding any features or fixing any bugs. It is often the first step to implementing a new feature or bug fix, carving a more clear path through the system, as part of the red-green-refactor workflow.
Feature flags is one way to attack these problems:
- Market timing. The feature is implemented but the rest of the company still isn’t ready for it. The support team needs to be trained on it, the promotional announcement needs to be sent out, or the CEO needs to be convinced that it’s a good idea.
- Questionable code. An isolated chunk of code – a new file system, for example – is ready to be evaluated by willing and able participants, but is not ready for public consumption until all the initial bugs have been shaken out.
Feature flags refer to hiding parts of the app behind a toggle, only shown to some specific users or only enabled by an admin. These feature flags typically differ from A/B testing in that they’re less about measurement and more about hiding.
Code review is one way to attack these problems:
- Siloed development. Developers work in isolation on specific categories of projects – one person on payment, another on API, a third on advertisement targeting – but on the same codebase. Features change around them and cruft grows without any clear communication channel between devs.
- Poor code quality. The developers learn from blog posts and web search results instead of from each other, furthering their isolation. Coding styles vary, and the same solution exists multiple times in the same repo.
Reviewing code is the process of reading a diff: comparing a new commit with what exists in the system. Often, there is a focus on maintainability, consistency, or knowledge transfer. Since it typically works on a diff, there are fewer considerations for big picture harmony.
A consistent Git workflow is one way to attack these problems:
- Unexplainable code. You find a strange line of code, the test explains nothing, the commit message only says “initial commit,” the ticket tracker was replaced twice since the project started, as was the project manager. Why is this solution the right one and why does the mysterious test enforce it?
- Bus factor one. The developer works alone, deploys a JAR, and leaves no comments. Then he or she quits and the new dev is onboarded. I hope you enjoyed this short horror story.
Git provides enough plumbing to hang yourself. Maybe that’s not the expression. Regardless, there’s no one way to use Git, and multiple right ways. From branch names to merge strategies to commit message content, it’s possible to have a unified version control process.
A deploy process is one way to attack these problems:
- Expensive downtime. Each second of downtime costs more in lost sales than each second of development time costs in salary.
- External failures. The deployment depends on a set of external services providing DNS, caching and content delivery, uptime monitoring, error reporting, and so on, each of which has their own failure modes.
- Regulations. Strict compliance with the law requires that very few people have access to the database or the production servers.
Going from development to production takes a few steps, which means it can be scripted. The runnable script can live near a human-readable script, separately describing who can deploy, what steps to take when it fails, what to do about downtime, and how to announce it. Combined, the program and documentation around it make up the deployment process.
Scaling is one way to attack these problems:
- Concurrent users. The service is immediately more popular and needs to handle twice as many users as before. They don’t necessarily need to use the full service, but at least need to have enough working to get their job done.
- Concurrent processing. The algorithm can be subdivided into smaller, independent problems, but each problem would take its own computer to solve.
Scaling horizontally presents as spinning up more servers to handle the load; this is in comparison with vertical scaling, where the CPU is sped up or the RAM increased. The new servers come up quickly (“instantly”), run the same software, and can be reduced when needed.
This is just a tiny selection of the solutions that we have seen to real problems over the years. Some are for fancy ideas – feature flags and story format are not common to every team – and some are sacred cows, like TDD and refactoring.