I am a cofounder of RethinkDB — an open-source distributed database designed to help developers and operations teams work with unstructured data to build real-time applications.
Over the course of building RethinkDB we learned many software engineering lessons. Most of these lessons feel like rediscovered platitudes, but we still managed to mess them up. Considering how many teams are struggling with shipping software, I think these lessons are worth repeating.
Sooner or later you’ll discover you’ve hired engineers that have very different philosophies on how software should be built. Say, a C programmer and a C++ programmer. The former will want the codebase to look like the Linux kernel. The latter will want it to look like the boost MPL library. What’s a well-meaning engineering manager to do?
I’ve tried all possible ways of addressing the ensuing social issues (including doing nothing), and found that there is only one approach that works. Always, always, always fight to cultivate and enforce shared culture. It can be unpleasant early on, but it’s the only way I know of to build a team that’s both happy and productive.
The absolute best way to instill shared culture is to code review every line of code before it ever goes into mainline. (Why every line of code? Because code reviews don’t stick if you do them selectively. Take my word for it — some rules need to be absolute.)
Here’s how we do it. By default, the reviewer is always right, and the author must fix every defect (we ask the reviewers to be tough, but reasonable). If the author really disagrees with the reviewer, they can appeal to a tech lead. The tech lead’s decision is final, and cannot be appealed. In case of emergencies (e.g. a broken build), the standard code review process can be too slow, so we also allow quick reviews over the shoulder.
You can find a different way to do it, but we’ve found this to be a pretty good system.
Code review is like exercise. Painful at the beginning, and any given session might feel like a waste of time. But over the lifetime of a project, it makes all the difference. Always code review every single commit. No ifs, no buts, no exceptions.
Ability to build great software and ability to build great products are two completely orthogonal skills, and aptitude for the former does not imply aptitude for the latter. It’s best to learn early that software companies aren’t about building great software. They’re about building great products. You have to start with the product, and let great software be a means to an end.
Make sure every engineer on the team respects this principle. Whenever possible, have people that understand products run the show. Beware of people that care about a particular programming language, technology, or an obscure technical sticking point more than about making your users happy. They can derail your whole team and firing them is often your best option.
Linear increases in complexity cause superlinear increases in amount of work to be done. Be careless about complexity, and before you know it your development process will crawl to a halt.
Complexity increases the surface area of your project. In software engineering, large surface area is death. Fight, fight, fight complexity with everything you’ve got. If you can avoid adding new functionality, fight like hell to keep it out. If you lose that fight, see if you can get 80% of the functionality with 20% of the work. Usually, you can. Fight like hell to do it that way. If you lose this fight, either you’ve hit the one case in a thousand where complexity is warranted, or, more likely, you should have fought harder.
Less is more. Avoid grand designs. Keep the codebase as small as possible.
Pick the simplest, most concrete possible implementation that solves the customer’s problem. Avoid big abstractions — functions are better than lenses 99% of the time. Use simpler language features whenever possible — an if statement is better than a template specialization. Great abstractions tend to feel very concrete because they make complex tasks easy and simple to understand. That’s why functions and pipes are better abstractions than monads and arrows. Avoid grand beautiful designs — they’re never as grand and as beautiful as you’ve envisioned them.
Be careful about swinging too far to the other extreme. Once you commit your project to an architecture, it can be extremely difficult to change. If your abstractions aren’t flexible enough to solve the customer’s problems, your competition will be right at your heels.
Picking the right abstractions requires a great deal of care and experience. Be thoughtful about this problem — if technology is a differentiating factor in your product, this can mean the difference between life and death.
I’ve observed the following two aspects of software development so many times, I’ve learned to accept them as axioms:
1. Humans are terrible at time estimation for anything longer than 2-4 days of work.
2. Lack of deadlines is a sure road to complex software with bad, overengineered abstractions.
Despite smelling like a fad, sprints are a surprisingly good solution to the seemingly impossible scheduling problem. Break up all work into no more than 2-4 day chunks (usually represented as issues in a tracker). Assign these issues to people, and group them into sprints of one to two weeks. Then work like hell together to get everything in a sprint done on time.
It’s strange, but with the right people it works really, really well. (Some people can’t work in this environment — in that case your best option is to kindly ask them to switch employers).
Not all projects can be broken up in a way that allows fitting them into a single sprint. Some projects are just too big. You have to treat these projects differently.
Firstly, fight as hard as you can to not build big projects. Break them up into smaller ones, find a different approach, don’t build the feature in the first place, do everything you have to do to keep them out.
If you’ve done all you can and still must build a big project, build the absolute most minimal version that solves the customer’s problem. Do all you can to decrease the scope. Make sure there is a single engineer in charge of the project, and put structure around it. Do a peer review of the initial design. Then set up deliverable milestones, and do peer reviews for each milestone. Once the project is feature complete in the shortest possible amount of time, assign improvements as part of the standard sprint schedule.
Keep the number of concurrent big projects to absolute minimum. One big project per team of about eight to ten engineers is a good rule of thumb.
Software rewrites and perpetual refactoring are a special case of big projects, and are so prevalent that they deserve special attention. Fight like hell not to allow big refactors. Always, always, always keep refactoring within the confines of a sprint.
If you absolutely must do a big rewrite, be triple-vigilant about it. Peer review the initial idea, peer review milestones, be ruthless about cutting complexity and scope. A big rewrite will take at least three times what you’ve estimated (if you’re lucky).
Always keep a refactoring stack of at most depth one. If someone is refactoring a part of the codebase (within the confines of a sprint, or heavens forbid as a large project), make sure they don’t run off refactoring other components. Always schedule refactoring as you would any other change.
There are hundreds of books written about different testing methodologies (unit tests, integration tests, whitebox, blackbox, etc.), but we’ve found that there is a much simpler way to categorize tests: short ones and long ones.
Integrate any test that runs in less than a few seconds (including unit and integration tests) into your development process. Use a tool like Travis CI, and run the short test suite after every commit. Never merge code that breaks the short test suite into mainline — be religious about it, and make it a part of the code review process. Don’t write automated tests that take more time to maintain than the code itself, but do write automated tests.
Any test that takes longer than a minute to run is a long test. Integrate long tests into your release process. Since you can’t run long tests after every commit, they lag behind development and tend to bitrot. Maintaining long tests requires a lot of dedication and planning, and is often impractical until you have full-time QA and release management teams.
Don’t abandon all automated testing because you don’t yet have the organizational support structure to manage long tests or because you wrote tests early on that required too much maintenance overhead. Even a small test suite that integrates into the development process makes an enormous difference in quality.
As the Klingons say it, “the truth must be won on a battlefield.” Sooner or later, some of your team members will strongly disagree with some of these ideas.
Debate everything vigorously, but once the decision is made, make sure everyone is on board, regardless of their personal reservations. That means pestering people to follow the rules until the process becomes self-perpetuating, confronting people that undermine decisions they disagree with, and, in pathological cases, firing talented engineers who can’t buy into the overall engineering culture. (They may very well end up working in a very different, but equally productive culture.)
It can be very unpleasant, but it must be done. Remember — your first responsibility is to the team, not to any given individual contributor.