Case of the Slow Matchmaking Routine

The most challenging bug I’ve ever fixed was a performance issue in a matchmaking routine. Matchmaking is the process of finding players to compete against each other in a video game. An excellent matchmaking algorithm doesn’t just stick players together randomly; it tries to make the game more fun by balancing power levels and preventing anyone from waiting too long for a match.

About six weeks before a game I was working on was scheduled to be feature complete, we discovered our routine couldn’t handle our load targets. The rate at which players were being removed from the matchmaking queue started dropping during load tests. Things got bad quickly once it fell below the rate at which we inserted them. Not only would this cause a bad user experience if we didn’t fix it, but it made it impossible for us to drive enough traffic to our game servers to test that they could handle the projected load. The company wasn’t going to release a game that could crash if it was successful, so we had to fix this issue, and we had to fix it quickly.

[Read More]

Regarding Test Coverage Targets

Unit tests are undeniably a good thing, but you only realize the full benefits of them when you have enough tests that you can make changes with confidence. If you can make a change, run your tests, and be comfortable enough to ship your changes, then you and your team can get work done much faster. More drastic changes to the shared code become feasible. Life gets better.

It makes sense then that teams want to ensure that code is sufficiently covered with tests. Nobody wants to count tests every time they review a PR, so tools are added that check it automatically. It’s then a small step to set a coverage target, and suddenly you have a machine checking every PR for tests. This all makes sense to me, and it was my first instinct too. I don’t recommend this approach any more.

[Read More]

My Architectural Report Template

As an architect I’ve been asked to answer a lot of hard questions. I used to waste time figuring out how to structure my answers, preventing me from getting into a good flow sooner. Now I have a simple template that is easy to use, easy to read, and saves me that wasted time up front.

This template works for simple reports that are only a couple of pages, but can easily be adjusted or expanded for more complicated or much larger documents.

[Read More]

Design by Dogma Antipattern

Always use a NoSQL database so your app can scale.

NoSQL databases can be more scalable, but schema-on-read has other drawbacks. NoSQL databases are much less capable of transactional changes. Relationships are difficult or impossible. Designing schemas to be efficient is much harder, and requires more up-front knowledge about your problem. NoSQL databases are sometimes the right tool for the job, but they are not the right tool for every job.

[Read More]
design 

Horizontal One-on-Ones and Talking Practice

When I was promoted to the role of architect it was a new role in the organization. The stakeholders I had to work with were not used to talking to an architect, and weren’t sure what I did or when I should be involved in a conversation. I started using recurring one-on-one meetings with each stakeholder separately. It worked great. It’s also made me a much better communicator.

One of the first and most important lessons I learned as an architect is that you can’t design a good architecture without a good understanding of its requirements. You can design a system in a vacuum, it’s also much easier to do it this way, but it’s far less likely to serve the organization. Gathering, validating, and documenting technical requirements is tough work, but an essential part of being an architect.

[Read More]

Case of the Appearing Users

A couple of years after solving The Case of The Disappearing Users, I was assigned another high profile bug where new users were being spontaneously created. They were being generated without a name or any profile information, but still filling up space in lists and appearing on schedules. A couple of other developers had tried fixing it but had no luck, so it was assigned to me.

I went through my usual bag of tricks: searched recent changes, searched for insert statements, tried to create empty users manually (and couldn’t). Nothing worked, and it was looking pretty hopeless.

[Read More]

Automating Non-Non-Downtime Upgrades in Kubernetes with ArgoCD

I recently worked on a project to move a complicated legacy application onto Kubernetes. It was quite an undertaking, but in the end we were successful. One of the biggest challenges was figuring out how to automate our legacy deployment process, one where the whole application has to be stopped completely for schema upgrades to run.

The normal “Kubernetes way” to upgrade an application is by changing the Deployment resource. With its default RollingUpdate strategy it will delete a pod with the old definition, start a pod with the new definition, wait for it to be healthy, then repeat continuously until the change is fully applied.

[Read More]

Importance of Alignment

Unless you work entirely alone, alignment is a big deal. When you are well aligned with your company’s goals you are a more valuable employee. When you are well aligned with your manager they can keep you on the right track and be an ally against obstacles. When you are well aligned with your peers you can keep each other focused on the most important work.

On the other hand, when you have poor alignment you can see all sorts of problems. When teams are misaligned they can undermine each other’s efforts. When you are misaligned with your manager you can find yourself being over managed or left out to dry when things get rough. When you are not aligned with the goals of your organization you miss opportunities to demonstrate your skills and advance your career.

[Read More]

Polyglot Unconference 2023

I recently had the pleasure of attending the 2023 Polyglot Unconference in Vancouver, put on by the Polyglot Software Association. I’ve been attending these for years. It is my favourite local conference.

An unconference, sometimes called an open spaces conference, is a participant-driven event where attendees choose the topics of discussion and provide the content themselves. They are meant to be open and inviting, and build interpersonal relationships. This year was no exception.

[Read More]

Sustainable Errors

Making a program work for the happy path is not always easy, but given enough time I believe pretty much anyone could do it. When a professional takes on the task however they will make it work for more than just the happy path, and do it with code that is easy to debug, and easy for others to understand and change. Since so much of what we end up dealing with are exceptional flows, we need a concise way to deal with them. Fortunately we have the aptly named exception pattern.

[Read More]