Bug Hunt


A couple days ago I released a patch for Atom Zombie Smasher that fixed a rather nasty bug. Tracking down and resolving this bug was a fairly long ordeal, so I thought I’d share a bit on how I handle QA as a one-man team.

First of all, a bit of history. When I was working at Pandemic, we had a sizeable QA team.  As new features and levels trickled in, QA would comb over them and try their damnedest to break the game in every obscure way possible.  One of my tasks as a designer was to make the tutorial levels, so I tended to receive extra helpings of “how is that even possible?” special-case bugs.

Something particularly effective was when QA was integrated into our DLC development.  Basically, QA began QA’ing starting from day one of DLC development, and were seated amongst the designers & artists. This resulted in extremely fluid communication, not just from immediate face-to-face talking, but by sheer osmosis of everyone being seated in the same aisle.  These simple changes helped make this my favorite development experience at Pandemic, in which we produced a high-quality package ahead of schedule while keeping the bug count absurdly low.

(And QA had the best usernames during multiplayer playtests. I’m looking at you, LindsayRohan and LorenaHobbit.)

I’ve since started Blendo Games, where I’m now a one-man team.  I wear the designer hat, programmer hat, artist hat - whatever hat you got, I’ve worn it.  From my previous hobby projects, I’ve learned doing QA by yourself is A) silly, and B) doomed to fail.  Once a game was released “into the wild,” it amazed me how quickly people found ways to twist and break everything.

And that got me thinking: the general public has an uncanny knack for finding the rarest of bugs, so why not leverage that?  Here’s how I implemented it:

It’s not a new idea, but it was new to me!  I gave it a trial run in Flotilla.  I was leery as to whether people would actually take the time to use the Report-a-bug, but I was taken aback: I got a lot of helpful feedback from a lot of people.

Example!  Here’s a Flotilla Report-a-bug message from a helpful stranger:

Subject: ACCEPTED CASINO OFFER BUG

Message: HI, I ACCEPTED THE CASINO’S OFFER TO WORK FOR THEM AFTER I GAMBLED AWAY MY FLEET.  UNFORTUNATELY, AFTER DOING THAT I WASN’T ABLE TO GO TO ANY OTHER PLANETS.  PRETTY SURE THIS IS A BUG BUT I’LL LET YOU DETERMINE THAT.  :)  NEEDLESS TO SAY, I’LL BE FIGHTING MY WAY OUT NEXT TIME.

It has worked well.  I get bug and crash reports, and within a few hours I’m usually able to release a patch that fixes them.  I’ve since continued to use this user-driven system for Air Forte and Atom Zombie Smasher, adding incremental improvements here and there (for one, it now uses normal normal sentence case, so it no longer looks like PEOPLE ARE ALWAYS SHOUTING AT ME).

Which takes me back to the beginning of all this: that one particularly nasty bug in Atom Zombie Smasher.  Since AZS was first released, I’ve been getting crash reports relating to the Zed Bait, a weapon used in the game.  The crash message’s call stack gave me a general location of where to look, but was vague as to specifics.  It’s one thing to know that a bug is happening, but I had no steps as to how to reproduce this bug on my machine.

So, I made a robot.  By that, I mean I wrote a very basic component that allowed the game to autonomously play itself.  Because my robot was never meant for public consumption, it’s plainly named “StressTester.”  The point of StressTester is to bang on one game feature over and over again for hours on end. Here’s the StressTester routine for finding the Zed Bait crash:

I activated StressTester and went to bed.  Next morning, I awoke and discovered StressTester had gained self-awareness. It asked, “do you want to play a game of thermonuclear war?”

I misspoke, that didn’t happen.  I awoke next morning and found the game had successfully crashed.  I now knew exactly what line of code the game was crashing on and in a couple minutes I fixed the problem.  Done and done.

Yes, StressTester was successful in that it led me to the crash solution.  But, it fails in that I have no idea what specific steps StressTester took to get to that crash.  StressTester has no concept of what it’s doing, it just blindly taps buttons. As a result, I still have no idea how to reproduce that bug.

Nope, you need warm human beings for that, specifically folks trained for QA.  In the meantime I’ll make due with what I got!