Wilhelm’s comment was the trigger. Testing is indeed hard.
I talk a bit about my RL job, without too many specifics. At a general level, it’s a service for about 130,000 users and deals in particular with smartphones – so one of those things where an issue/outage has a rather up-front-and-personal impact.
Massive project, great team, more stress than is reasonable, less time than we need. Sound familiar?
Typically, projects in my organization are full bake affairs. It means that the entirety of all functionality is present before going live. That makes projects take a supreme amount of time, since the last 10% is of function usually takes 90% of the overall time. One of the early decisions I took was to take an iterative approach. There are SDLC methodologies I could go into… but that can get boring quick. TLDR; the large project was split into 6 releases. The first 4 were time locked – e.g. what can we do in a month per, and the last 2 were more complicated and took a bit longer. When version 1.0 launched, and it didn’t have all the bits working, it took a lot of effort to re-train people that this was OK and planned. End result was feedback from 1.0 fed into 1.1, and so on for the duration of the project. Not scope creep… but refinement of the functions.
Getting those releases ready was a challenge. We had to build new testing environments and new processes. We had to find more people to do the work. Traditionally, all testing was internal, and it went from Dev — Staging — Production. That wasn’t all that effective. So we added a new test environment, parallel to Staging, specifically for client testing and modified UAT process to essentially have public alphas.
Alpha vs Beta
Everyone has their own opinion, fine. My is that an alpha is a release that is not feature complete, while a beta is feature complete. The first is to test for success, the second to test for failure. Historically, we’d do all this internally and when clients did do any testing, it was more of a sales show that actual testing.
We turned that around and asked each client to designate a representative for testing. Huge benefits, since these folks were generally testing things we never even thought of trying. It extended our testing window by about 20%, but the product bug rate dropped down by a ridiculous margin.
Our bug tracking system was further integrated into our release schedule. Where previously we would launch with acceptable high impact bugs, we moved that down to medium because we were able to detect the bugs much earlier in the process. That gave time to either fix it, or apply the mitigation/workaround to smooth out the bump.
Transparency
The other thing we did was publish the bug list. Well the ones that weren’t security related at least. It’s was a big file at the start, and quite a few bugs kept with us for multiple releases. Some are out of our control, as the software developer needs to do some overhauling.
The core benefit here was that people complained a whole lot less if they knew that we were aware and had some sort of plan to address it. It was a full time job to keep this list up to date and communications open with the clients. They’d get a report, put it on the list, and another team would do the assessment.
Delays
There were quite a few times where a large release had to be postponed due to a major bug found late in the cycle. In June we found a critical error that was patched by the vendor. Our internal tests were clean on that patch, but the client testing found some serious problems. That extra set of eyes found an issue that would have drowned our support team in tickets.
The thing is, by not drowning in tickets, we were able to resolve the issue faster as our focus was pre-emptive rather than re-active. Sure, we took it on the chin for a delay to a critical function but it’s always better to be late and working, than early and broken. And it’s even more important for overall sanity of the team.
Overall
Solid testing is how you get a game like Spider-Man or BotW rather than Alien: Colonial Marines. Every single one of us has seen a game that didn’t go through enough testing, and yet was released. In nearly all cases, that was an abject failure that cost that company dearly. For every FF14 that comes back to the front, there are a dozen or more Hellgates. People will continue to engage with a company if there’s a relationship, if there’s trust. Otherwise, they will find alternatives (if they even exist). And trust takes a while to build, and a fraction of time to lose.
Testing has proven to me to be one of the best ways to build and maintain that trust level, that being transparent and honest with the clients that that their issues are acknowledged and there’s a plan to address them. It takes an inordinate amount of time and skill to manage this type of relationship, but the results are worth every bit.