John Maizels goes sleuthing.
This is one for everyone in the broadcasting industry, and especially for managers and career technologists.
Turns out we techies have a major weakness that comes from our very strength, and the last 24 hours has reminded me how important it is to consider the impossible! Has this happened to you??
Technorama is nearing the AGM and yesterday we used Electionbuddy to send member voting invitations for the elections. It’s our first time using that system. Many of our members use gmail addresses. Nothing unusual about that; half the world uses Google’s no-charge platforms. So it was really disturbing when we checked the Electionbuddy bounce list, only to discover that a large number of those @gmail.com emails – maybe all – had been rejected when we sent the voting papers.
Because elections are sort of important, we immediately went on a hunt to find out what problem we’d created, whether our records were wrong, or whether Electionbuddy had broken. We tried to eliminate all the obvious reasons. If you’re used to solving business problems, I’ll bet you’ve already come up with a few explanations for what went wrong. But unless you’ve picked up other news, I’ll also bet you haven’t come up with the right answer.
One of the Four Drivers of Technologists is that we’re driven to fix things. Working out what’s wrong is an artform; a mix of experience, observation, analysis, and intuition.
In the words of Sherlock Holmes:
When you have eliminated the impossible, whatever remains, however improbable, must be the truth.
That’s problem determination. And in the last 24 hours several hundreds of millions – maybe over a billion – people had a problem determination opportunity: “was this gmail bounce due to something I did?”
I’ll bet my house that very, very few got the analysis right.
Nobody routinely tests the impossible first – we just eliminate anything implausible as a matter of course. Can’t happen.
In this case, the implausible was a global Google failure that occurred at the same time as we triggered our election start… very unfortunately for us, because 24 hours earlier or later there wasn’t a problem. Google claim the outage was 45 minutes, which is a lifetime for a global system. Our emails bounced 10 hours or more after that reported period, and many other users have reported the same. The outage might well have been way more than the reported period. Who knew that a corporate entity might seek to reduce the claimed impact of a customer problem?
When a large number of the election emails bounced back, it was obvious that all except one was at gmail.com. One solitary yahoo.com address in a sea of gmails. That should have been a bright red flag, despite which it never occurred to anyone on the team that the error might have been with gmail’s system, and we didn’t test that. Because it was the first time we’d used Electionbuddy, you can throw learning-curve into the mix of barriers-to-success. So we phoned round, we sent test emails (all of which worked), we contacted Electionbuddy (a few learnings there), and tried to work out what to do next. We probably burned ten or more person-hours to no useful effect.
It was only this morning that reports of Google’s outage surfaced, and that’s knowledge that changes the game. Now comes the process of fixing, much more manual than we’d like, but at least we know it wasn’t anything that we did.
Most importantly, the outage is a reminder that when a transmitter fails or the studio console buttons mysteriously do the wrong thing, or listeners report interference, or your playlist starts running backwards: before you do ANYTHING else, check the impossible. Check for volcanic eruption, fall in the price of gold, implosion of a galaxy far, far away, or an alien invasion. None of those things is plausible, but they might not be impossible, and an implausibility might just explain what’s happened.
Why is this important?
As the broadcast industry, we spend way too little time focused on technologist training, and in particular on problem determination as an artform and as a critical skill. That’s a huge failing for a sector that is fundamentally a technology business. Content might be king, but unless you have a very good megaphone you’re reliant on technology to get the content out, and that technology is going to break. The question is not “if?”; it’s “when?”, and “with how much pain and business impact?”
We need our technologists to have access to problem determination education, and we need to ensure they have the best tools and management support when the chips are down.
So the takeaway: once you’ve eliminated the impossible, then go on to fix the real problem. For Technorama, normal service with gmail and elections will resume shortly.
What’s your “oh, dear, missed that one” war story? What lessons can we learn?
Post your comments below or email [email protected]
About the Author
John Maizels has spent the last 50 years dabbling in IT and broadcast engineering; radio then television.
These days he is a technology evangelist who recognises Ohm as a god, helps TV students to understand the difference between SDI and a sprocket, and believes that every engineer should be forced to feel the pain that they inflict on their customers. He still enjoys doing the occasional VO and is passionate about radio and training.
He is a Fellow of the SMPTE, recipient of the CBAA’s Michael Law Award, and volunteer President of Technorama Inc, so at least the money is where the mouth is.
Subscribe to the radioinfo podcast on these platforms: Acast, Apple iTunes Podcasts, Podtail, Spotify, Google Podcasts, TuneIn, or wherever you get your podcasts.