Automated Baseball?

Something big happened in a baseball game last night that is causing a buzz in the sports world today.  I think it’s related to a buzz in the world of software testing.

Armando Galarraga, a pitcher for the Detroit Tigers, was on the verge of pitching a “perfect game” — a game not only in which no batter of the opposing team gets a hit (a “no-hitter”), but in which no batter even makes it to first base. That means pitcher Galarraga would have had to outlast 27 batters trying to smack the ball into play. That’s some great pitching on his part along with some exceptional defensive support from his teammates.

Perfect games are rare. In the 134-year history of Major League Baseball, there have only been 20 perfect games.  Two of them, amazingly, happened last month, which has never happened in one season.

And last night at 6 pm Pacific Standard Time, Armando Galarraga was set to be the 21st.

In the 9th and last inning, Galarraga faced one last batter: Jason Donald. Galarraga delivered a pitch and Donald connected.  The ball was covered by Tiger first baseman Miguel Cabrera who was way off the base to field the ball, so pitcher Galarraga ran to cover the base that Donald was running for. Cabrera threw the ball to Galarraga in time to beat Donald by a full step before hitting  first base in mid-stride.

But to everyone’s astonishment, first base umpire Jim Joyce called Donald safe!   Being safe means Donald had made it to first base before the ball reached Galarraga’s glove, spoiling his perfect game.

As the crowd booed, Tiger manager Jim Leyland came out and argued with Joyce, but the call stood.  The crowd then watched the instant replay which showed the Indians batter Donald out by a full step. Donald had not beaten the throw. He should have been out. Jim Joyce got the call wrong and everybody saw it.

But in baseball, even though umpire judgment calls can be argued, those calls rarely get reversed unless by another umpire who saw the play.  It was hopeless.  Furthermore, it was time to move on to the next batter, which Galarraga did — and subsequently got him out to end the game.

It didn’t matter that the Tigers won the game.  The “perfect game”, a game in which Galarraga technically allowed no batters to reach first base — was spoiled even though the objective truth (according to the camera footage) showed that Galarraga did not allow Donald to safely reach first base.

Unlike other sports, the camera has no say in how baseball games are decided.  In baseball, it’s the umpires that decide.  It’s purely human judgment in the moment. Other sports allow appeals to officials if the camera shows a different story than what their ruling indicated. Not baseball.  At least, not *yet*.  After last night, that might change because this particular game had a bearing on some historical statistics that make baseball much more interesting for a lot of people to follow.

That judgment call by umpire Jim Joyce is now the topic of sports radio call-in shows, newspaper sports sections, and online blogs and articles all across the country today – how he got the call wrong, what the camera showed, if baseball should allow instant replay to influence the game, even how the call was handled by the pitcher, the umpire, the manager, and soon, the Commissioner of Baseball, who oversees everything in the sport.

How is this important to software testing?

There is a balance in baseball between what the camera sees and what the umpire sees.  In testing, there is a balance between what the tester can test and what the computer can test.

In software, testers use their judgment.  Machines have no judgment other than what they are programmed to do.  They are programmed to execute and record, to render and calculate.

As it happened, about an hour before that game, I was talking with Michael Bolton and Ben Simo online about the term “exploratory test automation.”  I had retweeted Elisabeth Hendrickson‘s post about a class she was hosting at Agilistry (called “Exploring Automated Exploratory Testing“).

Bolton, Simo, and I were discussing that title, trying to see if we could come up with something more accurate, because Elisabeth’s title seemed to be a contradiction-in-terms. How do you automate exploration when exploration is inherently human judgment and skill as we react to what we learn in the moment and automation is not?  We were pretty sure we knew what she meant by the class, but how best to describe the interaction between machine and human?

It’s important to know that me, Michael, Ben, and my brother are people who believe in the power of language to convey ideas and meaning.  We argue over precision and semantics because they communicate more than just words.  We believe it is important to debate these kinds of things, openly, publicly, because it propels and provokes conversation about meaningful ideas that are meant to help all testers everywhere win more credibility and respect, much in the same way arguing baseball calls can evolve the sport.

So we traded ideas of how to describe the computer’s role in exploration.  Since it was a public discussion on Twitter, people following that thread could chime in:

Michael Bolton’s idea was to call it “Tool-Supported Exploratory Testing” (proving to be a humorous, dyslexic TSET)

James wanted to flip the words and call it “Exploratory Automated Testing”

Oliver Erlewein liked “ETX” (and so did I) but doesn’t yet know what the X could be — it’s just cool.

Zeger van Hese suggested “Hybrid Exploratory Testing”

I offered the playful “Bionic Testing” after the Six-Million-Dollar Man.

Alan Page said it could simply be called “exploratory testing” and leave it at that because no matter whether your exploration was computer-assisted, it’s still exploration.  James liked that and so did I.

But isn’t there a term or a phrase or a word that can more accurately and precisely describe the computer’s role in assisting testing?

Is it automation when you use a tool to help reveal a bug?

Is it automation when a machine executes a programmed test procedure?

Is it automation when you use Task Manager to see the list of processes in memory?

Is it automation when you execute Cucumber or Fitnesse (keyword-driven) tests?

What do you call it when you click a button on a test harness and it clicks on the objects on the screen for you and delivers a report at the end of the script?

If it’s all “automation”, doesn’t that imply that it needs no human intervention?

I think we can find a better term.

Everyone can agree that computers help exploration.  Call them “probe droids” or “bots” or “tools” — they inform a human about things that are notoriously hard for humans to know on our own.  They do things that are hard or slow or tedious or expensive or impossible for a human to do.

But we also know that it’s also impossible for software to test itself in all the ways we can test it — just like it’s impossible for a camera to replace umpires at baseball games.  Computers and humans enhance each other.

Today in baseball, there’s a lot of energy and debate because of that game last night.  Galarraga’s near-perfect game may lead to a major change in using replay in baseball games.  The Commissioner of Baseball may even overturn Joyce’s ruling, meaning that the official record books would reflect a perfect game last night in Detroit.

Today in software testing, there’s energy and debate around the word “automation”, especially with more classes like Elisabeth’s and the more we talk about Test-Driven Design and tools on projects.

While baseball debates whether to use instant replay in helping to decide close plays , I’ll bet you if they decide to use it, they will not call it “automated baseball.”  We testers *know* we use technology to help us with testing, I just think we can do better than “automated testing”.


11 Responses to “Automated Baseball?”

  1. Lanette Says:

    I call this the cyborg tester idea. Part woman, part machine. The part of automation that sucks the most so far, to me, is the validation. It is really hard to look for everything. If when we assume we make an “a-s-s” out of “u” and “me”, in order to make test automation we have a donkey farm so large even Lisa Crispin would think it was out of control. So, I like to leave the judgment on the “and not else unexpected happened that I didn’t predict” part up to me when possible.

    There is ALWAYS a problem with automated suites with intermittent failures. I mean with every single solitary suite I’ve ever worked on. The computer gets into this state, you go to isolate and it doesn’t repro. That’s because something else happened that we can’t reproduce easily and we don’t know what it is. Was it an update warning trying to fire? Was it a memory state? Was it a race condition with other tests? We try to narrow it down, but those bugs don’t get fixed often from my findings. Often they are a bug in the test automation itself. I wish we always had “instant replay” available.

    In baseball they could use some machine support. It is pure folly to not go with the more accurate information available. Also, if there is a tool that will HELP us, it is pure folly not to use it. However, it is pure folly to not have a balanced test strategy and just use automation! Who is watching the watchers? We need an answer, and the computer has it all under control is a bad answer.

    I read another blog that frustrated me. I couldn’t even comment on it. made me so unhappy. Why is it ONLY progress if it uses all machines? Who could have so little faith in mankind? Progress is being made constantly with the most creative computer we know of, the human mind. We are using minds together in new ways. To only see progress as what we can automate truly is a testing world that I never want to see. That is like baseball with no Umpires at all. Just one guy reviewing the cameras in a booth. Maybe we should just stay home and play a video game instead of going to the field. Easier to control the parameters for a fair game.

    Before I rant, I think we can do better than “automated testing” too, not just by improving our code, but by using our minds.

  2. Lanette Says:

    Whoops, the cloud terminology didn’t upset me, paste error. It was the blog, not the cloud glossary. 🙂

  3. Joe Strazzere Says:

    Technically, Major League Baseball already uses replay – just not for all plays. For example, the umpires can use replay for home run calls.

    The dissonance and fan reaction comes about mostly because the fan experience (both at home and in the park) is aided immensely by instant replay, yet the umpires aren’t allowed to take advantage of that same technology.

    It’s almost as if your boss (or some other observer) were allowed to run all the automated tests she chose, but you as the tester were forced to rely solely on manual testing. And then she would be angry for your not spotting all the bugs she was seeing!

  4. jbtestpilot Says:

    UPDATE: Adam Goucher pointed out something I did not know…

    Instant replay *is* used in baseball (in limited scope) to determine close calls, just not in the context above.

    It is only in these three contexts:

    Home run calls (fair or foul)
    Whether the ball actually left the playing field
    Whether the ball was subject to spectator interference

    For more, see

  5. Aaron Pailthorp Says:

    There are certainly many sorts of automation, although all automation tools tend to get dropped into an imaginary box that is far far more simple than reality (by people other than the authors and users of automation). Automation is costly and difficult. None-the-less, careful application, getting past the why-am-I-doing-this-again tasks can yield huge results, particularly with exploration filling in gaps. So I’m voting for TSET because I think it emphasizes the exploration with the automation not being the point.

    People who take a firm stand with respect to automation are always fun. When push comes to shove, I have never seen a release delayed for lack of automation (most of the time the automation available does not provide needed assurance). I have seen releases delayed for lack of testing, but not lack of automation. Bugs or other issues will also delay releases, and how they got found is not part of why they may or may not cause delay.

    A more common experience is that the automation is still pitching fits, everyone knows that it is broken, so now we’re exploring to see if we can agree that enough diligence has been exercised to allow thoughtful engineers to endorse a release – risk issues have been identified and mitigated.

    I had some fun today with some automation I hope to use in a test someday to validate something in an LDAP server (user directory stuff). I could see full well that I was plugging in all the right stuff to get my search results and be able to complete my validation using a GUI tool to examine the LDAP server, but my automation just couldn’t seem to get the same results. Once I’d gotten past some simple exceptions it was throwing, it still came back with a null set every time. It turned out that the code I’d been given and directed to call wasn’t complete, it was lacking a population loop for the results set. I completed that code too, learned some stuff along the way, and got what I wanted.

    So none of that had much to do with the risk issues I need to focus on for our release. But, but but but, it did inform my opinion of risks, gave me inspiration to think of maybe some more fringe stuff, and kept me busy while I was leaving our devs alone to try and get past a mountain of backlog.

    So in the end (right before release) I’ll use every bit of automation I can muster together to off-load drudgery, and will use my hands and eyes to glue the rest together. And hopefully be in a position to offer good information about risks as we progress. Am I using automated testing? You bet, yessir! Am I an exploratory tester? I think so, who isn’t?

  6. Aaron Pailthorp Says:

    Oh yeah, and w.r.t. TR noting that baseball could use some machine support – I suggest there are some very specific requirements that lead us to the way calls are made (and blown) like the heart breaker that killed a perfect game.

    There is no time clock in baseball (one of the really great things about the game) and as such flow of the game is really really important. So umps using machine support, at least in the context of current technology and instant video replay, would screw up the game big time. The cost is the risk of a bad call.

    We mitigate by using the best and well trained umps we can, but the risk remains, and last nights non-perfect game is a cost of games not being too long.

    Baseball is actually a nearly perfect game. Including the DH.

    The inappropriate use of machine assistance in software engineering can screw things up just as surely as instant replays for plays at the bag would screw up baseball.

  7. Jeremy Says:

    This is a great discussion; thanks, Jon. My thoughts:

    Even with the most state-of-the-art instant reply technology, there will always be some calls in sports that ultimately require ojbective human judgment to settle (just watch a season of the NFL and you’ll see what I mean). Similarly, I think we can say that even with the most state-of-the-art automation tools, there will always be some bugs in software testing that ultimately require objective human judgement to find or explain.

    And maybe it’s worth dwelling for a moment on the word “objective.” Umpires and referees are paid to make judgement without a stake in the outcome of the game other than its quality (a high number of uncalled fouls and broken rules lessens the quality of a game). Similarly, software testers are paid to make judgement and provide information without a stake in the outcome of the build other than its quality (a high number of undiscovered bugs and usability issues lessens the quality of a product). If a ref has a stake in one team winning or one player succeeding, he will fail his responsibility for objective judgement; if a tester has a stake in early deployment or a high bug rate, she will likewise fail her responsibility for objective judgement.

  8. Jeremy Says:

    More discussion about this on Future Tense:

    What I find interesting is that the sportswriter thinks we could replace umpires completely with technology, while the actual maker of the technology thinks that humans will always be needed.

  9. Automation Lab09 Says:

    Automation tool is not for minors (Even to the expert manual testers);It requires creative mind to write the scripts and to make it reliable,robust..this is where most of testers sucks and starts hating automation…

  10. Albert Gareev Says:

    Hi Jon,

    My input to that discussion on Twitter was:
    Does automatic transmission make driving of a car automated ?

    And the answer is No.

    We could talk about automated operation of a car. But operating a car is not driving.

    In software testing, we use tools. Creation of tools that assist in testing, is Test Automation. E.g. creation of load testing scripts, or functional GUI scripts for scripted testing approach.
    Scripted testing approach with execution of scripts by computer is Automated [Scripted] Testing. A formal, scripted approach sucks, and it’s about any activity whether it’s medicine, management, or software testing.

    Now, Test Automation is cool. It’s like growing an IR-seeing eye for yourself, or connecting 10 extra hands. Expanding capabilities, not replacing, is the way.

    Thank you,
    Albert Gareev

    PS. Thanks for participation in testing challenge!

  11. Jonathan Kohl Says:

    I meld both together:
    Computer Assisted Testing:
    Tool-Supported Testing:

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: