Friday, November 18, 2016

FAILURE ANALYSIS


License To Ill
Beastie Boys album cover (1986)

    "I wish to point out that the very physical development of the insect conditions it to be an essentially stupid and unlearning individual, cast in a mold which cannot be modified to any great extent. I also wish to show how these physiological conditions make it into a cheap mass-produced article, of no more individual value than a paper pie plate to be thrown away after it is once used. On the other hand, I wish to show that the human individual, capable of vast learning and study, which may occupy almost half of his life, is physically equipped, as the ant is not, for this capacity."
      — "The Human Use of Human Beings: Cybernetics and Society" (1950) by Norbert Wiener

I have some remarks about failure, and how we relate to and learn from it. Some are abstract, philosophical, general observations, others are concrete, personal, specific life lessons, so that's how I've sorted them out.

ABSTRACT, PHILOSOPHICAL, GENERAL OBSERVATIONS

Norbert Wiener, quoted above, was a mathematical genius who coined the word cybernetics for the science of control and communication. He explained that a cybernetic system used measurements of the amount of error to determine how to take corrective action, making error a vital part of the process.


diagram of centrifugal "flyball" governor
( wiki.eanswers.com/en/Centrifugal_governor )

A classic example is the centrifugal "flyball" governor on early steam engines. The twin balls on the spinning shift are flung farther from the axis as the steam engine speeds up, and the difference (or error) between the desired and actual speed triggers a lever to turn the throttle up or down. The measured error is vital to the mechanism's operation.

Or, as English poet William Blake said,

    "To be an error and to be cast out is a part of God's design."

      — "Complete Poetry and Prose of William Blake" (1850)

The School of Hard Knocks

What Norbert Wiener was arguing in the quote above, from his book written for the general reader, is that humans should be saved for tasks which require a certain amount of trial-and-error learning, since we are so good at that, and give the tasks requiring rote learning and canned responses to the machines to do for us. In other words, as automation increases it will be our job to make mistakes and learn from them, and what we have learned will passed on to machines that rarely make mistakes or learn from them.

Another 20th century technologist who talked about the importance of mistakes was R. Buckminster "Bucky" Fuller. In his 1968 book for the general reader, "Operating Manual For Spaceship Earth,"

Fuller teases with the title, and then reveals that there is no operating manual, and he certainly didn't write one.

    "Now there is one outstandingly important fact regarding Spaceship Earth, and that is that no instruction book came with it. I think it's very significant that there is no instruction book for successfully operating our ship. In view of the infinite attention to all other details displayed by our ship, it must be taken as deliberate and purposeful that an instruction book was omitted. Lack of instruction has forced us to find that there are two kinds of berries - red berries that will kill us and red berries that will nourish us. And we had to find out ways of telling which - was - which red berry before we ate it or otherwise we would die. So we were forced, because of a lack of an instruction book, to use our intellect, which is our supreme faculty, to devise scientific experimental procedures and to interpret effectively the significance of the experimental findings."

Fuller's domes have proved to be the most stable structures built by humans, and he discovered their secrets by trial and error with little formal education.

Many other speakers and authors have praised trial and error learning. Self-help guru and life coach Tony Robbins has told the tale of how he decided he wanted to get good at delivering seminars, and so he scheduled as many a day as he could, figuring that the more he did it the better he'd get. In "The E-Myth Revisited: Why Most Small Businesses Don't Work and What to Do About It" (2004) by Michael E. Gerber,

the author tells a similar tale of wanting to get good at management consulting, and so taking as many jobs as he could get (at a reduced rate) until he gained proficiency. I imagine in each case that the early customers didn't get such a good deal, but ultimately they both became excellent at what they do.

The Second System Effect

It is not just the "trial" which essential, but the "error" as well. There is ample evidence that we sometimes learn the wrong lessons from too mush success.

One of the first books ever written about software project management, "The Mythical Man-Month: Essays on Software Engineering" (1975) by Frederick P. Brooks Jr.,

introduces the concept of the "second system effect," described this way by WIkipedia:

    ...the tendency of small, elegant, and successful systems to be plagued with feature creep due to inflated expectations.

    The phrase was first used by Fred Brooks... [to describe] the jump from a set of simple operating systems on the IBM 700/7000 series to OS/360 on the 360 series.

Software developers and their clients have been fighting this trend ever since. The problem is more than "feature creep," it is the complacency, arrogance, and even — dare I say it — hubris of engineers and their customers and managers after a project that is perhaps too successful, paving the way for overreach and downfall. It is described in "The Soul of a New Machine" (1981) by Tracy Kidder,

in which Data General, a Massachusetts-based minicomputer company, was launching an engineering project for a new flagship computer. CEO Edson de Castro assembled his "A" team and sent them to a new R&D facility in North Carolina, an eleven hour drive away. Apparently he expected them to fail, or at least get way behind schedule on a bloated, incompatible system (same thing), because he assembled a secret "B" team in a makeshift facility near the Massachusetts headquarters and had them build what he wanted all along, a 32-bit computer compatible with their 16-bit line. It was a job few engineers wanted, but this team didn't suffer from "second system syndrome" and delivered on time with what became the next flagship product. (Coincidentally I worked at DG while all this was happening, but knew nothing about it until I read Kidder's book.)

Another famous failed "second system" was the "Pink" project A.K.A. "Taligent," a joint project of Apple and IBM intended to create a "Microsoft-killer" operating system for new PCs based on the PowerPC chip from Motorola. It did not end well.

Other examples abound. I personally witnessed the rise and fall of a "second system" known as the Stellar GS-1000, over-designed by Prime and Apollo veterans, and ultimately setting the second place record at the time for the most Venture Capital money lost by a startup ($200 million).

The story is again repeated by products ranging from the Blackberry phone by RIM to Windows Vista from Microsoft.

The moral is that "experience" needs to include a mix of successes and failures to be valuable.

The Shame of Failure, the Failure of Shame


cartoon found on a cubicle wall at NASA's Kennedy Space Center
a few months after the space shuttle Challenger explosion

    "Victory has 100 fathers and defeat is an orphan."

      — Count Galeazzo Ciano (1942)

We have a big problem in our civilization, especially where the American "western loner" mythos abounds: as a side-effect of glorifying winning and winners, we heap shame upon losing and losers. As a result people often rewrite their histories to edit out failures. It has been long known among mathematicians that most breakthroughs are made by a mathematician intuitively realizing a theorem must be true ands then working backwards to the premises that will provide a proof. But then the paper they write goes in the opposite direction, from premises to theorem, obscuring the discovery process.

In my business travels I happened upon an engineering forensics company in Silicon Valley called Failure Analysis Associates. They analyzed crashes and other accidents to learn from them. (Ironically, their major founder, Stanford professor Alan Tetelman, was killed on September 25, 1978, at the age of forty-four in the PSA Flight 182 air crash over San Diego.)

I was impressed by the quality of their work, but I guessed that their name (which provides the name of this blog entry) would come to be a hindrance. I expected it would be harder to get people to write checks to a "failure" company. Sure enough, the later changed their name to the innocuous "Exponent."

(I realize I'm taking a risk by putting the word failure in this blog entry's title, but I'm also encouraging folks like you to be courageous in your own failure analysis.)

I for one have learned not to trumpet failures, since they spook people so much, but I treasure the lessons from them. As business educator Marshall Thurber once said, "You're either on the winning team or the learning team." It's important to move beyond whatever shame comes with a failure, and to embrace the learning experience, even if the rest of the world doesn't see it that way.

Who Can Draw the Learning Curve?


learning curve illustration from Wikimedia

Google defines "learning curve" as

    the rate of a person's progress in gaining experience or new skills.

    "the latest software packages have a steep learning curve"

When I hear people use the term "steep learning curve" they are typically describing something hard to learn, like climbing a steep hill is hard. But looking at the above graph, the blue curve is the steep one, in which proficiency increased faster, so learning was easier. I have concluded that most people haven't given the concept much thought, and so have no idea what they are saying when they use the term.

I feel fortunate to have studied with a professor in college, Dr. Gregory Bateson, who gave the concept a whole lot of thought. He definitely practiced the old alleged Einstein aphorism:

    "Everything should be made as simple as possible, but no simpler."

Though Cambridge-educated in Queen's English, Latin and Greek, he had a focused way of thinking and speaking that managed to bring clarity even to Santa Cruz, California college students. He helped me understand that the so-called "learning curve" was really a relationship between curves, like the red and blue curves above. We would expect the blue curve to happen after the red curve if it was the same learner, since people get better at learning. Bateson called this Deutero-Learning, or Learning II, learning to learn. He first mentioned it in 1942, in a short paper called "Social Planning and the Concept of Deutero-Learning," reprinted in "Steps To an Ecology of Mind: Collected Essays in Anthropology, Psychiatry, Evolution and Epistemology" (1972).

    "It is a commonplace that the experimental subject — whether animal or man, becomes a better subject after repeated experiments. He not only learns to salivate at the appropriate moments, or to recite the appropriate nonsense syllables; he also, in some way, learns to learn. He not only solves the problems set him by the experimenter, where each solving is a piece of simple learning; but, more than this, he becomes more and more skilled in the solving of problems."

He referenced Clark L. Hull's 1940 publication, "Mathematico-Deductive Theory of Rote Learning: a Study in Scientific Methodology."

Bateson revisited this material in 1964 in the paper "The Logical Categories of Learning and Communication," reprinted in "Steps To an Ecology of Mind" (1972), op. cit. He explained how Hull was a pioneer in attempting to quantify psychology and plotted real learning curves along the way.

    "In human rote learning Hull carried out very careful quantitative studies which revealed this phenomenon, and constructed a mathematical model which would simulate or explain the curves of Learning I which he recorded. He also observed a second-order phenomenon which we may call "learning to rote learn" and published the curves for this phenomenon in the Appendix to his book. These curves were separated from the main body of the book because , as he states, his mathematical model (of Rote Learning I) did not cover this aspect of the data."

Note that Hull was attempting to define and validate a simple mathematical model of learning, and the Learning II data didn't fit his model, so he bumped it to an appendix!

Bateson revisited this material because he was studying the causes of schizophrenia and other mental illnesses, and he hit upon the idea that mental illness might be triggered by Learning II that is in error, i.e. that learns the wrong lessons about how to learn. From this I learned that now and then you have to examine your learning-to-learn strategies, review them, and possibly change them; this is a type of process Bateson called Learning III.

Dead Men Tell No Tales

A major challenge of learning from mistakes is that some of them are fatal. Ecologist Ramon Margalef pointed out that predation is usually a lot more educational for the predator than the prey. Business consultant Stewart Brand cautions against studying shipwreck survivors too much; the stories of the dead would probably be a lot more informative, if they were available. This is part of the genius of the book "The Perfect Storm: A True Story of Men Against the Sea" (1997) by Sebastian Junger.

The author tells tales he can't possibly know are true, by making educated guesses and using expert opinions and available facts to paint plausible scenarios.

The effect we are talking about here is called survivorship bias, and there is a very illuminating blog entry about its discovery, in David McRaney's blog "You Are Not So Smart."

He describes the work of a World War II group of American mathematicians called the Applied Mathematics Panel, and specifically a statistician named Abraham Wald.

    How, the Army Air Force asked, could they improve the odds of a bomber making it home? Military engineers explained to the statistician that they already knew the allied bombers needed more armor, but the ground crews couldn't just cover the planes like tanks, not if they wanted them to take off. The operational commanders asked for help figuring out the best places to add what little protection they could...

    The military looked at the bombers that had returned from enemy territory. They recorded where those planes had taken the most damage. Over and over again, they saw that the bullet holes tended to accumulate along the wings, around the tail gunner, and down the center of the body. Wings. Body. Tail gunner. Considering this information, where would you put the extra armor? Naturally, the commanders wanted to put the thicker protection where they could clearly see the most damage, where the holes clustered. But Wald said no, that would be precisely the wrong decision. Putting the armor there wouldn't improve their chances at all.

    Do you understand why it was a foolish idea? The mistake, which Wald saw instantly, was that the holes showed where the planes were strongest. The holes showed where a bomber could be shot and still survive the flight home, Wald explained. After all, here they were, holes and all. It was the planes that weren't there that needed extra protection, and they had needed it in places that these planes had not. The holes in the surviving planes actually revealed the locations that needed the least additional armor. Look at where the survivors are unharmed, he said, and that's where these bombers are most vulnerable; that's where the planes that didn't make it back were hit.

    Taking survivorship bias into account, Wald went ahead and worked out how much damage each individual part of an airplane could take before it was destroyed — engine, ailerons, pilot, stabilizers, etc. – and then through a tangle of complicated equations he showed the commanders how likely it was that the average plane would get shot in those places in any given bombing run depending on the amount of resistance it faced. Those calculations are still in use today.

A similar problem in a less fatal context is the error of the missing denominator. It is only natural for example to ask how one can get rich, and then look for the answer by interviewing people who got rich. This is the premise of many self-help books, starting with the classic "Think and Grow Rich" (1937) by Napoleon Hill.

Though there is much to be learned from these magnates, we don't get the corresponding stories of people who tried to get rich and failed.

If R people tired to get rich and succeeded, while F tried and failed, the success rate is R/(R+F), which gives us a rough idea of the "odds" of success. Without the number F we can't compute this ratio. I always try to find the missing denominator whenever possible.

CONCRETE, PERSONAL, SPECIFIC LIFE LESSONS

    "Most people are skeptical about the wrong things and gullible about the wrong things."

      — Nassim Nicholas Taleb
Okay, enough abstractions. Nassim Nicholas Taleb likes to ask who has "skin in the game" as a way of qualifying potential experts. (This of course refers to American football and other contact sports in which a player may leave some skin on the playing field.) I've lost some skin — metaphorically speaking — at various points in my career, and gained some valuable lessons.

Get It Before the Second Bounce

One thing I learned while still in school is that the material I best remember from my classes were the questions I got wrong on the tests. I'm not sure why but they seems always to have been the most memorable lessons.

Once years ago in a job interview I was asked which Java package had the methods for handling sockets, and I couldn't remember. I didn't think it was that terrific a question, since it's trivial information that's easy to look up, but I still failed that particular test, and I will never again forget: the package java.io is the answer.

Recently I was annoyed by a Facebook quiz on how Floridian I am (it's my birth state) — it said I got 90% but didn't tell what I got wrong! To heck with that! I refused to "share" their stupid quiz. How can I learn anything from it?

Showing Up For Work


Round Up ride

When I was in college I had a part time job as a ride operator at the Santa Cruz Boardwalk. For a while I worked for an old carny named R. T. Carr who had been in the circus, worked as a railroad cop, built circus miniatures as a hobby, and had many amazing tales of life adventures. I liked and admired him and I think he liked me too, but that didn't stop him from firing me for being habitually late. I was in charge of opening the Round Up ride weekends at 11:00 AM. If I took a bus from campus I had to leave at 8:55 AM, and I arrived an hour and forty-five minutes early. If I took the next bus at 10:55 AM I'd be 15 minutes late. If I left by 10:30 and hitchhiked (which I realize now was a mistake) I might get there on time. I usually left by 10:30. This strategy resulted in me being habitually late, which got me fired. (A few months later R. T. passed away and I took the opportunity to re-apply and get my old job back. Sigh.) What I now understand is that I should've gotten used to leaving at 8:55 AM and being an hour and forty-five minutes early, and made the best of it, taking all of the luck factor out of the situation.

The lesson that has stayed with me is that you have to plan to be early in order to make it on time. I'm glad I learned this before I began to work professionally. When I was traveling the western U.S. helping a sales team move supercomputers, I heard of a book called "Showing Up for Work and Other Keys to Business Success" (1988) by Michael & Timothy Mescon.

In the pre-Amazon days I went to some lengths to obtain a copy, but I just had to have it. Here is a sample:

    A story: once there was a young man who attended school in a large urban university as a part-time student while holding down a full-time job. After nine arduous years of work and study, study and work, he completed his studies. He went to his professor and said, "I'm ready to graduate. I want to be successful. I need your advice."

    The professor looked at him for a moment and asked, "Are you absolutely certain you want to be a success?"

    The student assured the professor that "After nine years of being a second-class citizen, I am committed to the idea of success," and repeated his request for advice.

    "In that case, I will advise you," said the professor. "Show up."

    The student was stunned. "Do you mean to say that after nine years of paying tuition, attending classes, passing exams, and studying, you're saying all I have to do to succeed is to show up?"

    "Well," said the professor, "that's actually the truth only about 70 percent of the time. But if you want to increase your odds, show up on time. And if you want to devastate virtually all competition, show up on time, dressed to play. Chances are, you won't even have to break a sweat."

These days I've been hearing tales of a surge in flaky employees, and the dreaded "no call, no show" which was practically unheard of when I started my career. My ingrained habits of promptness and reliability continue to help me stand out.

The Impossible Is Less Likely

    "can't happen — The traditional program comment for code executed under a condition that should never be true, for example a file size computed as negative. Often, such a condition being true indicates data corruption or a faulty algorithm; it is almost always handled by emitting a fatal error message and terminating or crashing, since there is little else that can be done. This is also often the text emitted if the 'impossible' error actually happens! Although 'can't happen' events are genuinely infrequent in production code, programmers wise enough to check for them habitually are often surprised at how often they are triggered during development and how many headaches checking for them turns out to head off."

      — definition of "can't happen" from hacker-dictionary.com

Painful lessons have taught me habits that I adhere to firmly. Very occasionally I test my rules by breaking them (also known as tempting fate) and I usually regret it. Some rules include:

  • never carry opened containers of fluids in baggage (factory sealed is OK)

  • bring a spare suit of clothes along to customer meetings

  • never take any food or drink into a trade show booth, except water

  • when locking a car door or building door, hold onto the key while you do it

In each case I am working to reduce the possibility of calamity. For example, if it is impossible for me to lock a key inside a door, it happens less often.

Five Scenarios

I've found it's tough to get people to do disaster planning. There are obviously superstitious reasons for this: sometimes people don't want to "jinx" things. There is also a certain amount of denial involved: if you don't plan for it, maybe it can't happen.

In my own household in Southern California, we recognize that power outages, wildfires and earthquakes are our biggest risks from natural disasters, and have planned accordingly. This includes:

  • having a planned meeting place if we are forced from our home and communications are down (actually we have two: one walking distance, and the other outside the wildfire risk area)

  • having a designated contact person out of town (disasters are nearly always local, and it's good to have an unaffected party for everyone to check in with)

  • all of our vehicles have disaster preparedness kits (see the last blog entry, GADGETS FOR THE TRAVELING TECHIE, August 2016, for more info).

But we know plenty of folks — most, actually — who have no plan.

When this happens on an institutional level, there can be bad consequences. One thinker who has made a career out of analyzing these types of problems is Nassim Nicholas Taleb, especially in his book "The Black Swan" (2007).

He points out that low-probability high-impact events are very hard to accurately forecast because they are rare, and give us few examples to analyze statistically.

The best tool I have ever encountered for getting organizations to do this paradoxically important yet non-urgent and possibly unneccessary preparation is a technique called "scenario planning." This set of slides from Maree Conway at Swinburne University of Technology provides a good overview:

The technique, though not developed by the Global Business Network, was popularized by them.

    Unlike forecasting which extrapolates past and present trends to predict the future, scenario planning is an interactive process for exploring alternative, plausible futures and what those might mean for strategies, policies, and decisions. Scenario planning was first used by the military in World War II and then by Herman Kahn at RAND ("Thinking the Unthinkable") during the Cold War, before being adapted to inform corporate strategy by Pierre Wack and other business strategists at Royal Dutch/Shell in the 1970s. The key principles of scenario planning include thinking from the outside in about the forces in the contextual environment that are driving change, engaging multiple perspectives to identify and interpret those forces, and adopting a long view.

      — Wikipedia article on Global Business Network

    ( en.wikipedia.org/wiki/Global_Business_Network )

A quick-and-dirty description of the process is to describe five future scenarios:

  1. best-case plausible
  2. better than average outcome
  3. typical, expected outcome
  4. worse than average outcome
  5. worst-case plausible

The term "plausible" is to rule out the paranoid and impossible, like Godzilla attacking, and to focus on a range of believable predictions, and then have a plan for each of them. It doesn't seem as macabre as simply "disaster planning," it has a good pedigree, and it prevents future panic when the unexpected (but not completely unexpected) is already in the plan.

Now go forth, fail, learn, succeed and prosper!


Disclaimer

I receive a commission on everything you purchase from Amazon.com after following one of my links, which helps to support my research. It does not affect the price you pay.


This free service is brought to you by the book:

1 comment:

  1. Very nice example about not wanting to reinforce those parts of the aircraft which had bullet holes. Thanks!

    ReplyDelete