m_topn picture
Atlantic Monthly Sidebar

Return to the Table of Contents.

M A R C H  1 9 9 8

The online version of this article appears in three parts. Click here to go to part one. Click here to go to part two.


A "Normal Accident"

PILOTS are safety practitioners, steeped in a can-do attitude toward survival and confident in their own skills. We tend to think that man-made accidents must lie within human control. This idea has been encouraged to some extent by the work of a group of Berkeley professors -- notably the political scientist Todd La Porte -- who study "high-reliability organizations," meaning those with good track records at handling apparently hazardous technologies: aircraft carriers, air-traffic-control centers, Flyingcertain power companies. They believe that organizations can learn from past mistakes and can tailor themselves to achieve new objectives, and that if the right, albeit difficult, steps are taken, many accidents can be avoided.

Charles Perrow's thinking is more difficult for pilots like me to accept. Perrow came unintentionally to his theory about normal accidents after studying the failings of large organizations. His point is not that some technologies are riskier than others, which is obvious, but that the control and operation of some of the riskiest technologies require organizations so complex that serious failures are virtually guaranteed to occur. Those failures will occasionally combine in unforeseeable ways, and if they induce further failures in an operating environment of tightly interrelated processes, the failures will spin out of control, defeating all interventions. The resulting accidents are inevitable, Perrow asserts, because they emerge from the venture itself. You cannot eliminate one without killing the other.
Discuss this article in the Community & Society forum of Post & Riposte.


Go to part one of this article.

Go to part two of this article.



Perrow's seminal book Normal Accidents: Living With High-Risk Technologies (1984) is an unusual work -- a hodgepodge of storytelling and exhortation, out of which this new way of thinking has risen. His central device is an organizational chart on which to plot the likelihood of serious system accidents. He does not append numerical values to the chart but uses a set of general risk indicators. In one quadrant stand the processes -- like those of most manufacturing -- that are simple, slow, linear, and visible, and in which the operators experience failures as isolated and containable events. In the opposite one stand the opaque and tangled processes characterized by a combination of what Perrow calls "interactive complexity" and "tight coupling." By "interactive complexity" he means not simply that there are many elements involved but that those elements are linked in multiple and often unpredictable ways. The failure of one part -- whether material, psychological, or organizational -- may coincide with the failure of an entirely different part, and this unforeseeable combination will cause the failure of other parts, and so on. If the system is large, the possible combinations of failures are practically infinite. Such unravelings seem to have an intelligence of their own: they expose hidden connections, neutralize redundancies, bypass "firewalls," and exploit chance circumstances that no engineer could have planned for. When the operating system is inherently quick and inflexible (like a chemical process, an automated response to missile attack, or a jet airliner in flight), the cascading failures can accelerate out of control, confounding the human operators and denying them a chance to jury-rig a recovery. That lack of slack is Perrow's tight coupling. Then the only difference between a harmless accident and a human tragedy may be a question, as in chemical plants, of which way the wind blows.

I ran across this thinking by chance, a year before the ValuJet crash, when I picked up a copy of Scott D. Sagan's book The Limits of Safety: Organizations, Accidents, and Nuclear Weapons (1993). Sagan, a Stanford political scientist who is a generation younger than Perrow, is the most persuasive of Perrow's interpreters, and with The Limits of Safety he has solidified system-accident thinking, focusing it more clearly than Perrow was able to. The Limits of Safety starts by placing high-reliability and normal-accident theories in opposition and then tests them against a laboriously researched and previously secret history of failures within U.S. nuclear-weapons programs. The test is a transparent artifice, but it serves to define the two theories. Sagan's obvious bias does not diminish his work.

Strategic nuclear weapons pose an especially difficult problem for system-accident thinking, for two reasons: first, there has never been an accidental nuclear detonation, let alone an accidental nuclear war; and second, if a real possibility of such an apocalyptic failure exists, it threatens the very logic of nuclear deterrence -- the expectation of rational behavior on which we continue to base our arsenals. Once again the pursuit of system accidents leads to uncomfortable ends. Sagan is not a man to advocate disarmament, and he shies away from doing so in his book, observing realistically that nuclear weapons are here to stay. Nonetheless, once he has defined "accidents" as less than nuclear explosions (as false warnings, near launches, and other unanticipated breakdowns in this ultimate "high-reliability" system), Sagan discovers a pattern of accidents, some of which were contained only by chance. The reader is hardly surprised when Sagan concludes that such accidents are inevitable.

The book interested me not because of the accidents themselves but because of their pattern, which seemed strangely familiar. Though the pattern represented possibilities that I as a pilot had categorically rejected, this new perspective required me to face the unpredictable side of my own experience with the sky. I had to admit that some of my friends had died in crazy and unlucky ways, that some flights had gone uncontrollably wrong, and that perhaps not even the pilots were to blame. What is more, I had to admit that no matter how carefully I checked my own airplanes, and how cautiously I flew them, the same could happen to me.

That is where we stand now as a society with ValuJet Flight 592, and it may explain our continuing discomfort with the accident. The ValuJet case represents a nearly perfect system accident. It arose from a process that fits most of Perrow's technical requirements of unpredictability and interactive complexity and some of those of tight coupling. More important, it fits the most basic definitions of an accident caused by the very functioning of the system or industry within which it occurred. Flight 592 burned because of its cargo of oxygen generators, yes, but more fundamentally because of a tangle of confusions that will take some entirely different form next time. It is frustrating to fight such a thing, and wrongdoing is difficult to assign.

ValuJet's Pretend Reality

TAKE, for example, the case of the two SabreTech mechanics who helped to remove the oxygen canisters from the ValuJet MD-80s, ignored the written work orders to install safety caps, stacked the dangerous canisters improperly in cardboard boxes, and finished by falsely signing off on the job. They will probably suffer for the rest of their lives for their negligence, as perhaps they should. But here is what really happened: Nearly 600 people logged time working on the three ValuJet airplanes in SabreTech's Miami hangar, and of them seventy-two logged 910 hours over several weeks for replacing oxygen generators, in most cases because they had "expired" -- reached the end of their approved lives. According to ValuJet work card No. 0069, which was supplied to investigators, the second step of the seven-step removal process was If generator has not been expended, install shipping cap on firing pin.

This required a gang of hard-pressed mechanics to draw a verbal distinction between canisters that were "expired," meaning most of the ones they were removing, and canisters that were not "expended," meaning many of the same ones, loaded and ready to fire, on which they were expected to put nonexistent caps. Also involved were canisters that were expired and expended, and others that were not expired but were expended. And then, of course, there was the set of new replacement canisters, which were both unexpended and unexpired. If this seems confusing, do not waste your time trying to figure it out -- the SabreTech mechanics did not, nor should they have been expected to. The NTSB suggested that one problem at SabreTech's Miami facility may have been the presence of Spanish-speaking immigrants on the work force, but quite obviously the language problem lay on the other side -- with ValuJet and the English-speaking engineers, literalists, who wrote the orders and technical manuals as if they were writing to themselves. The real problem, in other words, was engineerspeak.

Before the accident the worry was not about old parts but about new ones -- the safe refurbishing of the MD-80s in time to meet the ValuJet deadline. The mechanics quickly removed the oxygen canisters from their brackets and wired green tags to most of them. The green tags meant "repairable," which these canisters were not. It is not clear how many of the seventy-two workers were aware that these canisters couldn't be used again, since the replacement of oxygen generators is a rare operation, though of the people questioned after the accident most claimed to have known at least why the canisters had to be removed. But here, too, there is evidence of confusion. After the accident two tagged canisters were found still lying in the SabreTech hangar. On one of the tags, under "Reason for Removal," someone had written, "out of date." On the other tag someone had written, "generators have been expired fired."

Yes, a mechanic might have found his way past the ValuJet work card and into the huge MD-80 maintenance manual, to chapter 35-22-01, within which line "h" would have instructed him to "store or dispose of oxygen generator." By diligently pursuing his options, the mechanic could have found his way to a different part of the manual and learned that "all serviceable and unserviceable (unexpended) oxygen generators (canisters) are to be stored in an area that ensures that each unit is not exposed to high temperatures or possible damage." By pondering the implications of the parentheses he might have deduced that the "unexpended" canisters were also "unserviceable"canisters and that because he had no shipping cap, he should perhaps take such canisters to a safe area and "initiate" them, according to the procedures described in section 2.D. To initiate an oxygen generator is of course to fire it off, triggering the chemical reaction that produces oxygen and leaves a mildly toxic residue within the canister, which is then classified as hazardous waste. Section 2.D contains the admonition "An expended oxygen generator (canister) contains both barium oxide and asbestos fibers and must be disposed of in accordance with local regulatory compliances and using authorized procedures." No wonder the mechanics stuck the old generators in boxes.

The supervisors and inspectors failed miserably here, though after the accident they proved clever at ducking responsibility. At the least they should have supplied the required safety caps and verified that those caps were being used. If they had -- despite all the other errors that were made -- Flight 592 would not have burned. For larger reasons, too, their failure is an essential part of this story. It represents not the avarice of profit takers but rather something more insidious -- the sort of collective relaxation of technical standards that the Boston College sociologist Diane Vaughan has called "the normalization of deviance," and that she believes existed at NASA in the years leading up to the 1986 explosion of the space shuttle Challenger. The leaking O-rings that caused the catastrophic blow-by of rocket fuel were a well-known design weakness, and had been the subject of worried memos and conferences up to the eve of the launch. Vaughan's book The Challenger Launch Decision (1996) is a 575-page exercise in system-accident thinking. After a long immersion in NASA's technical culture, Vaughan concludes that the O-ring worries were put aside in part because the agency had gotten away with launching the O-rings before. As Perrow has argued, what can go wrong usually goes right -- and then people draw the wrong conclusions. In a general way this is what happened at SabreTech. Some mechanics now claim to have expressed their concerns about the safety caps, but if they did, they were not heard. The operation had grown used to taking shortcuts.

But let us be honest -- mechanics who are too careful will never get the job done. The airline system as it stands today requires people, in flight or on the ground, to compromise, to make choices, and sometimes even to gamble. The SabreTech crews went astray -- but not far astray -- by allowing themselves quite naturally not to worry about discarded parts. A fire hazard? Sure. The mechanics taped off the lanyards and may have shoved the canisters a little farther away from the airplanes they were working on. The canisters had no warnings about heat on them and none of the standard hazardous-materials placards. It probably would not have mattered anyway, because the work area was crowded with placards and officially designated hazardous materials, and people had learned not to take them too seriously. Out of curiosity a few of the mechanics fired off some canisters and listened to the oxygen come out -- it went pssst. No one seems to have considered the possibility that the canisters might accidentally be shipped. The mechanics did finally carry the five cardboard boxes over to the shipping department, but only because that was where ValuJet property was stored -- an arrangement that itself made sense.

When the shipping clerk got to work the next morning, he found the boxes without explanation on the floor of the ValuJet area. The boxes were innocent-looking, and he left them alone until he was told to tidy up. Sending them to Atlanta seemed like the best way to do that. He had shipped off "company material" before without ValuJet's specific approval, and he had heard no complaints. He knew he was dealing with oxygen canisters, but apparently did not understand the difference between oxygen storage tanks and generators designed to fire off. When he prepared the boxes for shipping, he noticed the green "repairable" tags mistakenly placed on the canisters by the mechanics, and misunderstood them to signify "unserviceable" or "out of service," as he variably said after the accident. He also drew the unpredictable conclusion that the canisters were therefore empty. He asked the receiving clerk to fill out a shipping ticket. The receiving clerk did as he was asked, listing the tires and canisters, and put quotation marks around the word "Empty." Later, when asked why, he replied, "No reason. I always put like, when I put my check, I put 'Carlos' in quotations. No reason I put that." The reason was that it was his habit. On the shipping ticket he also put "5 boxes" between quotation marks.

But a day or so later, over by Flight 592, the ValuJet ramp agent who signed for the cargo didn't care about such subtleties. ValuJet was not authorized to carry hazardous cargoes of any sort, and it seems obvious now that a shipping ticket listing tires on wheel assemblies and oxygen canisters (whether or not they were empty) should have aroused the ramp agent's suspicions. No one would have complained had he opened the boxes, or summarily rejected the load. There was no hazardous-materials paperwork associated with it, but he had been formally trained in the recognition of unmarked hazards. His ValuJet station-operations manual specifically warned, "Cargo may be declared under a general description that may have hazards which are not apparent, that the shipper may not be aware of this. You must be conscious of the fact that these items have caused serious incidents, and in fact, endangered the safety of the aircraft and personnel involved." It also said,
Your responsibility in recognizing hazardous materials is dependent on your ability to: 1. Be Alert! 2. Take the time to ask questions! 3. Look for labels! ... Ramp agents should be alert whenever handling luggage or boxes. Any item that might be considered hazardous should be brought to the attention of your supervisor or pilot, and brought to the immediate attention of Flight Control and, if required, the FAA. REMEMBER: SAFETY OF PASSENGERS AND FELLOW EMPLOYEES DEPENDS ON YOU!
It is possible that the ramp agent was lulled by the company-material labels. Would the SabreTech workers ship hazardous cargo without letting him know? His conversation with the copilot, Richard Hazen, about the weight of the load may have lulled him as well. Hazen, too, had been formally trained to spot hazardous materials, and he would have understood better than the ramp agent the dangerous nature of oxygen canisters, but he said nothing. It was a routine moment in a routine day. The morning's pesky electrical problems had perhaps been resolved. The crew was calmly and rationally preparing the airplane for the next flight, a procedure that had always worked for them before. As a result the passengers' last line of defense folded. They were unlucky, and the system killed them.

Disaster

Giving Up on a Zero-Accident Future

WHAT are we to make of this tangle of circumstance and error? One suspicion is that its causes may lie in the market forces of a deregulated airline industry, and that in order to keep such catastrophes from happening in the future we might need to consider the possibility of re-regulation -- a return to the old system of limited competition, union work forces, higher salaries, and expensive tickets. There are calls now for just that. The improvement in safety would come from slowing things down, and allowing a few anointed airlines the leisure to discover their mistakes and act on them. The effects on society, however, would be costly and anti-egalitarian -- a return to a constricted system that many fewer people could afford to use. Moreover, technical trends would argue against it. Despite the obvious chaos of the business and the apparent frequency of airline accidents, air travel has become safer under deregulation. Reductions in "procedural" and "engineered" accidents have more than compensated for any increase in system accidents -- which in any case must have occurred in the past as well.

The other way to regulate the airline industry is not economic but operational -- detailed governmental oversight of all the technical aspects of flight. This is an approach we have taken since the birth of the airlines, in the 1920s, and it is what we expect of the FAA today. Strictly applied standards are all the more important in a free market, in which unchecked competition would eventually require airlines to cut costs to the point of operating unsafely, until accidents forced them out of business one by one. A company should not overload its airplanes or fly them with worn-out parts, but it also cannot compete effectively against other companies that do. Day to day, airline executives may resent the intrusion of government, but in their more reflective moments they must also realize that they need this regulation in order to survive. The friendship that has grown up between the two sides -- between the regulators and the regulated -- is an expression of this fact, which no amount of self-reform at the FAA can change. When after the ValuJet crash David Hinson, of the FAA, reacted to accusations of cronyism by going to Congress and humbly requesting that his agency's "dual mandate" be eliminated, so that it would no longer be required by law to promote the airlines, he and Congress (which did as he requested) were engaged in a particularly hollow form of political theater.

The FAA's critics had real points to make. The agency had become too worried about the reactions of its allies in the airline industry, and it needed to try harder to enforce existing regulations. Perhaps it needed even to write some new regulations. Like NASA before the Challenger accident, the FAAneeded to listen to the opinions and worries of its own lower-level employees. But there are limits to all this, too. When, at a post-crash press conference in Miami, a reporter asked Robert Francis, of the NTSB, "Shouldn't the government protect us against this kind of thing?" the best answer would have been "It cannot, and never will."

The truth helps, because in our frustration with such system accidents we may be tempted to invent solutions that, by adding to the obscurity and complexity of the system, may aggravate just those characteristics that led to the accidents in the first place. This argument for a theoretical point of diminishing safety is a central part of Perrow's thinking, and it seems to be borne out in practice. In his exploration of the North American early-warning system Sagan found that the failures of safety devices and backup systems gave the most dangerous false indications of missile attack -- the kind that could have triggered a response. The radiation accidents at Chernobyl and Three Mile Island were both induced by failures in the safety systems. Remember also that the ValuJet oxygen generators were safety devices, that they were backup systems, and that they were removed from the MD-80s because of regulations limiting their useful lives. This is not an argument against such devices but a reminder that elaboration comes at a price.

Human reactions add to the problem. Administrators can think up impressive chains of command and control, and impose complex double checks and procedures on an operating system, and they can load the structure with redundancies, but on the receiving end there comes a point -- in the privacy of a hangar or a cockpit -- beyond which people rebel. These rebellions are now common throughout the airline business -- and, indeed, throughout society. They result in unpredictable and arbitrary actions, all the more so because in the modern, insecure workplace they remain undeclared. The one thing that always gets done is the required paperwork.

Paperwork is a necessary and inevitable part of the system, but it, too, introduces dangers. The problem is not just the burden that it places on practical operations but also the deception that it breeds. The two unfortunate mechanics who signed off on the nonexistent safety caps just happened to be the slowest to slip away when the supervisors needed signatures. The other mechanics almost certainly would have signed too, as did the inspectors. Their good old-fashioned pencil-whipping is perhaps the most widespread form of Vaughan's "normalization of deviance." The falsification they committed was part of a larger deception -- the creation of an entire pretend reality that includes unworkable chains of command, unlearnable training programs, unreadable manuals, and the fiction of regulations, checks, and controls. Such pretend realities extend even into the most self-consciously progressive large organizations, with their attempts to formalize informality, to deregulate the workplace, to share profits and responsibilities, to respect the integrity and initiative of the individual. The systems work in principle, and usually in practice as well, but the two may have little to do with each other. Paperwork floats free of the ground and obscures the murky workplaces where, in the confusion of real life, system accidents are born.

It would be wrong to conclude that we should join the alarmists in their prophesies of doom. Flying will remain safe, and for conventional reasons, including the admirable reaction we have seen to the ValuJet crash. But it should also be clear that there are structural limits to flight safety, and that any dream of a zero-accident future is probably about as realistic as the old ValuJet promise to put safety first. If that is true, we had better get used to it. Conventional accidents -- those I call procedural or engineered -- will submit to our solutions, but as air travel continues to expand, we can expect capricious system accidents to blossom. Understanding why might keep us from making the system even more complex, and therefore perhaps more dangerous, too.
The online version of this article appears in three parts. Click here to go to
part one. Click here to go to part two.

William Langewiesche is a contributing editor of The Atlantic and the author of Sahara Unveiled (1996). His article in this issue will appear in his book Inside the Sky: A Meditation on Flight, to be published this spring by Pantheon Books.

Illustrations by Philippe Weisbecker

Copyright © 1998 by The Atlantic Monthly Company. All rights reserved.
The Atlantic Monthly; March 1998; The Lessons of ValuJet 592; Volume 281, No. 3; pages 81 - 98.

m_nv_cv picture m_nv_un picture m_nv_am picture m_nv_pr picture m_nv_as picture m_nv_se picture