Board to revamp NASA management organization
BY WILLIAM HARWOOD
STORY WRITTEN FOR CBS NEWS "SPACE PLACE" & USED WITH PERMISSION
Posted: April 24, 2003
In case there were any doubts, the chairman of the Columbia Accident Investigation Board says that finding the root cause of the shuttle disaster is only part of the panel's charter and that lawmakers in Washington have made it clear they expect broad changes in NASA's organizational structure.
And that's exactly what they're going to get.
"You can see the depth and the details that we are exploring (in) these issues of risk, risk management," Harold Gehman told reporters Wednesday after the board's fifth public hearing. "We're going to interview many, many more experts on this subject and we're going to approach this issue probably more broadly than the Challenger report did. But we're also going to approach it with great care because of the 'law of unintended consequences.' We are not experts in this area, but clearly this area can stand some scrutiny."
"I'm just probably a little bit dense," a reporter said, "but it finally sunk in to me today, listening to some of your questions, that the management system-slash-culture, and I'm not sure how you separate those two, really is going to come under your microscope and you really are going to change it, aren't you?"
"That's correct," Gehman replied. "This is a result of my direct liaison with all of the (congressional) oversight committees, both the chairmen and minority members, who want a broader report than just what happened. And that's what they're going to get."
In earlier testimony before the board, a sociologist who spent years researching the organizational flaws that led to the 1986 Challenger disaster said that she, like former astronaut and CAIB member Sally Ride, hears "echoes of Challenger" in the way NASA managers came to accept as normal serious problems in a critical system.
In the case of Challenger, that critical system involved the O-ring seals in the joints of the shuttle's solid-fuel boosters. Flame from the interior of the rocket was never supposed to reach the O-rings, but heat damage was seen so frequently it came to be accepted as a normal occurrence and not a "safety of flight" issue. Challenger was destroyed by an O-ring burn-through.
In the case of Columbia, the critical system was the shuttle's thermal protection system. Foam insulation on the external fuel tank is not supposed to come off in flight. Yet dozens of protective heat-shield tiles on the shuttle's belly are damaged in virtually every flight by bits of foam debris breaking away during ascent. A large foam impact is believed by many to have played a role in the Columbia disaster.
In testimony at an earlier hearing, external tank engineers told the board that a top level requirement calling for no foam shedding at all remains on the books and that project engineers were well aware that a large enough piece of foam debris could pose a catastrophic threat to the shuttle.
A large piece broke off during a flight in October, hitting one of the shuttle's boosters. But foam shedding was not considered a safety of flight issue and while the impact was discussed at a flight readiness review before the next launch, it was not discussed in any detail during Columbia's FRR.
Eighty-one seconds after Columbia took off on Jan. 16, a suitcase-size piece of foam broke off the tank. One second later, it slammed into the leading edge of the shuttle's left wing at 450 mph, disintegrating in a shower of debris. A hurried analysis during the mission concluded the wing might have been damaged, but not to any catastrophic extent. It was not, in other words, a safety of flight issue and requests for close-up satellite imagery to inspect the area were turned down.
"I want to start from the point of view of Sally Ride's now famous statement, (that) she hears echoes of Challenger in Columbia," Diane Vaughan, author of "The Challenger Launch Decision," told the Columbia Accident Investigation Board Wednesday. "The question is what do these echoes mean? When you have problems that persist over time in spite of the change in personnel, it means that something systematic is going on in the organizations where these people work.
"Challenger was not just an O-ring failure, but it was a failure of the organizational system," she said. "What the echoes mean is that the problems that existed at the time of Challenger have not been fixed, despite all the resources and all the insights the presidential commission found, that these problems have still remained.
"And so one of the things that we need to think about is when an organizational system creates problems, the (corrective) strategies to make, the changes have to, in fact, address the causes in the system. If you don't do that, then the problems repeat and I believe that's what happened with Columbia."
Vaughan, a professor at Boston College, described the process in which well-intentioned NASA managers came to accept O-ring damage - or external tank foam shedding - as normal an "incremental descent into poor judgment" based on the "normalization of deviance."
"This was (a) design from which there were predicted to be no problems with the O-rings, no damage," she said. "An anomaly occurred early in flights of the shuttle and they accepted that anomaly. And then they continued to have anomalies and accepted more and more. This was not just blind acceptance, but they analyzed them thoroughly and on the basis of their engineering analysis and their tests they concluded that it was not a threat to flight safety.
"It's important to understand that this history was the background on which they made decisions on the eve of launch and that was one more step in which they, again, gradually had expanded the bounds of acceptable risk."
So how did this normalization of deviance, this eventual acceptance of damage in a system that was not supposed to experience regular damage, become standard operating procedure?
"It's important to know they were making decisions against a backdrop where problems were expected," Vaughan said. "Because the shuttle was designed to be reusable, they knew it was going to come back from outer space with damage and so there was damage on every mission. Put simply, in an environment like that, to have a problem is itself normal.
"So what to us in hindsight seemed to be clear signals of danger that should have been heeded, that is, the number of flaws in O-ring erosion that had happened prior to Challenger, looked different to them. What we saw as signals of danger they saw as mixed signals.
"They would have a problem flight, it would be followed by a flight in which there was no problem. They would have weak signals, something that in retrospect seemed to us to be a flight stopper to them was interpreted differently at the time. For example, cold, which was a problem with the Challenger flight, was not a clear problem and not a clear cause on an earlier launch.
"Finally, what we saw as signals of danger came to be routine," Vaughan said. "In the year before Challenger, they were having O-ring erosion on seven out of nine flights. At this time, it became a routine signal, not a warning sign."
By implication, NASA's acceptance of foam shedding as a routine event resulted in a mindset that played a role in the agency's post-launch decision-making process. The Boeing analysis of the foam impact was accepted even though it was based on limited test data based on impacts by much smaller pieces of debris. The analysis concluded there was no safety of flight issue. Requests for satellite imagery were never made, officials have said, because there was no safety of flight issue.
"What was obvious with Challenger was that on the eve of the launch that the concerns of the engineers were not prioritized," Vaughan said. "It also seems to be the case in the requests for the imagery from Columbia that concerned engineers discovering the foam strike at this point described it as large, there was nothing in their experience like this. It was the size of a Coke cooler. This was unique.
They wanted better imagery, she said, to help determine how extensive the damage might be.
"But somebody up the hierarchy canceled the request," Vaughan said. "The request did not go through proper channels, which points to me the significance of rules and hierarchies over deference to technical expertise."
In both Challenger and Columbia, she said, "following the normal rules and procedures seemed to take precedence. And we know that, in fact, in conditions of uncertainty, people do follow habits and routines.
"However, under these circumstances, where you have something without precedent, it would seem that this would be a time not for hierarchical decision making, but for a more collective, collaborative, what does everybody think, let's open the floodgates and not pull on the usual people but especially, what are the concerns of our engineers?
"And also to let up on the idea that you have to have hard data. Engineering hunches and intuitions are not what you want to launch a mission with. But when you have a problem that occurs that's a crisis and you don't have adequate information ... engineering hunches and intuition ought to be enough to cause concerns."
Vaughan also cited external pressures on the system - policy directives from Washington, budget shortfalls and unrealistic expectations - as factors affecting the way decisions are made. But the focus of her testimony was lessons learned from Challenger and how NASA might develop a better way to track problems and to identify those that pose potential risks before those risks are manifested.
"When you're working in a situation where problems are expected, you have problems every day and people are busy with daily engineering decisions, it becomes very difficult to identify and stay in touch with the big picture," she said. "How do you identify the trend so that people are aware when they're gradually increasing the bounds of acceptable risk? It is certainly true based on what we know about organizations and accidents that this is a risky system. And what we know is the greater the complexity of the organization, the greater the possibility of failure."
In a series of questions, Gehman displayed a strong interest in figuring out how to improve the system without falling victim to what he called the "law of unintended consequences."
"I'm still trying to understand the principles here," he said to Vaughan. "It seems to me that in a very, very large, complex organization like NASA is, with a very, very risky mission, some decisions have to be taken at middle management levels. Not every decision and not every problem can be raised up to the top. And there must be a process by which the level 3 and level 4 (managers), the decisions are taken, minority views are listened to, competent engineers weigh these things and then they take a deep breath and say OK, we've heard you now we're going to move on. Then they report up that they've done their due diligence, you might say.
"I'm struggling to find a model, an organization model in my head, when you've got literally thousands and thousands of these decisions to make that you can keep bumping them up higher in the organization with the expectation that people up higher in the organization are better positioned to make engineering decisions than the engineers. You said yourself hindsight is perfect. We've got to be really careful about hindsight.
"I'm trying to figure out what principles to apply," Gehman repeated. "We as a board are certainly skittish about making organizational changes to a very complex organization for fear of invoking the law of unintended consequences. So I need to understand the principles, I'm trying to figure out a way I can apply your very useful analysis here and apply it to find a way to figure out what the principles are we ought to apply to this case. So the part I'm hung up on right now is how else can you resolve literally thousands of engineering issues except in a hierarchical manner in which some manager, he has 125 of these and he's sorted through them and he reports to his boss that his 125 are under control. I don't know how to do that."
Vaughan offered two observations.
"Somehow or other in the shuttle program there is a process by which when a design doesn't predict an anomaly it can be accepted," she said. "That seems to me to be a critical point, that if this is not supposed to be happening, why are we getting hundreds of debris hits if it wasn't supposed to happen at all?
"It's certainly true that in a program where technical problems are normal, you have to set priorities. But if there is no design flaw predicted, then having a problem should itself be a warning sign, not something that is taken for granted. The idea is to spot little mistakes so that they don't turn into big catastrophes, which means spotting them early on.
"Two things, both of them NASA may be very aware of, is that engineers' concerns need to be dealt with. I can understand the requirement for hard data. But what about the more intuitive kinds of arguments? People feel disempowered because they've got a hunch or intuition and let somebody else handle it because they feel like they're going to be chastised for arguing on the basis of what at NASA is considered subjective information and they're not going to speak up. So there need to be channels to assure that, even giving engineers special powers if that's what's necessary. The other is the idea of giving more clout to the safety people to surface problems."
In the end, she said, "What we find out from this comparison between Columbia and Challenger is that NASA as an organization did not learn from its previous mistakes and it did not properly address all the factors the presidential commission identified.
"They need to reach out and get more information and look at other models as well. Thinking about how you might restructure the post-launch decision making process so that what appears to have happened in Columbia doesn't happen again. How can that be made more efficient? Maybe it needs to look more like the pre-launch decision process. But is there any evidence that NASA has really played with alternative models? My point about organizational structure is as organizations grow and change, you have to change the structures. But don't do it without thinking about what the consequences might be on the ground."
For its part, NASA has expressed no interest in Vaughan's conclusions or expertise. In a particularly insightful - and funny - exchange, board member John Logsdon asked Vaughan if anyone at NASA had ever called her for organizational advice, pointing out that her book is required reading in the Navy's nuclear training program.
"The book did get quite a lot of publicity," she replied. "I heard from many organizations that were concerned with reducing risk and reducing errors and mistakes. The U.S. Forest Service called and I spoke to hot shots and smoke jumpers, I went to a conference the physicians held looking at errors in hospitals, I was called by people working in nuclear regulatory operations, (by) regular businesses where it wasn't risky in the sense that human lives were at cost. Everybody called. My high school boyfriend called. But NASA never called."
NEW! This remarkable calendar features stunning images of planets, stars, gaseous nebulae, and galaxies captured by NASA's orbiting Hubble Space Telescope .
U.K. & WORLDWIDE STORE
Stunning posters featuring images from the Hubble Space Telescope and world-renowned astrophotographer David Malin are now available from the Astronomy Now Store.
U.K. & WORLDWIDE STORE
NEW! This amazing 2003 calendar features stunning images of mountain ranges, volcanoes, rivers, and oceans obtained from previous NASA space shuttle missions .
U.K. & WORLDWIDE STORE