Task group panelists blast space shuttle management
BY WILLIAM HARWOOD
STORY WRITTEN FOR CBS NEWS "SPACE PLACE" & USED WITH PERMISSION
Posted: August 17, 2005
Seven members of an independent review panel today blasted NASA's management of the post-Columbia shuttle program, blaming poor leadership for ongoing, pervasive "cultural" problems and an erosion of engineering rigor that raise questions about the agency's willingness to fly without a thorough understanding of the risks involved.
In an "annex" at the end of the final report of the Return to Flight Task Group, led by former Apollo astronaut Thomas Stafford and former shuttle commander Richard Covey, seven of the 26 panel members wrote a scathing set of personal observations detailing "persistent cultural symptoms we observed throughout the assessment process."
"What our concerns about rigor, risk and requirements point to are a lack of focused, consistent, leadership and management," the panel members wrote. "What we observed, during the return-to-flight effort, was that NASA leadership often did not set the proper tone, establish achievable expectations, or hold people accountable for meeting them. On many occasions, we observed weak understanding of basic program management and systems engineering principles, an abandonment of traditional processes, and a lack of rigor in execution.
"Many of the leaders and managers that we observed did not have a solid foundation in either the theory or practice of these basic principles. ... NASA's early successes are rooted in program management techniques and disciplines that few current managers in the human spaceflight arena have been willing to study. As a result, they lack the crucial ability to accurately evaluate how much or how little risk is associated with their decisions, particularly decisions to sidestep or abbreviate any given procedure or process.
"It is essential that senior managers have previously-demonstrated program management and systems engineering skills and a dedication to well-established, rigorous principles as they apply to complex, geographically and organizationally dispersed programs. More to the point, we remain concerned that NASA senior leadership did not recognize or correct this, and indeed sent contrary signals that the rigor and discipline of a sound program management process was not required."
In the wake of the Feb. 1, 2003, Columbia disaster, the Columbia Accident Investigation Board - CAIB - made 29 recommendations, including 15 that were to be implemented before shuttles returned to flight. Former NASA Administrator Sean O'Keefe appointed an outside panel of experts - the Return to Flight Task Group - to assess NASA's implementation of those 15 recommendations.
Earlier this summer, the task group released its preliminary report, which concluded NASA had failed to fully implement the three most critical recommendations: to eliminate all foam insulation debris from the external fuel tank; to "harden" the shuttle's heat-shield system to resist any impacts that do occur; and to develop reliable tile and wing leading edge repair techniques.
Most observers, including most task group members, said those shortcomings were more the result of a literal interpretation of the accident board's recommendations and a better understanding of the shuttle's vulnerabilities than any lack of effort on NASA's part. Given the narrow focus of their charter, no one on the panel suggested NASA not proceed with launch of the shuttle Discovery on the first post-Columbia mission.
In any case, the group's final report was submitted today and in an afternoon teleconference, Stafford and Covey both defended NASA, saying the personal observations by the "group of seven" were included in the final report at NASA Administrator Michael Griffin's direct request and that they did not represent the views of the panel as a whole.
"It was a very difficult task," Stafford said of NASA's ongoing recovery from the Columbia disaster. "In the end, they did a very competent job."
Covey would not address specific points made by the group of seven, but he said NASA handled a difficult process as well as could be expected.
"They had to go from an engineering and organizational approach that was focused on flying on a regular basis to one that went into almost a development mode and in some areas, an engineering redesign mode," he said. "Now, when that happens ... there's going to be some hiccups. It's not an easy transition, particularly when much of the design and development capability had long been lost within the program because of decision that had been made years ago.
"Everybody's going to have a different perspective. I'll use the sausage analogy. If you watch sausage being made, it's not always pretty and some people are going to find it uglier than others. I personally did not find the process, as it played out, unusual. It's easy to look from hindsight and say this could have been done better."
Another panel member who spoke on background agreed that "individual observations should not take on the same weight as the assessments reached by the entire task group because they are, as advertised, individual observations."
"However, they should serve as a heads up to NASA leadership and should merit serious and thorough internal thought and consideration," the panel member said.
The annex to the task group's final report included 10 sets of observations authored by 17 panel members that covered 28 pages. But the observations by the group of seven were by far the most detailed, covering 19-and-a-half pages.
The group was made up of Dan Crippen, former director of the Congressional Budget Office; Charles Daniel, a veteran NASA rocket engineer; Amy Donahue, a safety expert and professor of public policy at the University of Connecticut; Air Force Col. Susan Helms, a former shuttle astronaut who spent six months aboard the international space station; Susan Livingstone, a former undersecretary of the Navy; Rosemary O'Leary, a public administrator professor at Syracuse University; and William Wegner, an expert on nuclear safety programs.
Stafford and Covey said other task group panelists, some with broader experience managing large organizations, reached different conclusions.
Forrest McCartney, a retired Air Force lieutenant general, former director of the Kennedy Space Center and retired launch operations manager for Lockheed Martin, wrote that NASA is "dedicated to accomplishing the work necessary for safety returning to flight. They are to be congratulated on their efforts."
"While some of us might have approached the recovery process in a different way, the end result is what counts," he wrote. "The NASA headquarters leadership and space shuttle program office have done their best to implement the actions they believe will lead to a safe return to flight."
But the task group panelist who spoke on background said "NASA should pay close attention to these observations and either correct deficient processes and practices noted or to assure themselves that the observations do not represent the whole story."
That said, the group of seven's observations were stated in unusually direct language.
Citing "the enduring themes of dysfunctional organizational behavior," the group said a lack of personal accountability was pervasive in the shuttle program, "from the failure to establish responsibility for the loss of Columbia up to and including a failure to require an adequate risk assessment of (the shuttle Discovery's recent) flight."
"If no one, or no part of the organization, is held accountable for failing to meet those expectations, performance becomes simply a case of 'best effort' - a term that became common during many return-to-flight discussions.
"A general attitude within the space shuttle program seems to be that best-effort is a satisfactory substitute for meeting specific technical requirements; often requirements were not even documented to avoid the chance they could not be met. However, best-effort is a very poor substitute for a thorough understanding of the technical situation. Parts of the agency seem to have forgone their traditional engineering rigor in favor of 'when you have done your best effort, you are good to go.' This is not an appropriate philosophy for a high-performance organization that routinely puts the lives of its employees into high-risk situations."
While admitting the difficulty of achieving objectivity based on hindsight, "it appears to us that lessons that should have been learned (from the Challenger and Columbia disasters) have not been," the panel members wrote. "Perhaps we expected or hoped for too much. ... We expected up-front standards of validation, verification and certification. We expected rigorous and integrated risk management processes. We expected involved and insightful leadership from NASA headquarters. We were, overall, disappointed.
"There certainly are capable leaders to be found in the space shuttle program and throughout NASA. In our view, though, the return-to-flight effort, when taken as a whole, was not effectively led or managed. The absence of accountability, of having managers dedicated to program management processes, and of managers being assigned to programs only after demonstrating these skills are what we believe to be the causes of the surface-level symptoms we saw so often. In particular, leadership and managerial failures to set expectations and requirements and a failure to hold people accountable.
"These promoted a lack of engineering rigor, discipline and integrated risk assessment. Ultimately, this cost the program significant time and money while producing, in some areas, suspect, disappointing and/or inadequate results. Learning the lessons of these failures is important to NASA's future."
The group of seven focused on five broad areas: rigor, risk, requirements and leadership.
In this context, they wrote, rigor referred to an organization's use of and adherence to established standards and practices. According to Crippen and his co-authors, NASA's return-to-flight activities "often demonstrated a lack of standard processes, and, in some cases, simply a lack of any process at all."
"Once the Agency is on record as committed to a specific achievement, it becomes unpalatable to back off of that target for fear of appearing to fail," they wrote. "Instead, the adjustment of performance standards to allow a 'best-effort' provides the appearance that the goal has been met, but without the rigor and discipline necessary do so safely or completely. Before making commitments to specific achievements, NASA should fully consider how much progress is feasible, and motivate public and private expectations accordingly. When achievements are mandatory at first but become "goals" when the going gets tough, it sends a strong message to everyone that nothing is mandatory."
In hindsight, the panel members wrote, NASA's O'Keefe erred when he said the agency would implement the CAIB recommendations sight unseen. In so doing, NASA "short-circuited a more traditional and rigorous process."
"In our view, NASA leadership should not have foregone their traditional process of conducting detailed assessments of proposed changes," the panel members wrote. "In addition, before committing to a short-term launch date - that ultimately drove any number of important implementation decisions - NASA should have conducted detailed engineering assessments of the CAIB recommendations, traded them against other risk mitigation efforts, developed a clear understanding of the physics of foam loss, and devoted serious consideration of alternatives to "fix the foam;" e.g., orbiter hardening or a redesigned external tank. This would have allowed the program to determine how long a stand-down was necessary to implement a reasonable set of requirements to reduce the risk of flying the vehicle.
"As we reviewed the return-to-flight effort, it was apparent that there were numerous instances when an opportunity was missed to implement the best solution because of this false schedule pressure. As early as September 2003 the (task group) was told that specific technical activities were not being performed because they could not meet the schedule. Too often we heard the lament: "If only we'd known we were down for two years we would have approached this very differently..."
Another lack of rigor cited by the panel - one that also was cited by the CAIB - is the widespread use of PowerPoint presentations in lieu of actual engineering data and analyses.
"Several members of the Task Group noted, as had CAIB before them, that many of the engineering packages brought before formal control boards were documented only in PowerPoint presentations," the panel members wrote. "In some instances, requirements are defined in presentations, approved with a cover letter and never transferred to formal documentation. Similarly, in many instances when data was requested by the Task Group, a PowerPoint presentation would be delivered without supporting engineering documentation. It appears that many young engineers do not understand the need for, or know how to prepare, formal engineering documents such as reports, white papers, or analyses."
Another factor affecting the rigor of NASA's engineering processes is lax leadership, Crippen and his co-authors concluded. During a February design certification review, "a senior program manager commented that, 'It is no longer an important question as to whether or not any given item is certified. Some things won't be certified ... Items don't have to be certified to fly, and we can even get waivers for the safety cert if need be.' It was astounding that there was no rebuttal to this statement, even though the individual was not the most senior person at the table."
"This mocking of rigor sends a message to junior staff that it is acceptable to modify or avoid established processes," Crippen's team wrote. "As a result, both organizational and individual accountability fell by the wayside. Senior leadership should not trivialize established processes since their attitudes can be infectious, either to the benefit or detriment of the space shuttle program and the agency."
The panelists were especially critical of the way NASA manages and assesses risk, saying "we do not believe the risk management processes in place within the space shuttle program are significantly robust."
"We note that NASA managers also tend to confuse the exhaustive and laudable Integrated Hazard Report system with integrated risk management. The space shuttle program has executed a thorough review of all Integrated Hazard Reports on its own initiative and at a considerable cost in hours and funds. As commendable as this effort has been, the review of thousands of Integrated Hazards does not constitute, nor should it be a substitute for, a comprehensive integrated risk management approach.
"Throughout the return-to-flight effort, there has been a reluctance to appropriately characterize the risks inherent in the space shuttle program. As an example, it is has proven irresistible for some officials to characterize the modified external tank as 'safer,' the 'safest ever,' or even 'fixed,' when neither the baseline of the 'old' tanks nor the quantitative improvement of the 'new' design has been established. The tank may well be safer, but without adequate risk assessment based on objective evidence it is impossible to know."
In the area of requirements, the panel members raised the issue of waivers, a formal procedure that can allow a given system or component to fly even if it does not meet design specifications. NASA was criticized in the wake of the Challenger and Columbia mishaps for signing too many waivers instead of fixing the underlying problem. Today, Crippen's group charged, NASA gets around waivers by changing the terminology.
"The space shuttle program has been repeatedly cited for having too many waivers, and has become reluctant to add additional waivers, choosing instead to 'beat' the system by using other means," the panel members wrote.
In February, after numerous "open items" remained unresolved after a design certification review, "the ET project announced ... that it would document them in a 'Verification Limitations Document.' While it is laudable that the project at least captured the deficiencies in the certification (unlike some others), the stated rationale for this approach was that the Verification Limitations Document would negate the need for any waivers. This, in effect, clouds the number of requirements that are not being met and diminishes the certification of the external tank."