Ground teams struggle to save Mars orbiter from itself
BY STEPHEN CLARK
Posted: November 7, 2009
Although engineers are still weeks from uplinking new command logic to eliminate an unlikely, but potentially fatal, scenario jeopardizing the Mars Reconnaissance Orbiter, the mission's project manager said Friday he is confident the $720 million mission will resume soon.
Engineers are pursuing two paths of analysis to reach a solution to the reset problem.
One group is trying to devise a fix to be uplinked for the spacecraft to tell itself it is at Mars. Another is investigating the root cause of the events.
In one far-fetched but plausible scenario, MRO could revert to its pre-launch mode and attempt to make a hardline connection with ground controllers more than 100 million miles away.
"If we had the same kind of resets that we've seen four of this year, but you get more severe ones and you get them too close together, you could have the vehicle forget that it's in mapping orbit around Mars and instead think that it's still on the launch pad and only communicate through an umbilical cable, which isn't long enough to get there anymore," Erickson said.
Four major causes for the resets are being studied, according to Erickson.
The candidates include a momentary glitch removing power from a component in the computer, a problem with reference voltage, radiation, or a grounding issue.
"It's more than likely to be one of those four things or a flavor of them," Erickson said.
According to Erickson, it will be several weeks to a month before MRO is ready to gradually return to normal operations. The spacecraft has not been conducting science observations since its last computer reset Aug. 26.
Since late August, MRO has been in safe mode with its solar array tracking the sun for power and its antenna pointed at Earth to maintain communications.
"It's the safest condition for the spacecraft, so we said just leave it there until we get a better handle on this," said Doug McCuistion, director NASA's Mars exploration program.
In early September, officials said they expected it to take a few weeks to recover MRO and resume operations, but it has now spent nearly three months in safe mode.
Erickson said engineers are testing the new algorithms on the ground before uplinking them to MRO.
"We take our jobs of protecting this vehicle really seriously," Erickson said. "It's a really important asset to the American people. When we find something like this, we try to make sure that it can't happen (again)."
The worry is that MRO could experience two computer resets, more severe than any of the glitches so far, on its primary and redundant control strings within one minute of each other.
"The first one has to wipe out all information on the side of the spacecraft it's on now, and cause a side swap to the other side," Erickson said. "And then within a minute, we've got to have the same thing happen, where it wipes out all the information about what mission phase it's in."
It takes about a minute for the second string to repopulate the first string with information on MRO's mission phase.
"So you could have resets that are 1 minute and 5 seconds apart and it's not a problem," Erickson said.
The computer resets began in February, followed by another anomaly in June and two in August. The increasing frequency of the events concerned NASA managers.
The fix being designed by NASA involves changing data parameters in MRO's computer. When the spacecraft reboots, it searches a table in its nonvolatile memory in the command and data handling unit's computer module interface card, or CMIC, to determine if the mission is in pre-launch, launch, cruise, orbit insertion, or mapping mode.
"In all the places where it's going to look, we have inserted only the possibility to be in mapping," Erickson said.
Engineers are leery of writing to the nonvolatile memory, which is similar to flash, so officials are being cautious to ensure the fix will not cause additional problems.
"They've been working hard on the testbed to try to understand the interactions of the software and the CMIC," McCuistion said.
The new data parameter logic will likely be loaded into MRO's computer before officials identify the most likely root cause of the resets. It could take longer for engineers to close out the fault tree.
"They're having these resets due to a problem in the actual command and data handling system," Erickson said. "It's a hardware problem. These resets are happening so fast that they leave virtually no trace of what's causing them to happen."
"It's happening in the nanoseconds to milliseconds range, so there's nothing. Just some indication that the event was 'this trigger happened' and that's it," Erickson said.
In a teleconference with the NASA Advisory Council last month, McCuistion said the problem could stem from a parts aging issue.
Launched in 2005, MRO arrived at Mars in March 2006 and completed its primary phase of science operations in November 2008.
Once the probe its given a clean bill of health, it will restart science observations and play a larger role as a communications relay station for NASA's Spirit and Opportunity rovers. MRO will also be a significant part of the communications plan for the Mars Science Laboratory, or Curiosity, rover when it arrives in 2012.
NASA says MRO has returned more scientific data than all previous Mars missions combined. The orbiter's six instruments include a powerful telescope-like high-resolution camera, atmospheric sensors, an imaging spectrometer and a radar to probe the Martian subsurface.