If I understand it, the alarm during the first Lunar descent was due to the computer being overloaded by too much RADAR data. That sounds like something that they would have thought to simulate, but it seems that everyone was surprised by it. Why didn't they simulate that particular type of failure (computer overflow in general, and RADAR-caused overflow in particular?)
-
2You are assuming they didn't have a plan for it, that's not a good assumption. – GdD Dec 23 '20 at 11:50
-
35Anyone who's tried to bomb-proof their software only to find a problem 3 months later knows the answer here :-( – Carl Witthoft Dec 23 '20 at 12:12
-
3istr they did pretty much that exact failure in a sim. Will look for reference. – Organic Marble Dec 23 '20 at 13:32
-
@OrganicMarble Yeah, looks like they ran into it in a sim on July 5, which prompted the Guidance controllers to put together a list of how to handle the various alarms. (Source is Apollo: The Race to the Moon). Want to put together an answer, or should I? – DylanSp Dec 23 '20 at 13:50
-
@DylanSp go for it! – Organic Marble Dec 23 '20 at 13:54
-
1Stanislav Lem's Pilot Pirx story Ananke is about this case. It may provide fictitious reasons. – Hans-Peter Stricker Dec 24 '20 at 10:39
-
The search space is simply too big – Hakaishin Dec 24 '20 at 17:00
-
2@CarlWitthoft And had to write software to meet some high ranking executive's arbitrary deadline. "If Kennedy wants to get to the Moon before the decade is out, he can come down here and debug the cursed thing himself!" -- Probably some Apollo software engineer. – Schwern Dec 25 '20 at 06:05
-
@CarlWitthoft Or had someone suggest "we could (manually) test more" as a response to how bugs made it to production. You're telling me you wanted a tester to test (stupidly obscure long chain of circumstances) because we added in one field elsewhere on the page that was completely unrelated? (Literally unrelated, we just had an unlucky choice of what data we used to validate with in production that lead to discovering a existing bug that was ~2+ years old.) – user3067860 Dec 28 '20 at 12:39
-
1You might find this long video interesting: Light Years Ahead | The 1969 Apollo Guidance Computer. – Fred Mar 22 '23 at 23:03
4 Answers
They did simulate the debugging alarms, such as the 1201 and 1202. From Apollo: The Race to the Moon*:
On July 5, just eleven days before the launch, [...] the scenario included one of the computer alarms that [Jay] Honeycutt (one of the simulation supervisors) had discovered. When the alarm went off, the controllers didn't know what to do with it.
The Guidance controllers subsequently put together a list of the different alarms and how they should be handled. When the 1202 and 1201 alarms occurred on Apollo 11, Steve Bales (Guidance controller) and Jack Garman (in the Guidance back room) knew how to handle the alarms; as long as they weren't continuously firing, the descent was still ok to proceed.
As for why the general issue of executive overflow (the guidance computer being overloaded) wasn't more thoroughly tested: as much as the Apollo program simulated and tested, there was still a limit to the number of different scenarios they could test in the time they had. Again from Apollo:
[Identifying and learning how to handle the computer alarms] was a pain in the ass, many of [the Guidance controllers] thought, because there were so many failure modes on a descent that were much more likely to happen.
* Chapter 24, end of section 3
- 1,958
- 1
- 14
- 25
-
13Simulating real radar data in a realistic way must have been a real challenge, if it was even possible. – GdD Dec 23 '20 at 16:18
As DylanSp's answer notes, the 1201/1202 alarms were simulated, but the details of the computer overload that caused them on the Apollo 11 flight were complex, and were not specifically simulated prior to the mission.
According to Mindell's Digital Apollo:
The trouble was that the rendezvous radar and the rest of the guidance system had different electrical power supplies. They both ran on alternating current of the same frequency, but had different phases (i.e. their alternating sine waves were out of sync). When the change in the [rendezvous radar mode] switch procedure was tested in the lab, technicians connected both to the same power supply, which caused them to run in phase, even though they would be out of phase in the spacecraft...
On Apollo 11, the power supplies on the LM fell into a particularly unfortunate phase angle. Hence the computer and radar were not in sync, causing the angle counters on the rendezvous radar to constantly increment or decrement in response to random electrical noise, sending nearly the maximum rate of data to the computer. The computer struggled to increment or decrement its counters for tracking the radar angles, which used up about 15 percent of its processing time.
More details can be found in Don Eyles' paper titled Tales From The Lunar Module Guidance Computer.
NASA went to great lengths to realistically test and simulate as much as they were able, but a few issues like this did slip through the cracks.
- 168,364
- 13
- 593
- 699
The lack of phase synchronizing of the two power supplies in the Rendezvous computer wasn't simulated or anticipated because the engineering documentation was in error. It did not require phase synchronization, only frequency locking. You don't simulate a problem that isn't defined as a problem :-)
- 11
- 1
-
2Do you have a reference to back up your assertions? Specifically, that "the engineering documentation was in error." Also, as stated elsewhere on this site, the problem was found twice independently prior to launch https://space.stackexchange.com/a/37372/6944 – Organic Marble Mar 22 '23 at 21:31
-
2Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center. – Community Mar 22 '23 at 21:43
The story of the simulation - the last before Apollo 11 - where Mission Control learned about Program Alarms is beautifully told in Gene Kranz's "Failure Is Not an Option", Chapter 15, the best example I know of the supremacy of preparation and training against other more expedite ways of doing stuff. In the simulation they aborted the mission because of the alarm. That would probably be what would happen to Apollo 11 if the simulation would not have happened.
- 330
- 3
- 5