Top
Best
New

Posted by speckx 1 day ago

How NASA built Artemis II’s fault-tolerant computer(cacm.acm.org)
589 points | 219 commentspage 3
PunchyHamster 8 hours ago|
I wonder how they made the voted-answer-picker fail-resistant
stevepotter 12 hours ago||
It would be nice to see some of the software source. I’m super interested and i think I helped pay for it
nickpsecurity 20 hours ago||
The ARINC scheduler, RTOS, and redundancy have been used in safety-critical for decades. ARINC to the 90's. Most safety-critical microkernels, like INTEGRITY-178B and LynxOS-178B, came with a layer for that.

Their redundancy architecture is interesting. I'd be curious of what innovations went into rad-hard fabrication, too. Sandia Secure Processor (aka Score) was a neat example of rad-hard, secure processors.

Their simulation systems might be helpful for others, too. We've seen more interest in that from FoundationDB to TigerBeetle.

0xblinq 8 hours ago||
They should have also built a fault tolerant toilette.
spaceman123 19 hours ago||
Probably same way they’ve built fault-tolerant toilet.
jeron 16 hours ago|
ctrl+f toilet, thank you for already commenting this
RobRivera 7 hours ago||
2 outlooks.

2.

Two.

gambiting 15 hours ago||
So honest and perhaps a bit stupid question.

Astronauts have actual phones with them - iPhones 17 I think? And a regular Thinkpad that they use to upload photos from the cameras. How does all of that equipment work fine with all the cosmic radiation floating about? With the iPhone's CPU in particular, shouldn't random bit flips be causing constant crashes due to errors? Or is it simply that these errors happen but nothing really detects them so the execution continues unhindered?

EdNutting 14 hours ago|
They’re not mission-critical equipment. If they fail, nobody dies.

They’re not radiation hardened, so given enough time, they’d be expected to fail. Rebooting them might clear the issue or it might not (soft vs hard faults).

Also impossible to predict when a failure would happen, but NASA, ESA and others have data somewhere that makes them believe the risk is high enough that mission critical systems need this level of redundancy.

gambiting 14 hours ago||
>>They’re not mission-critical equipment. If they fail, nobody dies.

Yes, for sure, but that's not my question - it's not a "why is this allowed" but "why isn't this causing more visible problems with the iphones themselves".

Like, do they need constant rebooting? Does this cause any noticable problems with their operation? Realistically, when would you expect a consumer grade phone to fail in these conditions?

mrheosuper 5 hours ago|||
A lot of "space-rated" components come from consumer space, with certification that it can work in space.

IIRC the Helicopter on Mars using the same snapdragon CPU in your phone.

Also, bit flip can happen without you knowing. A flip in free ram, or in a temp file that is not needed anymore won't manifest into any error, but then, your system is not really deterministic anymore since now you rely on chance.

EdNutting 13 hours ago|||
Random bit flips due to radiation are infrequent - the stat is something like one but flip per megabyte per 40,000 data centre RAM modules per year - ie extremely uncommon, but common enough to matter at scale.

Space is a harsher environment but they’re only up there for like a week. So, if there were an incident, it would be more likely to kill the devices, but it’s not very likely to happen during the short period of time (while still being more likely than on earth’s surface).

That said, part of the point of them taking these devices up is to find out how well they perform in practice. We just don’t really know how these consumer devices perform in space.

It will be interesting to see the results when they’re published!

SeanAnderson 19 hours ago||
Typo in the first sentence of the first paragraph is oddly comforting since AI wouldn't make such a typo, heh.

Typo in the first sentence of the second paragraph is sad though. C'mon, proofread a little.

tux 18 hours ago||
I think everyone should now make mistakes so we ca distinguish human vs ai.
zeristor 15 hours ago||
This can be optimised for no doubt, adversarial training is like that
ck2 9 hours ago||
if I remember correctly the space shuttle had four computers that all did the same processing and a fifth that decided what was the correct answer if they all didn't match or some went down

can't find a wikipedia article on it but the times had an article in 1981

https://www.nytimes.com/1981/04/10/us/computers-to-have-the-...

apparently the 5th was standby, not the decider

pbronez 7 hours ago|
The Artemis computer handles way more flight functions than Apollo did. What are the practical benefits of that?

This electrify & integrate playbook has brought benefits to many industries, usually where better coordination unlocks efficiencies. Sometimes the smarts just add new failure modes and predatory vendor relationships. It’s showing up in space as more modular spacecraft, lower costs and more mission flexibility. But how is this playing out in manned space craft?

More comments...