Summary of Third Meeting (Oct 29--30, 2009, Austin, TX)

Goals

This final study meeting had two goals: understanding the constituency groups that had not presented at previous meetings, and crafting a plan for how the results of the study would be written up and presented to the CCC and funding agencies. Presentations from the life-critical systems and infrastructure working groups outlined the key issues facing their communities and some overlaps with other communities. Program managers from the NRL, NSF and DARPA attended the meeting, and provided feedback on how the study's results could be made most useful to them. In particular, a number of participants suggested that the study group propose multi-agency program to fund cross-layer resilience research, and much of the later discussion focused on ways to pursue this suggestion.

Tuning up Story

We started the workshop as we have the other two workshops by telling the cross-layer visioning story for the series of workshops. Presenting the 10-20 slide story allows us to provide a basis for first time attendees and to refine the story we'll be telling funding agencies. As always, this presentation starts the discussion on reliability challenges and cross-layer reliability approaches. There were a number of suggestions that the audience provided:

Framing for public: Immuno-Logic

One of the breakout sessions focused on ways to sell this type of project to the lay people. This discussion focused on two different objectives: finding slogans that non-technical people would immediately jive to [HMQ: yes, I know I need a better word here.] and how to sell the story. The clear winning concepts on slogans was "Immunologic." Everyone felt that the immune system analogy was a good concept for cross-layer reliability, as both the human immune system and cross-layer reliability are multi-layer defense systems. Furthermore, lay people have a basic understanding of the immune system and understand how detrimental diseases that directly affect the functionality of the immune system, such as Leukemia and AIDS, are. Most people also have an understanding that the human immune system is innate and adaptive, which are two properties that we want computing systems to have. Finally, there is a synergy between between the human immune system and cross-layer reliability due to the physical nature of both systems. [HMQ: I wasn't certain what to do with the physical sub-bullet.]

There was also further discussion of how to sell the cross-layer reliability story. Many of the points brought up here were discussing methods of protecting US-based jobs and protecting us. The technology industry for several years felt pressure to outsource technical work to China and India. Many companies feel that outsourcing technical work to these countries is necessary to remain competitive price-wise in the global technology economy. This can be seen in both the increase in off-shore fabrication of silicon and the increase of off-shore electronics companies. Many people felt that increasing the reliability of US-based computing systems would help create value in US-built computing systems, increase the competitiveness of US-based companies, and increase jobs in the US technology market.

There is also a very strong story to be told in how our computing systems protect us. As stated in later sections, the cost of reliability failures in automobiles, medical implantable devices, and the energy infrastructure can be quite high. Reliability failures in these arenas can be expensive both in terms of human lives lost, but economically, too. Fairly trivial reliability failures in medical implantable devices can cause lead to surgery to have the device explanted and replaced with a new device. In 2003 a cascading failure in the OH power infrastructure ended up affecting the entire northeastern US and Canada, which left 55 million people without power, played a role in 11 fatalities, and cost $6B [http://en.wikipedia.org/wiki/Northeast_Blackout_of_2003, http://www.scientificamerican.com/article.cfm?id=2003-blackout-five-years-later]. Finally, we rely heavily on computational support for persistence surveillance for treaty monitoring of both the comprehensive test ban treaty and environmental treaties, as well as warfighter support for the wars in Iraq and Afghanistan. Reliability failures in these arenas can cause fatalities in the battlefield, lead to bad policy decisions, and create confusion in the geo-political arena [http://en.wikipedia.org/wiki/South_Atlantic_Flash].

Government/Strategy

[HMQ: My notes are weak on this topic and Andre's notes a little cryptic, so I could use Andre's help on this one.]

Education

There was a spirited discussion on education, as many people felt that resilence is not being taught currently in the EE, CS curricula. One participant pointed out that system reliability is taught as a discipline to mechanical and civil engineers, so there is a precedence of teaching these types of ideas to engineers. Many people also pointed out the we needed to start thinking about how to teach system reliability to computer engineers, including how to work on the K-12 pipeline with robotics and cubesat projects and competitions. Several people also felt that could be competitions tied to conferences, as the branch predictor competition was tied to HPCA [HMQ: ?]. We have also added a new wiki page to the relxlayer website to foster continued discussion regarding educational opportunities [http://www.relxlayer.org/Education].

Research Organization

Life Critical

Infrastructure

Roadmap

Metrics

Addressing Challenges? (orphaned point)

Next Steps / Our path forward