Final Report
Content Assignment
Program Vision
What are you trying to do? [vision -- propose assign: AMD]
Why now? [vision -- propose assign: AMD]
How is it done today? [vision weak; techniques DATE paper has some more; maybe adequate with proposal -- propose assign: NPC]
Trends? [roadmap + vision except detail retained below -- propose assign: AMD]
TODO: connect low-level fault rate to high-level impact (people will see) [TODO]
- primarily this is something like: "this fault (variation?) rate will cause this problem in the future"
- spend X% of time rebooting/recovering ?
- crash after YY minutes of operation?
- spend Z% on energy overhead?
- die after Q (too few) days?
- horror stories (don't necessarily speak to grounding implications of fault rates for lay public---only grounding the impact of failures of electronic systems)
- primarily this is something like: "this fault (variation?) rate will cause this problem in the future"
What can we accomplish? [propose assign: HMQ]
Build reliable systems from unreliable components [vision]
Ground goals [TODO? -- not really have]
What's new? (Ideas and promising directions) [vision -- propose assign: AMD]
Why do this? [vision maybe some from proposal except bullet retained -- propose assign: HMQ]
Specific big wins (challenges overcome) from focus groups? [probably falls out of mission impact work...]
- feasible to fly commercial? and/or have any access to most advanced technology? (close commercial/aerospace component gap?); allow areospace to exploit modern electronics?
- advanced technology safe for drive-by-wire?
- enable larger (more components, computation) medical devices?
enable supercomputers able to solve XXX problems? [maybe can come form BlueGene example? run more than 4 days?]
- reduce energy used by computers...
- ??? review focus group output and select appropriate to highlight here ???
Why government leadership? [execsums and proposal have maybe 70% of this]
- leadership: cross-industry
- economic
- safety
- security
Challenge problems and areas of pain [propose assign: AMD]
Common/cross-cutting challenges [vision]
By focus group (5) [TODO assemble]
- Commercial
- Address growing reliability challenge with small enough overhead to avoid negating benefit of scaling
- Reduce energy per operation while retaining reliable operation
- Maintain or extend lifetimes in face of increasing wear effects
- Economically address demand for components with different reliability needs
- Navigating complex, multidimensional design space
- Aerospace
System lifetimes >> changes in political and scientific need
- Navigating complex, multidimensional design space
- Widening gap between commercial and mil/areo components
- Design for (uncommon) worst-case environment
- Bottleneck in testing
- Focus on part reliability over system reliability
- Large Scale
- Overhead required to achieve reliability using current and traditional fault-tolerance approaches is too high.
- Life Critical
- Infrastructure
- Commercial
Big Science Questions? [tuneup from proposal -- propose assign: AMD]
Mission Impacts? [working out for technical execsum... -- propose assign: HMQ]
- security
- cyberphysical
- satellite
- supercomputer
- green?
- ???
Education [TODO -- propose assign: HMQ]
- what's missing in curriculum just to deal with where we are today (not educating EE/CS types about reslience)
- what's needed to go with this / how revolutionize curriclum
Critical Questions? (big risk items? ... more strategic questions?) [TODO -- propose assign: AMD]
- enable concurrent research in understanding low-level upset, fatigue effects with ever changing technology along with high-level mitigation
- manage developer burden (avoid increasing)
- must be careful not to add complexity that makes things worse (e.g. pushing more complexity into software without adequate validation that the software will handle appropriately)
- how deal with legacy software (don't want to take this as an absolute mandate that inhibits innovation, but should be some thinking about how to handle things without complete rewrite)
- ??? others
Metrics, Goals, Measure and manage programs [??? Missing Metrics report ... might be some in DATE metrics paper? probably still a gap -- propose assign: AMD]
- Goals of metrics
- assess if proposed research proposing to attack the right problems?
- measure if research making progress on solving the problem?
- Some possible, primary metrics
- Energy/Op at noise rate and performance target (Noise rate: defects, variation wear, transients)
- Post-fab adaptability to range of noise rates
- Timeliness and quality of adaptation
- Recommendations from metrics group
- Goals of metrics
Research Organization and Infrastructure (see discussion starter ResearchOrgInfra) [carter working on initial draft -- propose assign: NPC]
Examples and Illustrative Scenarios [-- propose assign: NPC]
Processor [probably in techniques/examples DATE paper]
SoC [TODO -- see how much coverage is in techniques DATE paper]
High-level software [TODO]
Layer discussion and community contributions: probably good to have a clear description of the layers and callout the wide range of communities that could participate in this research [TODO -- propose assign: AMD]
- Process and Participants
summarize activities of group (meetings, wiki, focus groups...) [TODO -- propose assign: HMQ]