= Final Report = == Content Assignment == === Program Vision === * What are you trying to do? [''vision'' -- '''''propose assign''''': AMD] * Why now? [''vision'' -- '''''propose assign''''': AMD] * How is it done today? [''vision'' weak; ''techniques'' DATE paper has some more; maybe adequate with ''proposal'' -- '''''propose assign''''': NPC] * Trends? [''roadmap'' + '' vision '' except detail retained below -- '''''propose assign''''': AMD] * TODO: connect low-level fault rate to high-level impact (people will see) ['''TODO'''] * primarily this is something like: "this fault (variation?) rate will cause this problem in the future" * spend X% of time rebooting/recovering ? * crash after YY minutes of operation? * spend Z% on energy overhead? * die after Q (too few) days? * horror stories (don't necessarily speak to grounding implications of fault rates for lay public---only grounding the impact of failures of electronic systems) http://www.philstar.com/Article.aspx?articleId=525112&publicationSubCategoryId=200 http://en.wikipedia.org/wiki/Northeast_Blackout_of_2003 http://en.wikipedia.org/wiki/Vela_Incident http://csem.engin.umich.edu/muri/MURIreport2004.pdf http://csem.engin.umich.edu/muri/MURIreport2004.pdf * What can we accomplish? ['''''propose assign''''': HMQ] * Build reliable systems from unreliable components [''vision''] * Ground goals ['''TODO? -- not really have'''] * What's new? (Ideas and promising directions) [''vision'' -- '''''propose assign''''': AMD] * Why do this? [''vision'' maybe some from ''proposal'' except bullet retained -- '''''propose assign''''': HMQ] * Specific big wins (challenges overcome) from focus groups? [''probably falls out of mission impact work...''] * feasible to fly commercial? and/or have any access to most advanced technology? (close commercial/aerospace component gap?); allow areospace to exploit modern electronics? * advanced technology safe for drive-by-wire? * enable larger (more components, computation) medical devices? * enable supercomputers able to solve XXX problems? [''maybe can come form BlueGene example? run more than 4 days?''] * reduce energy used by computers... * ??? review focus group output and select appropriate to highlight here ??? * Why government leadership? [''execsums'' and ''proposal'' have maybe 70% of this] * leadership: cross-industry * economic * safety * security * Challenge problems and areas of pain ['''''propose assign''''': AMD] * Common/cross-cutting challenges [''vision''] * By focus group (5) ['''TODO assemble'''] * Commercial * Address growing reliability challenge with small enough overhead to avoid negating benefit of scaling * Reduce energy per operation while retaining reliable operation * Maintain or extend lifetimes in face of increasing wear effects * Economically address demand for components with different reliability needs * Navigating complex, multidimensional design space * Aerospace * System lifetimes >> changes in political and scientific need * Navigating complex, multidimensional design space * Widening gap between commercial and mil/areo components * Design for (uncommon) worst-case environment * Bottleneck in testing * Focus on part reliability over system reliability * Large Scale * Overhead required to achieve reliability using current and traditional fault-tolerance approaches is too high. * Life Critical * Infrastructure * Big Science Questions? [tuneup from ''proposal'' -- '''''propose assign''''': AMD] * Mission Impacts? [''working out for technical execsum...'' -- '''''propose assign''''': HMQ] * security * cyberphysical * satellite * supercomputer * green? * ??? * Education ['''TODO''' -- '''''propose assign''''': HMQ] * what's missing in curriculum just to deal with where we are today (not educating EE/CS types about reslience) * what's needed to go with this / how revolutionize curriclum * Critical Questions? (big risk items? ... more strategic questions?) ['''TODO''' -- '''''propose assign''''': AMD] * enable concurrent research in understanding low-level upset, fatigue effects with ever changing technology along with high-level mitigation * manage developer burden (avoid increasing) * must be careful not to add complexity that makes things worse (e.g. pushing more complexity into software without adequate validation that the software will handle appropriately) * how deal with legacy software (don't want to take this as an absolute mandate that inhibits innovation, but should be some thinking about how to handle things without complete rewrite) * ??? others * Metrics, Goals, Measure and manage programs ['''??? Missing Metrics report ... might be some in DATE metrics paper? probably still a gap''' -- '''''propose assign''''': AMD] * Goals of metrics * assess if proposed research proposing to attack the right problems? * measure if research making progress on solving the problem? * Some possible, primary metrics * Energy/Op at noise rate and performance target (Noise rate: defects, variation wear, transients) * Post-fab adaptability to range of noise rates * Timeliness and quality of adaptation * Recommendations from metrics group * Research Organization and Infrastructure (see discussion starter ResearchOrgInfra) [''carter working on initial draft'' -- '''''propose assign''''': NPC] * Examples and Illustrative Scenarios [-- '''''propose assign''''': NPC] * Processor [probably in ''techniques/examples'' DATE paper] * SoC ['''TODO''' -- see how much coverage is in ''techniques'' DATE paper] * High-level software ['''TODO'''] * Layer discussion and community contributions: probably good to have a clear description of the layers and callout the wide range of communities that could participate in this research ['''TODO''' -- '''''propose assign''''': AMD] * Process and Participants * summarize activities of group (meetings, wiki, focus groups...) ['''TODO''' -- '''''propose assign''''': HMQ] * [[Participants|comprehensive list of participants]]