Question
How do we organize, manage, and analyze layering for cooperative fault mitigation?
Summary
A cross-layer reliable system must be able to communicate information and coordinate actions across levels. Broadly, this question tries to address the nature of how these levels should interact.
Sub-Questions
- What should new contracts and interfaces look like?
- What information is useful to reflect up the stack?
- What controls on lower levels should be exposed and how?
- What information is it useful for higher-levels to pass down?
- How do we evaluate and compose techniques across levels?
- How do we engineer and analyze adaptation and repair control loops across layers?
Relevant Scenarios
Workshop Materials
Existing Work
add additional references here
Comments
- We want to be careful about laying an increasing burden on the application programmer.
- Software is not a single piece, but many layers and by treating them differently, we can achieve more than lumping them together. To enumerate: at the bottom virtual machine monitor/hypervisor, then operating system, then C++ lib/runtime, then app frameworks (J2EE, ruby rails, python/perl interpretter, and finally the actual application. Also, many new applications have a multiple tiers with multiple instances (front-end web servers talking to app servers talking to backend database).
- Some handling and information can be added by the compiler. Even for cases where we do communicate to/from the application, a number of things can be inserted automatically without adding a burden to the programmer.
- Fault model coercion: if exposing errors to higher layers, it may be useful to coerce faults into a small number of equivalence classes to reduce the number of errors/behaviors that upper layers have to worry about. This is one of the reasons distributed systems often like fail-stop semantics; this reduces a large class of potential problems to one kind.
To comment, please add another bullet to this list.