Infrastructure Challenges

Nature

Our modern infrastructure is highly computerized. This infrastructure includes our power grid, our building control (heating and cooling, fire supression, security), and our telecommunications (phone, internet, cable). All of these are directly or indirectly life critical---communication are necessary for emergency response, power is required to run emergency and hospital equipment as well as to keep our environments livable, and much of building control is associated with keeping us safe. Because of the additional capabilities and economics they provide, the trend over time has been to increase the computational components of these systems. Automation responds more quickly and consistently than humans and makes our systems run more efficiently. All this means we must provide high reliability for an ever growing computerized system.

Because of our increasing dependence on computational and communication infrastructure (networking, computing, cloud computing---the combination of the network and the compute servers), outages also have a large, negative economic impact. Many modern workplaces grind to a halt when the network is out, resulting in large costs (e.g., consider the professional salaries of the impacted populace for the period of the outage, alternately consider the lost sales and reputation due to the outage). In cases where the computation controls a larger physical plant (e.g. power, heating, cooling), failures of the computation to provide appropriate control could endanger the controlled plan (e.g. allow a power line to overload or a chemical reaction to proceed out of control).

Infrastructure systems tend to be highly distributed. In many cases, their spatial distribution is essential to the services they provide---we must get power out to a large area, communication is about connecting distant people and machines, and building control must reach into all spaces in a building. This means computations cannot be centralized in a carefully controlled and environment and are less physically accessible. It also means that system upgrades do not occur uniformly and the system as a whole will almost always be composed of many different generations of technology.

Computation in some of these infrastructure roles (e.g. power, heating, cooling) is relatively inexpensive compared to the plant the computation is monitoring or controlling. As a result, this class of system has been able to tolerate larger overhead costs for reliability (e.g. if the computing is only 1% of the cost of the system, duplicating or triplicating it may only increase the system costs by 1--2%).

Availability is a key metric. What is the fraction of down time? Short service failures (few milliseconds of network outage, few seconds of heating or cooling control) may be tolerable, so infrastructure systems care both about the frequency of upsets and the time to recover. Long outage events must be very infrequent, whereas quick recovery events can occur at higher frequencies.

Increasing efficiency demands greater computational control---either to control more things or find solutions closer to the optimum. This increases computational needs, but not excessively. Many Green initiatives to reduce energy consumption rely on more sophisticated computation and monitoring to control energy usage.

Much of the economics of the computing infrastructure (perhaps more so in the context of networking and telecommunications) come from riding the main-stream technology wave. So, while computing needs in some infrastructural areas (perhaps power and building control) might be satisfied with a freeze at 180nm technology, this now places a premium cost on maintaining access to older technology---one that will make the electronic parts even more expensive. The coupling and volume benefits between industries remains strong.

Challenges

Challenges/Infrastructure (last edited 2010-02-08 23:39:25 by AndreDeHon)