The Future of Software: the reliability of interactionIf Moore's prediction continues to hold, the speed of the average PC will be approaching 100 Giga Hertz in 2010 and it will have a hard-disk capacity measured in Terrabytes. Dial-up modem connections will be replaced by high-speed network connections that, like the power grid today, will reach into every office, home, and hotel room.So what's the problem? All this sounds good. Processors will be smaller and faster, and there will be a lot more of them, hiding in everything from cars to house-hold appliances. Far more important than this, though, is that all these powerful embedded devices will likely be connected; they will be able to exchange data, and they will be able to interact. We already have embedded processors in cars, linked by small "car-area" networks. But what if these processors start communicating over a broader network, say with nearby cars? The potential is enormous. In remote areas, the embedded processors could warn the driver and suggest a route to the nearest (open) gas station before the amount of fuel left in the car becomes insufficient to do so. We can also imagine collision avoidance systems that are programmed to take control of a car when a collision seems imminent. The embedded processors in the cars that are about to collide can negotiate a strategy that avoids the collision, without risking a roll-over. If this indeed comes true, the reliability of the car you're driving in 2010 will depend on the reliability of the software in the other cars around you. Your airbags might inflate, for instance, if the software in a nearby car mistakingly warns your car that it is about to crash into it. The significant difference between 2010 and today, then, will be that in 2010 we will have an unimaginable number of powerful computing devices that are all connected, and that can potentially interact. If history is a guide, each of these powerful devices may be controlled by multi-million line programs. Who is going to write this software, and how can we make sure that the resulting systems are reliable? Your car may now fail not because of faulty software in your own car but because of faulty software in someone else's car that interacts with yours in an unexpected way. Which problems will have to be solved before we can trust these types of systems? Many believe that today we are in the midst of a 'software crisis.' The first time that this was expressed was in the late sixties, when large applications were perhaps 10,000 to 100,000 lines of code. The feeling of operating in crisis mode has never left us, and despite our best wishes for this to be different, the basic methods that are used in software development have not changed much either. So how is it that today one can indeed write million line programs, and, at least some of the time, get them to work reliably? A modern telephone switch, for instance, is controlled by multi-million line programs. Undeniably, these systems work, even meeting very strict reliability requirements: the average down-time of a standard switch in the telephone network is less than 3 minutes per year. This is achieved with a good requirements process, a careful system of checks and balances, and perhaps most importantly through good tool support. The quality of the software development tools that programmers use today to develop and test code is vastly better than it was thirty years ago. Would those tools be able to carry us over to 2010? Unfortunately, something more will be needed. Today's tools support a style of programming that is at odds with the needs of the types of applications that will soon dominate. They provide good support for the development of sequentially executing code, but have have only limited support for multi-threading and concurrency. Today it is almost impossible to test or to debug a truly distributed application. The tools that may soon help to fill this gap already exist, at least in prototype form. At Bell Labs, for instance, we are working on a tool called 'Spin' that is designed to help the programmer to systematically and reproducibly find the inevitable bugs in distributed system applications. Spin is formally called a 'model checker.' That is, Spin works with a mathematical model of an application, and contains efficient algorithms for analyzing such models. This type of analysis has long been considered to be too computationally expensive to be applied broadly in industrial software development. But, this is quickly changing. In a recent project at Bell Labs, code-named 'FeaVer,' we used an automated model extraction technique to generate the Spin models directly from the source code, written in ANSI-C, of a newly designed telephone switch. The Spin models could be analyzed thoroughly to determine which of the call-processing features were implemented correctly and which were not. The model checker automatically finds and reports any system execution that can violate any required software feature. The error executions are reported as feasible interleavings of the executions of all participating concurrent processes in the system that contribute to the fault. The checking process is relatively fast. Within about one hour of clock time, the system can check the call-processing code for compliance with hundreds of complex feature requirements, and report the bugs back to the user. But who has time to wait even one hour? Our goal is to built an interactive code checking system. We have a few ideas left to try to bring about such performance, but even supposing that they all would fail, Moore's law can all by itself melt away the time requirements, and reduce hours of computation to seconds, in one or two decades. It is relatively safe prediction, therefore, that within the next decade programmers will gain access to a new breed of smart source code analyzers for both sequential and distributed systems code. These analyzers can be embedded into standard program development environments, and they can work without the programmer being aware of it. These analyzer demons will be able to warn the programmer virtually instantaneously when subtle bugs get introduced into the code -- the true ideal of a programmer's omniscient helper. |
Dr. Gerard J. Holzmann Computing Principles Research Bell Labs, Lucent Technologies Murray Hill, NJ, USA