When a program like Microsoft Word or Excel "crashes," it means that something has gone seriously wrong during the program's execution. The operating system often recognizes that there is a serious problem and kills off the offending application in a clean way. When it does this, the operating system will say something cryptic like "fatal exception error" (and often display a large collection of hexadecimal numbers that are totally useless to you, the user, but might be of some use to the original programmer). The other way for a program to crash is for it to take the operating system down with it, meaning that you have to reboot.

Even though there is nothing you can do with the cryptic error messages, it might be nice to at least know what they mean! So let's go through the three most common:

  • Fatal exception error - An application program like Microsoft Word is made up of many layers and components. There is the core operating system, an operating system services layer, perhaps an encapsulation layer on top of the system services, hundreds of software libraries, internal function/class libraries and DLLs, and finally the main application layer. Most modern operating systems and languages (like C++, Java, etc.) support programming concepts known as exceptions and exception handling. Exceptions allow different layers to communicate problems to each other. For example, say that a program needs some memory, so it asks the operating system to reserve a block of memory. If the operating system is unable to honor the memory request (because the requested block is too big, or the system is low on memory, or whatever), it will "throw a memory exception" up to the layer that made the request. Various layers may continue to throw the exception upward. Somewhere along the line, one of the layers needs to "catch the exception" and deal with the problem. The program needs to say, "Wow -- the system is out of memory. I need to tell the user about this with a nice dialog box." If the program fails to catch the exception (because for some reason the programmer never wrote the code to handle that particular exception), the exception makes it all the way to the top of all the layers, and the operating system recognizes it as an "unhandled exception." The operating system then shuts down the program. Well-designed software handles all exceptions.
  • Invalid page fault - A program uses memory (RAM) to store data. For example, when you load a document into Microsoft Word, large parts of the file you are editing take up space in RAM. As the program needs memory, it requests blocks of memory of specific sizes from the operating system. The program remembers the location of each block it allocates using a "pointer." If the program tries to write data to a location beyond the end of a memory block, or if the program gets confused and tries to access a non-existent block of memory using an invalid pointer, the operating system can see that happening and generates a "page fault" or a "segmentation fault." The operating system shuts down the program because the program obviously does not know what it is doing.
  • Illegal operation - A microprocessor has a finite number of instructions it understands, and each instruction is represented by a number known as an "opcode." The opcode 43 might mean "add," the opcode 52 might mean "multiply," etc. If the microprocessor is executing a program and comes to an opcode that it does not recognize or that it cannot execute because of the current state it is in, then the microprocessor stops to complain. The operating system handles this complaint by shutting down the offending program. Illegal opcodes normally come from software jumping to a location in memory that does not contain valid program information.

All of these pr­oblems are caused by human error on the part of a programmer. The programmer is not diligent enough to catch an exception, or allows the program to access invalid memory. Sometimes, the root cause is incompetence or inexperience, but in many cases it is the complexity of today's programs. There are hundreds of exceptions and often millions of blocks of memory that a program manages in an intricate, layered environment. One false move and the application crashes -- software is very brittle. Testing finds many errors, but usually it does not find them all.