It’s important that database data be retrievable in a reasonable amount of time. This is particularly an issue for very large databases. The speed with which information is retrieved from a database depends on a number of factors. The size of the database and the speed of the hardware it is running on are obvious factors. Perhaps most critical, however, is the method used to access table data, which depends on the way the data is structured on the storage medium.
How quickly a system can retrieve desired information depends on the speed of the device that stores it. Different storage devices have a wide range of speeds, spanning many orders of magnitude. For fast retrievals, the information you want should reside on the fastest devices. Because it is difficult to predict which data items will be needed next, you can’t always make sure the data you are going to want next will be contained in the fastest storage device. Some storage allocation algorithms are nonetheless quite effective at making such predictions.
There is a hierarchy of storage types, ranging from the fastest to the slowest. In general, the faster a storage device is, the smaller its capacity. As a consequence, it is generally not possible to hold a large database entirely in the fastest available storage. The next best thing is to store that subset of the database that is most likely to be needed soon in the faster memory. If this is done properly, the overall performance of the system will be almost as fast as if the entire memory was as fast as the fastest component of it. Here are the components of a typical memory system, starting with the fastest part:
* Registers: The registers in a computer system are the fastest form of storage. They are integrated into the processor chip, which means they are implemented with the fastest technology and the delay for transfers between the processing unit and the registers is minimal. It is not feasible to store any portion of a database in the registers, which are limited in number and in size. Instead, registers hold the operands that the processor is currently working on.
* L1 cache: Level 1 cache is typically also located in the processor chip, but is not as intimately integrated with the processor as are the registers. It is the fastest form of storage that can store a significant fraction of a database.
* L2 cache: Level 2 cache is generally located on a separate chip from the processor. It has greater capacity and is usually somewhat slower than the L1 cache.
* Hard disk: Hard disk storage has orders of magnitude more capacity than does cache and is orders of magnitude slower. This is where databases are stored. Registers, L1 cache, and L2 cache are all volatile forms of memory. The data is lost when power is removed. Hard disk storage, on the other hand, is non-volatile. The data is retained even when the system is turned off. Because hard disk systems can hold a large database and retain it when power is off or interrupted, such systems are the normal home of all databases.
* Offline storage: It is not necessary to have immediate access to databases that are not in active use. They can be retained on storage media that are slower than hard drives. A sequential storage medium such as magnetic tape is fine for such use. Data access is exceedingly slow, but acceptable for data that is rarely if ever needed. Huge quantities of data can be stored on tape. Tape is the ideal home for archives of obsolete data that nevertheless need to be retained against the day when they might be called upon again.
Business users who just want to retrieve the information they need from a database need not concern themselves with how the DBMS does it. However, it is good to know what is going on “under the covers” to return to you the results you want in a timely fashion.
To receive my free mini-booklet on protecting yourself from SQL injection attacks, or to comment on this article, please fill in the form below. Thanks!