How a DBMS Organizes Database Data for Speedy Retrieval

Computer Storage

It’s important that database data be retrievable in a reasonable amount of time. This is particularly an issue for very large databases. The speed with which information is retrieved from a database depends on a number of factors. The size of the database and the speed of the hardware it is running on are obvious factors. Perhaps most critical, however, is the method used to access table data, which depends on the way the data is structured on the storage medium.

How quickly a system can retrieve desired information depends on the speed of the device that stores it. Different storage devices have a wide range of speeds, spanning many orders of magnitude. For fast retrievals, the information you want should reside on the fastest devices. Because it is difficult to predict which data items will be needed next, you can’t always make sure the data you are going to want next will be contained in the fastest storage device. Some storage allocation algorithms are nonetheless quite effective at making such predictions.

There is a hierarchy of storage types, ranging from the fastest to the slowest. In general, the faster a storage device is, the smaller its capacity. As a consequence, it is generally not possible to hold a large database entirely in the fastest available storage. The next best thing is to store that subset of the database that is most likely to be needed soon in the faster memory. If this is done properly, the overall performance of the system will be almost as fast as if the entire memory was as fast as the fastest component of it. Here are the components of a typical memory system, starting with the fastest part:

Registers: The registers in a computer system are the fastest form of storage. They are integrated into the processor chip, which means they are implemented with the fastest technology and the delay for transfers between the processing unit and the registers is minimal. It is not feasible to store any portion of a database in the registers, which are limited in number and in size. Instead, registers hold the operands that the processor is currently working on.

L1 cache: Level 1 cache is typically also located in the processor chip, but is not as intimately integrated with the processor as are the registers. It is the fastest form of storage that can store a significant fraction of a database.

L2 cache: Level 2 cache is generally located on a separate chip from the processor. It has greater capacity and is usually somewhat slower than the L1 cache.

Hard disk: Hard disk storage has orders of magnitude more capacity than does cache and is orders of magnitude slower. This is where databases are stored. Registers, L1 cache, and L2 cache are all volatile forms of memory. The data is lost when power is removed. Hard disk storage, on the other hand, is non-volatile. The data is retained even when the system is turned off. Because hard disk systems can hold a large database and retain it when power is off or interrupted, such systems are the normal home of all databases.

Offline storage: It is not necessary to have immediate access to databases that are not in active use. They can be retained on storage media that are slower than hard drives. A sequential storage medium such as magnetic tape is fine for such use. Data access is exceedingly slow, but acceptable for data that is rarely if ever needed. Huge quantities of data can be stored on tape. Tape is the ideal home for archives of obsolete data that nevertheless need to be retained against the day when they might be called upon again.

 

Business users who just want to retrieve the information they need from a database need not concern themselves with how the DBMS does it. However, it is good to know what is going on “under the covers” to return to you the results you want in a timely fashion.

To receive my free mini-booklet on protecting yourself from SQL injection attacks, or to comment on this article, please fill in the form below. Thanks!

Looking for a High Paying Job? There are a Lot of Openings Right Now.

Computer Storage

 

I’ve spent my career developing, using, and writing about databases. Databases, and the data they contain are the most valuable assets that organizations large and small possess. Making use of that data, curating it, and maintaining it are high value jobs in those organization. There  are a lot of different database management systems out there, some proprietary and others open source. They all have one thing in common, however. They rely on the SQL data language to talk to the database and retrieve useful information from it. Right now, thousands of job openings that require SQL knowledge are posted in the Web every day. People who can fill those openings are much in demand, which means salaries, already high, are on the way up.

Back in 1995, when I wrote the first edition of SQL For Dummies, recognition of the importance of SQL was just starting to emerge from the inner sanctum of corporate data centers. Now SQL is universal and contained in anything that stores data in a structured manner. I am working on a video course that will provide an easy onramp to SQL proficiency. I will keep you updated on what will be in the course and when I plan to roll it out. In the meantime, if you have specific areas that you will like to see covered in the course, please contact me at allen.taylor@ieee.org and let me know.