Creating a Database that Meets User Needs

Computer Storage

 

A database is a repository for some person’s or organization’s data. For it to meet user needs, it must satisfy several criteria:

  • It must be easy to enter new data into it.
  • It must be easy and fast to retrieve just the information you want from it.
  • It must store the data in such a way that database operations do not corrupt it.

A database system consists of the hardware the database resides on, the operating system that controls that hardware, the database management system (DBMS) that performs operations on the database, the database application that translates the user’s commands into instructions the DBMS understands, and the database itself.

The database application is the part of the system that interfaces directly with the user, so it is the part that must make it easy to enter new data, and easy and fast to retrieve the desired information. This is the part of the system that database developers create. The DBMS comes from a vendor such as Microsoft or Oracle or from a consortium of professionals as is the case for open source DBMSs such as PostgreSQL or MySQL. The operating system (OS) comes from an OS vendor, such as Microsoft, or from an open source community, as is the case with Linux. Finally, the hardware comes from a hardware vendor such as IBM, Dell, or HP.

The hardware, operating systems, and DBMSs are all general purpose and have proven to be reliable in the marketplace. For these things, you can just buy the components that have enough capacity to meet your needs. The custom part is the database application. This is the software piece that solves the particular problem that you have. Since your needs are not exactly like anyone else’s you are either going to have to produce this part yourself, or hire someone to do it. Even if you decide to hire someone, it is good to understand what’s important so that you can make sure that what is finally produced actually does meet your needs.

There is a time-tested method for making sure that an application that results from a development effort actually meets the needs of those who will be using it. That method, which guides developers through the entire life of the development project is called the System Development Life Cycle (SDLC). The SDLC breaks down the development effort into a sequence of phases that, when carefully followed give a high probability of a successful project. The SDLC is not restricted to database development. It applies to development projects of any kind.

The phases are:

  • Definition
  • Requirements
  • Evaluation
  • Design
  • Implementation
  • Final Documentation and Testing
  • Maintenance

Each phase is important and skipping any one of them can cause the entire project to fail. I will describe each of these phases in detail in subsequent posts, including why it is important.

How a DBMS Organizes Database Data for Speedy Retrieval

Computer Storage

It’s important that database data be retrievable in a reasonable amount of time. This is particularly an issue for very large databases. The speed with which information is retrieved from a database depends on a number of factors. The size of the database and the speed of the hardware it is running on are obvious factors. Perhaps most critical, however, is the method used to access table data, which depends on the way the data is structured on the storage medium.

How quickly a system can retrieve desired information depends on the speed of the device that stores it. Different storage devices have a wide range of speeds, spanning many orders of magnitude. For fast retrievals, the information you want should reside on the fastest devices. Because it is difficult to predict which data items will be needed next, you can’t always make sure the data you are going to want next will be contained in the fastest storage device. Some storage allocation algorithms are nonetheless quite effective at making such predictions.

There is a hierarchy of storage types, ranging from the fastest to the slowest. In general, the faster a storage device is, the smaller its capacity. As a consequence, it is generally not possible to hold a large database entirely in the fastest available storage. The next best thing is to store that subset of the database that is most likely to be needed soon in the faster memory. If this is done properly, the overall performance of the system will be almost as fast as if the entire memory was as fast as the fastest component of it. Here are the components of a typical memory system, starting with the fastest part:

Registers: The registers in a computer system are the fastest form of storage. They are integrated into the processor chip, which means they are implemented with the fastest technology and the delay for transfers between the processing unit and the registers is minimal. It is not feasible to store any portion of a database in the registers, which are limited in number and in size. Instead, registers hold the operands that the processor is currently working on.

L1 cache: Level 1 cache is typically also located in the processor chip, but is not as intimately integrated with the processor as are the registers. It is the fastest form of storage that can store a significant fraction of a database.

L2 cache: Level 2 cache is generally located on a separate chip from the processor. It has greater capacity and is usually somewhat slower than the L1 cache.

Hard disk: Hard disk storage has orders of magnitude more capacity than does cache and is orders of magnitude slower. This is where databases are stored. Registers, L1 cache, and L2 cache are all volatile forms of memory. The data is lost when power is removed. Hard disk storage, on the other hand, is non-volatile. The data is retained even when the system is turned off. Because hard disk systems can hold a large database and retain it when power is off or interrupted, such systems are the normal home of all databases.

Offline storage: It is not necessary to have immediate access to databases that are not in active use. They can be retained on storage media that are slower than hard drives. A sequential storage medium such as magnetic tape is fine for such use. Data access is exceedingly slow, but acceptable for data that is rarely if ever needed. Huge quantities of data can be stored on tape. Tape is the ideal home for archives of obsolete data that nevertheless need to be retained against the day when they might be called upon again.

 

Business users who just want to retrieve the information they need from a database need not concern themselves with how the DBMS does it. However, it is good to know what is going on “under the covers” to return to you the results you want in a timely fashion.

To receive my free mini-booklet on protecting yourself from SQL injection attacks, or to comment on this article, please fill in the form below. Thanks!

Marketing Analysts are in High Demand

According to the Bureau of Labor Statistics, marketing analyst employment is projected to increase by 19 percent from 2014 to 2024. This is a field where skills in SQL and visualization technologies such as Microsoft Power BI are tickets to interesting and well paid careers. Check out some details here.

Start Learning Data Mining Now

Click this link to go to the first lesson in the data mining course.

Big Data Coming Soon to an Organization Near You

Organizations have a trove of information stored both on premises and in the cloud. Finally they are starting to leverage it using business intelligence (BI) software. Here’s an article that describes the recent explosion in Big Data usage, as well as the even bigger explosion due to occur in 2017.

An easy onramp into BI is described in my new video course, Getting Started with Microsoft Power BI. Subscribe to this site, using the text boxes on the left, to keep up to date on Big Data, Business Intelligence, SQL, and all things data.

People Who Give TED Talks are Hugely Successful People, Right?

Monica Bryne is a critically acclaimed science fiction novelist. She recently gave a TED talk on the economic realities of being an artist. This was a real eye-opener for me. I have never considered myself to be a fantastically successful author. I’ve never won any awards or critical acclaim. However, I find that compared to Monica, and apparently many other artists of all kinds, I am a veritable superstar, in that I can support myself with my art. Of course it helps that my wife has a great job as a clinical laboratory scientist.

The Road to Business Intelligence

In1965, Gordon Moore wrote an article for an electronics industry magazine where he noticed that computer memory manufacturers were doubling the number of memory cells that they could fabricate on a single silicon chip about every two years. This translated to the fact that every two years, you could get about twice as much computing power for the same amount of dollars as you could get two years before. This is the hallmark of exponential growth.

Moore’s Law has been in operation now for over 50 years and still holds true. That is why the supercomputers we carry around in our pockets and purses, called phones, have the capabilities that they do.

Back when Gordon Moore made his famous observation, there were no iPhones, tablets, or personal computers. At that time a computer filled an entire room much bigger than the room you are probably in right now.

One of the most common tasks in business in those days, as it is now, was to do financial analysis with an electronic spreadsheet. I worked with a company that offered such a spreadsheet for IBM mainframe computers. They rented it to their clients for $25,000 a month, and that was in the days when $25,000 was worth a lot more than it is now.

Personal computers appeared on the scene in 1975, but were considered to be toys by IBM and the other mainframe manufacturers of the day, and were ignored. In many ways, they were toys because the programs that early hobbyists wrote for them were not particularly business oriented.

That changed in 1979 when Dan Bricklin’s Software Arts released VisiCalc for the Apple II. The program was immediately recognized as so valuable, that it not only justified its purchase price — all by itself it justified the purchase of the computer that it ran on as well. It was the world’s first killer app.

After a brief time in the sun, VisiCalc was superseded by Lotus 1–2–3, which had more features and ran on the IBM PC. Lotus 1–2–3 was later displaced by Microsoft Excel, which ran under the new Windows operating system as well as on the Apple Macintosh. Excel is still at the top of the spreadsheet heap today, but the market has evolved.

Today, organizations have data stored in a variety of formats that have been created by different programs. In many cases, they want to pull data from different sources, combine it in a meaningful way, and present the result in a way that brings out trends and emphasizes what is important. This field is called Business Intelligence or BI. A number of companies worldwide have been active in BI for many years. SAP is one example and SAS is another.

BI products have tended to be quite costly, earning high profit margins for their purveyors. However, Moore’s law has now become a factor in the BI marketplace. Microsoft has recently evolved extensions to its ubiquitous Excel spreadsheet product into a new cloud-based BI product called Power BI.

One way to look at Moore’s law is to say that every two years you get twice as much computing power for the same cost. There is another way, however. That is to say that every two years you get the same power for half the cost. The ultimate result of that way of looking at it is that commodities that are subject to Moore’s Law eventually become free. This is essentially true of hard disk drives, where you can now get a drive holding trillions of bytes for well under $100. It has also come to business intelligence in the form of Microsoft’s Power BI, which is literally free. No cost whatsoever. Nada.

Microsoft is using Power BI as a lure to pull businesses into their ecosystem of products. Power BI is free, but once you have it, you are going to want the other Microsoft products that are tightly connected to it.

Moore’s Law rolls on far beyond what Gordon Moore or anyone else at that time ever thought it would. We today are the beneficiaries. If you run a business of any kind today, take a look at Microsoft’s Power BI. The price is right. You might also be interested in my course on Power BI from O’Reilly Media, titled Getting Started with Microsoft Power BI. In addition, subscribe to my blog at allengtaylor.com.

Make a comment on this post:

Tesla Self-driving Car Foresees Crash Before It Happens and Brakes in Time

A Tesla in the Netherlands predicted an immanent crash several seconds before it happened, warned the driver and started braking, preventing a possible multiple car pileup. https://electrek.co/2016/12/27/tesla-autopilot-radar-technology-predict-accident-dashcam/

Where CRISPR Might Take Us in 2017

CRISPR is the new gene editing technology that has exploded onto the health and longevity map, including early recognition of its significance by the Nobel committee. Here are some exciting places that it could take us as soon as this year.

 

Nobel Prizewinner Has New Book on Reversing Aging

Elizabeth Blackburn of the Salk Institute in La Jolla, California has a new book named “The Telomere Effect” that explains lifestyle choices you can make that will add years to your life. Read about it here:

https://www.statnews.com/2017/01/03/aging-control-telomere-effect/