A database management system (DBMS) is a computer program (or more typically, a suite of them) designed to manage a database, a large set of structured data, and run operations on the data requested by numerous users. Typical examples of DBMS use include accounting, human resources and customer support systems. Originally found only in large companies with the computer hardware needed to support large data sets, DBMSs have more recently emerged as a fairly standard part of any company back office.
DBMS's contrast with the more general concept of a database applications in that they are designed as the "engine" of a multi-user system. In order to fill this role, DBMSs are typically built around a private multitasking kernel with built-in networking support. A typical database application will not include these features internally, but may be able to support similar functionality by relying on the operating system to provide these features for it.
|
Databases have been in use since the earliest days of electronic computing, but the vast majority of these were custom programs written to access custom databases. Unlike modern systems which can be applied to widely different databases and needs, these systems were tightly linked to the database in order to gain speed at the price of flexibility.
As computers grew in capability this tradeoff became increasingly unnessesary, and a number of general-purpose database systems emerged, and by the mid-1960s there were a number of such systems in commercial use. Interest in a standard started to grow, and Charles Bachman, author of one such product, IDS, founded the Database Task Group within Codasyl[?], the group responsible for the creation and standardization of COBOL. In 1971 they delivered their standard, which generally became known as the Codasyl approach, and soon there were a number of commercial products based on it available.
The Codasyl approach was based on the "manual" navigation of a linked dataset which was formed into a large network. When the database was first opened the program was handed back a link to the first record in the database, which also contained pointers to other pieces of data. To find any particular record the programmer had to step through these pointers one at a time until the required record was returned. SImple queries like "find all the people in Sweden" required the program to walk the entire data set and collect the matching results, there was, essentially, no concept of "find" or "search". This might sound like a serious limitation today, but in an era when the data was most often stored on magnetic tape such operations were too expensive to contimplate anyway.
IBM also had their own DBMS system in 1968, known as IMS. IMS was a development of software written for the Apollo program on the System/360. IMS was generally similar in concept to Codasyl, but used a strict hierarchy for its model of data navigation instead of Codasyl's network model.
Both concepts later became known as navigational databases due to the way data was accessed, and Bachman's 1973 Turing Award award presentation was The Programmer as Navigator.
Edgar Codd worked at IBM in San Jose, one of their offshoot offices that was primarily involved in the development of hard disk systems. He was unhappy with the navigational model of the Codasyl approach, notably the lack of a "search" facility which was becoming increasingly useful when the database was stored on disk instead of tape. In 1970 he wrote a number of papers outlining a new approach to database construction, eventually culminating in the groundbreaking A Relational Model of Data for Large Shared Data Banks.
In this paper he described a new system for storing and working with large databases. Instead of records being stored in some sort of linked list of free-form records as in Codasyl, his concept was to use a "table" of fixed-length records. Such a system would be very ineffecient when storing "sparse" databases where some of the data for any one record could be left empty. The relational model solved this by splitting the data into a series of tables, with optional elements being moved out of the main table where they would take up room only if needed.
For instance, a common use of a database system is to track information about users, their name, login information, various addresses and phone numbers. In the navigational approach all of this data would be placed in a single record, and items that were not used would simply not be placed in the database. In the relational approach, the data would be split into a user table, an address table and a phone number table (for instance). Only if the address or phone numbers were provided would records be created in these optional tables.
Linking the information back together is key to this system. In the relational model some bit of information was used as a "key", uniquely defining a particular record. When information was being collected about a user, information stored in the optional (or related) tables would be found by searching for this key. For instance, if the login name of a user is unique, addresses and phone numbers for that user would be recorded with the login name as their key.
This "re-linking" of related data back into a single collection is something that traditional computer languages are not designed for. Just as the navigational approach would require programs to loop in order to collect records, the relational approach would require loops to collect information about any one record. Codd's solution to this problem was to create a new language dedicated to just this problem, a suggestion that would later develop into the almost-universal SQL today.
Using a branch of mathematics known as tuple calculus, he demonstrated that such a system could support all the operations of normal databases (inserting, updating etc.) as well as providing a simple system for finding and returning sets of data in a single operation.
IBM started working on a prototype system based on Codd's concepts as System R in the early 1970s. The first "quicky" version was ready in 1974/5, and work then started on multi-table systems in which the data could be broken down so that all of the data for a record (much of which is often optional) didn't have to be stored in a single large "chunk". Followup multi-user versions were tested by customers in 1978 and 79, by which time a standardized computer language, SQL, had been added. By this time it had become clear that Codd's ideas were both workableand superior to Codasyl, and IBM started working on a true product versions of System R, known as SQL/DS, and, later, Database 2 (DB2).
Codd's paper was also picked up by two people at Berkeley, Euegene Wong and Michael Stonebraker. They started a project known as INGRES[?] using funding that had already been allocated for geographical database project, using student programmers to produce code. Starting in 1973, INGRES delivered its first test products in 1974, and was generally ready for widespread use in 1979. During this time a number of people had moved "through" the group, perhaps as many 30 people worked on the project, about five at a time. INGRES was similar to System R in a number of ways, including the use of a "language" for data access, known as QUEL.
Many of the people involved with INGRES became convinced of the future commercial success of such systems, and formed their own companies to commercialize the work. Sybase, Informix, NonStop SQL and eventually Ingres itself were all being sold as offshoots to the original INGRES product in the 1980s. Even Microsoft SQL Server is actually a re-built version of Sybase, and thus, INGRES. Only Larry Ellison's Oracle started from a different chain, based on IBM's papers on System R by beating them to market when the first version was released in 1978.
Even in Sweden Codd's paper was read, and Mimer SQL was developed from the mid-70s at Uppsala University, and in 1984 this project was consolidated into an independent enterprise. In the early 1980s Mimer introduced transaction handling for high robustness in applications, an idea that was subsequently implemented on most other DBMSs.
In response database programmers have built up a collection of design paterns on what will and will not work well in a relational system, to the point where applications often take on a "relational flavour" as a result. Factoring out information like addresses has to be carefully weighted against the performance problems it will cause, leading to many design decisions being chosen for the developer by the inherant limitations of the relational model.
Another solution was to fix the original problem and allow the database to store explicit information about the relationships between data. Instead of finding Bob's address by looking up the "key" in the address table, simply store a pointer to the data in question. In fact, if the data is "owned" by the original record (that is, no other records in USER point to it), it can be stored in the same physical location, thereby increasing the speed at which it can be accessed.
Such systems, known as multidimentional databases have a number of advantages when dealing with large data sets. Although promoted for this role, they first came on the market in an area when databases were too small to need such systems, or so large they used custom solution anyway. As a general solution the multidimensional system never became popular directly.
This could happen because of the multidimentional system's concepts of ownership. In a OO program a particular object "owns" others in memory, Bob owns his address for instance. This concept of ownership is absent from relational databases, requiring a huge amount of effort to map between the two concepts. Adding support for various OO languages and polymorphism re-created the multidimentional systems as object databases, which continue to serve a niche today.
When a DBMS is used, information systems can be changed much more easily as the organization's information requirements change. New categories of data can be added to the database without disruption to the existing system.
Data security prevents unauthorised users from viewing or updating the database. Using passwords, users are allowed access to the entire database or subsets of the database, called subschemas (pronounced "sub-skeema"). For example, an employee database can contain all the data about an individual employee, but one group of users may be authorized to view only payroll data, while others are allowed access to only work history and medical data.
The DBMS can maintain the integrity of the database by not allowing more than one user to update the same record at the same time. The DBMS can keep duplicate records out of the database; for example, no two customers with the same customer numbers (key fields) can be entered into the database. See ACID properties for more information.
Database query languages[?] and report writers allow users to interactively interrogate the database and analyse its data.
If the DBMS provides a way to interactively enter and update the database, as well as interrogate it, this capability allows for managing personal databases. However, it may not leave an audit trail of actions or provide the kinds of controls necessary in a multi-user organisation. These controls are only available when a set of application programs are customised for each data entry and updating function.
A business information system is made up of subjects (customers, employees, vendors, etc.) and activities (orders, payments, purchases, etc.). Database design[?] is the process of deciding how to organize this data into record types and how the record types will relate to each other. The DBMS should mirror the organisation's data structure and process transactions efficiently.
Organizations may use one kind of DBMS for daily transaction processing and then move the detail onto another computer that uses another DBMS better suited for random inquiries and analysis. Overall systems design decisions are performed by data administrators and systems analysts. Detailed database design is performed by database administrators.
The three most common organisations are the hierarchical, network and relational models. A database management system may provide one, two or all three methods. Inverted lists and other methods are also used. The most suitable structure depends on the application and on the transaction rate and the number of inquiries that will be made.
The dominant model in use today is the relational model, usually used with the SQL query language. Many DBMSes also support the Open Database Connectivity API that supports a standard way for programmers to access the DBMS.
Database servers[?] are specially designed computers that held the actual databases and run only the DBMS and related software. Database servers are usually multiprocessor computers, with RAID disk arrays used for stable storage. Connected to one or more servers via a high-speed channel, hardware database accelerators[?] are also used in large volume transaction processing environments.
wikipedia.org dumped 2003-03-17 with terodump