Pritam Kundu: February 2012

v DATABASE MANAGEMENT SYSTEM :

· Definition:

A database-management system (DBMS) is a collection of interrelated data and a set of programs to access those data. The collection of data, usually referred to as the Database, contains information relevant to an enterprise. The primary goal of a DBMS is to provide a way to store and retrieve database information that is both convenient and efficient.

§ Database-management system is widely use in Banking, Airlines, University, Credit Card Transaction, Telecommunication, Finance, Manufacturing, Sales, Human Resources.

· Database System Vs File System:

• Data redundancy and inconsistency:

Since different programmers create the files and application programs over a long period, the various files are likely to have different formats and the programs may be written in several programming languages. Moreover, the same information may be duplicated in several places (files).

Data inconsistency that is, the various copies of the same data may no longer agree.

• Difficulty in accessing data:

The conventional file-processing environments do not allow needed data to be retrieved in a convenient and efficient manner.

Example-

Suppose that one of the bank officers needs to find out the names of all customers who live within a particular postal-code area. The officer asks the data-processing department to generate such a list. Because the designers of the original system did not anticipate this request, there is no application program on hand to meet it. There is, however, an application program to generate the list of all customers. The bank officer has now two choices: either obtains the list of all customers and extract the needed information manually or ask a system programmer to write the necessary application program. Both alternatives are obviously unsatisfactory.

• Data isolation:

As the data are scattered in various files, and files may be in different formats, writing new application programs to retrieve the appropriate data is difficult.

• Integrity problems:

The data values stored in the database must satisfy certain types of consistency constraints. For example, the balance of a bank account may never fall below a prescribed amount (say, $25). Developers enforce these constraints in the system by adding appropriate code in the various application programs. However, when new constraints are added, it is difficult to change the programs to enforce them.

• Atomicity problems:

A computer system, like any other mechanical or electrical device, is subject to failure. Consider a program to transfer $50 from account A to account B. If a system failure occurs during the execution of the program, it is possible that the $50 was removed from account A but was not credited to account B, resulting in an inconsistent database state. Clearly, it is essential to database consistency that either both the credit and debit occur, or that neither occur. That is, the funds transfer must be atomic. It is difficult to ensure atomicity in a conventional file-processing system.

• Concurrent-access anomalies:

For the sake of overall performance of the system and faster response, many systems allow multiple users to update the data simultaneously. In such an environment, interaction of concurrent updates may result in inconsistent data.

Consider bank account A, containing $500. If two customers withdraw funds (say $50 and $100 respectively) from account A at about the same time, the result of the concurrent executions may leave the account in an incorrect (or inconsistent) state.

If the two programs run concurrently, they may both read the value $500, and write back $450 and $400, respectively. Depending on which one writes the value last, the account may contain either $450 or $400, rather than the correct value of $350.

• Security problems:

Not every user of the database system should be able to access all the data. For example, in a banking system, payroll personnel need to see only that part of the database that has information about the various bank employees. They do not need access to information about customer accounts. But, since application programs are added to the system in an ad hoc manner, enforcing such security constraints is difficult.

- This is the measure disadvantage in file processing system over a database management system.

· Data Abstraction:

Many database system users are not computer trend developers hide the complexity form users throw severable table of abstraction to simplify users to interactions with the system. This is called data abstractor.

• Physical level:

The lowest level of abstraction describes how the data are actually stored. The physical level describes complex low-level data structures in detail.

• Logical level:

The next-higher level of abstraction describes what data are stored in the database, and what relationships exist among those data. The logical level thus describes the entire database in terms of a small number of relatively simple structures.

• View level:

The highest level of abstraction describes only part of the entire database. Many users of the database system do not need all this information; instead, they need to access only a part of the database. The view level of abstraction exists to simplify their interaction with the system. The system may provide many views for the same database.

· Data Model:

Data model is a collection of conceptual tools for describing data, data relationships, data semantics, and consistency constraints.

Entity Relationship Model:

The entity-relationship (E-R) data model is based on a perception of real world that consist of a collection of basic objects call Entity and relationships among these objects.

v Database Legacy:

Flat file, Hierarchical & Network database are usually referred as Legacy Database. They represent the ways. People used to organize information in prehistoric times.

1. Flat file Database:

The flat file database was probably one of the earliest database management systems.

Problems of flat file database are: Data redundancy, slow processing speed, error prone, retrieved.

Hierarchical Database Network Database Relational Database

2. Hierarchical Database:

It stores data in Hierarchical structures similar to that of a family tree, organization chart, or pyramid. It is based on “parents/child” paradigm in each parents could have many children but each child has one and only one parents.

In the hierarchical database model redundancy is greatly reduce compared with Flat file database model. The hierarchical database maintains one to many relationships very well.

3. Network Database:

It is also based on the concept of parents/child relationship but removes the restriction of one child having one and only one parents.

v Data Integrity:

Data integrity falls into three following categories:

1. Entity Integrity: Entity integrity ensures that row can be identifying by an attribute called the Primary key. The primary key cannot have a null value.

2. Domain Integrity: Domain integrity refers to the range of valid entities for a given column. It ensure that there are only valid entities in the column.

3. Referential Integrity: Referential integrity ensures that for every value of Foreign Key, there is matching value of primary key.

v Database Languages:

A database system provides a data definition language to specify the database schema and a data manipulation language to express database queries and updates. In practice, the data definition and data manipulation languages are not two separate languages; instead they simply form parts of a single database language, such as the widely used SQL language.

Ø Data-Definition Language:

We specify a database schema by a set of definitions expressed by a special language called a data-definition language (DDL).

For instance, the following statement in the SQL language defines the account table:

create table account

(account-number char(10),

balance integer)

Execution of the above DDL statement creates the account table. In addition, it updates a special set of tables called the data dictionary or data directory.

A data dictionary contains metadata—that is, data about data. The schema of a table is an example of metadata. A database system consults the data dictionary before reading or modifying actual data.

We specify the storage structure and access methods used by the database system by a set of statements in a special type of DDL called a data storage and definition language. These statements define the implementation details of the database schemas, which are usually hidden from the users.

The data values stored in the database must satisfy certain consistency constraints. For example, suppose the balance on an account should not fall below $100. The DDL provides facilities to specify such constraints. The database systems check these constraints every time the database is updated.

Ø Data-Manipulation Language:

Data manipulation is

• The retrieval of information stored in the database

• The insertion of new information into the database

• The deletion of information from the database

• The modification of information stored in the database

A data-manipulation language (DML) is a language that enables users to access or manipulate data as organized by the appropriate data model. There are basically two types:

• Procedural DMLs require a user to specify what data are needed and how to get those data.

• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what data are needed without specifying how to get those data.

Declarative DMLs are usually easier to learn and use than are procedural DMLs. However, since a user does not have to specify how to get the data, the database system has to figure out an efficient means of accessing data. The DML component of the SQL language is nonprocedural.

v The Query Processor:

The query processor components include

• DDL interpreter:

It interprets DDL statements and records the definitions in the data dictionary.

• DML compiler:

It translates DML statements in a query language into an evaluation plan consisting of low- level instructions that the query evaluation engine understands.

A query can usually be translated into any of a number of alternative evaluation plans that all give the same result. The DML compiler also performs query optimization, that is, it picks the lowest cost evaluation plan from among the alternatives.

• Query evaluation engine:

It executes low-level instructions generated by the DML compiler.

v Database Users and Administrators:

A primary goal of a database system is to retrieve information from and store new information in the database. People who work with a database can be categorized as database users or database administrators.

Ø Database Users and User Interfaces

There are four different types of database-system users, differentiated by the way they expect to interact with the system. Different types of user interfaces have been designed for the different types of users.

• Naive users:

Naïve users are unsophisticated users who interact with the system by invoking one of the application programs.

For example, a bank teller who needs to transfer $50 from account A to account B invokes a program called transfer. This program asks the teller for the amount of money to be transferred, the account from which the money is to be transferred, and the account to which the money is to be transferred.

The typical user interface for naive users is a forms interface, where the user can fill in appropriate fields of the form. Naive users may also simply read reports generated from the database.

• Application programmers:

Application programmers are computer professionals who write application programs. Application programmers can choose from many tools to develop user interfaces. Rapid application development (RAD) tools are tools that enable an application programmer to construct forms and reports without writing a program.

• Sophisticated users:

Sophisticated users interact with the system without writing programs. Instead, they form their requests in a database query language. They submit each such query to a query processor, whose function is to break down DML statements into instructions that the storage manager understands.

• Specialized users:

Specialized users are sophisticated users who write specialized database applications that do not fit into the traditional data-processing framework. Among these applications are computer-aided design systems, knowledge base and expert systems, systems that store data with complex data types (for example, graphics data and audio data), and environment-modelling systems.

v Database Administrator:

One of the main reasons for using DBMSs is to have central control of both the data and the programs that access those data. A person who has such central control over the system is called a database administrator (DBA). The functions of a DBA include:

• Schema definition:

The DBA creates the original database schema by executing a set of data definition statements in the DDL.

• Storage structure and access-method definition

• Schema and physical-organization modification:

The DBA carries out changes to the schema and physical organization to reflect the changing needs of the organization, or to alter the physical organization to improve performance.

• Granting of authorization for data access:

By granting different types of authorization, the database administrator can regulate which parts of the database various users can access. The authorization information is kept in a special system structure that the database system consults whenever someone attempts to access the data in the system.

• Routine maintenance: Examples of the database administrator’s routine maintenance activities are:

ü 􀀀 Periodically backing up the database, either onto tapes or onto remote servers, to prevent loss of data in case of disasters such as flooding.

ü Ensuring that enough free disk space is available for normal operations, and upgrading disk space as required.

ü 􀀀 Monitoring jobs running on the database and ensuring that performance is not degraded by very expensive tasks submitted by some users.

v Storage Manager:

A storage manager is a program module that provides the interface between the low level data stored in the database and the application programs and queries submitted to the system. The storage manager is responsible for the interaction with the file manager. The raw data are stored on the disk using the file system, which is usually provided by a conventional operating system. The storage manager translates the various DML statements into low-level file-system commands. Thus, the storage manager is responsible for storing, retrieving, and updating data in the database.

The storage manager components include:

• Authorization and integrity manager:

It tests for the satisfaction of integrity constraints and checks the authority of users to access data.

• Transaction manager:

It ensures that the database remains in a consistent (correct) state despite system failures, and that concurrent transaction executions proceed without conflicting.

• File manager:

It manages the allocation of space on disk storage and the data structures used to represent information stored on disk.

• Buffer manager:

It is responsible for fetching data from disk storage into main memory, and deciding what data to cache in main memory. The buffer manager is a critical part of the database system, since it enables the database to handle data sizes that are much larger than the size of main memory.

The storage manager implements several data structures as part of the physical system implementation:

• Data files: It stores the database itself.

• Data dictionary: It stores metadata about the structure of the database, in particular the schema of the database.

• Indices: It provides fast access to data items that hold particular values.

v Transaction Manager:

It ensures that the database remains in a consistent (correct) state despite system failures and that concurrent transaction execution proceed without conflicting.

· File Manager: It manages the allocation of space on disk storage & the data structures used to represent information stored on a disk

· Buffer Manger: It is responsible for fetching data from disk storage into main memory & deciding what data to cache in the memory. The buffer manager is a critical part of the database, since it enables the database to handle data sizes that much larger than the size of main memory.

v Entity:

An entity is a “thing” or “object” in the real world that is distinguishable from all other objects. Entities are the building block of database. We represent an entity using a rectangular box that contains the name of the entity.

v Attributes:

An attributes is a property of an entity that differentiates from other entity & provide information about the entity. In an E-R diagram we represent attributes as ellipse.

• Simple and Composite attributes:

Simple attributes have been simple; that is, they are not divided into subparts.

Composite attributes can be divided into subparts (that is, other attributes).

For example, an attribute name could be structured as a composite attribute consisting of first-name, middle-initial, and last-name. So name is a composite attributes.

• Single-valued and Multi-valued attributes:

When an attribute has a single value for a particular entity, it is called the single-value attribute.

When an attribute has a multiple value for a particular entity, it is called the Multi-value attribute. As example, a person can have several phones no. So phones no. attribute is Multi-valued attribute.

• Derived attribute:

The value for this type of attribute can be derived from the values of other related attributes or entities.

As example consider, the customer entity set has an attribute age, which indicates the customer’s age. If the customer entity set also has an attribute date-of-birth, we can calculate age from date-of-birth and the current date.

Ø N.B: An attribute takes a null value when an entity does not have a value for it. The null value may indicate “not applicable”—that is, that the value does not exist for the entity. For example, one may have no middle name. Null can also designate that an attribute value is unknown. An unknown value may be either missing (the value does exist, but we do not have that information) or not known (we do not know whether or not the value actually exists).

v Relationship:

Relationship is used to establish a connection between a pair of logically related entities. It is a association between entities.

Separate entities can have relationships with each other. For example- if student study various course the entity are student and course, while the relationship between them is represented by a diamond type.

Student

Course

v Mapping Cardinalities:

Mapping cardinalities, or cardinality ratios, express the number of entities to which another entity can be associated via a relationship set.

For a binary relationship set R between entity sets A and B, the mapping cardinality must be one of the following:

• One to one. An entity in A is associated with at most one entity in B, and an entity in B is associated with at most one entity in A. (See Figure 2.4a.)

• One to many. An entity in A is associated with any number (zero or more) of entities in B. An entity in B, however, can be associated with at most one entity in A. (See Figure 2.4b.)

• Many to one. An entity in A is associated with at most one entity in B. An entity in B, however, can be associated with any number (zero or more) of entities in A. (See Figure 2.5a.)

• Many to many. An entity in A is associated with any number (zero or more) of entities in B, and an entity in B is associated with any number (zero or more) of entities in A. (See Figure 2.5b.)

v Keys:

A key allows us to identify a set of attributes that suffice to distinguish entities from each other. Keys also help uniquely identify relationships, and thus distinguish relationships from each other. So keys plays an important role maintain data integrity.

· Candidate Key: It is important to have an attribute in a table that uniquely identifies a row. So an attributes which identifies a row is called Candidates Key. This attributes has values that are unique.

Consider the table Vehicle

VEHICLE:

Serial#	Regn#	Description
023451	5602	Leyland
023452	4502	Volvo
023453	4513	Toyota

The value of attribute Serial#, Regn# & Description are unique in every row. Therefore all three are candidates key.

ü Candidates key can also referred to as Surrogate Key

· Primary Key:

The candidate key that we choose to identify each row uniquely is called the Primary key In the table Vehicle: If we choose Serial# to identify rows uniquely serial# is the primary key.

· Alternative Key:

A candidate key that is not chosen as a primary key is an alternative key. In the table VEHICLE, if we choose Serial# as the primary key then Reg# & Description is the alternative key.

· N.B: It is important that we understood that a primary key is the only sure way to identify the row of a table. Hence an attribute key may have the null value. A null value is not to be permitted in a primary key. Since it would be difficult to uniquely identify rows containing null value.

· Supper Key:

· Composite Key:

In a certain table, a single attribute cannot be used to identify rows uniquely so a combination of two or more attributes is used as a primary key. Such keys are called Composite Key.

· Foreign Key:

When a primary key of one table appears as an attribute in another table, it is called the Foreign Key. In the second table a foreign key is used to relate two tables.

v Weak Entity Set:

An entity set may not have sufficient attributes to form a primary key. Such an entity set is termed a weak entity set. An entity set that has a primary key is termed a strong entity set.

For a weak entity set to be meaningful, it must be associated with another entity set, called the identifying or owner entity set. The relationship associating the weak entity set with the identifying entity set is called the identifying relationship.

Although a weak entity set does not have a primary key, we nevertheless need a means of distinguishing among all those entities in the weak entity set that depend on one particular strong entity. The discriminator of a weak entity set is a set of attributes that allows this distinction to be made. Hence the primary key of a weak entity set is formed by the primary key of the identifying entity set, plus the weak entity set’s discriminator.

v Entity-Relationship Diagram:

An E-R diagram can express the overall logical structure of a database graphically. E-R diagrams are simple and clear—qualities that may well account in large part for the widespread use of the E-R model. Such a diagram consists of the following major components:

• Rectangles, which represent entity sets

• Ellipses, which represent attributes

• Diamonds, which represent relationship sets

• Lines, which link attributes to entity sets and entity sets to relationship sets

• Double ellipses, which represent multi-valued attributes

• Dashed ellipses, which denote derived attributes

• Double lines, which indicate total participation of an entity in a relationship set

• Double rectangles, which represent weak entity sets

Pritam Kundu

Pages

Tuesday 21 February 2012

DATABASE MANAGEMENT SYSTEM