Database Management - Department of Higher Education

Samara Gijsbertsen | Download | HTML Embed
  • Jan 21, 2010
  • Views: 15
  • Page(s): 164
  • Size: 1.35 MB
  • Report



1 ? Database Management

2 Subject: DATABSE MANAGEMENT Credits: 4 SYLLABUS Introduction to data base management system Data versus information, record, file; data dictionary, database administrator, functions and responsibilities; file-oriented system versus database system Database system architecture Introduction, schemas, sub schemas and instances; data base architecture, data independence, mapping, data models, types of database systems Data base security Threats and security issues, firewalls and database recovery; techniques of data base security; distributed data base Data warehousing and data mining Emerging data base technologies, internet, database, digital libraries, multimedia data base, mobile data base, spatial data base Lab: Working over Microsoft Access Suggested Readings: 1. A Silberschatz, H Korth, S Sudarshan, Database System and Concepts; fifth Edition; McGraw-Hill 2. Rob, Coronel, Database Systems , Seventh Edition, Cengage Learning

3 DATABASE MANAGEMENT DATABASE MANAGEMENT SYSTEM COURSE OVERVIEW This course provides an immediately useable tools and the The students on completion of the course shall develop the techniques in the methods of database management system, following skills and competencies: requirements analysis, definition, specification and design etc. It Database provides participants with the details if the tools, techniques DBMS and methods to lead or participate in the front-end phases. Database System Application Database systems are designed to manage large bodies of File System information.Management of data involves both defining structures for storage of information & providing way for Data Ionsistency manipulation of data. In addition, the database system must ensure safety of data. DBMS is collection of programs that enables you to store, modify, and extract important information from a database. There are many different types of DBMS, ranging from small sys-tems that run on personal computers to huge systems that run on mainframes. Objectives To help you to learn DBMS and design technique: what it is and how one goes about doing it. The primary goal of DBMS is to provide an environment that is both convenient & efficient for people to use in retrieving & storing information.. Database systems are designed to store large bodies of information. By the end of this material, you will be equipped with good knowledge of technical information that will help you develop & understand DBMS ii

4 DATABASE DATABASE MANAGEMENT MANAGEMENT CONTENT Lesson No. Topic Page No. INTRODUCTION TO DBMS Lesson 1 Introduction to Database I 1 Lesson 2 Introduction to Database II 5 Lesson 3 Tutorial 8 Lesson 4 Database Concepts I 9 Lesson 5 Database Concepts II 13 DATA MODELS IN DATABASES Lesson 6 Data Models 17 RELATIONAL DATA BASE MANAGEMENT SYSTEM Lesson 7 Relational Database Management System I 21 Lesson 8 Relational Database Management System II 27 Lesson 9 E-R Model I 31 Lesson 10 E-R Model II 36 STRUCTURED QUERY LANGUAGES Lesson 11 Structured Query Language(SQL) I 40 Lesson 12 LAB 45 Lesson 13 Lab 46 Lesson 14 SQL II 47 Lesson 15 LAB 55 Lesson 16 LAB 56 Lesson 17 SQL III 57 Lesson 18 Lab 62 Lesson 19 Lab 63 Lesson 20 SQL IV 64 Lesson 21 Lab 67 Lesson 22 Lab 68 Lesson 23 Integrity and security 69 Lesson 24 LAB 75 Lesson 25 LAB 76 Lesson 26 PL/SQL 77 Lesson 27 Lab 82 Lesson 28 Lab 83 Lesson 29 Database Triggers 84 Lesson 30 LAB 89 v

5 DATABASE DATABASE MANAGEMENT MANAGEMENT CONTENT Lesson No. Topic Page No. Lesson 31 LAB 90 Lesson 32 Database Cursors 91 Lessom 33 LAB 100 Lesson 34 LAB 101 NORMALIZATION Lesson 35 Normalisation I 102 Lesson 36 Normalisation I I 107 Lesson 37 Normalisation III 112 FILE ORGANIZATION METHODS Lesson 38 File Organization Method I 118 Lesson 39 File Organization Method II 123 DATA BASE OPERATIONAL MAINTENANCE Lesson 40 Transactions Management 130 Lesson 41 Concurrency Control I 136 Lesson 42 Concurency Control II 141 Lesson 43 Concurrency Control III 146 Lesson 44 Database Recovery 152 vi

6 DATABASE LESSON 1: INTRODUCTION TO DATABASE I MANAGEMENT Lessons Objective 1.2 Database System Applications Database There are many different types of DBMSs, ranging from small systems that run on personal computers to huge systems that Database management system run on mainframes. Database are applied in wide no. of Essentials of data applications. Following are some of the examples :- Benefits of DBMS Banking: For customer information, accounts, loans & other Database system application banking transactions Purpose of database system Airlines: For reservation & schedule information 1.1 What is Database Management Universities: For student information, course System(DBMS) registration,grades etc. A database can be termed as a repository of data. A collection Credit card transaction: For purchase of credit cards & of actual data which constitutes the information regarding an generation of monthly statements. organisation is stored in a database. For ex. There are 1000 Tlecommunication: For keeping records of calls made , students in a college & we have to store their personal details, generating monthly billetc. marks details etc., these details will be recorded in a database. Finance: For storing information about holdings, sales & A collection of programs that enables you to store, modify, and purchase of financial statements extract information from a database is known as DBMS.The Sales: For customer,product & purchase information primary goal of a DBMS is to provide a way to store & retrieve database information that is both convenient & efficient. Manufacturing: For management of supply chain. Human Resource: For recording information about employees,salaries,tax,benefits etc. We can say that when ever we need to have a computerised Actual data storage system, we need a database system 1.3 Purpose of Database system A file system is one in which we keep the information in Database systems are designed to manage large bodies of operating system files. Before the evolution of DBMS, information.Management of data involves both defining organisations used to store information in file systems. A structures for storage of information & providing way for typical file processing system is supported by a conventional manipulation of data. In addition, the database system must operating system. The system stores permanent records in ensure safety of data. various files & it need application program to extract records , DBMS is collection of programs that enables you to store, or to add or delete records . modify, and extract important information from a database. We will compare both systems with the help of an example. There are many different types of DBMS, ranging from small There is a saving bank enterprise that keeps information about sys-tems that run on personal computers to huge systems that all customers & saving accounts. Following manipulations has run on mainframes. to be done with the system Good data management is an essential prerequisite to corporate A program to debit or credit an account success. A program to add a new account. Data Information A program to find balance of an account. Information Knowledge A program to generate monthly statements. Knowledge Judgment As the need arises new applications can be added at a particular Judgment Decision point of time as checking accounts can be added in a saving Decision Success account. Using file system for storing data has got following provided that data is: disadvantages:- complete 1. Data Redundancy and Inconsistency accurate Different programmers work on a single project , so various timely files are created by different programmers at some interval of time. So various files are created in different formats & different easily available programs are written in different programming language. 1

7 Same information is repeated.For ex name & address may The primary goal of DBMS is to provide an environment DATABASE appear in saving account file as well as in checking account. This that is both convenient & efficient for people to use in redundancy sesults in higher storage space & access cost.It also retrieving & storing information. leads to data inconsistency which means that if we change some DBMS systems are ubiquitous today & most people record in one place the change will not be reflected in all the interact either directly or indirectly with database many times places. For ex. a changed customer address may be reflected in MANAGEMENT every day. saving record but not any where else. Database systems are designed to store large bodies of 2. Difficulty in Accesing data information. Accessing data from a list is also a difficulty in file A major purpose of a DBMS is to provide users with an system.Suppose we want to see the records of all customers abstract view of data i.e. the system hides how the data is who has a balance less than $10,000, we can either check the list stored & maintained. & find the names manually or write an application program .If we write an application program & at some later time, we need Review Terms to see the records of customer who have a balance of less than Database $20,000, then again a new program has to be written. DBMS It means that file processing system do not allow data to be Database System Application accessed in a convenient manner. File System 3. Data Isolation Data Ionsistency As the data is stored in various files, & various files may be Consistency constraints stored in different format, writing application program to retrieve the data is difficult. Atomicity Redundancy 4. Integrity Problems Sometimes, we need that data stored should satisfy certain Data isolation constraints as in a bank a minimum deposit should be of $100. Data Security Developers enforce these constraints by writing appropriate Students Activity programs but if later on some new constraint has to be added then it is difficult to change the programs to enforce them. 1. What is database?Explain with example? 5. Atomicity Problems Any mechanical or electrical devive is subject to failure, and so is the computer system. In this case we have to ensure that data should be restored to a consistent state.For example an amount of $50 has to be transferred from Account A to Account B. Let the amount has been debited from account A but have not been credited to Account B and in the mean time, some failure occurred. So, it will lead to an inconsistent state. So,we have to adopt a mechanism which ensures that either full 2. What is DBMS?Explain with example? transaction hould b executed or no transaction should be excuted i.e. the fund transfer should be atomic. 6. Concurrent access Problems Many systems allows multiple users to update the data simultaneously. It can also lead the data in an inconsistent state.Suppose a bank account contains a balance of $ 500 & two customers want to withdraw $100 & $50 simultaneously. Both the transaction reads the old balance & withdraw from that old balance which will result in $450 & &400 which is incorrect. 3. List four significant difference between file system & 7.Security Problems DBMS? All the user of database should not be able to access all the data. For example a payroll Personnel needs to access only that part of data which has information about various employees & are not needed to access information about customer accounts. Points to Ponder A DBMS contains collection of inter-related data & collection of programs to access the data. 2

8 4. What are the advantages of DBMS? 10. Explain consistency constraints in database? DATABASE MANAGEMENT 5. Explain various applications of database? 6. Explain data inconsistency with example? 7. Explain data security? Why it is needed?Explain with example? 8. Explain isolation & atomicity property of database? 9. Explain why redundancy should be avoided in database? 3


10 DATABASE LESSON 2: INTRODUCTION TO DATABASE II MANAGEMENT Lesson Objective Data abstraction View of data Levels of data Physical level Logical level View level Database language DDL,DML View of Data A database contains a no. of files & certain programs to access & modify these files.But the actual data is not shown to the user, View level the system hides actual details of how data is stored & main- tained. This level contains the actual data which is shown to the users. This is the highest level of abstraction & the user of Data Abstraction this level need not know the actual details(complexity) of Data abstraction is the process of distilling data down to its data storage. essentials. The data when needed should be retrieved efficiently. As all the details are not of use for all the users, so we hide the actual(complex) details from users. Various level of abstraction to data is provided which are listed below:- Physical level It is the lowest level of abstraction & specifies how the data is actually stored. It describes the complex data structure in details. Database Language As a language is required to understand any thing, similarly to create or manipulate a database we need to learn a language.Database language is divided into mainly 2 parts :- 1. DDL(Data definition language) 2. DML(Data Manipulation language) Data Definition Language (DDL) Used to specify a database scheme as a set of definitions Logical level expressed in a DDL It is the next level of abstraction & describes what data are 1. DDL statements are compiled, resulting in a set of tables stored in database & what relationship exists between varius stored in a special file called a data dictionary or data data. It is less complex than physical level & specifies simple directory. structures. Though the complexity of physical level is 2. The data directory contains metadata (data about data) required at logical level, but users of logical level need not 3. The storage structure and access methods used by the know these complexities. database system are specified by a set of definitions in a 5

11 special type of DDL called a data storage and definition Dml DATABASE language Query language 4. basic idea: hide implementation details of the database Students Activity schemes from the users 1. Define data abstraction? Data Manipulation Language (DML) MANAGEMENT 1. Data Manipulation is: retrieval of information from the database insertion of new information into the database deletion of information in the database modification of information in the database 2. A DML is a language which enables users to access and manipulate data. The goal is to provide efficient human 2. How many views of data abstraction are there? Explain in interaction with the system. details? 3. There are two types of DML: procedural: the user specifies what data is needed and how to get it nonprocedural: the user only specifies what data is needed Easier for user May not generate code as efficient as that produced by procedural languages 4. A query language is a portion of a DML involving 3. Explain database language? Differentiate between DDL & information retrieval only. The terms DML and query DML? language are often used synonymously. Points to Ponder DBMS systems are ubiquitous today & most people interact either directly or indirectly with database many times every day. Database systems are designed to store large bodies of information. A major purpose of a DBMS is to provide users with an abstract view of data i.e. the system hides how the data is stored & maintained. Structure of a database is defined through DDL. & manipulated through DML. DDL statements are compiled, resulting in a set of tables stored in a special file called a data dictionary or data directory. A query language is a portion of a DML involving information retrieval only. The terms DML and query language are often used synonymously. Review Terms Data Security Data Views Data Abstraction Physical level Logical level View level Database language Ddl 6



14 DATABASE LESSON 4: DATABASE CONCEPTS I MANAGEMENT Lesson objective Null value allowed Data dictionary Null or non-existing data value may be or may not be allowed for an element. Element with possible null values Meta data needs special considerations in reports and may cause Database schema problems, if used as a key. Database Instance Default value Data independence Data element may have a default value. Default value may be a variable, like current date and time of the day (DoD). Data Dictionary English language dictionaries define data in terms such as Element coding (allowed values) and intra-element known facts or things used as a basis for inference or validation details or reference to other documents reckoning, typically (in modern usage) operated upon or Explanation of coding (code tables, etc.) and validation manipulated by computers, or Factual information used as a rules when validating this element alone in the application basis for discussion, reasoning, or calculation domain. Data dictionary may cover the whole organisation, a part of the Inter-element validation details or reference to other organisation or a database. In its simplest form, the data documents dictionary is only a collection of data element definitions. More Validation rules between this element and other elements in advanced data dictionary contains database schema with the data dictionary. reference keys, still more advanced data dictionary contains Database table references entity-relationship model of the data elements or objects. Reference to tables the element is used and the role of the Parts of Data Dictionary element in each table. Special indication when the data element is the key for the table or a part of the key. 1. Data Element Definitions Definitions and references needed to understand the Data element definitions may be independent of table defini- meaning of the element tions or a part of each table definition Short application domain definitions and references to Data element number other documents needed to understand the meaning and Data element number is used in the technical documents. use of the data element. Data element name (caption) Source of the data in the element Commonly agreed, unique data element name from the Short description in application domain terms, where the application domain. This is the real life name of this data data is coming. Rules used in calculations producing the element. element values are usually written here. Short description Validity dates for the data element definition Description of the element in the application domain. Validity dates, start and possible end dates, when the element is or was used. There may be several time periods Security classification of the data element the element has been used. Organisation-specific security classification level or possible restrictions on use. This may contain technical links to History references security systems. Date when the element was defined in present form, references to superseded elements, etc. Related data elements External references List of closely related data element names when the relation References to books, other documents, laws, etc. is important. Version of the data element document Field name(s) Version number or other indicator. This may include formal Field names are the names used for this element in version control or configuration management references, computer programs and database schemas. These are the but such references may be hidden, depending on the technical names, often limited by the programming system used. languages and systems. Date of the data element document Code format Writting date of this version of the data element Data type (characters, numeric, etc.), size and, if needed, document. special representation. Common programming language notation, input masks, etc. can be used. 9

15 Quality control references For example, meta-data would document data about data DATABASE Organisation-specific quality control endorsements, dates, elements or attributes, (name, size, data type, etc) and data etc. about records or data structures (length, fields, columns, etc) Data element notes and data about data (where it is located, how it is associated, Short notes not included in above parts. ownership, etc.). Meta-data may include descriptive information about the context, quality and condition, or characteristics of the MANAGEMENT Table Definitions data. Table definition is usually available with SQL command help table Data Independence 1. The ability to modify a scheme definition in one level Tablename without affecting a scheme definition in a higher level is Table name called data independence. Table owner or database name 2. There are two kinds: List of data element (column) names and details Physical data independence Key order for all the elements, which are possible keys The ability to modify the physical scheme without Possible information on indexes causing application programs to be rewritten Possible information on table organisation Modifications at this level are usually to improve Technical table organisation, like hash, heap, B+ -tree, AVL - performance tree, ISAM, etc. may be in the table definition. Logical data independence Duplicate rows allowed or not allowed The ability to modify the conceptual scheme Possible detailed data element list with complete data without causing application programs to be element definitions rewritten Possible data on the current contents of the table Usually done when logical structure of database is The size of the table and similar site specific information altered may be kept with the table definition. 3. Logical data independence is harder to achieve as the Security classification of the table application programs are usually heavily dependent on the Security classification of the table is usually same or higher logical structure of the data. An analogy is made to abstract than its elements. However, there may be views accessing data types in programming languages. parts of the table with lower security. Points to Ponder Database schema Data dictionary is a collection of data elements & its It is the overall structure is called a database schema. Database definition. schema is usually graphical presentation of the whole database. Database Schema is the overall structure of a database. Tables are connected with external keys and key colums. When accessing data from several tables, database schema will be Database instance is the structure of a database at a needed in order to find joining data elements and in complex particular time. cases to find proper intermediate tables. Some database Meta-data is the data about data. products use the schema to join the tables automatically. The ability to modify a scheme definition in one level Database system has several schemas according to the level of without affecting a scheme definition in a higher level is abstraction.The physical schema describes the database design at called data independence. physical level. The logical schema describes the database design Review Terms at logical level.A database can also have sub-schemas(view level) that describes different views of database. Database Instance Schema Database Instance Database Schema 1. Databases change over time. Physical schema 2. The information in a database at a particular point in time is called an instance of the database Logical schema 3. Analogy with programming languages: Physical data independence Data type definition - scheme Database Language Value of a variable - instance DDL Meta-Data DML Meta-data is definitional data that provides information about Query Language or documentation of other data managed within an application Data dictionary or environment. Metadata 10

16 DATABASE Student Activity 1. What is difference between database Schema & database instance? MANAGEMENT 2. What do you understand by the structure of a database? 3. Define physical schema and logical schema? 4. Define data independence?Explain types of data independence? 5. Define data dictionary, meta-data? 6. Define various elements of data dictionary? 11


18 DATABASE LESSON 5: DATABASE CONCEPTS II MANAGEMENT Lesson Objective Storage structure and access method definition: writing Database manager a set of definitions translated by the data storage and definition language compiler Database user Scheme and physical organization modification: writing Database administrator a set of definitions used by the DDL compiler to generate Role of Database administrator modifications to appropriate internal system tables (e.g. data Role of Database user dictionary). This is done rarely, but sometimes the database Database architecture scheme or physical organization must be modified. Database Manager Granting of authorization for data access: granting The database manager is a program module which provides the different types of authorization for data access to various interface between the low-level data stored in the database and users the application programs and queries submitted to the system. Integrity constraint specification: generating integrity 1. Databases typically require lots of storage space (gigabytes). constraints. These are consulted by the database manager This must be stored on disks. Data is moved between disk module whenever updates occur. and main memory (MM) as needed. Database Users 2. The goal of the database system is to simplify and facilitate The database users fall into several categories: access to data. Performance is important. Views provide Application programmers are computer professionals interact- simplification. ing with the system through DML calls embedded in a program 3. So the database manager module is responsible for written in a host language (e.g. C, PL/1, Pascal). Interaction with the file manager: Storing raw data These programs are called application programs. on disk using the file system usually provided by a The DML precompiler converts DML calls (prefaced by a conventional operating system. The database manager special character like $, #, etc.) to normal procedure calls in a must translate DML statements into low-level file host language. system commands (for storing, retrieving and The host language compiler then generates the object code. updating data in the database). Some special types of programming languages combine Integrity enforcement: Checking that updates in the Pascal-like control structures with control structures for the database do not violate consistency constraints (e.g. no manipulation of a database. bank account balance below $25) These are sometimes called fourth-generation languages. Security enforcement: Ensuring that users only have They often include features to help generate forms and access to information they are permitted to see display data. Backup and recovery: Detecting failures due to Sophisticated users interact with the system without power failure, disk crash, software errors, etc., and writing programs. restoring the database to its state before the failure They form requests by writing queries in a database query Concurrency control: Preserving data consistency language. when there are concurrent users. These are submitted to a query processor that breaks a 4. Some small database systems may miss some of these DML statement down into instructions for the database features, resulting in simpler database managers. (For manager module. example, no concurrency is required on a PC running MS- DOS.) These features are necessary on larger systems Specialized users are sophisticated users writing special database application programs. These may be Database Administrator CADD systems, knowledge-based and expert systems, The database administrator is a person having central control complex data systems (audio/video), etc. over data and programs accessing that data. Duties of the Naive users are unsophisticated users who interact database administrator include: with the system by using permanent application Scheme definition: the creation of the original database programs (e.g. automated teller machine). scheme. This involves writing a set of definitions in a DDL 1. Database systems are partitioned into modules for different (data storage and definition language), compiled by the functions. Some functions (e.g. file systems) may be DDL compiler into a set of tables stored in the data provided by the operating system. dictionary. 13

19 2. Components include: DATABASE Nave Application Sophisticated Database Users Programs Users Administrator File manager manages allocation of disk space and data structures used to represent information on disk. Database manager: The interface between low-level Application Application Query Database Interfaces programs Schema data and application programs and queries. MANAGEMENT Query processor translates statements in a query language into low-level instructions the database manager understands. (May also attempt to find an Query DDL equivalent but more efficient form.) DML PreCompiler processor compiler DML precompiler converts DML statements embedded in an application program to normal procedure calls in a host language. The precompiler Application Program Database Manager interacts with the query processor. object code DDL compiler converts DDL statements to a set of tables containing metadata stored in a data dictionary. In addition, several data structures are required for physical File Manager system implementation: Data files: store the database itself. Data dictionary: stores information about the structure of the database. It is used heavily. Great emphasis should be placed on developing a good design and efficient implementation of the dictionary. Indices: provide fast access to data items holding particular Data files values. Data Database System Architecture Data dictionary Database systems are partitioned into modules for different storage functions. Some functions (e.g. file systems) may be provided by the operating system. Components Include Points to Ponder File manager manages allocation of disk space and data Database manager is a program module which provides structures used to represent information on disk. the interface between the low-level data stored in the Database manager: The interface between low-level data database and the application programs and application programs and queries. Database administrator is a person having central control Query processor translates statements in a query language over data into low-level instructions the database manager Database user is a person who access the database at various understands. (May also attempt to find an equivalent but level. more efficient form.) Data files: store the database itself. DML precompiler converts DML statements embedded Data dictionary: stores information about the structure of in an application program to normal procedure calls in a the database. host language. The precompiler interacts with the query DML precompiler converts DML statements embedded processor. in an application program to normal procedure calls in a DDL compiler converts DDL statements to a set of tables host language. containing metadata stored in a data dictionary. File manager manages allocation of disk space and data In addition, several data structures are required for physical structures used to represent information on disk. system implementation: Review Terms Data files: store the database itself. Database Instance Data dictionary: stores information about the structure of Schema the database. It is used heavily. Great emphasis should be placed on developing a good design and efficient Database Schema implementation of the dictionary. Physical schema Indices: provide fast access to data items holding particular Logical schema values. Physical data independence 14

20 Database Language DATABASE DDL DML Query Language Data dictionary MANAGEMENT Metadata Database Administrator Database User Student Activity 1. What are the various kinds of database users? 2. What do you understand by the structure of a database? 3. Define physical schema and logical schema? 4. Define file manager, dml compiler,data files? 15


22 DATABASE LESSON 6: DATA MODELS MANAGEMENT Lesson objective If we want to create a structure where in a course various Understsnding data models students are there & these students are given certain marks in assignment. Different types of data models Hierchachical data model network data model Relational model Data models are a collection of conceptual tools for describing data, data relationships, data semantics and data constraints. A data model is a description of both a container for data and a methodology for storing and retrieving data from that However, as you can imagine, the hierarchical database model container. Actually, there isnt really a data model thing. Data has some serious problems. For one, you cannot add a record models are abstractions, oftentimes mathematical algorithms to a child table until it has already been incorporated into the and concepts. You cannot really touch a data model. But parent table. This might be troublesome if, for example, you nevertheless, they are very useful. The analysis and design of wanted to add a student who had not yet signed up for any data models has been the cornerstone of the evolution of courses. databases. As models have advanced so has database efficiency. Worse, yet, the hierarchical database model still creates repetition There are various kinds of data models i.e. in a database records of data within the database. You might imagine that in the can be arranged in various ways.The various ways in which data database system shown above, there may be a higher level that can be represented are:- includes multiple course. In this case, there could be redundancy 1. Hierarchial data model because students would be enrolled in several courses and thus each course tree would have redundant student information. 2. Network data model Redundancy would occur because hierarchical databases handle 3. Relational Model one-to-many relationships well but do not handle many-to- 4. E-R-Model many relationships well. This is because a child may only have The Hierarchical Model one parent. However, in many cases you will want to have the Organization of the records is as a collection of trees. As its child be related to more than one parent. For instance, the name implies, the Hierarchical Database Model defines hierarchi- relationship between student and class is a many-to-many. cally-arranged data. Not only can a student take many subjects but a subject may Perhaps the most intuitive way to visualize this type of also be taken by many students. How would you model this relationship is by visualizing an upside down tree of data. In relationship simply and efficiently using a hierarchical database? this tree, a single table acts as the root of the database from The answer is that you wouldnt. which other tables branch out. You will be instantly familiar with this relationship because that is how all windows-based directory management systems (like Windows Explorer) work these days. Relationships in such a system are thought of in terms of children and parents such that a child may only have one parent but a parent can have multiple children. Parents and children are Though this problem can be solved with multiple databases tied together by links called pointers (perhaps physical creating logical links between children, the fix is very kludgy and addresses inside the file system). A parent will have a list of awkward. pointers to each of their children. Faced with these serious problems, the computer brains of the world got together and came up with the network model. Network Databases In many ways, the Network Database model was designed to solve some of the more serious problems with the Hierarchical Database Model. Specifically, the Network model solves the problem of data redundancy by representing relationships in terms of sets rather than hierarchy. The model had its origins in 17

23 the Conference on Data Systems Languages (CODASYL) which The Sequence of Columns is Insignificant DATABASE had created the Data Base Task Group to explore and design a The Sequence of Rows is Insignificant method to replace the hierarchical model. Each Column Has a Unique Name The network model is very similar to the hierarchical model Certain fields may be designated as keys, which means that actually. In fact, the hierarchical model is a subset of the network searches for specific values of that field will use indexing to MANAGEMENT model. However, instead of using a single-parent tree hierarchy, speed them up. Where fields in two different tables take values the network model uses set theory to provide a tree-like from the same set, a join operation can be performed to select hierarchy with the exception that child tables were allowed to related records in the two tables by matching values in those have more than one parent. This allowed the network model to fields. Often, but not always, the fields will have the same name support many-to-many relationships. in both tables. For example, an orders table might contain Visually, a Network Database looks like a hierarchical Database (customer-ID, product-code) pairs and a products table in that you can see it as a type of tree. However, in the case of a might contain (product-code, price) pairs so to calculate a given Network Database, the look is more like several trees which customers bill you would sum the prices of all products share branches. Thus, children can have multiple parents and ordered by that customer by joining on the product-code fields parents can have multiple children. of the two tables. This can be extended to joining multiple tables on multiple fields. Because these relationships are only specified at retreival time, relational databases are classed as dynamic database management system. The RELATIONAL database model is based on the Relational Algebra. A basic understanding of the relational model is necessary to effectively use relational database software such as Oracle, Microsoft SQL Server, or even personal database systems such as Access or Fox, which are based on the relational model. Nevertheless, though it was a dramatic improvement, the network model was far from perfect. Most profoundly, the model was difficult to implement and maintain. Most imple- mentations of the network model were used by computer programmers rather than real users. What was needed was a simple model which could be used by real end users to solve real problems. Relational Model The relational model was formally introduced by Dr. E. F. Points to Ponder Codd in 1970 and has evolved since then, through a series of Data models are a collection of conceptual tools for writings. The model provides a simple, yet rigorously defined, describing data, data relationships, data semantics and data concept of how users perceive data. Network model solves the constraints. problem of data redundancy by representing relationships in terms of sets. A relational database is a collection of two- Types of data models are:- dimensional tables. The organization of data into relational 1. Hierarchial data model tables is known as the logical view of the database. That is, the 2. Network data model form in which a relational database presents data to the user and 3. Relational Model the programmer. The way the database software physically stores the data on a computer disk system is called the internal 4. E-R-Model view. The internal view differs from product to product and The Hierarchical Database Model defines hierarchically- does not concern us here. arranged data. A relational database allows the definition of data structures, Network model solves the problem of data redundancy by storage and retrieval operations and integrity constraints. In representing relationships in terms of sets. such a database the data and relations between them are Network model solves the problem of data redundancy by organised in tables. A table is a collection of records and each representing relationships in terms of sets. record in a table contains the same fields. Review Terms Properties of Relational Tables Data models Values Are Atomic Hierchical data model Each Row is Unique Network data model Column Values Are of the Same Kind Relational data model 18

24 DATABASE Students Activity 1. Define data models? MANAGEMENT 2. Define hierarchichal data model? 3. Define relational data model? 4. Define relational data model? 19


26 DATABASE LESSON 7: RELATIONAL DATABASE MANAGEMENT SYSTEM I MANAGEMENT student_id student_name address Lesson Objective Understanding RDBMS Understanding data structures 8656789 Peta Williams 9, Davis Hall Understanding data manipulation 8700074 John Smith 9, Davis Hall 8900020 Arun Kumar 90, Second Hall Understanding various relational algebra operation 8801234 Peter Chew 88, Long Hall Understanding data integrity 8654321 Reena Rani 88, Long Hall 8712374 Kathy Garcia 88, Long Hall The relational model was proposed by E. F. Codd in 1970. It The relation student 8612345 Chris Watanabe 11, Main Street deals with database management from an abstract point of student_id subject_id view. The model provides specifications of an abstract database management system.To use the database management systems 8700074 CP302 based on the relational model however, users do not need to 8900020 CP302 master the theoretical foundations. Codd defined the model as 8900020 CP304 consisting of the following three components: 8700074 MA111 1. Data Structure - a collection of data structure types for 8801234 CP302 building the database. 8801234 CH001 2. Data Manipulation - a collection of operators that may be The relation enrolment used to retrieve, derive or modify data stored in the data structures. subject_id subject_name department CP302 Database Management Comp. Science 3. Data Integrity - a collection of rules that implicitly or CP304 Software Engineering Comp. Science explicitly define a consistent database state or changes of CH001 Introduction to Chemistry Chemistry states PH101 Physics Physics Data Structure MA111 Pure Mathematics Mathematics Often the information that an organisation wishes to store in a We list a number of properties of relations: computer and process is complex and unstructured. For example, we may know that a department in a university has 1. Each relation contains only one record type. 200 students, most are full-time with an average age of 22 years, 2. Each relation has a fixed numer of columns that are and most are females. Since natural language is not a good explicitly named. Each attribute name within a relation is language for machine processing, the information must be unique. structured for efficient processing. In the relational model the 3. No two rows in a relation are the same. information is structures in a very simple way 4. Each item or element in the relation is atomic, that is, in We consider the following database to illustrate the basic each row, every attribute has only one value that cannot be concepts of the relational data model. decomposed and therefore no repeating groups are allowed. 5. Rows have no ordering associated with them. 6. Columns have no ordering associated with them (although most commercially available systems do). The above properties are simple and based on practical consider- The above database could be mapped into the following ations. The first property ensures that only one type of relational schema which consists of three relation schemes. Each information is stored in each relation. The second property relation scheme presents the structure of a relation by specifying involves naming each column uniquely. This has several its name and the names of its attributes enclosed in parenthe- benefits. The names can be chosen to convey what each column sis. Often the primary key of a relation is marked by is and the names enable one to distinguish between the column underlining. and its domain. Furthermore, the names are much easier to student(student_id, student_name, address) remember than the position of the position of each column if enrolment(student_id, subject_id) the number of columns is large. subject(subject_id, subject_name, department) The third property of not having duplicate rows appears An example of a database based on the above relational model obvious but is not always accepted by all users and designers of is: DBMS. The property is essential since no sensible context free 21

27 meaning can be assigned to a number of rows that are exactly DATABASE Data Manipulation the same. The manipulative part of relational model makes set processing The next property requires that each element in each relation be (or relational processing) facilities available to the user. Since atomic that cannot be decomposed into smaller pieces. In the relational operators are able to manipulate relations, the user relation model, the only composite or compound type (data does not need to use loops in the application programs. Avoiding loops can result in significant increase in the produc- MANAGEMENT that can be decomposed into smaller pieces) is a relation. This simplicity of structure leads to relatively simple query and tivity of application programmers. manipulative languages. The primary purpose of a database in an enterprise is to be able The relation is a set of tuples and is closely related to the to provide information to the various users in the enterprise. concept of relation in mathematics. Each row in a relation may The process of querying a relational database is in essence a way be viewed as an assertion. For exaple, the relation student of manipulating the relations that are the database. For asserts that a student by the name of Reena Rani has example, one may wish to know student_id 8654321 and lives at 88, Long Hall. Similarly the 1. names of all students enrolled in CP302, or relation subject asserts that one of the subjects offered by the 2. names of all subjects taken by John Smith. Department of Computer Science is CP302 Database Manage- ment. The Relational Algebra The relational algebra is a procedural query language. It consists In the relational model, a relation is the only compound data of a set of operations that take one or two relations as input structure since relation do not allow repeating groups or and produce a new relation as their result. The fundamental pointers. operations in the relational algebra are select, project, union, set We now define the relational terminology: difference, Cartesian product, and rename. In addition to the Relation - essentially a table fundamental operations, there are several other operations- Tuple - a row in the relation namely, set intersection, natural join, division, and assignment. Attribute - a column in the relation We will define these operations in terms of the fundamental Degree of a relation - number of attributes in the relation operations. Cardinality of a relation - number of tuples in the relation Fundamental Operations Domain - a set of values that an attribute is permitted to take. The select, project, and rename operations are called unary Same domain may be used by a number of different attributes. operations, because they operate on one relation. The other Primary key - as discussed in the last chapter, each relation must three operations operate on pairs of relations and are, therefore, have an attribute (or a set of attributes) that uniquely identifies called binary operations. each tuple. Various operations are shown as follows: Each such attribute (or a set of attributes) is called a candidate key of the relation if it satisfies the following properties: Operation Symbol Operation Symbol (a) the attribute or the set of attributes uniquely identifies Projection Cartesian product each tuple in the relation (called uniqueness), and Join Selection (b) if the key is a set of attributes then no subset of these attributes has property (a) (called minimality). Left outer join Renaming There may be several distinct set of attributes that may serve as Right outer join candidate keys. One of the candidate keys is arbitrarily chosen as Union the primary key of the relation. Intersection Full outer join The three relations above student, enrolment and subject have Assignment Semijoin degree 3, 2 and 3 respectively and cardinality 4, 6 and 5 respec- tively. The primary key of the the relation student is student_id, of relation enrolment is (student_id, subject_id), and finally the The Select Operation primary key of relation subject is subject_id. The relation student probably has another candidate key. If we can assume The select operation selects tuples that satisfy a given condition.. the names to be unique than the student_name is a candidate The argument relation is in parentheses after the . Thus, to key. If the names are not unique but the names and address select those tuples of the loan relation where the branch is together are unique, then the two attributes (student_id, Perryridge, we write address) is a candidate key. Note that both student_id and (student_id, address) cannot be candidate keys, only one can. branch-name = Perryridge (loan) Similarly, for the relation subject, subject_name would be a If the loan relation is as shown , then the relation that results candidate key if the subject names are unique. from the preceding query will be a different relation. The relational model is the most popular data model for We can find all tuples in which the amount lent is more than commercial data processing applications.It is very much simple $1200 writing due to which the programmers work is reduced. amount>1200 (loan) 22

28 In general, we allow comparisons using =, , , in the Example: The table E (for Employee) DATABASE selection predicate. Furthermore, we can combine several enr ename dept predicates into a larger predicate by using the connectives and 1 Bill A (), or (V), and not (). Thus, to find those tuples pertaining to loans of more than $1200 made by the Perryridge branch, we 2 Sarah C write 3 John A MANAGEMENT branch-name = Perryridge L amount>1200 (loan) Example: The table D (for Department) dnr dname A Marketing B Sales C Legal The selection predicate may include comparisons between two Result Relational algebra attributes. To Illustrate, consider the relation loan-officer that enr ename dept dnr dname EXD consists of three attributes: customer-name, banker-name, and 1 Bill A A Marketing loan-number, which specifies that a particular banker is the loan 1 Bill A B Sales 1 Bill A C Legal officer for a loan that belongs to some customer. To find all 2 Sarah C A Marketing customers who have the same name as their loan officer, we can 2 Sarah C B Sales write 2 Sarah C C Legal 3 John A A Marketing customer-name = banker-name (loan-officer) 3 John A B Sales Projection 3 John A C Legal Projection is the operation of selecting certain attributes from a Seldom useful in practice. relation R to form a new relation S. For example, one may only Can give a huge result. be interested in the list of names from a relation that has a number of other attributes. Projection operator may then be The Union Operation used. Like selection, projection is an unary operator. Consider a query to find the names of all bank customers who II loan-number, amount (loan) have either an account or a loan or both. Note that the customer relation does not contain the information, since a customer Composition of Relational Operations does not need to have either an account or a loan at the bank. The fact that the result of a relational operation is itself a To answer this query, we need the information in the depositor relation is important. Consider the more complicated query relation and in the borrower relation . We know how to find Perryridge Find those customers who live1500 in Harrison. We write: the names of all customers with a loan in the bank: II customer-name ( customer-city = Harrison (customer)) II customer-name (borrower) Notice that, instead of giving the name of a relation as the We also know how to find the names of all customers with an argument of the projection operation, we give an expression account in the bank: that evaluates to a relation. II customer-name (depositor) In general, since the result of a relational-algebra operation is of To answer the query, we need the union of these two sets; that the same type (relation) as its inputs, relational-algebra opera- is, we need all customer names that appear in either or both of tions can be composed together into a the two relations. We find these data by the binary operation loan-number amount union, denoted, as in set theory, by U. So the expression needed L-11 900 is L-14 1500 II customer-name (borrower) U II customer-name (depositor) L-15 1500 L-16 1300 There are 10 tuples in the result, even though there are seven L-17 1000 distinct borrowers and six depositors. This apparent discrepancy L-23 2000 occurs because Smith, Jones, and Hayes are borrowers as well as L-93 500 depositors. Since relations are sets, duplicate values are elimi- nated. customer-name Loan number and the amount of the loan. Adams Relational-algebra expression. Composing relational-algebra Curry operations into relational-algebra expressions in just like Hayes Jackson composing arithmetic operations (such as +, -, *, and %) into Jones arithmetic expressions. Smith Cartesian Product Williams Lindsay The cartesian product of two tables combines each row in one Johnson table with each row in the other table. Turner 23

29 Names of all customers who have either a loan or an account. of the query. For relational-algebra queries, assignment must DATABASE For a union operation r U s to be valid, we require that two always be made to a temporary relation variable.Note that the conditions hold: assignment operation does not provide any additional power to the algebra. It is, however, a convenient way to express complex 1. The relations r and s must have the same number of queries. attributes. MANAGEMENT 2. The domains of the ith attribute of r and the ith attribute Points to Ponder of s must be the same, for all i. The relational model was proposed by E. F. Codd in Note that r and s can be, in general, temporary relations that are 1970. the result of relational-algebra expressions. provides specifications of an abstract database management system The Set-Intersection Operation The first additional-relational algebra operation that we shall It consists of the following three components: define is set intersection (I ). Suppose that we wish to find all 1. Data Structure a collection of data structure types for customers who have both a loan and an account. Using set building the database. intersection, we can write 2. Data Manipulation a collection of operators that may be IIcustomer-name (borrower) IIcustomer-name (deposi- used to retrieve, derive or modify data stored in the data tor) structures. Note that we can rewrite any relational algebra expression that 3. Data Integrity a collection of rules that implicitly or uses set intersection by replacing the intersection operation with explicitly define a consistent database state or changes of. a pair of set-difference operations as: Relational algebra describes a set of algebraic operations that operates on tables, & output a table as Thus, set intersection is not a fundamental operation and does a result. not add any power to the relational algebra. It is simply more Review Terms convenient to write r I s than to write r (r s). Table/Relation The Set Difference Operation Tuple The set-difference operation, denoted by -, allows us to find Domain tuples that are in one relation but are not in another. The expression r s produces a relation containing those tuples in r Database schema but not in s. Database instance We can find all customers of the bank who have an account but Keys not a loan by writing Primary key II customer-name (depositor) II customer-name (borrower) Foreign key As with the union operation, we must ensure that set differ- Relational algebra ences are taken between compatible relations. Therefore, for a Student Activity set difference operation r s to be valid, 1. Why do we use RDBMS? we require that the relations r and s be of the same arity, and that the domains of the ith attribute of r and the ith attribute of s be the same. The Assignment Operation It is convenient at times to write a relational-algebra expression by assigning parts of it to temporary relation variables. The assignment operation, denoted by , works like assignment in a programming language. temp1 amount>1200 (loan) 2. Define relation,tuple,domain,keys? temp2 II loan-number, amount (loan) result = temp1 temp2 The evaluation of an assignment does not result in any relation being displayed to the user. Rather, the result of the expression to the right of the is assigned to the relation variable on the left of the . This relation variable may be used in subsequent expressions. With the assignment operation, a query can be written as a sequential program consisting of a series of assignment followed by an expression whose value is displayed as the result 24

30 3. What is the difference between Intersection, Union & DATABASE Cartesian product? MANAGEMENT 25


32 DATABASE LESSON 8: RELATIONAL DATABASE MANAGEMENT SYSTEM II MANAGEMENT Lesson Objectives distinct). An example arises in the query Find the number of Elaborating various other features of Relational algebra branches appearing in the pt-works relation. In this case, a branch name count only once, regardless of the number of Understanding aggregate function employees working that branch. We write this query as follows: Understanding joins Gcount-distinct(branch-name) (pt-works) Understanding natural, outer, inner joins? In the above figure, the result of this query is a single row Aggregate Functions containing the value 3. Aggregate functions take a collection of values and return a Suppose we want to find the total salary sum of all part-time single value as a result. For example, the aggregate function sum employees at each branch of the bank separately, rather than the takes a collection of values and returns the sum of the values. sun for the entire bank. To do so, we need to partition the Thus, the function sum applied on the collection relation pt-works into group based on the branch, and to apply {1,1,3,4,4,11} the aggregate function on each group. returns the value 24. The aggregate function avg returns the The following expression using the aggregation operator G average of the values. When applied to the preceding collection, achieves the desired result: it returns the value 4. The aggregate function count returns the branch-name G sum(salary) (pt-works) number of the elements in the collection, and returns 6 on the In the expression, the attribute branch-name in the left-hand preceding collection. Other common aggregate function include subscript of G indicates that the input relation pt-works must min and max, which return the minimum and maximum be divided into groups based on the value of branch-name. values in a collection; they return 1 and 11, respectively, on the Following Figure shows the resulting groups. preceding collection. employee-name branch-name salary The collections on which aggregate functions operate can have Rao Austin 1500 multiple occurrences of a value; the order in which the values Sato Austin 1600 appear is not relevant. Such collections are called multisets. Sets Johnson Downtown 1500 are a special case of multisets where there is only one copy of Loreena Downtown 1300 each element. Peterson Downtown 2500 To illustrate the concept of aggregation, we take the following Adams Perryridge 1500 example Brown Perryridge 1300 Gopal Perryridge 5300 Adams Perryridge 1500 Brown Perryridge 1300 The expression sum(salary) in the right-hand subscript of G Gopal Perryridge 5300 indicates that for each group of tuples (that is, each branch), the Johnson Downtown 1500 aggregation function sum must be applied on the collection of Loreena Downtown 1300 values of the salary attribute. The output relation consists of Peterson Downtown 2500 tuples with the branch name, and the sum of the salaries for Rao Austin 1500 the branch, as shown in Figure Sato Austin 1600 branch-name sum of salary Austin 3100 G sum(salary) (pt-works) Downtown 5300 The symbol G is the letter G in calligraphic fount; read it as Perryridge 8100 calligraphic G. The relational-algebra operation G signifies that aggregation is to be applied, and its subscript specifies the The general from of the aggregation operation G is as follows: aggregate operation to be applied. The result of the expression G1,G2,,Gn G F1 (A1), F2(A2),,Fm(Am)(E) above is a relation with a single attribute, containing a single Where E is any relational-algebra expression; G1, G2,, Gn row with a numerical value corresponding to the sum of all the constitute a list of attributes on which to group; each Fi is an series of all employees working part-time in the bank. aggregate function; and each Ai is an attribute There are cases where we must eliminate multiple occurrences of a value before computing an aggregate function. If we do want Join The Natural-Join Opeation to eliminate duplicates, we use the same function names as before, with the addition of the hyphenated string distinct If is often desirable to simplify certain queries that require a appended to the end of the function name (for example, count- Cartesian product. Usually, a query that involves a Cartesian 27

33 product includes a selection operation on the result of the the natural join. The tuple (Gates, null, null, Redmond, 5300) is DATABASE Cartesian product. Consider the query Find the names of all such a tuple. Thus, all information from the right relation is customers who have a loan at the bank, along with the loan present in the result of the right outer join. number and the loan amount. We first form the Cartesian The full outer join ( ) does both of those operations, product of the borrower and loan relations. Then, we select padding tuples from the left relation that did not match any those tuples that pertain to only the same loan-number, MANAGEMENT from the right relation, as well as tuples from the right relation followed by the projection of the resulting customer-name, that did not match any from the left relation, and adding them loan-number, and amount: to the result of the join. IIcustomer-name,, amount ( = (borrower x loan)) The natural join is a binary operation that allows us to combine certain selections and a Cartesian product into one operation. It Result of employee ft-works. is denoted by the join symbol ( ). The natural-join opera- tion forms a Cartesian product of its two arguments, performs a selection forcing equality on those attributes that appear in both relation schemas, and finally removes duplicate attributes. Outer Join The outer-join operation is an extension of the join operation to deal with missing information. Suppose that we have the Result of employee ft-works. relations with the following schemas, which contain data on Renaming Tables and Columns full-time employees: Example: The table E (for Employee) employee (employee-name, street, city) ft-works (employee-name, branch-name, salary) Suppose that we want to generate a single relation with all the information (street, city, branch name, and salary) about full- time employees. A possible approach would be to use the natural join operation as follows: Example: The table D (for Department) employee ft-works nr name We can use the outer-join operation to avoid this loss of A Marketing information. There are actually three forms of the operation: let B Sales outer join, denoted ; right outer join, denoted ; and C Legal full outer join, denoted . All three forms of outer join compute the join, and add extra tuples to the result of the join. We want to join these tables, but: Several columns in the result will have the same name (nr and name). How do we express the join condition, when there are two columns called nr? The result of employee ft-works. Solutions Rename the attributes, using the rename operator. Keep the names, and prefix them with the table name, as is done in SQL. (This is somewhat unorthodox.) Result Relational algebra Result of employee ft-works. enr ename dept dnr dname 1 Bill A A Marketing (p (enr, ename, dept)(E)) ? dept = dnr (p (dnr, The left outer join ( ) takes all tuples in the left relation that 2 Sarah C C Legal dname)(D)) did not match with any tuple in the right relation, pads the 3 John A A Marketing tuple with null values for all other attributes from the right relation, and adds them to the result of the natural join. The You can use another variant of the renaming operator to change tuple (Smith, Revolver, Death Valley, null, null) is such a tuple. the name of a table, for example to change the name of E to R. All information from the left relation is present in the result of This is necessary when joining a table with itself (see below). the left outer join. .p R(E) The right outer join ( ) is symmetric with the left outer join: A third variant lets you rename both the table and the columns: It pads tuples from the right relation that did not match any (E) .p R(enr, ename, dept) from the left relation with nulls and adds them to the result of 28

34 5. Define rename operators? DATABASE Points to Ponder Aggregate functions take a collection of values and return a single value as a result. Usually, a query that involves a Cartesian product includes a selection operation on the result of the Cartesian product. MANAGEMENT Review Terms Aggregate functions Joins Natural join Outer join Right outer join Left outer join Student Notes 1. Define aggregate functions with example? 2. Define joins?What is natural join? 3. Differentiate between inner join & outer join? 4. Differentiate between left outer join & right outer join with the help of example? 29


36 DATABASE LESSON 9: E-R MODEL - I MANAGEMENT Lesson Objective Each entity has a value for each of its attributes.For each Understanding entity attribute ,there is a set of permitted values called domain or value set. Understanding relationship Understanding attribute, domain, entity set Simple and Composit Attributes A simple attribute has got one value for its attribute & a Understanding Simple & composit Attributes comosit attribute is one which can be divided into sub-parts. Understanding Derived Attribute For example an attribute name can be divided into first name Understanding relationship set middle name & last name. Components of E-R-Diagrams Single and Multivalued Attributes Designing E-R-diagrams An attribute which have got only one value is known as single ER considers the real world to consist of entities and relation- valued attribute.For ex. the loan_no attribute will have only one ships among them. An Entity is a thing which can be distinctly loan_no. There may be cases when an attribute has a set of identified, for example a person, a car, a subroutine, a wire, an values for a specific entity.For ex. an attribute phone_no. may event. have a value zero,one or several phone_no. This is known as multivalued attribute. A Relationship is an association among entities, eg person Owns car Derived Attribute Its value is derived from value of other related attributes or is an association between a person and a car person EATS dish entities.For ex. an attribute age can be calculated from another IN place is an association among a person, a dish and a place. attribute date_of_birth. Attribute, Value, Domain, Entity Set The information about one entity is expressed by a set of Relationship, Relationship Set A relationship is an association among several entities.For ex. a (attribute,value) pairs, eg a car model could be: customer A is associated with loan_no L1. A relationship set is | Name = R1222 | a subset of the cartesian product of entity sets. For example a Power = 7.3 relationship set (abbreviated as RSet) on the relationship Nseats =5 Person HAS_EATEN Dish IN Place could be as shown in Values of attributes belong to different value-sets or domains, --------------------------------------- | RSet 'Person HAS_EATEN Dish IN Place' | for example, for a car, |---------------------------------------| | Person | Dish | Place | Nseats is an integer between 1 and 12 |---------|--------|--------------------| Entities defined by the same set of attributes can be grouped | Steve | Duck | Kuala Lumpur | | Weiren | Duck | Beijing | into an Entity Set (abbreviated as ESet) as shown in | Paolo | Noodles| Naples | | Mike | Fondue | Geneva | |-------------------------------| | Paolo | Duck | Beijing | | ESET : CarModel | |---------------------------------------| |-------------------------------| | Name | Power | Nseats | Notice that an RSet is an ESet having ESets as attributes. |----------|--------------------| | R1222 | 7.3 | 5 | Components of ER-D | HZ893 | 6.8 | 5 | The overall logical structure of a database can be expressed | R1293 | 5.4 | 4 | graphically by an E-R diagram.The various components of an |-------------------------------| E-R diagram are as folows:- 1. Rectapgles, which represent entity sets. An Entity Set A given set of attributes may be referred to as an entity type. All 2. _ Ellipses, which represent attributes. entities in a given ESet are of the same type, but sometimes 3. Diamonds, which represents relationship sets there can be more that one set of the same type. The set of all 4. Lines, which link attributes to entity sets and entity sets to persons who are customers at a given bank can be defined as an relationship sets. entity set customer. The individual entity that constitutes a set 5. Double ellipses, which represent multi-valued attributes. are said to be extension to entity set. So all the individual bank customers are the extension of entity set customer. 6. Dashed ellipse which represents derived attributes 7. Double lines,which indicate total participation of an entity in a relationship set. 31

37 DATABASE ENTITY RELATIONSHIP DIAGRAM NOTATIONS Peter Chen developed ERDs in 1976. Since then Charles Bachman and James Martin have added some sligh refinements to the basic ERD principles. MANAGEMENT Entity An entity is an object or concept about which you want to store information. Weak Entity A weak entity is dependent on another entity to exist.. Attributes Attributes are the properties or characteristics of an entity. Key attribute A key attribute is the unique, distinguishing characteristic of the entity. For example, an employee's social security number might be the employee's key attribute. Multivalued attribute A multivalued attribute can have more than one value. For example, an employee entity can have multiple skill values. Derived attribute A derived attribute is based on another attribute. For example, an employee's monthly salary is based on the employee's annual salary. Relationships Relationships illustrate how two entities share information in the database structure. 32

38 DATABASE Weak relationship To connect a weak entity with others, you MANAGEMENT should use a weak relationship notation. Cardinality Cardinality specifies how many instances of an entity relate to one instance of another entity. Ordinality is also closely linked to cardinality. While cardinality specifies the occurences of a relationship, ordinality describes the relationship as either mandatory or optional. In other words, cardinality specifies the maximum number of relationships and ordinality specifies the absolute minimum number of relationships. Recursive relationship In some cases, entities can be self-linked. For example, employees can supervise other employees. Mapping Cardinalities It express the number of entities to which other entity can be associated via a relationship set. It can be of the following types:- 1. One to one: An entity in A is associated with at most one entity in B, & an entity in B is associated with at most one entity in A. 2. One to many: An entity in A is associated with any number(zero or more) of entities in B.An entity in B however can be associated with at most one entity in A. 3. Many to one: An entity in A is associated with at most one entity in B.An entity in B however can be associated with any number(zero or more) of entities in A. 4. Many to many: An entity in A is associated with any number(zero or more) of entities in B & an entity in B is associated with any number(zero or more) of entities in A. 33

39 One to many relationship DATABASE 2. Define relationship, relationship set? MANAGEMENT Many to one relationship 3. Differentiate between simple & composit attribute? One to one relationship 4. Define derived attribute? Points to Ponder An Entity is a thing which can be distinctly identified, for example a person, a car, a subroutine, a wire, an event. A Relationship is an association among entities, eg person Owns car A given set of attributes may be referred to as an entity type A simple attribute has got one value for its attribute & a 5. Differentiate between single & multi-valued attribute? comosit attribute is one which can be divided into sub- parts. value is derived from value of other related attributes or entities is known as derived attribute. A relationship is an association among several entities A relationship set is a subset of the cartesian product of entity sets 6. Define cardinality?Explain various kinds of cardinality? Review Terms Entity Entity set Attribute Domain Value Relationship 7. Define various components of E-R-Diagram? Relationship set Cardinality Association Student Notes 1. Define entity, domain,value? 34


41 DATABASE LESSON 10: E-R MODEL - II MANAGEMENT Lesson Objective A superkey may contain extraneous attributes, and we are Designing E-R-Diagram often interested in the smallest superkey. A superkey for which no subset is a superkey is called a candidate key. Understanding keys In the example above, S.I.N. is a candidate key, as it is Super key,primary key, composit key minimal, and uniquely identifies a customer entity. Entity relationship diagram methodology A primary key is a candidate key (there may be more than Developing Entity Relationship Diagrams one) chosen by the DB designer to identify entities in an (ERDs) entity set. Why An entity set that does not possess sufficient attributes to form Entity Relationship Diagrams are a major data modelling tool a primary key is called a weak entity set. One that does have a and will help organize the data in your project into entities and primary key is called a strong entity set. define the relationships between the entities. This process has For example, proved to enable the analyst to produce a good database The entity set transaction has attributes transaction-number, structure so that the data can be stored and retrieved in a most date and amount. efficient manner. Different transactions on different accounts could share the Information same number. Entity These are not sufficient to form a primary key (uniquely A data entity is anything real or abstract about which we want to identify a transaction). store data. Entity types fall into five classes: roles, events, Thus transaction is a weak entity set. locations, tangible things or concepts. E.g. employee, payment, campus, book. Specific examples of an entity are called in- For a weak entity set to be meaningful, it must be part of a one- stances. E.g. the employee John Jones, Mary Smiths payment, to-many relationship set. This relationship set should have no etc. descriptive attributes. (Why?) The idea of strong and weak entity sets is related to the Relationship existence dependencies seen earlier. A data relationship is a natural association that exists between one or more entities. E.g. Employees process payments. Member of a strong entity set is a dominant entity. Cardinality defines the number of occurrences of one entity Member of a weak entity set is a subordinate entity. for a single occurrence of the related entity. E.g. an employee A weak entity set does not have a primary key, but we need a may process many payments but might not process any means of distinguishing among the entities. payments depending on the nature of her job. The discriminator of a weak entity set is a set of attributes that Attribute allows this distinction to be made. A data attribute is a characteristic common to all or most The primary key of a weak entity set is formed by taking the instances of a particular entity. Synonyms include property, data primary key of the strong entity set on which its existence element, field. E.g. Name, address, Employee Number, pay rate depends (see Mapping Constraints) plus its discriminator. are all attributes of the entity employee. An attribute or combination of attributes that uniquely identifies one and only To illustrate one instance of an entity is called a primary key or identifier. E.g. transaction is a weak entity. It is existence-dependent on Employee Number is a primary key for Employee. account. Keys The primary key of account is account-number. Differences between entities must be expressed in terms of transaction-number distinguishes transaction entities within attributes. the same account (and is thus the discriminator). A superkey is a set of one or more attributes which, taken So the primary key for transaction would be (account- collectively, allow us to identify uniquely an entity in the number, transaction-number). entity set. Just Remember: The primary key of a weak entity is found by For example, in the entity set customer, customer-name and taking the primary key of the strong entity on which it is S.I.N. is a superkey. existence-dependent, plus the discriminator of the weak entity Note that customer-name alone is not, as two customers set. could have the same name. 36

42 3. Examine relationships between entities closely. Are they DATABASE An Entity Relationship Diagram Methodology necessary? Are there any relationships missing? Eliminate any redundant relationships. Dont connect relationships to Identify the roles, events, locations, tangible things 1. Identify Entities or concepts about which the end-users want to each other. store data. Find the natural associations between pairs of 4. Use colors to highlight important portions of your MANAGEMENT 2. Find Relationships diagram. entities using a relationship matrix. Put entities in rectangles and relationships on line 3. Draw Rough ERD segments connecting the entities. Determine the number of occurrences of one entity 4. Fill in Cardinality for a single occurrence of the related entity. Identify the data attribute(s) that uniquely identify 5. Define Primary Keys one and only one occurrence of each entity. Eliminate Many-to-Many relationships and include 6. Draw Key-Based ERD primary and foreign keys in each entity. Name the information details (fields) which are 7. Identify Attributes essential to the system under development. For each attribute, match it with exactly one entity 8. Map Attributes that it describes. Adjust the ERD from step 6 to account for entities 9. Draw fully attributed ERD or relationships discovered in step 8. Does the final Entity Relationship Diagram 10. Check Results accurately depict the system data? A Simple Example A company has several departments. Each department has a supervisor and at least one employee. Employees must be assigned to at least one, but possibly more departments. At least one employee is assigned to a project, but an employee may be on vacation and not assigned to any projects. The important data fields are the names of the departments, projects, supervisors and employees, as well as the supervisor and employee number and a unique project number. 1. Identify Entities The entities in this system are Department, Employee, Supervi- sor and Project. One is tempted to make Company an entity, but it is a false entity because it has only one instance in this problem. True entities must have more than one instance. 2. Find Relationships We construct the following Entity Relationship Matrix: Department Employee Supervisor Project Department is assigned run by Employee belongs to works on Supervisor Runs Project uses 3. Draw Rough ERD We connect the entities whenever a relationship is shown in the entity Relationship Matrix. E-R diagram with an attribute attached to a relationship set. If a relationship set has also some attributes associated with it, Tips for Effective ER Diagrams then we link these ttributes to that relationship set. For 1. Make sure that each entity only appears once per diagram. example, we have the accessdrate descriptive attribute attached 2. Name every entity, relationship, and attribute on your to the relationship set depositor to specify the most ecent date diagram. on which a customer accessed that account 37

43 E-R diagram with composite, multivalued, and derived DATABASE Student Activity attributes 1. Explain the difference between primary key, candidate key & super key? MANAGEMENT 2. Why is redundancy a bad practice ? It shows how composite attributes can be represented in the E- R notation. Here, a composite attribute name, with component attributes first-name, middle-initial, md last-name replaces the simple attribute customer-name of customer. Also, a compos- te attribute address, whose component attributes are street, city, state, and zip-code) replaces the attributes customer-street and customer-city of customer. The attribute street is tself a composite attribute whose component attributes are street- number, street-name, md apartment number. 3. Construct an E-R-diagram for a car insurance company It also illustrates a multivalued attribute phone-number, whose customer owns one or more cars each.Each car is depicted by a louble ellipse, and a derived attribute age, depicted associated with it zero to any number of recorded student? by a dashed ellipse. Points to Ponder ER considers the real world to consist of entities and relationships among them. An Entity is a thing which can be distinctly identified A Relationship is an association among entities The information about one entity is expressed by a set of (attribute,value) 4. Design an E-R diagram for keeping tracks of the exploits A given set of attributes may be referred to as an entity type of your favourite sports team. You should store the Attributes are of various types:- matches played, scores in each match, players in each match Single & Multi-valued & individual player statistics for each match. Summary Simple & Composit statistics should be modelled as derived attribute? Derived E-R-diagrams are expressed through various symbols Cardinality expresses the number of entities to which other entity can be associated via a relationship set Review Terms Entity. Relationship. Attribute. dOMAIN E-R-D Symbols. Various kinds of keys. Types of Attributes Types of Cardinality 38


45 DATABASE LESSON 11: STRUCTURED QUERY LANGUAGE(SQL)I MANAGEMENT Lesson Objective these tables. Tables are uniquely identified by their names and Understanding SQL are comprised of columns and rows. Columns contain the column name, data type, and any other attributes for the Understanding DDL, DML column. Rows contain the records or data for the columns. Creating tables Here is a sample table called weather. Selecting data city, state, high, and low are the columns. The rows contain the Constraints data for this table: Drop Weather Insert city state high low Select with conditions Phoenix Arizona 105 90 SQL (pronounced ess-que-el) stands for Structured Query Tucson Arizona 101 92 Language. SQL is used to communicate with a database. Flagstaff Arizona 88 69 According to ANSI (American National Standards Institute), it is the standard language for relational database management San Diego California 77 60 systems. SQL statements are used to perform tasks such as New Albuquerque 80 72 update data on a database, or retrieve data from a database. Mexico Some common relational database management systems that Data-Definition Language use SQL are: Oracle, Sybase, Microsoft SQL Server, Access, The set of relations in a database must be specified to the Ingres, etc. Although most database systems use SQL, most of system by means of a data definition language (DDL). them also have their own additional proprietary extensions that are usually only used on their system. However, the standard The SQL DDL allows specification of not only a set of SQL commands such as Select, Insert, Update, Delete, relations, but also information about each relation, including Create, and Drop can be used to accomplish almost The schema for each relation everything that one needs to do with a database. The domain of values associated with each attribute The SQL language has several parts: The integrity constraints Data-definition language (DDL). The SQL DDL The set of indices to be maintained for each relation provides commands for defin-ing relation schemas, deleting The security and authorization information for each relation relations, and modifying relation schemas. The physical storage structure of each relation on disk Interactive data-manipulation language (DML). The SQL DML includes a query language based on both the Creating Tables relational algebra and the tuple relational calculus. It includes The create table statement is used to create a new table. Here is also commands to insert tuples into, delete tuples from, and the format of a simple create table statement: modify tuples in the database. create table tablename View definition. The SQL DOL includes commands for (column1 data type, defining views. column2 data type, Transaction control. SQL includes commands for column3 data type); specifying the beginning and ending of transactions. Format of create table if you were to use optional constraints: Embedded SQL and dynamic SQL. Embedded and dynamic SQL define how SQL statements can be embedded create table tablename within general-purpose programming lan-guages, such as C, (column1 data type C++, Java, PUr, Cobol, Pascal, and Fortran. [constraint], Integrity. The SQL DDL includes commands for specifying column2 data type integrity constraints that the data stored in the database [constraint], must satisfy. Updates that violate integrity constraints are disallowed. column3 data type Authorization. The SQL DDL includes commands for [constraint]); specifying access rights to relations and views. [ ] = optional A relational database system contains one or more objects called Note: You may have as many columns as youd like, and the tables. The data or information for the database are stored in constraints are optional. 40

46 following information about your new employees: firstname, DATABASE Example create table employee lastname, title, age, and salary. (first varchar(15), last varchar(20), age number(3), MANAGEMENT address varchar(30), city varchar(20), state varchar(20)); To create a new table, enter the keywords create table followed Drop a Table by the table name, followed by an open parenthesis, followed The drop table command is used to delete a table and all rows by the first column name, followed by the data type for that in the table. column, followed by any optional constraints, and followed by To delete an entire table including all of its rows, issue the drop a closing parenthesis. It is important to make sure you use an table command followed by the tablename. drop table is open parenthesis before the beginning table, and a closing different from deleting all of the records in the table. Deleting parenthesis after the end of the last column definition. Make all of the records in the table leaves the table including column sure you seperate each column definition with a comma. All and constraint information. Dropping the table removes the SQL statements should end with a ;. table definition as well as all of its rows. The table and column names must start with a letter and can be drop table tablename followed by letters, numbers, or underscores - not to exceed a total of 30 characters in length. Do not use any SQL reserved Example keywords as names for tables or column names (such as drop table myemployees; select, create, insert, etc). Inserting into a Table Data types specify what the type of data can be for that particu- The insert statement is used to insert or add a row of data into lar column. If a column called Last_Name, is to be used to the table. hold names, then that particular column should have a To insert records into a table, enter the key words insert into varchar (variable-length character) data type. followed by the table name, followed by an open parenthesis, Here are the most common Data types: followed by a list of column names separated by commas, Fixed-length character string. Size is specified in followed by a closing parenthesis, followed by the keyword char(size) parenthesis. Max 255 bytes. values, followed by the list of values enclosed in parenthesis. Varchar(size) Variable-length character string. Max size is specified in The values that you enter will be held in the rows and they will parenthesis. match up with the column names that you specify. Strings Number value with a max number of column digits number(size) specified in parenthesis. should be enclosed in single quotes, and numbers should not. Date Date value insert into tablename Number value with a maximum number of digits of "size" (first_column,...last_column) number(size,d) total, with a maximum number of "d" digits to the right of the decimal. values (first_value,...last_value); Number value with a maximum number of digits of "size" In the example below, the column name first will match up number(size,d) total, with a maximum number of "d" digits to the right of the decimal. with the value Luke, and the column name state will match up with the value Georgia. Constraints When tables are created, it is common for one or more columns Example to have constraints associated with them. A constraint is insert into employee basically a rule associated with a column that the data entered (first, last, age, address, city, state) into that column must follow. For example, a unique values (Luke, Duke, 45, 2130 Boars Nest, constraint specifies that no two records can have the same value in a particular column. They must all be unique. The other two Hazard Co, Georgia); most popular constraints are not null which specifies that a Note: All strings should be enclosed between single quotes: column cant be left blank, and primary key. A primary key string constraint defines a unique identification of each record (or row) Students Activity in a table. 1. Insert data into your new employee table. Its now time for you to design and create your own table. If you decide to change or redesign the table, you can either drop it and recreate it or you can create a completely different one. Students Activity You have just started a new company. It is time to hire some employees. You will need to create a table that will contain the 41

47 DATABASE The Like pattern matching operator can also be used in the conditional selection of the where clause. Like is a very powerful operator that allows you to select only rows that are like what 2. Your first three employees are the following: you specify. The percent sign % can be used as a wild card to MANAGEMENT Jonie Weber, Secretary, 28, 19500.00 match any possible character that might appear before or after Potsy Weber, Programmer, 32, 45300.00 the characters specified. For example: Dirk Smith, Programmer II, 45, 75020.00 select first, last, city from empinfo where first LIKE Er%; This SQL statement will match any first names that start with Er. Strings must be in single quotes. Or you can specify, select first, last from empinfo 3. Enter these employees into your table first, and then insert where last LIKE %s; at least 5 more of your own list of employees in the table This statement will match any last names that end in a s. select * from empinfo where first = Eric; This will only select rows where the first name equals Eric exactly. Sample Table: empinfo first Last id age city state John Jones 99980 45 Payson Arizona Mary Jones 99982 25 Payson Arizona Eric Edwards 88232 32 San Diego California Selecting Data Mary Ann Edwards 88233 32 Phoenix Arizona The select statement is used to query the database and retrieve Ginger Howell 98002 42 Cottonwood Arizona selected data that match the criteria that you specify. Here is the format of a simple select statement: Sebastian Smith 92001 23 Gila Bend Arizona Gus Gray 22322 35 Bagdad Arizona select column1 Mary Ann May 32326 52 Tucson Arizona [,column2",etc] Erica Williams 32327 60 Show Low Arizona from tablename Leroy Brown 32380 22 Pinetop Arizona [where condition]; Elroy Cleaver 32382 22 Globe Arizona [] = optional The column names that follow the select keyword determine which columns will be returned in the results. You can select as Some more examples many column names that youd like, or you can use a * to select all columns. select first, last, city from empinfo; The table name that follows the keyword from specifies the select last, city, age from empinfo table that will be queried to retrieve the desired results. where age > 30; The where clause (optional) specifies which data values or rows select first, last, city, state from empinfo will be returned or displayed, based on the criteria described where first LIKE J%; after the keyword where. select * from empinfo; = Conditional selections Equal used in the where clause: select first, last, from empinfo > Greater than where last LIKE %s; < Less than select first, last, age from empinfo >= Greater than or equal where last LIKE %illia%;

48 7. Select all columns for everyone whose last name ends in DATABASE Student Activity 1. Display the first name and age for everyone thats in the ith. table. MANAGEMENT Points to Ponder 2. Display the first name, last name, and city for everyone SQL is used to communicate with a database. thats not from Payson. The set of relations in a database must be specified to the system by means of a data definition language (DDL). The create table statement is used to create a new table A constraint is basically a rule associated with a column that the data entered into that column must follow The drop table command is used to delete a table and all rows in the table. The select statement is used to query the database and 3. Display all columns for everyone that is over 40 years old. retrieve selected data that match the criteria that you specify. 4. Display the first and last names for everyone whose last The insert statement is used to insert or add a row of data name ends in an ay. into the table. The Like pattern matching operator can also be used in the conditional selection of the where clause Review Terms SQL DDL DML Constraints 5. Display all columns for everyone whose first name equals Insert Mary. Create Select Drop Constraints Like 6. Display all columns for everyone whose first name contains Mary. 43




52 DATABASE LESSON 14: SQL-II MANAGEMENT Lesson Objective *The WHERE clause (optional) specifies which data values or Elaborating Select statement rows will be returned or displayed, based on the criteria described after the keyword where. Rename operator Aggregate function Example SELECT name, age, salary FROM employee WHERE age > 50; Having The above statement will select all of the values in the name, Order-by age, and salary columns from the employee table whose age is Group-by greater than 50. IN & Between Comparison Operators The SELECT statement is the core of SQL, and it is likely that the vast majority of your SQL commands will be SELECT = Equal statements.There are enormous amount of options available > Greater than for the SELECT statement.When constructing SQL Queries < Less than (with the SELECT statement), it is very useful to know all of >= Greater than or equal to the possible options and the best or more efficient way to do 1000; *The column names that follow the SELECT keyword determine which columns will be returned in the results. You can select as many column names that youd like, or you can use a * to select all columns. The order they are specified will be the order that they are returned in your query results. *The table name that follows the keyword FROM specifies the table that will be queried to retrieve the desired results. 47

53 2. Select all columns from the items_ordered table for DATABASE Items_ordered whoever purchased a Tent. customerid order_date item quantity Price 10330 30-Jun-1999 Pogo stick 1 28.00 10101 30-Jun-1999 Raft 1 58.00 10298 01-Jul-1999 Skateboard 1 33.00 MANAGEMENT 10101 01-Jul-1999 Life Vest 4 125.00 10299 06-Jul-1999 Parachute 1 1250.00 10339 27-Jul-1999 Umbrella 1 4.50 10449 13-Aug-1999 Unicycle 1 180.79 10439 14-Aug-1999 Ski Poles 2 25.50 10101 18-Aug-1999 Rain Coat 1 18.30 10449 01-Sep-1999 Snow Shoes 1 45.00 3. Select the customerid, order_date, and item values from the 10439 18-Sep-1999 Tent 1 88.00 items_ordered table for any items in the item column that 10298 19-Sep-1999 Lantern 2 29.00 10410 28-Oct-1999 Sleeping Bag 1 89.22 start with the letter S. 10438 01-Nov-1999 Umbrella 1 6.75 10438 02-Nov-1999 Pillow 1 8.50 10298 01-Dec-1999 Helmet 1 22.00 10449 15-Dec-1999 Bicycle 1 380.50 10449 22-Dec-1999 Canoe 1 280.00 10101 30-Dec-1999 Hoola Hoop 3 14.75 10330 01-Jan-2000 Flashlight 4 28.00 10101 02-Jan-2000 Lantern 1 16.00 10299 18-Jan-2000 Inflatable Mattress 1 38.00 10438 18-Jan-2000 Tent 1 79.99 10413 19-Jan-2000 Lawnchair 4 32.00 4. Select the distinct items in the items_ordered table. In other 10410 30-Jan-2000 Unicycle 1 192.50 words, display a listing of each of the unique items from the items_ordered table. Customers customerid firstname lastname city state 10101 John Gray Lynden Washington 10298 Leroy Brown Pinetop Arizona 10299 Elroy Keller Snoqualmie Washington 10315 Lisa Jones Oshkosh Wisconsin 10325 Ginger Schultz Pocatello Idaho 10329 Kelly Mendoza Kailua Hawaii 10330 Shawn Dalton Cannon Beach Oregon Aggregate Functions 10338 Michael Howell Tillamook Oregon 10339 Anthony Sanchez Winslow Arizona MIN returns the smallest value in a given column 10408 Elroy Cleaver Globe Arizona MAX returns the largest value in a given column 10410 Mary Ann Howell Charleston South Carolina SUM returns the sum of the numeric values in a given column 10413 Donald Davids Gila Bend Arizona AVG returns the average value of a given column 10419 Linda Sakahara Nogales Arizona COUNT returns the total number of values in a given column 10429 Sarah Graham Greensboro North Carolina COUNT(*) returns the number of rows in a table 10438 Kevin Smith Durango Colorado 10439 Conrad Giles Telluride Colorado Aggregate functions are used to compute against a returned 10449 Isabela Moore Yuma Arizona column of numeric data from your SELECT statement. They basically summarize the results of a particular column of Students Activity selected data. We are covering these here since they are required 1. From the items_ordered table, select a list of all items by the next topic, GROUP BY. Although they are required purchased for customerid 10449. Display the customerid, for the GROUP BY clause, these functions can be used item, and price for this customer. without the GROUP BY clause. For example: SELECT AVG(salary)FROM employee; This statement will return a single result which contains the average value of everything returned in the salary column from the employee table. Another Example SELECT AVG(salary)FROM employee WHERE title = Programmer; This statement will return the average salary for all employees whose title is equal to Programmer 48

54 SELECT column1, SUM(column2) DATABASE Example SELECT Count(*)FROM employees; FROM list-of-tables This particular statement is slightly different from the other GROUP BY column-list; aggregate functions since there isnt a column supplied to the Lets say you would like to retrieve a list of the highest paid count function. This statement will return the number of rows salaries in each dept: in the employees table. MANAGEMENT SELECT max(salary), dept Students Activity FROM employee 1. Select the maximum price of any item ordered in the GROUP BY dept; items_ordered table. Hint: Select the maximum price only. This statement will select the maximum salary for the people in each unique department. Basically, the salary for the person who makes the most in each department will be displayed. Their, salary and their department will be returned. For example, take a look at the items_ordered table. Lets say you want to group everything of quantity 1 together, everything of quantity 2 together, everything of quantity 3 together, etc. If you would like to determine what the largest cost item is for each grouped quantity (all quantity 1s, all quantity 2s, all 2. Select the average price of all of the items ordered that were quantity 3s, etc.), you would enter: purchased in the month of Dec. SELECT quantity, max(price) FROM items_ordered GROUP BY quantity; Enter the statement in above, and take a look at the results to see if it returned what you were expecting. Verify that the maximum price in each Quantity Group is really the maximum price. GROUP BY - Multiple Grouping Columns - What if ? What if you ALSO want to display their lastname for the query 3. What are the total number of rows in the items_ordered below: table? SELECT max(salary), dept FROM employee GROUP BY dept; What youll need to do is: SELECT lastname, max(salary), dept FROM employee GROUP BY dept, lastname; This is a called multiple grouping columns. Students Activity 4. For all of the tents that were ordered in the items_ordered 1. How many people are in each unique state in the customers table, what is the price of the lowest tent? Hint: Your query table? Select the state and display the number of people in should return the price only. each. Hint: count is used to count rows in a column, sum works on numeric data only. Group by Clause The Group By clause will gather all of the rows together that contain data in the specified column(s) and will allow aggregate 2. From the items_ordered table, select the item, maximum functions to be performed on the one or more columns. This price, and minimum price for each specific item in the table. can best be explained by an example: Hint: The items will need to be broken up into separate GROUP BY clause syntax: groups. 49

55 DATABASE 2. From the items_ordered table, select the item, maximum MANAGEMENT price, and minimum price for each specific item in the table. Only display the results if the maximum price for one of the items is greater than 190.00. 3. How many orders did each customer make? Use the items_ordered table. Select the customerid, number of orders they made, and the sum of their orders. Click the Group By answers link below if you have any problems. 3. How many orders did each customer make? Use the items_ordered table. Select the customerid, number of orders they made, and the sum of their orders if they purchased more than 1 item. Having Clause The HAVING clause allows you to specify conditions on the rows for each group - in other words, which rows should be selected will be based on the conditions you specify. The HAVING clause should follow the GROUP BY clause if you are going to use it. HAVING clause syntax: SELECT column1, SUM(column2) Order by Clause FROM list-of-tables ORDER BY is an optional clause which will allow you to GROUP BY column-list display the results of your query in a sorted order (either HAVING condition; ascending order or descending order) based on the columns that you specify to order by. HAVING can best be described by example. Lets say you have an employee table containing the employees name, department, ORDER BY clause syntax: salary, and age. If you would like to select the average salary for SELECT column1, SUM(column2) each employee in each department, you could enter: FROM list-of-tables SELECT dept, avg(salary) ORDER BY FROM employee column-list [ASC | DESC]; GROUP BY dept; [ ] = optional But, lets say that you want to ONLY calculate & display the This statement will select the employee_id, dept, name, age, and average if their salary is over 20000: salary from the employee_info table where the dept equals SELECT dept, avg(salary) Sales and will list the results in Ascending (default) order based FROM employee on their Salary. GROUP BY dept ASC = Ascending Order - default HAVING avg(salary) > 20000; DESC = Descending Order Students Activity For example 1. How many people are in each unique state in the customers SELECT employee_id, dept, name, age, salary table that have more than one person in the state? Select the FROM employee_info state and display the number of how many people are in WHERE dept = Sales each if its greater than 1. ORDER BY salary; If you would like to order based on multiple columns, you must seperate the columns with commas. For example: SELECT employee_id, dept, name, age, salary FROM employee_info WHERE dept = Sales 50

56 ORDER BY salary, age DESC; greater than or equal to 50000.00 AND the title is equal to DATABASE Programmer. Both of these conditions must be true in order Students Activity for the rows to be returned in the query. If either is false, then it 1. Select the lastname, firstname, and city for all customers in will not be displayed. the customers table. Display the results in Ascending Order Although they are not required, you can use paranthesis around based on the lastname. MANAGEMENT your conditional expressions to make it easier to read: SELECT employeeid, firstname, lastname, title, salary FROM employee_info WHERE (salary >= 50000.00) AND (title = Programmer); Another Example SELECT firstname, lastname, title, salary FROM employee_info 2. Same thing as exercise #1, but display the results in WHERE (title = Sales) OR (title = Programmer); Descending order. This statement will select the firstname, lastname, title, and salary from the employee_info table where the title is either equal to Sales OR the title is equal to Programmer. Students Activity 1. Select the customerid, order_date, and item from the items_ordered table for all items unless they are Snow Shoes or if they are Ear Muffs. Display the rows as long as they are not either of these two items. 3. Select the item and price for all of the items in the items_ordered table that the price is greater than 10.00. Display the results in Ascending order based on the price. 2. Select the item and price of all items that start with the letters S, P, or F. Combining conditions and Boolean Operators The AND operator can be used to join two or more conditions in the WHERE clause. Both sides of the AND condition must be true in order for the condition to be met and for those rows to be displayed. SELECT column1, SUM(column2) FROM list-of-tables In and Between Conditional Operators SELECT col1, SUM(col2) WHERE condition1 AND condition2; FROM list-of-tables The OR operator can be used to join two or more conditions in the WHERE clause also. However, either side of the OR WHERE col3 IN (list-of-values); operator can be true and the condition will be met - hence, the SELECT col1, SUM(col2) rows will be displayed. With the OR operator, either side can be FROM list-of-tables true or both sides can be true. WHERE col3 BETWEEN value1 AND value2; For Example The IN conditional operator is really a set membership test SELECT employeeid, firstname, lastname, title, salary operator. That is, it is used to test whether or not a value (stated before the keyword IN) is in the list of values provided after FROM employee_info the keyword IN. WHERE salary >= 50000.00 AND title = Programmer; For Example This statement will select the employeeid, firstname, lastname, SELECT employeeid, lastname, salary title, and salary from the employee_info table where the salary is 51

57 FROM employee_info DATABASE WHERE lastname IN (Hernandez, Jones, Roberts, Ruiz); This statement will select the employeeid, lastname, salary from the employee_info table where the lastname is equal to either: Hernandez, Jones, Roberts, or Ruiz. It will return the rows if it MANAGEMENT is ANY of these values. The IN conditional operator can be rewritten by using com- pound conditions using the equals operator and combining it Mathematical Operators with OR - with exact same output results: Standard ANSI SQL-92 supports the following first four basic arithmetic operators: SELECT employeeid, lastname, salary FROM employee_info + addition WHERE lastname = Hernandez OR lastname = Jones OR - subtraction lastname = Roberts * multiplication OR lastname = Ruiz; / division As you can see, the IN operator is much shorter and easier to read when you are testing for more than two or three values. % modulo You can also use NOT IN to exclude the rows in your list. The modulo operator determines the integer remainder of the The BETWEEN conditional operator is used to test to see division. This operator is not ANSI SQL supported, however, whether or not a value (stated before the keyword BETWEEN) most databases support it. The following are some more useful is between the two values stated after the keyword BE- mathematical functions to be aware of since you might need TWEEN. them. These functions are not standard in the ANSI SQL-92 For example specs, therefore they may or may not be available on the specific SELECT employeeid, age, lastname, salary RDBMS that you are using. However, they were available on several major database systems that I tested. They WILL work FROM employee_info on this tutorial. WHERE age BETWEEN 30 AND 40; ABS(x) returns the absolute value of x This statement will select the employeeid, age, lastname, and returns the sign of input x as -1, 0, or 1 (negative, zero, or salary from the employee_info table where the age is between 30 SIGN(x) positive respectively) and 40 (including 30 and 40). modulo - returns the integer remainder of x divided by y MOD(x,y) (same as x%y) This statement can also be rewritten without the BETWEEN returns the largest integer value that is less than or equal operator: FLOOR(x) to x SELECT employeeid, age, lastname, salary CEILING(x) or returns the smallest integer value that is greater than or FROM employee_info CEIL(x) equal to x POWER(x,y) returns the value of x raised to the power of y WHERE age >= 30 AND age

58 DATABASE Points to Ponder SQL provides rename operator both relations and attributes Aggregate functions are used to compute against a returned column of numeric data from your SELECT statement. MANAGEMENT The GROUP BY clause will gather all of the rows together that contain data in the specified column(s) and will allow aggregate functions to be performed on the one or more columns. The HAVING clause allows you to specify conditions on the rows for each group ORDER BY is an optional clause which will allow you to display the results of your query in a sorted order Review Terms Rename operator Aggregate function Having Order-by Group-by 53




62 DATABASE LESSON 17: SQL-III MANAGEMENT Lesson Objective Now, whenever a purchase is made from a repeating customer, Table joins the 2nd table, Purchases only needs to be updated! Weve just eliminated useless redundant data, that is, weve just normal- Tuple variable ized this database! String operators Notice how each of the tables have a common Set operators cusomer_number column. This column, which contains the Views unique customer number will be used to JOIN the two tables. Update Using the two new tables, lets say you would like to select the customers name, and items theyve purchased. Here is an Delete example of a join statement to accomplish this: Table Joins SELECT customer_info.firstname, customer_info.lastname, All of the queries up until this point have been useful with the purchases.item exception of one major limitation - that is, youve been selecting from only one table at a time with your SELECT FROM customer_info, purchases statement. It is time to introduce you to one of the most WHERE customer_info.customer_number = beneficial features of SQL & relational database systems - the purchases.customer_number; Join. To put it simply, the Join makes relational database This particular Join is known as an Inner Join or systems relational. Equijoin. This is the most common type of Join that you Joins allow you to link data from two or more tables together will see or use. into a single query resultfrom one single SELECT statement. Notice that each of the columns are always preceeded with the A Join can be recognized in a SQL SELECT statement if it table name and a period. This isnt always required, however, it has more than one table after the FROM keyword. is good practice so that you wont confuse which colums go with what tables. It is required if the name column names are For example the same between the two tables. I recommend preceeding all SELECT list-of-columns of your columns with the table names when using joins. FROM table1,table2 Note: The syntax described above will work with most WHERE search-condition(s) Database Systems . However, in the event that this doesnt Joins can be explained easier by demonstrating what would work with yours, please check your specific database documenta- happen if you worked with one table only, and didnt have the tion. ability to use joins. This single table database is also some- Although the above will probably work, here is the ANSI SQL- times referred to as a flat table. Lets say you have a one-table 92 syntax specification for an Inner Join using the preceding database that is used to keep track of all of your customers and statement above that you might want to try: what they purchase from your store: SELECT customer_info.firstname, customer_info.lastname, Everytime a new row is inserted into the table, all columns will purchases.item be be updated, thus resulting in unnecessary redundant data. FROM customer_info INNER JOIN purchases For example, every time Wolfgang Schultz purchases some- ON customer_info.customer_number = thing, the following rows will be inserted into the table: purchases.customer_number; id first last Address city state zip date item price Another Example 10982 Wolfgang Schultz 300 N. 1st Ave Yuma AZ 85002 032299 snowboard 45.00 10982 Wolfgang Schultz 300 N. 1st Ave Yuma AZ 85002 091199 gloves 15.00 SELECT employee_info.employeeid, 10982 Wolfgang Schultz 300 N. 1st Ave Yuma AZ 85002 100999 lantern 35.00 employee_info.lastname, employee_sales.comission 10982 Wolfgang Schultz 300 N. 1st Ave Yuma AZ 85002 022900 tent 85.00 FROM employee_info, employee_sales WHERE employee_info.employeeid = An ideal database would have two tables: employee_sales.employeeid; 1. One for keeping track of your customers This statement will select the employeeid, lastname (from the 2. And the other to keep track of what they purchase: employee_info table), and the comission value (from the Customer_info table: employee_sales table) for all of the rows where the employeeid customer_number firstname lastname address city state zip in the employee_info table matches the employeeid in the employee_sales table. Purchases table: customer_number date item price 57

63 DATABASE Students Activity String Operations 1. Write a query using a join to determine which items were SQL specifies strings by enclosing them in single quotes, for ordered by each of the customers in the customers table. example, Perryridge, as we saw earlier. A single quote character Select the customerid, firstname, lastname, order_date, that is part of a string can be specified by using two single quote item, and price for everything each customer purchased in characters; for example the string Its right can be specified by It s right. MANAGEMENT the items_ordered table. The most commonly used operation on strings is pattern matching using the op-erator like. We describe patterns by using two special characters: Percent (%): The % character matches any sub string. Underscore (-): The - character matches any character. Patterns are case sensitive; that is, uppercase characters do not match lowercase char-acters, or vice versa. To illustrate pattern matching, we consider the following exam-ples: Perry% matches any string beginning with Perry. 2. Repeat exercise #1, however display the results sorted by %idge% matches any string containing idge as a sub state in descending order string, for example, Perryridge, Rock Ridge, Mianus Bridge, and Ridgeway. - - - matches any string of exactly three characters. - - % matches any string of at least three characters. SQL expresses patterns by using the like comparison operator. Consider the query Find the names of all customers whose street address includes the substring Main. This query can be written as select customer-name from customer Tuple Variables where customer-street like %Main% The as clause is particularly useful in defining the notion of tuple variables, as is done in the tuple relational calculus. A tuple Joined Relations variable in SQL must be associated with a particular relation. SQL provides not only the basic Cartesian-product mechanism Tuple variables are defined in the from clause by way of the as for joining tuples of relations found in its earlier versions, but, clause. To illustrate, we rewrite the query For all customers who SQL also provides various other mechanisms - have a loan from the bank, find their names, loan numbers, and Loan number Branch name Amout loan amount as select customer-name, T. loan-number, L-170 Downtown 3000 S.amount L-230 Redwood 4000 from borrower as T, loan as S L-260 Perryridge 1700 where T. loan-number = S. loan-number Customer name Loan number Note that we define a tuple variable in the from clause by Jones L-170 placing it after the name of the relation with which it is Smith L-230 associated, with the keyword as in between (the keyword as is Hayes L-155 optional). When we write expressions of the form relation- The loan and borrower relations. name. attribute-name, the relation name is, in effect, an for joining relations, including condition joins and natural implicitly defined tuple variable. joins, as well as var-ious forms of outer joins. These additional operations are typically used as subquery expressions in the Tuple variables are most useful for comparing two tuples in the from clause. same relation. Recall that, in such cases, we could use the rename operation in the relational algebra. Suppose that we want the Examples query Find the names of all branches that have assets greater We illustrate the various join operations by using the relations than at least one branch located in Brooklyn. We can write the loan and borrower in . We start with a simple example of inner SQL expression joins. select distinct T.branch-name loan inner join borrower on loan. loan-number - borrower. from branch as T, branch as S loan-number where T.assets > S.assets and S.branch-city = Brooklyn The expression computes the theta join of the loan and the borrower relations, with the join condition being loan. Loan- Observe that we could not use the notation branch. asset, since number = borrower. loan-number. The attributes of the result it would not be clear which reference to branch is intended. 58

64 consist of the attributes of the Right-hand-side relation result. Although outer -join expressions are typically used in the DATABASE followed by the attributes of the right-hand-side relation. from clause, they can be used anywhere that a relation can be Note that the attribute loan-number appears twice in the figure- used. the first occur-rence is from loan, and the second is from Each of the variants of the join operations in SQL consists of a borrower. The SQL standard does not require attribute names join type and a join condition. The join condition defines which MANAGEMENT in such results to be unique. An as clause should be used to tuples in the two relations match and what attributes are assign unique names to attributes in query and subquery results. present in the result of the join. The join type defines how We rename the result relation of a join and the attributes of the tuples in each result relation by using an as clause, as illustrated here: Branch name Loan number Amount Customer name loan inner join borrower on loan. loan-number = L-170 Downtown 3000 Jones as Lb(loan-number, branch, amount, L- 230 Redwood 4000 Smith cust, cust-Loan-num) The result of loan natural inner join borrower. We rename the second occurrence of loan-number to cust- loan- num. The ordering of the attributes in the result of the join is Join types important for the renaming. Inner join Next, we consider an example of the left outer join operation: Left outer join loan left outer join borrower on loan. Loan-number = bor- Right outer join rower. loan-number Full outer join Loan number Branch name Amount Customer name Loan number Join Types and Join Conditions L-170 Downtown 3000 Jones L-170 The first join type is the inner join, and the other three are the L-230 Redwood 4000 Smith L-230 outer joins. Of the three join conditions, we have seen the The result of loan inner join borrower on natural join and the on condition before, and we shall discuss loan. loan-number = borrower .loan-number. the using condition, later in this section. The use of a join condition is mandatory for outer joins, but is Loan number Branch name Amount Customer name Loan number optional for inner joins (if it is omitted, a Cartesian product L-170 Downtown 3000 Jones L-170 L- 230 Redwood 4000 Smith L-230 results). Syntactically, the keyword natural appears before the L-260 Perryridge 1700 null null join type, as illustrated earlier, whereas the on and using con- ditions appear at the end of the join expression. The keywords The result of loan left outer join borrower on loan. loan- inner and outer are optional, since the rest of the join type number = borrower .loan-number. enables us to deduce whether the join is an inner join or an We can compute the left outer join operation logically as outer join. follows. First, compute the result of the inner join as before. The meaning of the join condition natural, in terms of which Then, for every tuple t in the left-hand-side relation loan that tuples from the two relations match, is straightforward. The does not match any tuple in the right-hand-side relation ordering of the attributes in the result of a natural join is as borrower in the inner join, add a tuple r to the result of the follows. The join attributes (that is, the attributes common to join: The attributes of tuple r that are derived from the left- both relations) appear first, in the order in which they appear in hand-side relation are filled in with the values from tuple t, and the left-hand-side relation. Next come all nonjoin attributes of the remaining attributes of r are filled with null values. The the left-hand-side relation, and finally all nonjoin attributes of tuples (L-170, Downtown, 3000) and (L-230, Redwood, 4000) the right-hand-side relation. join with tuples from borrower and appear in the result of the inner join, and hence in the result of the left outer join. On the The right outer join is symmetric to the left outer join. Tuples other hand, the tuple (L-260, Perryridge, 1700) did not match from the right-hand--side relation that do not match any tuple any tuple from borrower in the inner join, and hence a tuple (L- in the left-hand-side relation are padded with nulls and are 260, Perryridge, 1700, null, null) is present in the result of the added to the result of the right outer join. left outer join. Here is an example of combining the natural join condition Finally, we consider an example of the natural join operation: with the right outer join type: loan natural inner join borrower loan natural right outer join borrower This expression computes the natural join of the two relations. The attributes of the result are defined by the join type, which is The only attribute name common to loan and borrower is loan- a natural join; hence, loan-number appears only once. The first number. However, the attribute loan-number appears only once two tuples in the result are from the inner natural join of loan in the result of the natural join, whereas it appears twice in the and borrower. The tuple (Hayes, L-155) from the right-hand- result of the join with the on condition. side relation does not match any tuple from the left-hand-side relation loan in the natural inner join. Hence, the tuple (L-155, Join Types and Conditions null, null, Hayes) appears in the join result. We saw examples of the join operations permitted in SQL Join op-erations take two relations and return another relation as the 59

65 The join condition using (Ai, A2, . . . ,An) is similar to the DATABASE natural join condition, ex-cept that the join attributes are the attributes A1, A2, . . . , An, rather than all attributes that are common to both relations. The attributes A1, A2,, An must consist of only attributes that are common to both relations, and they appear only once in the result of the join. MANAGEMENT The full outer join is a combination of the left and right outer- join types. After the operation computes the result of the inner join, it extends with nulls tuples from Loan number Branch name Amount Customer name L-170 Downtown 3000 Jones L-230 Redwood 4000 Smith L-155 Null null Hayes The result of loan natural right outer join borrower. the left-hand-side relation that did not match with any from the right-hand-side, and adds them to the result. Similarly, it extends with nulls tuples from the right-hand--side relation that did not match with any tuples from the left-hand-side relation and adds them to the result. loan full outer join borrower using (loan-number) Loan number Branch name Amount Customer name L-170 Downtown 3000 Jones L- 230 Redwood 4000 Smith L-260 Perryridge 1700 null L-155 null Null Hayes The result of loan full outer join borrower using(loan- number). Points to Ponder Joins allow you to link data from two or more tables together into a single query resultfrom one single SELECT statement. Tuple variables are defined in the from clause by way of the as clause We describe patterns by using two special characters: Percent (%): The % character matches any sub string. Underscore (-): The - character matches any character. Various kinds of joins are as follows:- Join types Inner join Left outer join Right outer join Full outer join Review Terms Table joins Tuple variable String operators Set operators Views Update Delete 60




69 DATABASE LESSON 20: SQL-IV MANAGEMENT Lesson Objectives The Intersect Operation Set operators To find all customers who have both a loan and an account at the bank, we write Union (select distinct customer-name Intersect from depositor) Except intersect (select distinct customer-name View from borrower) Update The intersect operation automatically eliminates duplicates. Delete Thus, in the preceding query, if a customer-say, Jones-has Set Operations several accounts and loans at the bank, then Jones will appear The SQL operations union, intersect, and except operate on only once in the result. relations and correspond to the relational-algebra operations U, If we want to retain all duplicates, we must write intersect all in n, and -. Like union, intersection, and set difference in relational place of intersect: algebra, the relations participating in the operations must be compatible; that is, they must have the same set of attributes. (select customer-name select customer-name from depositor) from depositor intersect all (select customer-name and the set of customers who have a loan at the bank, which from borrower) can be derived by The number of duplicate tuples that appear in the result is select customer-name equal to the minimum number of duplicates in both d and b. Thus, if Jones has three accounts and two loans at the bank, from borrower then there will be two tuples with the name Jones in the result. We shall refer to the relations obtained as the result of the preceding queries as d and b, respectively. The Except Operation To find all customers who have an account but no loan at the The Union Operation bank, we write To find all customers having a loan, an account, or both at the (select distinct customer-name bank, we write from depositor) (select customer-name except (select customer-name from depositor) from borrower) union The except operation automatically eliminates duplicates. Thus, (select customer-name in the preceding query, a tuple with customer name Jones will from borrower) appear (exactly once) in the result only if Jones has an account at The union operation automatically eliminates duplicates, unlike the bank, but has no loan at the bank. the select clause. Thus, in the preceding query, if a customer-say, If we want to retain all duplicates, we must write except all in Jones-has several accounts or loans (or both) at the bank, then place of except: Jones will appear only once in the result. (select customer-name If we want to retain all duplicates, we must write union all in from depositor) place of union: except all (select customer-name from borrower) (select customer-name The number of duplicate copies of a tuple in the result is equal from depositor) to the number of duplicate copies of the tuple in d minus the union all (select Customer-name number of duplicate copies of the tuple in b, provided that the from borrower) difference is positive. Thus, if Jones has three accounts and one The number of duplicate tuples in the result is equal to the loan at the bank, then there will be two tuples with the name total number of duplicates that appear in both d and b. Thus, Jones in the result. If, instead, this customer has two accounts if Jones has three accounts and two loans at the bank, then and three loans at the bank, there will be no tuple with the there will be five tuples with the name Jones in the result. name Jones in the result. 64

70 [and|or column DATABASE Views We define a view in SQL by using the create view command. To OPERATOR value]; define a view, we must give the view a name and must state the [] = optional query that computes the view. The form of the create view command is Examples update phone_book MANAGEMENT create view v as where is any legal query expression. The set area_code = 623 view name is repre-sented by v. where prefix = 979; As an example, consider the view consisting of branch names update phone_book and the names of customers who have either an account or a set last_name = Smith, prefix=555, suffix=9292 loan at that branch. Assume that we want this view to be called where last_name = Jones; all-customer. We define this view as follows: update employee create view all-customer as set age = age+1 (select branch-name, customer-name where first_name=Mary and last_name=Williams; from depositor, account where depositor.account-number = account.account- Students Activity number) 1. Jonie Weber just got married to Bob Williams. She has requested that her last name be updated to Weber-Williams. union (select branch-name, customer-name from borrower, loan where borrower. loan-number = The attribute names of a view can be specified explicitly as follows: create view branch-total-Ioan(branch-name, total-loan) as 2. Dirk Smiths birthday is today, add 1 to his age. select branch-name, sum(amount) from loan group by branch-name The preceding view gives for each branch the sum of the amounts of all the loans at the branch. Since the expression sum (amount) does not have a name, the attribute name is specified explicitly in the view definition. 3. All secretaries are now called Administrative Assistant. View names may appear in any place that a relation name may Update all titles accordingly. appear. Using the view all-customer, we can find all customers of the Perryridge branch by writing select customer-name from all-customer where branch-name = Perryridge Modification of the Database Update The update statement is used to update or change records that 4. Everyone thats making under 30000 are to receive a 3500 a match a specified criteria. This is accomplished by carefully year raise. constructing a where clause. update tablename set columnname = newvalue [,nextcolumn = newvalue2...] where columnname OPERATOR value 65

71 5. Everyone thats making over 33500 are to receive a 4500 a DATABASE Students Activity year raise. 1. Jonie Weber-Williams just quit, remove her record from the table. MANAGEMENT 6. All Programmer II titles are now promoted to Programmer III. 2. Its time for budget cuts. Remove all employees who are making over 70000 dollars. 7. All Programmer titles are now promoted to Programmer II. Points to Ponder The SQL operations union, intersect, and except operate on relations The union operation automatically eliminates duplicates, unlike the select clause The intersect operation automatically eliminates duplicates. To define a view, we must give the view a name and must state the query that computes the view Deleting Records The update statement is used to update or change records The delete statement is used to delete records or rows from the that match a specified criteria table. The delete statement is used to delete records or rows from delete from tablename the table. where columnname Review Terms OPERATOR value Set operators [and|or column Union OPERATOR value]; Intersect [ ] = optional Except Examples View delete from employee; Update Note: if you leave off the where clause, all records will be deleted! Delete delete from employee Students Activity where lastname = May; delete from employee where firstname = Mike or firstname = Eric; To delete an entire record/row from a table, enter delete from followed by the table name, followed by the where clause which contains the conditions to delete. If you leave off the where clause, all records will be deleted. 66



74 DATABASE LESSON 23: INTEGRITY AND SECURITY MANAGEMENT Lesson Objective Null Value Concepts Data constraints While creating tables, if a row lacks a data value for a particular column, that value is said to be null. Columns of any data types Column Level Constraints may contain null values unless the column was defined as not Level Constraints null when the table was created. Table level constraints Principles of Null values NULL value Setting a null value is appropriate when the actual value is Primary Key unknown, or when a value would not be meaningful. Unique key A null value is not equivalent to a value of zero. Default key A null value will evaluate to null in any expression. e.g. null Foreign key multiplied by 10 is null. NOT NULL constraints When a column name is defined as not null, then that Check constraints column becomes a mandatory column. It implies that the user is forced to enter data into that column. Data Constraints Example: Create table client master with a not null constraint Integrity constraints ensure that changes made to the database on columns client no, Name, address, address2. by authorized users do not result in a loss of data consistency. Thus, integrity constraints guard against accidental damage to NOT NULL as a column constraint: the database. CREATE TABLE client master Besides the cell name, cell length and cell data type, there are (client_no varchar2(6) NOT NULL, other parameters i.e. other data constraints that can be passed to name varchar2(20) NOT NULL, the DBA at cell creation time. address 1 varchar2(30) NOT NULL, These data constraints will be connected to a cell by the DBA as address2 varchar2(30) NOT NULL, flags. Whenever a user attempts to load a cell with data, the DBA will check the data being loaded into the cell against the city varchar2(15), state varchar2(15), pin code number( 6), data constraints defined at the time the cell was created. If the remarks varchar2(60), bal_due number (10,2)); data being loaded fails any of the data constraint checks fired by Primary Key Concepts the DBA, the DBA will not load the data into the cell, reject the A primary key is one or more columns in a table used to entered record, and will flash an error message to the user. uniquely identify each row. in the table. Primary key values must These constraints are given a constraint name and the DBA not be null and must be unique across the column. stores the constraints with its name and instructions internally A multicolumn primary key is called a composite primary key. along with the cell itself The only function that a primary key performs is to uniquely The constraint can either be placed at the column level or at the identify a row and thus if one column is used it is just as good table level. as if multiple columns are used. Multiple columns i.e. (com- Column Level Constraints posite keys) are used only when the system designed requires a If the constraints are defined along with the column definition, primary key that cannot be contained in a single column. it is called as a column level constraint. Column level constraint Examples can be applied to anyone column at a time i.e. they are local to a Primary Key as a column constraint: specific column. If the constraint spans across multiple Create client_master where client_no is the primary key. columns, the user will have to use table level constraints. CREATE TABLE client master Table Level Constraints (client_no varchar2(6) PRIMARY KEY, If the data constraint attached to a specific cell in a table references the contents of another cell in the table then the user name varchar2(20), add}-essl varchar2(30), address2 will have to use table level constraints. Table level constraints are varchar2(30), stored as a part of the global table definition. city varcbar2(15), state varchar2(15), pincode number(6), Examples of different constraints that can be applied on the remarks varchar2(60), bal_due number (10,2)); table are as follows: Primary Key as a table constraint: Create a sales order details table where 69

75 CREATE TABLE sales_order DATABASE Column Name: Data Type Size Attributes (s_order_no varchar2(6) PRIMARY KEY, S_order_no varchar2 6 Primary Key s _order_date date, client_no varchar2(6), product_no varchar2 6 Primary Key dely _Addr varchar2(25), salesman _no varchar2(6), qty _ordered Number 8 dely_type char(l) DEFAULT F, MANAGEMENT qty - disp Number 8 billed_yn char( l), dely_date date, product_rate Number 8,2 order_status varchar2(1 0)) - CREATE TABLE sales order details Foreign Key Concepts (s_order_no varchar2(6), product_no varchar2(6), Foreign keys represent relationships between tables. A foreign key is a Column (or a group of columns) whose values -are qty _ordered number(8), qty - disp number(8), derived from the primary key of the same or some other table. product_rate number(8,2), The existence of a foreign key implies that the table with the PRIMARY KEY (s_order_no, product_no)); foreign key is related to the - primary key table from which the Unique Key Concepts foreign-key is derived. A foreign key must have a corresponding A unique key is similar to a primary key, except that the purpose primary key value in the primary key table to have a meaning. of a unique key is to ensure that information in the column for For example, the s_order_no column is the primary key of each record is unique, as with telephone or drivers license table sales_order. In table sales_order__details, s _order_no is a numbers. A table may have many unique keys. foreign key that references the values in table sales Example: Create Table client_master with unique constraint on order . column client_no The Foreign Key References Constraint UNIQUE as a column constraint: rejects an Insert or Update of a value, if a corresponding CREATE TABLE client master value does not currently exist in the primary key table (client_no varchar2( 6) CONSTRAINT cnmn - ukey UNIQUE, rejects a DEL_TE, if it would invalidate a REFERENCES name varchar2(20), address 1 varchar2(30), address2 constrain varchar2(30), must reference a PRIMARY KEY or UNIQUE column(s) city varchar2(15), state varchar2(l5), pincode number(6), in primary key table remarks varch_2(60), bal_due number(lO,2), partpaY311 will reference the PRIMARY KEY of the primary key table char(l)); if no column or group of columns is specified in the constraint UNIQUE as a table constraint: must reference a table, not a view or cluster; CREATE TABLE client master requires that you own the primary key table, have Reference (client_no varchar2(6), name varchar2(20), privilege on it, or have column-level Reference privilege on addressl varchar2(30), address2 varchar2(30), the referenced colwnns in the primary key table; city varchar2(15), state varchar2(15), pincode number(6), doesnt restrict how other constraints may reference the remarks varchar2(60), bal_due number(lO,2), same tables; CONSTRAINT cnmn_ukey UNIQUE (client_no)); requires that the FOREIGN KEY column(s) and the CONSTRAINT column(s) have matching data types; Default Value Concepts At the time of cell creation a default value can be assigned to it. may reference the same table named in the CREATE When the user is loading a record with values and leaves this TABLE statement; cell empty, the DBA will automatically load this cell with the must not reference the same column more than once (in a default value specified- The data type of the default value single constraint). should match the data type of the column. You can use the Example: Create table sales_order _details with primary key as default clause to specify any default value you want. s_order_no and product_no and foreign key as s_order_no Create sales_order table where: referencing column s_order_no in the sales order table. Column Name Data Type Size Attributes Foreign Key as a Column Constraint S_order no varchar2 6 Primary Key S_order date date CREATE TABLE sales order details Client_no varchar2 6 (s_order_no varchar2(6) REFERENCES sales_order, Dely_Addr varchar2 25 Salesman_no varchar2 6 product_no varchar2(6), Dely_type char I Delivery: part (P) / full (F) qty _ordered number(8), qty - disp number(8), product_rate Billed_yn char I Default 'F' number(8,2), Dely_date date Primary Key (s_order_no, product_no)); Order_status varchar2 10 70

76 DATABASE FOREIGN KEY as a Table Constraint Check with not Null Integrity Constraints Create Table sales order details According to the ANSI I ISO standard, a not null integrity ( s _order_no varchar2( 6), constraint is an example of a CHECK integrity constraint, where the condition is, CHECK (column_name IS NOT product_no varchar2(6), NULL). Therefore, the not null integrity constraint for a single qty_ordered number(8), qty_disp number(8), column can, in practice be written in two forms, by using the MANAGEMENT product_rate number(8,2), not null constraint or by using a CHECK constrllint. For ease Primary Key (s_order_no, product_no), of use, you should always choose to define the not null integrity constraint instead of a CHECK constraint with the is Foreign Key (s_order_no) References sales_order); not null condition. Check Integrity Constraints Here we shall look at the method via which data constraints can Use the Check constraint when you need to enforce integrity be attached to the cell so that data validation can be done at rules that can be evaluated based on a logical expression. Never table level itself using the power of the DBA. . use Check constraints if the constraint can be defined using the A constraint clause restricts the range of valid values for one not null, primary key or foreign key constraint. column (a column constraint) or for a group of columns (a Following are a few examples of appropriate CHECK con- table constraint). Any INSERT, UPDATE or DELETE straints: statement evaluates a relevant constraint; the constraint must be a Check constraint on the client no column of the client satisfied for the statement to succeed. master so that no client no value starts with C. Constraints can be connected to a table by CREATE TABLE or a Check constant on name column of the client master so ALTER TABLE command. Use ALTER T ABLE to add. or that the name is entered in upper case. drop the constraint from a table. Constraints are recorded in the a Check constraint on the city column of the client_master data dictionary. If you dont name a constraint, it is assigned the so that only the cities Bombay, New Delhl , Mapras name SYS - Cn, where n is an integer that makes the name and Calcutta are allowed. unique in the database. Create T Able client master Restrictions on Check Constraints (client_no varchar2(6) Constraint ck_clientno A Check integrity constraint requires that a condition be true or unknown for every row of the table. If a statement causes the Check ( client_no like C%), condition to evaluate to false; the statement is rolled back. The name varchar2(20) Constraint ck_cname condition of a CHECK constraint has the following limita- Check (name = upper(name, tions: address I varchar2(30), address2 varchar2(30), The condition must be a Boolean expression that can be city varchar2( 15) Constraint ck _city evaluated using the values in the row being inserted or updated. Check (city In Cnewdelhi, Bombay, Calcutta, Madras)), The condition cannot contain sub queries or sequences. state varchar2(l5), pin code number(6), The condition cannot include the Sysdate, Uid, User or remarks varchar2(60), bal- due number(10,2)); Userenv SQL function When using CHECK constraints, consider the ANSI I ISO standard which states that a CHECK constraint is violated only Defining Different Constraints on the Table if the condition evaluates to False, True and unknown values Create a sales_order _details table where do not violate a check condition. Therefore, make sure that a Column Name: Data Type Size Attributes CHECK constraint that you define actually enforces the rule you Primary Key, Foreign Key need to enforce. S_order_ no varchar2 6 references s order no of sales order table. For example, consider the following CHECK constraint for Primary Key, Foreign Key emp table: product_no varchar2 6 references product_no of CHECK ( sal > 0 or comm >= 0 ) product_master table. qty_ordered number 8 not null At first glance, this rule may be interpreted as do not allow a Qty_disp number 8. row in emp table unless the employees salary is greater than 0 product_rate number 8,2 not null or the employees commission is greater than or equal to 0. However, note that if a row is inserted with a null salary and a negative commission, the row does not violate the CHECK Create Table sales_order details constraint because the entire check condition is evaluated as (s_order_no varchar2(6) Contraint order_fkey unknown. In this particular case, you can account for such References sales_order, violations by placing not null integrity constraint on both the product_no varchar2(6) Constraint product_fkey sal and comm columns. References product_master, qty _ordered number(8) Not Null, 71

77 qty - disp number(8), When the user is loading a record with values and leaves DATABASE product_rate number(8,2) Not Null, this cell empty, the DBA will automatically load this cell with the default value specified. Primary Key (s_order_no, product_no)); Foreign keys represent relationships between tables. A Defining Integrity Constraints in the Alter Table foreign key is a Column (or a group of columns) whose MANAGEMENT Command values -are derived from the primary key of the same or You can also define integrity constraints using the constraint some other table. clause in the Alter Table command. The following examples Use the CHECK constraint when you need to enforce show the definitions of several integrity constraints: integrity rules that can be evaluated based on a logical 1. Add Primary Key constant on column supplier_no in table expression supplier_master; PL/SQL is Oracles procedural language extension to SQL. Alter Table supplier_master PL/SQL enables you to mix SQL statements with Add Primary Key (suppplier_no); procedural constructs 2. Add Foreign Key constraint on column s order no in table Procedures, functions, and packages are all examples of PL/ sales order details referencing table sales_order, modify SQL program units. column qty _ordered to include Not Null constant; Database triggers are procedures that are stored in the Alter Table sales order details database and are implicitly executed (fired) when the Add Constraint order_tkey contents of a table are changed Foreign Key (s_order_no) References sales_order Review Terms Modify (qty_ordered number(8) Not Null); Integrity constraints Dropping Integrity Constraints in the Alter Table Cmmand Table level constraints You can drop an integrity constraint if the rule that it enforces is Column level constraint no longer true or if the constraint is no longer needed. Drop Check constraints the constraint using the Alter Table command with .the DROP NULL constraints clause. The following examples illustrate the dropping of integrity constraints: Primary key 1. Drop the Primary Key constraint from supplier_master; Foreign key Alter Table supplier_master Unique key Drop Primary Key; Students Activity 2. Drop Foreign Key constraint on column product_no in 1. What is constraint?How many kinds of constraints are table sales_order_details; there? Alter Table sales order details Drop Constraint product- fkey; Note : Dropping UniquE and Primary Key constraints drops the associated indexes. Points to Ponder Integrity constraints ensure that changes made to the database by authorized users do not result in a loss of data 2. Define primary key,unique key,foreign key? consistency. If the constraints are defined along with the column definition, it is called as a column level constraint. If the data constraint attached to a specific cell in a table references the contents of another cell in the table then the user will have to use table level constraints While creating tables, if a row lacks a data value for a particular column, that value is said to be null. 3. Define check integrity constraints? A primary key is one or more columns in a table used to uniquely identify each row. A unique key is similar to a primary key, except that the purpose of a unique key is to ensure that information in the column for each record is unique. 72

78 4. Differentiate between table level & column level constraint? DATABASE MANAGEMENT 5. Define Not Null constraint with the help of an example? 73




82 DATABASE LESSON 26: PL/SQL MANAGEMENT Lesson objectives Packages Difference between SQL & PL/SQL A package is a group of related procedures and functions, together with the cursors and variables they use, stored together Stored procedures in the database for continued use as a unit. Similar to Packages standalone procedures and functions, packaged procedures and Stored procedures functions can be called explicitly by applications or users. Functions Procedures and Functions PL/SQL is Oracles procedural language extension to SQL. PL/ A procedure or function is a schema object that consists of a set SQL enables you to mix SQL statements with procedural of SQL statements and other PL/SQL constructs, grouped constructs. With PL/SQL, you can define and execute PL/SQL together, stored in the database, and executed as a unit to solve program units such as procedures, functions, and packages. PL/ a specific problem or perform a set of related tasks. Procedures SQL program units generally are categorized as anonymous and functions permit the caller to provide parameters that can blocks and stored procedures. be input only, output only, or input and output values. An anonymous block is a PL/SQL block that appears within Procedures and functions allow you to combine the ease and your application and it is not named or stored in the database. flexibility of SQL with the procedural functionality of a In many applications, PL/SQL blocks can appear wherever SQL structured programming language. statements can appear. For example, the following statement creates the A stored procedure is a PL/SQL block that Oracle stores in the Credit_Account procedure, which credits money to a bank database and can be called by name from an application. When account: you create a stored procedure, Oracle parses the procedure and Create Procedure credit_account stores its parsed representation in the database. Oracle also (acct Number, credit Number) AS allows you to create and store functions (which are similar to /* This procedure accepts two arguments: an account procedures) and packages (which are groups of procedures and functions). number and an amount of money to credit to the specified account. If the specified account does not exist, a An Introduction to Stored Procedures and Packages Oracle allows you to access and manipulate database informa- new account is created. */ tion using procedural schema objects called PL/SQL program old_balance Number; units. Procedures, functions, and packages are all examples of new_balance Number; PL/SQL program units. Begin PL/SQL is Oracles procedural language extension to SQL. It Select balance Into old_balance From accounts extends SQL with flow control and other statements that make it possible to write complex programs in it. The PL/SQL engine Where acct_id = acct is the tool you use to define, compile, and execute PL/SQL For Update Of balance; program units. This engine is a special component of many new_balance := old_balance + credit; Oracle products, including Oracle Server. Update accounts Set balance = new_balance While many Oracle products have PL/SQL components, this Where acct_id = acct; chapter specifically covers the procedures and packages that can be stored in an Oracle database and processed using the Oracle Commit; Server PL/SQL engine. The PL/SQL capabilities of each Oracle Exception tool are described in the appropriate tools documentation. When No_Data_Found Then Stored Procedures and Functions Insert Into accounts (acct_id, balance) Procedures and functions are schema objects that logically group Values(acct, credit); a set of SQL and other PL/SQL programming language statements together to perform a specific task. Procedures and When Others Then functions are created in a users schema and stored in a database Rollback; for continued use. You can execute a procedure or function End credit_account; interactively using an Oracle tool, such as SQL*Plus, or call it Notice that the Credit_Account procedure includes both SQL explicitly in the code of a database application, such as an Oracle and PL/SQL statements. Forms or Precompiler application, or in the code of another procedure or trigger. 77

83 management change, only the procedures need to be modified, DATABASE Procedure Guidelines Use the following guidelines to design and use all stored not all of the applications that use the procedures. procedures: Integrity Define procedures to complete a single, focused task. Do Stored procedures improve the integrity and consistency of your not define long procedures with several distinct subtasks, applications. By developing all of your applications around a MANAGEMENT because subtasks common to many procedures might be common group of procedures, you can reduce the likelihood of duplicated unnecessarily in the code of several procedures. committing coding errors. Do not define procedures that duplicate the functionality For example, you can test a procedure or function to guarantee already provided by other features of Oracle. For example, that it returns an accurate result and, once it is verified, reuse it do not define procedures to enforce simple data integrity in any number of applications without testing it again. If the rules that you could easily enforce using declarative integrity data structures referenced by the procedure are altered in any way, constraints. only the procedure needs to be recompiled; applications that call Benefits of Procedures the procedure do not necessarily require any modifications. Procedures provide advantages in the following areas. Anonymous PL/SQL Blocks vs. Stored Procedures Security A stored procedure is created and stored in the database as a Stored procedures can help enforce data security. You can restrict schema object. Once created and compiled, it is a named object the database operations that users can perform by allowing that can be executed without recompiling. Additionally, them to access data only through procedures and functions. dependency information is stored in the data dictionary to guarantee the validity of each stored procedure. For example, you can grant users access to a procedure that updates a table, but not grant them access to the table itself. As an alternative to a stored procedure, you can create an When a user invokes the procedure, the procedure executes with anonymous PL/SQL block by sending an unnamed PL/SQL the privileges of the procedures owner. Users who have only block to Oracle Server from an Oracle tool or an application. the privilege to execute the procedure (but not the privileges to Oracle compiles the PL/SQL block and places the compiled query, update, or delete from the underlying tables) can invoke version in the shared pool of the SGA, but does not store the the procedure, but they cannot manipulate table data in any source code or compiled version in the database for reuse other way. beyond the current instance. Shared SQL allows anonymous PL/SQL blocks in the shared pool to be reused and shared Performance until they are flushed out of the shared pool. Stored procedures can improve database performance in several In either case, moving PL/SQL blocks out of a database ways: application and into database procedures stored either in the The amount of information that must be sent over a database or in memory, you avoid unnecessary procedure network is small compared with issuing individual SQL recompilations by Oracle at runtime, improving the overall statements or sending the text of an entire PL/SQL block performance of the application and Oracle. to Oracle, because the information is sent only once and thereafter invoked when it is used. External Procedures A PL/SQL procedure executing on an Oracle8 Server can call an A procedures compiled form is readily available in the external procedure or function that is written in the C program- database, so no compilation is required at execution time. ming language and stored in a shared library. The C routine If the procedure is already present in the shared pool of the executes in a separate address space from that of the Oracle SGA, retrieval from disk is not required, and execution can Server. begin immediately. Packages Memory Allocation Packages encapsulate related procedures, functions, and Because stored procedures take advantage of the shared associated cursors and variables together as a unit in the memory capabilities of Oracle, only a single copy of the database. procedure needs to be loaded into memory for execution by You create a package in two parts: the specification and the body. multiple users. Sharing the same code among many users A packages specification declares all public constructs of the results in a substantial reduction in Oracle memory require- package and the body defines all constructs (public and private) ments for applications. of the package. This separation of the two parts provides the Productivity following advantages: Stored procedures increase development productivity. By The developer has more flexibility in the development cycle. designing applications around a common set of procedures, You can create specifications and reference public procedures you can avoid redundant coding and increase your productivity. without actually creating the package body. For example, procedures can be written to insert, update, or You can alter procedure bodies contained within the package delete rows from the EMP table. These procedures can then be body separately from their publicly declared specifications in called by any application without rewriting the SQL statements the package specification. As long as the procedure necessary to accomplish these tasks. If the methods of data specification does not change, objects that reference the 78

84 altered procedures of the package are never marked invalid; new_balance NUMBER; DATABASE that is, they are never marked as needing recompilation insufficient_funds EXCEPTION; The following example creates the specification and body for a BEGIN package that contains several procedures and functions that SELECT balance INTO old_balance FROM accounts process banking transactions. WHERE acct_id = acct FOR UPDATE OF balance; MANAGEMENT Create Package bank_transactions (null) AS new_balance := old_balance - debit; minimum_balance Constant Number := 100.00; IF new_balance >= minimum_balance THEN Procedure apply_transactions; UPDATE accounts SET balance = new_balance Procedure enter_transaction (acct Number, kind Char, WHERE acct_id = acct; amount Number); do_journal_entry(acct, D); END bank_transactions; ELSE Create Package Body bank_transactions AS RAISE insufficient_funds; /* Package to input bank transactions */ END IF; new_status CHAR(20); /* Global variable to record status of EXCEPTION transaction being applied. Used for update in WHEN NO_DATA_FOUND THEN Apply_Transactions. */Procedure do_journal_entry (acct new_status := Nonexistent account; Number, kind CHAR) IS WHEN insufficient_funds THEN /* Records a journal entry for each bank transaction applied new_status := Insufficient funds; by the Apply_Transactions procedure. */ WHEN OTHERS THEN /* Returns other errors to applica- Begin tion */ Insert Into journal new_status := Error: || SQLERRM(SQLCODE); Values (acct, kind, sysdate); END debit_account; IF kind = D THEN PROCEDURE apply_transactions IS new_status := Debit applied; /* Applies pending transactions in the table TRANSACTIONS ELSIF kind = C THEN to the new_status := Credit applied; ACCOUNTS table. Used at regular intervals to update bank ELSE accounts without interfering with input of new transactions. */ new_status := New account; /* Cursor fetches and locks all rows from the TRANSAC- END IF; TIONS END do_journal_entry; table with a status of Pending. Locks released after all pending Procedure credit_account (acct Number, credit Number) IS transactions have been applied. */ /* Credits a bank account the specified amount. If the account CURSOR trans_cursor IS does not exist, the procedure creates a new account first. */ SELECT acct_id, kind, amount FROM transactions old_balance NUMBER; WHERE status = Pending new_balance NUMBER; ORDER BY time_tag BEGIN FOR UPDATE OF status; SELECT balance INTO old_balance FROM accounts BEGIN WHERE acct_id = acct FOR trans IN trans_cursor LOOP /* implicit open and fetch FOR UPDATE OF balance; /* Locks account for credit update */ */ IF trans.kind = D THEN new_balance := old_balance + credit; debit_account(trans.acct_id, trans.amount); UPDATE accounts SET balance = new_balance ELSIF trans.kind = C THEN WHERE acct_id = acct; credit_account(trans.acct_id, trans.amount); do_journal_entry(acct, C); ELSE EXCEPTION new_status := Rejected; WHEN NO_DATA_FOUND THEN /* Create new account END IF; if not found */ /* Update TRANSACTIONS table to return result of applying INSERT INTO accounts (acct_id, balance) this transaction. */ VALUES(acct, credit); UPDATE transactions SET status = new_status do_journal_entry(acct, N); WHERE CURRENT OF trans_cursor; When Others Then /* Return other errors to application */ END LOOP; new_status := Error: || SQLERRM(SQLCODE); COMMIT; /* Release row locks in TRANSACTIONS table. */ END credit_account; END apply_transactions; PROCEDURE debit_account (acct NUMBER, debit NUM- PROCEDURE enter_transaction (acct NUMBER, kind BER) IS CHAR, amount NUMBER) IS /* Debits an existing account if result is greater than the /* Enters a bank transaction into the TRANSACTIONS table. allowed minimum balance. */ A new transaction is always put into this queue before being old_balance NUMBER; 79

85 applied to the specified account by the DATABASE How Oracle Stores Procedures and Packages APPLY_TRANSACTIONS When you create a procedure or package, Oracle procedure. Therefore, many transactions can be simultaneously compiles the procedure or package input without interference. */ stores the compiled code in memory BEGIN INSERT INTO transactions stores the procedure or package in the database MANAGEMENT VALUES (acct, kind, amount, Pending, sysdate); Points to Ponder COMMIT; PL/SQL enables you to mix SQL statements with END enter_transaction; procedural constructs. END bank_transactions; Procedures and functions are schema objects that logically Packages allow the database administrator or application group a set of SQL and other PL/SQL programming developer to organize similar routines. They also offer increased language statements together to perform a specific task. functionality and database performance. A package is a group of related procedures and functions Benefits of Packages Packages are used to define related procedures, variables, and Review Terms cursors and are often implemented to provide advantages in the Difference between SQL & PL/SQL following areas: Stored procedures encapsulation of related procedures and variables Packages declaration of public and private procedures, variables, Stored procedures constants, and cursors Functions better performance Students Activity Encapsulation 1. Differentiate between SQL & PL-SQL? Stored packages allow you to encapsulate (group) related stored procedures, variables, datatypes, and so forth in a single named, stored unit in the database. This provides for better organiza- tion during the development process. Encapsulation of procedural constructs in a package also makes privilege management easier. Granting the privilege to use a package makes all constructs of the package accessible to the grantee. Public and Private Data and Procedures 2. Define stored procedures & functions? The methods of package definition allow you to specify which variables, cursors, and procedures are public Directly accessible to the user of a package. private Hidden from the user of a package. For example, a package might contain ten procedures. You can define the package so that only three procedures are public and therefore available for execution by a user of the package; the remainder of the procedures are private and can only be accessed 3. Define packages? by the procedures within the package. Performance Improvement An entire package is loaded into memory when a procedure within the package is called for the first time. This load is completed in one operation, as opposed to the separate loads required for standalone procedures. Therefore, when calls to related packaged procedures occur, no disk I/O is necessary to execute the compiled code already in memory. A package body can be replaced and recompiled without affecting the specification. As a result, objects that reference a packages constructs (always via the specification) need not be recompiled unless the package specification is also replaced. By using packages, unnecessary recompilations can be minimized, resulting in less impact on overall database performance. 80




89 DATABASE LESSON 29: DATABASE TRIGGERS MANAGEMENT Lesson Objectives Triggering Event or Statement Database triggers A triggering event or statement is the SQL statement that causes a trigger to be fired. A triggering event can be an Insert, Parts of triggers Update, or Delete statement on a table. Types of triggers . . . Update of parts_on_hand on inventory . . . Executing trigger which means: when the Parts_On_Hand column of a row in Database triggers are procedures that are stored in the database the INventory table is updated, fire the trigger. Note that when and are implicitly executed (fired) when the contents of a table the triggering event is an Update statement, you can include a are changed. column list to identify which columns must be updated to fire Introduction the trigger. You cannot specify a column list for Insert and Oracle allows the user to define procedures that are implicitly Delete statements, because they affect entire rows of informa- executed (i.e. executed by, Oracle itself), when an insert, update tion. or delete is issued against a table from SQL * Plus or through A triggering event can specify multiple DML statements, as in an application. These procedures are called database triggers. . . . Insert or Update or Delete of inventory . . . The major point that make these triggers stand alone is that they are fired implicitly (i.e. internally) by Oracle itself and not which means: when an Insert, Update, or Delete statement is explicitly called by the user, as done in normal procedures. A issued against the Inventory table, fire the trigger. When trigger can include SQL and PL/SQL statements to execute as a multiple types of DML statements can fire a trigger, you can use unit and can invoke stored procedures. However, procedures conditional predicates to detect the type of triggering statement. and triggers differ in the way that they are invoked. A procedure In this way, you can create a single trigger that executes different is explicitly executed by a user, application, or trigger. Triggers code based on the type of statement that fires the trigger. (one or more) are implicitly fired (executed) by Oracle when a Trigger Restriction triggering INSERT, UPDATE, or DELETE statement is A trigger restriction specifies a Boolean (logical) expression that issued, no matter which user is connected or which application must be True for the trigger to fire. The trigger action is not is being used. executed if the trigger restriction evaluates to FALSE or Use Of Database Triggers Unknown. In the example, the trigger restriction is Triggers can supplement the standard capabilities of Oracle to new.parts_on_hand < new.reorder_point provide a highly customized database management system. For Trigger Action example, a trigger can restrict DML operations against a table to A trigger action is the procedure (PL/SQL block) that contains those issued during regular business hours. A trigger could also the SQL statements and PL/SQL code to be executed when a restrict DML operations to occur only at certain times during triggering statement is issued and the trigger restriction weekdays. Other uses for triggers are to evaluates to TRUE. automatically generate derived column values Like stored procedures, a trigger action can contain SQL and prevent invalid transactions PL/SQL statements, define PL/SQL language constructs (variables, constants, cursors, exceptions, and so on), and call enforce complex security authorizations stored procedures. Additionally, for row triggers (described in enforce referential integrity across nodes in a distributed the next section), the statements in a trigger action have access database to column values (new and old) of the current row being enforce complex business rules processed by the trigger. Two correlation names provide access provide transparent event logging to the old and new values for each column. provide sophisticated auditing Types of Triggers maintain synchronous table replicates This section describes the different types of triggers: row and statement triggers; and Before, After, and Instead-of triggers. gather statistics on table access Row vs. Statement Triggers Parts of a Trigger When you define a trigger, you can specify the number of times A trigger has three basic parts: the trigger action is to be executed: once for every row affected a triggering event or statement by the triggering statement (such as might be fired by an a trigger restriction UPDATE statement that updates many rows), or once for the a trigger action triggering statement, no matter how many rows it affects. 84

90 Before row trigger DATABASE Row Triggers A row trigger is fired each time the table is affected by the Before modifying each row affected by the triggering triggering statement. For example, if an UPDATE statement statement and before checking appropriate integrity updates multiple rows of a table, a row trigger is fired once for constraints, the trigger action is executed provided that the each row affected by the UPDATE statement. If a triggering trigger restriction was not violated. statement affects no rows, a row trigger is not executed at all. MANAGEMENT After statement trigger Row triggers are useful if the code in the trigger action depends After executing the triggering statement and applying any on data provided by the triggering statement or rows that are deferred integrity constraints, the trigger action is executed. affected. After row trigger Statement Triggers After modifying each row affected by the triggering A statement trigger is fired once on behalf of the triggering statement and possibly applying appropriate integrity statement, regardless of the number of rows in the table that constraints, the trigger action is executed for the current row the triggering statement affects (even if no rows are affected). provided the trigger restriction was not violated. Unlike For example, if a DELETE statement deletes several rows Before row triggers, After row triggers lock rows. from a table, a statement-level DELETE trigger is fired only You can have multiple triggers of the same type for the same once, regardless of how many rows are deleted from the table. statement for any given table. For example you may have two Statement triggers are useful if the code in the trigger action Before statement triggers for Update statements on the EMP does not depend on the data provided by the triggering table. Multiple triggers of the same type permit modular statement or the rows affected. For example, if a trigger makes a installation of applications that have triggers on the same complex security check on the current time or user, or if a trigger tables. Also, Oracle snapshot logs use After row triggers, so you generates a single audit record based on the type of triggering can design your own After row trigger in addition to the Oracle- statement, a statement trigger is used. defined After row trigger. Before vs. After Triggers You can create as many triggers of the preceding different types When defining a trigger, you can specify the trigger timing- as you need for each type of DML statement (Insert, Update, or whether the trigger action is to be executed before or after the Delete). For example, suppose you have a table, SAL, and you triggering statement. Before and After apply to both statement want to know when the table is being accessed and the types of and row triggers queries being issued. The example below contains a sample package and trigger that tracks this information by hour and Before Triggers type of action (for example, Update, Delete, or Insert) on table Before triggers execute the trigger action before the triggering SAL. A global session variable, Stat.Rowcnt, is initialized to statement is executed. This type of trigger is commonly used in zero by a Before statement trigger. Then it is increased each time the following situations: the row trigger is executed. Finally the statistical information is When the trigger action should determine whether the saved in the table Stat_Tab by the After statement trigger. triggering statement should be allowed to complete. Using Sample Package and Trigger for SAL Table a BEFORE trigger for this purpose, you can eliminate unnecessary processing of the triggering statement and its Drop Table stat_tab; eventual rollback in cases where an exception is raised in the Create Table stat_tab(utype CHAR(8), trigger action. rowcnt Integer, uhour Integer); Create or Replace Package stat is To derive specific column values before completing a rowcnt Integer; triggering INSERT or UPDATE statement. END; After Triggers / AFTER triggers execute the trigger action after the triggering Create Trigger bt Before Update or Delete or Insert on sal statement is executed. AFTER triggers are used in the following Begin situations: stat.rowcnt := 0; When you want the triggering statement to complete before END; executing the trigger action. / Create Trigger rt Before Update or Delete Oor Insert on sal If a BEFORE trigger is already present, an AFTER trigger for Each Row Begin can perform different actions on the same triggering stat.rowcnt := stat.rowcnt + 1; statement. END; Combinations / Using the options listed above, you can create four types of Create Trigger at After Update or Delete or Insert on sal triggers: Declare BEFORE statement trigger typ CHAR(8); Before executing the triggering statement, the trigger action hour Number; is executed. Begin IF updating 85

91 THEN typ := update; END IF; Then DATABASE IF deleting THEN typ := delete; END IF; Insert Into dept IF inserting THEN typ := insert; END IF; Values(:n.deptno, :n.dept_type); hour := TRUNC((SYSDATE - TRUNC(SYSDATE)) * 24); Else UPDATE stat_tab Update dept SET dept.dept_type = :n.dept_type SET rowcnt = rowcnt + stat.rowcnt Where dept.deptno = :n.deptno; MANAGEMENT WHERE utype = typ End If; AND uhour = hour; If Not Exists Select * From project IF SQL%ROWCOUNT = 0 THEN Where project.projno = :n.projno INSERT INTO stat_tab VALUES (typ, stat.rowcnt, hour); Then END IF; Insert Into project EXCEPTION Values(:n.projno, :n.project_level); WHEN dup_val_on_index THEN ELSE UPDATE stat_tab Update project SET project.level = :n.level SET rowcnt = rowcnt + stat.rowcnt Where project.projno = :n.projno; WHERE utype = typ End If; AND uhour = hour; END; END; The actions shown for rows being inserted into the / Manager_Info view first test to see if appropriate rows already Instead of Triggers exist in the base tables from which Manager_Info is derived. Instead of triggers provide a transparent way of modifying The actions then insert new rows or update existing rows, as views that cannot be modified directly through SQL DML appropriate. Similar triggers can specify appropriate actions for statements (Insert, Update, and Delete). These triggers are called UPDATE and Delete. Instead of triggers because, unlike other types of triggers, Trigger Execution Oracle fires the trigger instead of executing the triggering A trigger can be in either of two distinct modes: statement. The trigger performs update, insert, or delete operations directly on the underlying tables. enabled An enabled trigger executes its trigger action if a triggering Users write normal INSERT, DELETE, and UPDATE statement is issued and the trigger restriction (if any) evaluates to TRUE. statements against the view and the INSTEAD OF trigger disabled A disabled trigger does not execute its trigger action, even if a works invisibly in the background to make the right actions take triggering statement is issued and the trigger restriction (if any) place. By default, INSTEAD OF triggers are activated for each would evaluate to TRUE. row. Example of an Instead of Trigger For enabled triggers, Oracle automatically The following example shows an Instead of trigger for executes triggers of each type in a planned firing sequence inserting rows into the Manager_Info view. when more than one trigger is fired by a single SQL Create View manager_info AS statement SELECT, e.empno, d.dept_type, d.deptno, p.level, performs integrity constraint checking at a set point in time p.projno with respect to the different types of triggers and guarantees FROM emp e, dept d, project p that triggers cannot compromise integrity constraints WHERE e.empno = d.mgr_no provides read-consistent views for queries and constraints AND d.deptno = p.resp_dept; manages the dependencies among triggers and objects Create Trigger manager_info_insert referenced in the code of the trigger action Instead of Insert on manager_info Eferencing New As n new manager information uses two-phase commit if a trigger updates remote tables For Each Row in a distributed database Begin fires multiple triggers in an unspecified order, if more than one If Not Exists Select * From emp trigger of the same type exists for a given statement WHERE emp.empno = :n.empno Points to Ponder THEN Allowing the user to define procedures that are implicitly INSERT INTO emp executed is known as triggers VALUES(:n.empno,; ELSE Triggers can supplement the standard capabilities of Oracle UPDATE emp SET = to provide a highly customized database management WHERE emp.empno = :n.empno; system END IF; A trigger restriction specifies a Boolean (logical) expression If Not Exists Select * From dept that must be TRUE for the trigger to fire. Where dept.deptno = :n.deptno 86

92 A trigger has three basic parts: DATABASE 1. a triggering event or statement 2. a trigger restriction 3. a trigger action 6. Define various restriction of trigger? MANAGEMENT Review Terms Database triggers Parts of triggers Types of triggers Executing trigger Students Activity 1. Explain the use of database triggers?Explain its need? 2. What are the various parts of database trigger? 3. Differentiate between before & after trigger? 4. Differentiate between row & after trigger? 5. Define INSTEAD OF Trigger with the help of an example? 87




96 DATABASE LESSON 32: DATABASE CURSORS MANAGEMENT Lesson Objectives records from a record, set created using a select statement are to Defining cursor be evaluated & processed one at a time, then the only method available is by using Explicit Cursors. Use of cursor Also cursors can be used to evaluate the success of updates and Explicit cursor deletes and the number of this affected (Implicit ). Implicit cursor Explicit Cursor Parametrized cursor You can explicitly declare a cursor to process the rows individu- When ever an SQL statement is executed, Oracle DBA performs ally. A cursor declared by the user is called Explicit Cursor. For the following tasks:- queries that return more than one row, you must declare a Reserves an area in memory called private SQL Area. cursor explicitly. Populates this area with the appropriate data Why Use an Explicit Cursor? Processes the data in memory area. Cursors can be used when the users wants to process data one Frees the memory area when the execution is complete. row at a time. What is Cursor ? Example Oracle DBA uses a work area for its internal processing. This Update an Acctmast table und set a value in its Balance amount work area is private to SQLs operation and is called a cursor. column depending upon whether The data that is stored is called Active Data Set. The size of the the account has an amount debited or credited in the Accttran cursor in memory is the size required to hold the number of table. The records from the Accttran table will be fetched one at rows in the Active Data set. a time and. updated in the Acctmast table depending upon whether the account is debited or credited. Example PL/SQL raises an error if an embedded select statement When a user first select statement as retrieves more than one row. Such an error forces an abnormal Select empno,job,salary from Employeewhere dept_no=20 termination of the PL/SQL block. Such an error can be The resultant data set is as follows:- eliminated by using a cursor. Active Data Set Explicit Cursor Management 3456 IVAN MANAGER 10000 The steps involved in declaring a cursor and manipulating 3459 PRADEEP - ANALYST 7000 data in the active set are : Declare a cursor that specifies the 3446 MITA PROGRMR 4000 SQL select statement that you want to process 3463 VIJAY CLERK 2000 Open a cursor. 3450 ALDRIN ACCTANT 3000 Fetch data from the cursor one row at a time. Contents -of a cursor. Close the cursor. When a query returns multiple rows, in addition to the data A cursor is defined in the declarative part of a PL/SQL block by held in, the cursor, Oracle will also open and maintain a row naming it and specifying a query. Then, three commands are pointer. Depending on user requests to view data the row used to control the cursor: open, fetch and close. pointer will be relocated within the cursors Active Data Set. First, initialize the cursor with the open statement, this ; Additionally Oracle also maintains cursor variables loaded with defines a private SQL area executes a query associated with the the value of the total no. of rows fetched from the active data cursor populates the Active Data Set. set. Sets the Active Data Set1s row pointer to the first record. Use of Cursors in PL/SQL While SQL is the natural language of the DBA, it does not have The fetch statement retrieves the current row and advances the any procedural capabilities such as condition checking, looping cursor to the next row. You can execute fetch repeatedly until all and branching. For this, Oracle provides the PL/SQL. Program- rows have been retrieved. mers can use it to create programs for validation and When the last row has been processed, close the cursor with the manipulation of table data. PL/SQIJ adds to the power of close statement. This will release the memory occupied by the SQL and provides the user with all the functionality of a cursor and its Data Set. programming environment. Focus: The HRD manager has decided to raise the salary for all A PL/SQL block of code includes the Procedural code for the employees in department No. 20 by 0.05. Whenever any looping and branching along with the SQL statement. If such raise is given to the. employees, a record for the same is 91

97 maintained in the emp_ raise table. It includes the employee DATABASE Fetching a Record from the Cursor number, the date when the raise was given and the actual raise. The fetch statement retrieves the rows from the active set to the Write a PL/SQL block to update the salary of each employee variables one at a time. Each time a fetch is executed, The focus and insert a record in the emp _raise table;- of the DBA cursor advances to the next row in the Active Set. The table definition is as follows: One can make use of any loop structure (Loop-End Loop MANAGEMENT Table name: employee along with While, For, IF-End If ) to fetch the records from the cursor into variables one row at a time. Column name Data Type Size Attributes Syntax : FETCH cursorname INTO variable1, variable2, ... , emp _code varchar 10 Primary Key, via which we shall seek data in the table. For each column value returned by the query associated cursor, Ename varchar 20 The first name of the candidate. there must be a corresponding variable in the into list. And, Depmo number 5 The department number. Job varchar 20 Employee job details. their datatypes must match. These variables will be declared in Sal number 8,2 The current salary of the employee. the DECLARE section of the PL/SQL block. - Example : DECLARE emp _code varchar 10 is the part of a composite key via which we shall seek data in the table. cursor c - emp is select emp _code, salary from raise date date The date on which the raise was given employee raise amt number 8,2 The raise given to the employee. where deptno = 20; Emp _code and raise_date together form a composite primary /* Declaration of memory variable that holds data key. fetched from the cursor */ str - emp _code employee.emp - code%type; Declaring a Cursor To do the above via a PL/SQL block it is necessary to declare a num _salary employee.salary%type; cursor and associate it with a query before referencing it in any BEGIN statement within the PL/SQL block. This is because forward open c_emp; references to object are not allowed in PL/SQL. /* infinite loop to fetch data from cursor c - emp one Syntax : CURSOR cursorname IS row at a time */ SQL statement; loop Example: DECLARE fetch c - emp into str - emp _code, num _salary; /* Declaration of the cursor named c - emp /* Updating the salary in the employee table as The active data set wi# include the names, department current salary + raise */ numbers and salaries of all the employees belonging update employee set salary = num_salary + to department 20 */ (num_salary * .05) cursor c - emp is where emp _code = str - emp _code; select emp _code, salary from employee /*insert a record in the emp _raise table */ insert into where deptno = 20; emp _raise values The cursor name is not a PL/SQL variable;. it is used only to (str_emp_code, sysdate, num_salary * 0.05) reference the query. It cannot be assigned any values or be used end loop; in an expression. commit; Opening a Cursor Opening the cursor executes the query and identifies the active END; set, that contains all the rows which meet the query search Note that the current program will result into an indefinite loop criteria. as there is no exit provided from the loop. The Exit from the SYntax : OPEN cursorname ; loop can be provided using cursor variables as explained on page 101 section Explicit Cursor Variables. Also note if you Example: DECLARE execute a fetch and there are no more rows left in the active data cursor c - emp is select emp _code, salary from set, the values of the Explicit cursor ,variables are indetermi- employee nate. where deptno = 20; Closing a Cursor BEGIN The close statement disables the cursor and the active set becomes undefined. This will release the memory occupied by /* Opening cursor c - emp */ the cursor and its Data Set. Once a cursor is closed, the user can open c_emp; reopen the cursor using the open statement. END; Syntax : CLOSE cursorname ; Open statements retrieves the records frO1n the database and places it in the cursor (private SQL area). 92

98 Example : DECLARE commit; DATABASE ursor c - emp is select _mp _code, salary from close c - emp ; employee END; where deptno = 20; %FOUND: is the logical opposite of %notfound. It str- emp _code employee.emp - code%type; evaluates to true, if the last fetch succeeded because a row MANAGEMENT num _salary employee.salary%type; was available; or to false, if the last fetch failed because no more rows were available. BEGIN Syntax : cursorname%Found open c_emp; Example: The PL/SQL block will be as follows: loop Declare fetch c - emp into str - emp _code, num_salary; cursor c - emp is select emp _code, salary from employee update employee set salary = num_salary + (num_salary * .05) where deptno = 20; where emp _code = str - emp _code; str - emp _code employee.emp - code%type; insert into emp _raise values num_salary employee.salary_type; (str - emp code, sysdate, num_salary * 0.05) Begin end loop; open c_emp; commit; loop /* Close cursor c - emp */ fetch c - emp into str - emp _code, num _salary; close c_emp; * If no. of record retrieved:> 0 then END; process the data else exit the loop. */ Explicit Cursor Attributes Oracle provides certain attributes / cursor variables to control if c_emp%)found then the execution of the cursor. Whenever any cursor (explicit or update employee set salary = num_salary + (num_salary * implicit) is opened and used Oracle creates a set of four system. .05) variables via which Oracle keeps track of the Current status of where emp _code =str - emp _code; the cursor. You can access these cursor variables. They are insel1 into emp _raise values described below. (str_emp_code, sysdate, num_salary * 0.05) %Notfound : evaluates to t11,le, if the last fetch has failed because no more rows were available; or to false, if the last else fetch returned a row. exit; Svntax : cursorname%NOTFOUND end if; Example: DECLARE end loop; cursor c - emp is select emp _code, salary from employee commit; where deptno = 20; close c - emp ; str - emp _code employee.emp -=.code%type; END; num_salary employee. salary%type; %ISOPEN : evaluates to true, if an explicit cursor is open; BEGIN or to false, if it is closed. open c_emp; Syntax : Cursorname%ISOPEN loop Example : Declare fetch c_emp into str_emp_code, num_salary; cursor c - emp is select emp _code, salary from employee / * If no. of records retrieved is 0 or if all the records are where deptno = 20; fetched then exit the loop. */ str - emp _code employee.emp - code%type; exit when c_emp%notfound ; num _salary employee. salary%type; update employee set salary = num_salary + (num_salary * Begin .05) open c - emp; where -emp _code = str - emp _code; /* If the cursor is open insert into emp _raise values continue with the data processing (str_emp_code, sysdate, num_salary * O_05) else end loop; display an appropriate error message */ 93

99 if c_emp%isopen then Example : The PL/SQL block will be rewritten as follows: DATABASE loop Declare fetch c - emp into str - emp _code, num _salary; cursor c - emp is select emp _code, salary from employee exit when c - emp%notfound ; where deptno = 20; update employee set salary = num_salary + (num_salary * Begin MANAGEMENT .05} for emp_rec in c_emp where emp _code = str - emp _code; loop insert into emp _raise values update employee (str - emp _code, sysdate, num _salary * 0.05) set salary = emp_rec.salary + (emp_rec.salary * .05) end loop; where emp _code = emp _rec.emp _code; commit; insert into emp _raise values close c - emp ; (emp_rec.emp_code, sysdate, emp_rec.salary *.05) else end loop; dbms_output. put_line (Unable to open Cursor); commit; end if ; End; END; When you use a cursor for loop, the cursor for loop %Rowcount : returns the number of rows fetched from the implicitly declares emp _rec as belonging to type c _ active set. It is set to zero when the cursor is opened. emp%rowtype and retrieves the records as declared in the Syntax : cursomaine%Rowcount cursor c - emp. Example: Display the names, department name and salary The sequence of statements inside the loop is executed once of the first 10 employees getting the highest salary. for every row that satisfies the query associated with the Declare cursor. with each iteration, a record will be fetched from c _emp into emp _rec. Dot notation should be used to make cursor c - emp is a reference to individual items. select ename, deptno, salary from employee, deptmaster Example: emp yec.emp _code where deptmaster.deptno = employee.deptno where emp _rec is a row type variable and emp _code is the order by salary desc ; name of the field. str - ename employee.ename%type ; When you leave the loop, the cursor is closed automatically. num - deptno employee.deptno%type ; This is true even if you use an exit or goto statement to num_salary employee.salary%type; leave the loop prematurely, or if an exception is raised inside the loop. Thus, When you exit the loop it closes the cursor c Begin - emp. open c_emp; Note: The record is defined only inside the loop. You cannot dbms_output.Put_line (Name Department Salary); refer to its item outside the loop. The reference in the following dbms - output. Put _line ( -); example is illegal: -loop Begin fetch c - emp into str - ename, num - deptno, num _salary; for c1rec in c1 loop dbms -output.Put_line (str - ename II , II num - deptno II end loop; exit when c_emp%rowcount = 10 ; result:= clrec.n2 + 3; /* referencing c 1rec outside for loop is end loop; . illegal */ II num_salary); End; END; Focus: A bank has an ACCTMAST table where it holds the current status of a ,clients bank account ( i.e. currently what the Cursor for Loops client has in the savings bank account.) In most situations that require an explicit cursor, you can simplify coding by using a cursor for loop instead of the open, Another table called the ACCTTRAN table holds each transac- fetch and close statements. tion as it occurs at the back. i.e. Deposits / Withdrawals of clients. A client can deposit money which must be then A cursor for loop implicitly declares its loop index as a ADDED to the amount held against that specific clients name %rowtype record, opens a cursor, repeatedly fetches rows of in the ACCTMAST table. This is referred to as a CREDIT type values from the active set into items in the record, and closes the transaction. cursor when all rows have been processed. 94

100 A client may withdraw money from his account. This must be where acctno = acctnum; DATABASE SUBTRACTED from the amount held against that specific else clients name in the ACCTMAST table. This is referred to as a /* if the account is credited then update the DEBIT type transaction. acctmast table as balance = balance + amt */ update The ACCTTRAN table must therefore hold a flag that acctmast MANAGEMENT indicates whether the transaction type was CREDIT or DEBIT. set balance = (balance + amt) Based on this flag define a cursor which will update the where acctno = acctnum ; ACCTMAST Balance field contents. end if; Write a PL/SQL block that updates the acctmast table and sets update accttran set processed = Y the balance depending upon whether the account is debited OT where acctno = acctnum; credited. The updation should be done only for those records exit when acc - updt%notfound ; that are not processed i.e. the processed flag is N in the accttran table. end loop; 1. Create the following tables close acc - updt; Table name: acctmast commit; End; Column name Data Type Size Attributes Implicit Cursor Acctno varchar2 4 'Primary Key Oracle implicitly opens a cursor to process each SQL statement Name varchar2 20 Account Name not associated with an explicitly declared cursor. PL/SQL lets Balance Number 8 The balance in the account. you refer to the most recent implicit cursor as the SQL cursor. So, although you cannot use the open, fetch, and close state- Table name : Accttran ments to control an implicit cursor, you can still use cursor attributes to access information about 1:\1e most recently Column name Data Type Size Attributes executed SQL statement. Acctno varchar2 4 is the foreign key field which Implicit Cursor Attributes references table acttmast. TrnDate date The Transaction Date: The SQL cursor has four attributes as described below. When deb crd char 1 The Dr / Cr Flag. appended to the cursor name (i.e. SQL), these attributes let you Amount number 7,2 Processed char 1 A flag indicating whether the record is access information about the execution of insert, update, delete processed or not and single-row select statements. Implicit cursor attributes return the boolean null value, until they are set by a cursor The following PL/SQL code updates the acctmast table operation. depending upon the daily transactions entered in the accttran The values of the cursor attributes always refer to the most table. recently executed SQL statement, wherever the statement Declare appears. It might be in a different scope (in a sub-block). So, if cursor acc-updt is you want to save an attribute value for later use, assign it a boolean variable immediately. select acctno, deb - crd, amount from accttran %Notfound : evaluates to true, if an insert, update or delete where processed = N; affected no rows, or a single -row select returns no rows. acctnum char(4); Otherwise, it evaluates to false. db_cd char( 1); Syntax : QL %Notfound amt number(7,2); Example : The HRD manager has decided to raise the salary Begin of employees by 0.15. Write a PL/SQL block to accept the open acc - updt ; employee number and update the salary of that employee. Display appropriate message based on the existence of the /* perform the updation for all the records retrieved by the record in the employee table. cursor */ Begin loop update employee set salary = salary * 0.15 fetch acc - updt into acctnum, db - cd,amt; where emp _code = &emp _code; /* if the account is debited then update the if sq1%Botfound- then acctmast table as balance = balance - amt */ dbms_output.put_line(Employee No. Does not Exist); if db_cd = d then else update acctmast dbms_- output.put_line(Employee Record Modified set balance = (balance - amt) Successfully); 95

101 end if; We should be able to pass a value to cursor only when it is DATABASE END; being opened. For $at the cursor must be declared in such a way that it recognizes that it will receive the requested value( s) at the %FOUND : is the logical opposite of %notfound. Note, time of opening the cursor. Such a cursor is known as Param- however that both attributes evaluate to null until they are eterized Cursor. set by an implicit or explicit cursor operation. %found MANAGEMENT evaluates to true, if an insert, update or delete affected one Syntax : CURSOR cursor_name (variable_name datatype) is or more rows, or a single-row select returned one or more select statement... tows. Otherwise, it evaluates to false. The scope of cursor parameters are local to that cursor, which Syntax: SQL%Notfound metros that they can be referenced only within the query declared Example: The example in sq1%notfound will be written as in the cursor declaration. The values of cursor parameters are follows: used by the associated query when the cursor is opened. Begin For example. update employee set salaty = salaty * 0.15 cursor c_emp (num_deptno number) is where emp _code = &emp _code; select job, ename from emp where deptno > num - deptn; if sq1%found then The parameters to a cursor can be passed in the open statement. They can either be constant values or the contents of a memory dbms - output.put _line(Employee Record variable. Modified Successfully); For example else OPEN c_emp (30) dbms_output.put_line(Employee No. Does not Exist); OPEN c_emp (num_deptno) end if; Note: The memory variable should be declared in the DE- END; CLARE section and the value should be assigned to that %Rowcount : returns the number of rows affected by an memory variable. insert, update or delete, or select into statement. Each parameter in the declaration must have a corresponding Example: The HRD manager has decided to raise the salary value in the open statement. Remember that the parameters of of employees working as Programmers by 0.15. Write a a cursor cannot return values. PL/SQL block to accept the employee number and update Example : the salary of that employee. Display appropriate message based on the existence of the record in the employee table. Allow insert, update and delete for the table itemmast on the bases of the table itemtran table.. Declare 1. Create the following tables rows_affected char( 4) ; Table name: itemmast Begin update employee set salary = salary * 0.15 Column name Data Type Size Attributes where job = Programmers ; itemid Number 4 Primary Key rows_affected := to_char(sq1%rowcount) description Varchar 20 The item description if sq1%rowcount > 0 then Bal stock Number 3 The balance stock for an item dbms - output.put_line(rows _affected II Employee Table name: ItemTran Records Modified Successfully) ; Column name Data Type Size Attributes else Itemid Number 4 Foreign key via which we shall seek data in the table. dbms_output.put_line(There are no Employees Description Varchar 30 Item description Operation Char 1 The kind of operation on Item Mast table working as Programmers); i.e. Insert Update Delete(I U, D) Qty Number 3 The Qty sold end if; Status Varchar 30 The status of the Operation. END; %ISOPEN : Oracle automatically closes the SQL cursor after Based on the value in the operation column of table itemtran executing its associated SQL statement. As a result, the records for table itemmast is either inserted updated or sq1%isopen always evaluates to false. deleted. On the basis of success/failure of insert, update and delete operation the status column in the table itemtran is Parameterized Cursors updated with appropriate text indicating success or reason for So far we have used cursors querying all the records from a table. failure. Following are the 3 cases which are to be taken care of. Sometimes records are brought in memory selectively. While 1. If Operation = I then the itemid against along with declaring a cursor? the select statement must include a where description and qty is inserted into the required columns of clause to retrieve data conditionally. 96

102 the table itemmast. If insert is successful then the status update itemtran DATABASE field of itemtran table is updated to Successful - else it is set itemtran.status = SUCCESSFUL updated as Item Already Exist. where itemid = itemidno; 2. If Operation = U then the qty against this operation /* if the record is notfound and the operation is update/ column is added to bal:...:.stock column of the table delete then set the status to Item Not Present */ MANAGEMENT itemmast where itemid of table itemmast is same as that of itemtran. If update is successful then the status column of elsif oper = U or oper = D then itemtran table is updated to Successful else it is updated as update itemtran Item Does Not Exist. set itemtran.status = Item Not Present 3. If Operation= D , then a Tow from itemmast is deleted where itemid = itemidno; whose itemid is equal to the itemid in the table itemtran end if; with the operation column having the value D . If delete is successful then the status column of itemtran table is else updated to Successful else it is updated as Item Does Not /* if the record is found and the operation is insert then set Exist. the status to Item Already exists *,/ The following PL/SQL code takes care of the above three cases. if oper = I then Declare update itemtran /* Cursor scantable retrieves all the records of table itemtran */ set itemtran.status = Item Already Exists cursor scantable is where itemid = itemidno; select itemchk operation, qty, description from itemtran; /* if the record is found and the operation is update/delete /* Cursor Itemchk accepts the value of item id from the current then perform the update or delete operation raw of cursor set the status to Successful */ . scantable */ elsif oper = D then cursor itemchk(mastitemid number) is delete from item_mast where item - mast.itemid = select itemid from item_mast itemidno; update itemtran where itemid = mastitemid; set itemtran.status = Successful /* variable_ that hold data from the cursor scantable */ where itemid = itemidno; itemidno number( 4); elsif oper = U then de scrip varchar2(30); update item_mast oper char(l); set item_mast. bal- stock = quantity quantity number(3); where itemid = itemidno; /* variable that hold data from the cursor itemchk */ dummyitem number( 4); update itemtran - Begin set itemtran.status = Successful /* open the scantable cursor */ where itemid = itemidno; open scantable; end if; loop close itemchk; / * Fetch the records from the scan table cursor */fetch exit when scantable%notfound; scantable into itemidno, oper, quantity, descrip; end loop; /* Open the itemchk cursor close scantable; Note that the value of variable passed to the itemchk cursor commit; is set to the value of item id in the current row of cursor END; scantable */ Points to Ponder open itemchk(itemidno); Oracle DBA uses a work area for its internal processing. fetch itemchk into dummyitem; You can explicitly declare a cursor to process the rows /* if the record is not found and the operation is insert individually. A cursor declared by the user is called Explicit then insert the new record and set the status to Successful Cursor */if itemchk%notfound then Opening the cursor executes the query and identifies the if oper = T then insert into item - mast(itemid, bal- stock, active set, that contains all the rows which meet the query description) search criteria. values(itemidno; quantity, descrip); 97

103 The fetch statement retrieves the rows from the active set to DATABASE the variables one at a time The close statement disables the cursor and the active set becomes undefined Oracle implicitly opens a cursor to process each SQL MANAGEMENT statement not associated with an explicitly declared cursor. Review Terms Defining cursor Use of cursor Explicit cursor Implicit cursor Parametrized cursor Students Activity 1. Define cursors?Explain its usage? 2. Differentiate between explicit & implicit cursor with the help of an example? 98

104 DATABASE MANAGEMENT 99 Student Notes



107 DATABASE LESSON 35: NORMALISATION - I MANAGEMENT Lesson Objectives relation schema had an attribute whose domain consists of Normalisation iden-tification numbers encoded as above, the schema would not be in first normal form. First Normal Form First normal form (1NF) sets the very basic rules for an Functional dependencies organized database: Closure of a set of functional dependencies Eliminate duplicative columns from the same table. When deciding upon the structure of data to be stored in a Create separate tables for each group of related data and file(s), or a database, the two main issues to be considered are: identify each row with a unique column or set of columns 1. removing data duplication from the files/database (the primary key). 2. avoiding data confusion, where different versions of the Functional Dependencies same information is located in different places in the same Functional dependencies plays key role in designing good file database. A functional dependency is a type of constraint that is Data Normalisation is the process of determining the correct a generalization of the notion of key. structure for data in files or databases so that the problems Functional dependencies are constraints. on the set of legal mentioned cannot occur relations. They. allow us to express facts about the enterprise Data is structured by following a series of steps. Each step that we are modeling with our database. removes the potential for a particular problem to occur in the If we define the notion of a superkey as follows. Let R be a data, e.g. duplication, and each step builds upon the previous relation schema. A subset K of R is a superkey of R if, in any steps legal relation r(R), for all pairs tl and t2 of tuples in r such that First Normal Form tlt2, then t1[K] t2[K_] . That is, no two tuples in any legal The first of the normal forms that we study, first normal form, relation r(R) may have the same value on attribute set K. imposes a very basic requirement on relations. The notion of functional dependency generalizes the notion of A domain is atomic if elements of the domain are considered superkey. Consider a relation schema R, and let R and R. to be indivisible units. We say that a relational schema R is in The functional dependency first normal form (lNF) if the domains of all attributes of R C are atomic. holds on schema R if, in any legal relation r(R), for all pairs of A set of names is an example of a non atomic value. For tuples t1 and t2 in r such that t1() = t2 () it is also the case example, if the schema of a relation employee included an that t1() = t2() attribute children whose domain elements are sets of names, the schema would not be in first normal form. Using the functional-dependency notation, we say that K is a superkey of R if K R. That is, K is a superkey if, whenever Composite attributes, such as an attribute address with t1[K) = t2[K], it is also the case that t1[R]= t2[R] (that is, component attributes street and city, also have nonatomic t1=t2). domains. Functional dependencies allow us to express constraints that we Integers are assumed to be atomic, so the set of integers is an cannot express with superkeys. Consider the schema atomic domain; the set of all sets of integers is a nonatomic domain. The distinction is that we do not normally consider Loan-info-schema = (loan-number, branch-name, customer- integers to have subparts, but we consider-sets of integers to name, amount) which is simplification of the Lending-schema have subparts-namely, the integers making up the set. But the that we saw earlier. The set of functional dependencies that we important issue is not what the domain itself is, but rather expect to hold on this elation schema is how we use domain elemeents in our database. The domain of loan-nurnber amount loan-number branch-name all integers would be nonatomic if we considered each integer to We would not, however, expect the functional dependency be an ordered list of digits. loan-number customer-name As a practical illustration of the above point, consider an to hold, since, in general, a given loan can be made to more than organization that as-signs employees identification numbers of one customer (for example, to both members of a husband- the following form: The first two letters specify the department wife pair). and the remaining four digits are a unique number within the department for the employee. Examples of such numbers would be CS0012 and EE1127. Such identification numbers can be divided into smaller units, and are there-fore nonatomic. If a 102

108 We shall use functional dependencies in two ways: DATABASE Customer-name Customer-street Customer-city 1. To test relations to see whether they are legal under a given Jones Main Harrison set of functional dependencies. If a relation r is legal under Smith North Rye a set F of functional dependencies, we say that r satisfies F. Hayes Main Harrison 2. To specify constraints on the set of legal relations. We shall Curry North Rye MANAGEMENT thus concern our-selves with only those relations that satisfy Lindsay Park Pittsfield a given set of functional dependen-cies. If we wish to Tufl1et Putnam Stamford constrain ourselves to relations on schema R that satisfy a Williams Nassau Princeton set F of functional dependencies, we say that F holds on R. Adams Spring Pittsfield Johnson Alma Palo Alto To see which functional dependencies are satisfied. Observe that Glenn Sand Hill Woodside A C is satisfied. There are two tuples that have an A value Brooks Senator Brooklyn of a1. These tuples have the same C value-namely, C1. Similarly, Green Walnut Stamford the two tu-ples with an A value of a2 have the same C value, C2. There are no other pairs of distinct tuples that have the The customer relation same A value. The functional dependency C A is not L-17 Downtown 1000 satisfied, however. To see that it is not, consider the tuples t1 = L-23 Redwood 2000 (a2, b3, C2, d3) and L-15 Perryridge 1500 L-14 Downtown 1500 L-93 Mianus 500 a1 B1 C1 d1 L-11 Round Hill 900 a1 B2 c1 d2 L-29 Pownal 1200 a2 B2 c2 d2 L-16 North Town 1300 a2 B2 C2 d3 L-18 Downtown 2000 L-25 Perryridge 2500 a3 B3 C2 d4 L-10 Brighton 2200 Sample relation r. The loan relation. t2= (a3, b3, C2, d4). These two tuples have the same C values, can have streets with the same name. Thus, it is possible, at C2, but they have dif-ferent A values, a2 and a3, respectively. some time, to have an instance of the customer relation in Thus, we have found a pair of tuples t1 and t2 such that t1(C) which customer-street customer-city is not satis-fied. So, we = t2 [C], but t1 [A] t2 [A]. would not include customer-street customer-city in the set Many other functional dependencies are satisfied by r, including, of functional dependencies that hold on Customer-schema. for example, the functional dependency AB D. Note that we In the loan relation (on Loan-schema), we see that the depen- use AB as a shorthand for {A,B}, to conform with standard dency loan--number amount is satisfied. In contrast to the practice. Observe that there is no pair of distinct tuples tl and t2 case of customer-city and customer-street in Customer-schema, such that tl [AB] = t2 [AB]. Therefore, if t1 [AB] = t2 [AB], it we do believe that the real-world enterprise that we are model- must be that t1 = t2 and, thus, t1 [D] = t2[D]. So, r satisfies ing requires each loan to have only one amount. Therefore, we AB D. want to require that loan-number amount be satisfied by the Some functional dependencies are said to be trivial because they loan relation at all times. In other words, we require that the are satisfied by all relations. For example, A A is satisfied by constraint loan-number amount hold on Loan-schema. all relations involving attribute A. Reading the definition of In the branch relation, we see that branch-name assets is functional dependency literally, we see that, for all tuples t1 and satisfied, as is assets branch-name. We want to require that t2 such that t1(A] = t2 [A], it is the case that h[A] = t2 [A]. branch-name assets hold on Branch-schema. However, we Similarly, AB A is satisfied by all relations involving attribute do not wish to require that assets branch-name hold, since it A. In general, a functional dependency of the form is is possible to have several branches that have the same asset trivial if . value. To distinguish between the concepts of a relation satisfying a In what follows, we assume that, when we design a relational dependency and a dependency holding on a schema, we return database, we first list those functional dependencies that must to the banking example. If we consider the customer relation always hold. In the banking example, our list of dependencies (on Customer-schema) in Figure 7.3, we see that customer- includes the following: street customer-city is satisfied. However, we believe that, in Downtown Brooklyn 9000000 the real world, two cities Redwood Palo Alto 2100000 Perryridge Horseneck 1700000 Mianus Horseneck 400000 Round Hill Horseneck 8000000 Pownal Bennington 300000 North Town Rye 3700000 Brighton Brooklyn 7100000 103

109 On Branch-schema: Therefore, we have shown that, whenever tl and t2 are tuples DATABASE branch-name branch-city such that tl [A] = t2 [A], it must be that tl [H] = t2[H]. But that is exactly the-definition of A H. branch-name assets Let F be a set of functional dependencies. The closure of F, On Customer-schema: denoted by F+, is the set of all functional dependencies logically customer-name customer-city MANAGEMENT implied by F. Given F, we can compute F+ directly from the customer-name customer-street formal definition of functional dependency. If F were large, this On Loan-schema: process would be lengthy and difficult. Such a computation of F+ requires argu-ments of the type just used to show that loan-number amount AH is in the closure of our example set of dependencies. loan-nll1nber branch-name Axioms, or rules of inference, provide a simpler technique for On Borrower-schema: reasoning about functional dependencies. In the rules that No functional dependencies follow, we use Greek letters (, , . . .) for sets of attributes, On Account-schema: and uppercase Roman letters from the beginning of the alphabet for individual attributes. We use to denote U. account-number branch-name account-number balance We can use the following three rules to find logically implied functional dependencies. By applying these rules repeatedly, we On Depositor-schema: can find all of F+ given F. This collection of rules is called No functional dependencies Armstrongs axioms in honor of the person who first pro- Closure of a Set of Functional posed it. Dependencies Reflexivity rule. If is a set of attributes and then It is not sufficient to consider the given set of functional a holds. dependencies. Rather, we need to consider all functional Augmentation rule. If holds and is a set of dependencies that hold. We shall see that, given a set F of attributes, then holds. functional dependencies, we can prove that certain other Transitivity rule. If holds and holds, then functional dependencies hold. We say that such functional holds. dependencies are logically implied by F. Armstrongs axioms are sound, because they do not generate More formally, given a relational schema R, a functional any incorrect func-tional dependencies. They are complete, dependency f on R is log-ically implied by a set of functional because, for a given set F of functional de-pendencies, they dependencies F on R if every relation instance r(R) that satisfies allow us to generate all F+. The bibliographical notes provide F also satisfies f. ref-erences for proofs of soundness and completeness. Suppose we are given a relation schema R = (A, B, C, G, H, I) Although Armstrongs axioms are complete, it is tiresome to and the set of functional dependencies use them directly for the computation of F+. To simplify AB matters further, we list additional rules. It is pos-sible to use AC Armstrongs axioms to prove that these rules are correct CG H Union rule. If holds and holds, then CG I Decomposition rule. If holds, then holds BH and holds. The functional dependency Pseudotransitivity rule. If holds and holds, then holds. AH Let us apply our rules to the example of schema R = (A, B, C, is logically implied. That is, we can show that, whenever our G, H, I) and the set F of functional dependencies {A B, A given set of functional dependencies holds on a relation, AH C, CG H, CG I, B H}. We list several members of F+ must also hold on the relation. Suppose that tl and t2 are tuples here: such that A H. Since A B and B H hold, we apply the transitiv- t1(A] = t2[A] ity rule. Observe that it was much easier to use Armstrongs Since we are given that A B, it follows from the definition of axioms to show that A H holds than it was to argue directly functional dependency that from the definitions, as we did earlier in this section. t1(B] = t2[B] CG HI. Since CG and CG I, the union rule Then, since we are given that B H, it follows from the implies that CG HI. definition of functional dependency that AG I. Since A C and CG I, the pseudotransitivity t1(H] = t2[H] rule implies that A G I holds. Another way of finding that AG I holds is as follows. We use the aug-mentation rule on A C to infer AG CG. 104

110 Applying the transitivity rule to this dependency and CG I, 4. Define functional dependecy? DATABASE we infer AG I. Points To Ponder Normalisation is the process of determining the correct structure for data in files or databases MANAGEMENT A relational schema R is in first normal form (lNF) if the domains of all attributes of R are atomic. Functional dependencies plays key role in designing good database. A functional dependency is a type of constraint 5. Define Closure of a set of functional dependency? that is a generalization of the notion of key. The bad design of a database suggests that we should decompose a relation schema that has many attributes into several schemas with fewer attributes. Review Terms Normalisation First Normal Form Functional dependencies Closure of a set of functional dependencies Students Activity 1. Define Normalisation? 2. How normalisation can improve the redundancy of data? 3. Define 1NF? 105

111 Student Notes 106 DATABASE MANAGEMENT

112 DATABASE LESSON 36: NORMALISATION - II MANAGEMENT Lesson objectives the relation branch customer Decomposition Customer name Loan number Amount Properties of decomposition Jones L-17 1000 Smith L-23 2000 Lossless join decomposition Hayes L-15 1500 2NF Jackson L-14 1500 Jones L-93 500 Decomposition Turner L-11 900 The bad design of a database suggests that we should decom- Williams L-29 1200 pose a relation schema that has many attributes into several Hayes L-16 1300 schemas with fewer attributes. Careless decom-position, Johnson L-18 2000 however, may lead to another form of bad design. Glenn L-25 2500 Brooks L-10 2200 Consider an alternative design in which we decompose Lend- ing-schema into the following two schemas: The relation customer-loan. Branch-customer-schema = (branch-name, branch-city, assets, The result of computing branch-customer [>

113 That is, {R1, R2,..., Rn} is a decomposition of R if, for i = DATABASE Customer Laon Branch name Branch city Assets Amount name number 1,2,..., n, each Ri is a subset of R, and every attribute in R Downtown Brooklyn 9000000 Jones L-17 1000 Downtown Brooklyn 9000000 Jones L-93 500 appears in at least one Ri. Redwood Palo Alto 2100000 Smith L-23 2000 Let r be a relation on schema R, and let Ti = Ri (T) for i = Perryridge Horseneck 1700000 Hayes L-15 1500 1,2,..n. That is, {r1, r2, r3rn} is the database that results Perryridge Horseneck 1700000 Hayes L-16 1300 MANAGEMENT Downtown Brooklyn 9000000 Jackson L-14 1500 from decomposing R into {R1, R2" . . , Rn}. Mianus Horseneck 400000 Jones L-17 1000 It is always the case that Mianus Horseneck 400000 Jones L-93 500 Round Hill Horseneck 8000000 Turner L-11 900 r r1 [>

114 Lending-schema is an example of a bad database design. DATABASE Second Normal Form Assume that we decompose it to the following three relations: The general requirements of 2NF are: Branch-schema = (branch-name, branch-city, assets) Remove subsets of data that apply to multiple rows of a Loan-schema = (loan-number, branch-name, amount) table and place them in separate rows. Borrower-schema = (customer-name, loan-number) Create relationships between these new tables and their MANAGEMENT predecessors through the use of foreign keys. We claim that this decomposition has several desirable proper- ties, which we discuss next. These rules can be summarized in a simple statement: 2NF attempts to reduce the amount of redundant data in a table by Lossless-join Decomposition extracting it, placing it in new table(s) and creating relationships When we decompose a relation into a number of smaller between those tables. relations, it is crucial that the decomposition be lossless. We Lets look at an example. Imagine an online store that main- must first present a criterion for determining whether a tains customer information in a database. Their Customers decomposition is lossy. table might look something like this: Let R be a relation schema, and let F be a set of functional dependencies on R. Let R1 and R2 form a decomposition of R. CustNum FirstName LastName Address City State ZIP 1 John Doe 12 Main Street Sea Cliff NY 11579 This decomposition is a lossless-join decom-position of R if at 2 Alan Johnson 82 Evergreen Tr Sea Cliff NY 11579 least one of the following functional dependencies is in F+: 3 Beth Thompson 1912 NE 1st St Miami FL 33157 R1 R2 Rl 4 Jacob Smith 142 Irish Way South Bend IN 46637 5 Sue Ryan 412 NE 1st St Miami FL 33157 R1 R2 R2 In other words, if R1 R2 forms a superkey of either R1 or A brief look at this table reveals a small amount of redundant R2, the decomposition of R is a lossless-join decomposition. data. Were storing the Sea Cliff, NY 11579 and Miami, FL We can use attribute closure to efficiently test for superkeys, as 33157 entries twice each. Now, that might not seem like too we have seen earlier. much added storage in our simple example, but imagine the We now demonstrate that our decomposition of Lending- wasted space if we had thousands of rows in our table. schema is a lossless-join decomposition by showing a sequence Additionally, if the ZIP code for Sea Cliff were to change, wed of steps that generate the decomposition. We begin by need to make that change in many places throughout the decomposing Lending-schema into two schemas: database. Branch-schema = (branch-name, branch-city, assets) In a 2NF-compliant database structure, this redundant informa- tion is extracted and stored in a separate table. Our new table Loan-info-schema = (branch-name, customer-name, loan- (lets call it ZIPs) might look like this: number, amount) Since branch-name branch-city assets, the augmentation rule ZIP City State for functional depen-dencies implies that 11579 Sea Cliff NY branch-name branch-name branch-city assets 33157 Miami FL Since Branch-schema n Loan-info-schema = {branch-name}, it 46637 South Bend IN follows that our initial decomposition is a lossless-join If we want to be super-efficient, we can even fill this table in decomposition. advance the post office provides a directory of all valid ZIP Next, we decompose Loan-info-schema into codes and their city/state relationships. Surely, youve encoun- Loan-schema = (loan-number, branch-name, amount) tered a situation where this type of database was utilized. Someone taking an order might have asked you for your ZIP Borrower-schema = (customer-name, loan-number) code first and then knew the city and state you were calling This step results in a lossless-join decomposition, since loan- from. This type of arrangement reduces operator error and number is a common at-tribute and loan-number amount increases efficiency. branch-name. Now that weve removed the duplicative data from the For the general case of decomposition of a relation into Customers table, weve satisfied the first rule of second normal multiple parts at once, the test for lossless join decomposition form. We still need to use a foreign key to tie the two tables is more complicated. See the bibliographical notes for references together. Well use the ZIP code (the primary key from the on the topic. ZIPs table) to create that relationship. Heres our new Custom- While the test for binary decomposition is clearly a sufficient ers table: condition for lossless join, it is a necessary condition only if all constraints are functional dependencies. We shall see other types CustNum FirstName LastName Address ZIP of constraints later (in particular, a type of constraint called 1 John Doe 12 Main Street 11579 multivalued dependencies), that can ensure that a decomposi- 2 Alan Johnson 82 Evergreen Tr 11579 3 Beth Thompson 1912 NE 1st St 33157 tion is lossless join even if no functional dependencies are 4 Jacob Smith 142 Irish Way 46637 present. 5 Sue Ryan 412 NE 1st St 33157 109

115 Weve now minimized the amount of redundant information 4. Define 2NF with the help of an example? DATABASE stored within the database and our structure is in second normal form. Points to Ponder The bad design of a database suggests that we should MANAGEMENT decompose a relation schema that has many attributes into several schemas with fewer attributes. when we decompose a relation into a number of smaller relations, it is crucial that the decomposition be lossless 2NF attempts to reduce the amount of redundant data in a table by extracting it, placing it in new table(s) and creating relationships between those tables. We can use a given set of functional dependencies in designing a relational database in which most of the undesirable properties do not occur When we decompose a relation into a number of smaller relations, it is crucial that the decomposition be lossless 2NF remove subsets of data that apply to multiple rows of a table and place them in separate rows. Review Terms Decomposition Properties of decomposition Lossless join decomposition 2NF Students Activity 1. Define decomposition? Why it is required? 2. Define desirable properties of decomposition? 3. Define Lossless join decomposition? 110

116 DATABASE MANAGEMENT 111 Student Notes

117 DATABASE LESSON 37: NORMALISATION - III MANAGEMENT Lesson Objectives We claim that Loan-info-schema is not in a desirable form, since BCNF it suffers from the problem of repetition of information. We observe that, if there are several customer names associated 3NF with a loan, in a relation on Loan-info--schema, then we are Comparison of BCNF & 3NF forced to repeat the branch name and the amount once for each 4NF customer. We can eliminate this redundancy by redesigning our database such that all schemas are in BCNF. One approach to Boyce-Codd Normal Form this problem is to take the existing non--BCNF design as a Using functional dependencies, we can define several normal starting point, and to decompose those schemas that are not in forms that represent good database designs. BCNF. Consider the decomposition of Loan-info-schema into Definition two schemas: One of the more desirable normal forms that we can obtain is Loan-schema = (loan-number, branch-name, amount) Boyce-Codd normal form (BCNF). A relation schema R is in BCNF with respect to a set F of functional dependencies if, for Borrower-schema = (customer-name, loan-number) all functional dependencies in F+ of the form , where This decomposition is a lossless-join decomposition. R and R, at least one of the following holds: To determine whether these schemas are in BCNF, we need to is a trivial functional dependency (that is, ). determine what functional dependencies apply to them. In this is a superkey for schema R. example, it is easy to see that A database design is in BCNF if each member of the set of loan-number amount branch-name relation schemas that constitutes the design is in BCNF. applies to the Loan-schema, and that only trivial functional As an illustration, consider the following relation schemas dependencies apply to Borrower-schema. Although loan- and their respective functional dependencies: number is not a superkey for Loan-info-schema, it is a candidate Customer-schema = (customer-name, customer-street, key for Loan-schema. Thus, both schemas of our decomposi- customer-city) tion are in SCNF. customer-name -t customer-street customer-city It is now possible to avoid redundancy in the case where there are several cus-tomers associated with a loan. There is exactly Branch-schema = (branch_name, assets, branch-city) one tuple for each loan in the relation on Loan-schema, and one branch-name -t assets branch-city tuple for each customer of each loan in the relation on Bor- Loan-info-schema = (branch-name, customer-name, loan- rower-schema. Thus, we do not have to repeat the branch name number, amount) and the amount once for each customer associated with a loan. loan-number -t amount branch-name Often testing of a relation to see if it satisfies BCNF can be We claim that Customer-schema is in BCNF. We note that a simplified: candidate key for the, schema is cust9mer-name. The only To check if a nontrivial dependency causes a violation nontrivial functional dependencies that hold on Customer- of SCNF, comput + (the attribute closure of ), and schema have customer-name on the left side of the arrow. Since verify that it includes all attributes of R; that is, it is a customer-name is a candidate key, functional dependencies with superkey of R. customer-name on the left side do not violate the definition of To check if a relation schema R is in SCNF, it suffices to BCNF. Similarly, it can be shown easily that the relation schema check only the depen- dencies in the given set F for violation Branch-schema is in BCNF. of BCNF, rather than check all depen-dencies in F+. The schema Loan-info-schema, however, is not in BCNF. First, We can show that if none of the dependencies in F causes a note that loan-number is not a superkey for Loan-info-schema, violation of SCNF, then none of the dependencies in F+ will since we could have a pair of tuples represent-ing a single loan cause a violation of SCNF either. made to two people - for example, Unfortunately, the latter procedure does not work when a (Dowhtown, John Bell, L-44, 1000) relation is decomposed. That is it does not suffice to use F (Downtown, Jane Bell, L-44, 1000) when we test a relation Ri in a decomposition of R, for Because we did not list functional dependencies that rule out violation of BCNF. For example consider relation schema R the preceding case, loan--number is not a candidate key. (A, B, C, D, E), with functional dependencies F containing A However, the functional dependency loan-nurnber amount Band BC D. Suppose this were decomposed into R1(A, B) is nontriviaL Therefore, Loan-info-schema does not satisfy the and R2(A, C, D, E). Now, neither of the dependencies in F definition of BCNF. contains only attributes from (A, C, D, E) so we might be 112

118 misled into thinking R2 satisfies BCNF. In fact, there is a definition seems rather unintuitive, and it is not obvious why it DATABASE dependency AC D in F+ (which can be inferred using the is useful. It represents, in some sense, a minimal relaxation of pseudotransitivity rule from the two dependencies in F), which the BCNF conditions that helps ensure that every schema has a shows that R2 is not in BCNF. Thus, we may need a depen- dependency-preserving decomposition into 3NF. Its purpose dency that is in F+, but is not in F, to show that a decomposed will become more clear later, when we study decomposition into relation is not in BCNF. 3NF. MANAGEMENT An alternative BCNF test is sometimes easier than computing Observe that any schema that satisfies BCNF also satisfies 3NF, every dependency in F+. To check if a relation Ri in a decompo- since each of its functional dependencies would satisfy one of sition of R is in BCNF, we apply this test: the first two alternatives. BCNF is there-fore a more restrictive . For every subset a of attributes in R.i, check that a+ (the constraint than is 3NF. attribute closure of a under F) either includes no attribute of Ri The definition of 3NF allows certain functional dependencies - a, or includes all attributes of Ri. that are not allowed in BCNF. A dependency that If the condition is violated by some set of attributes a in Ri, satisfies only the third alternative of the 3NF definition is not consider the following functional dependency, which can be allowed in BCNF, but is allowed in 3NF. shown to be present in F+: Let us return to our Banker-schema example (Section 7.6). We (+ - ) Ri have shown that this relation schema does not have a depen- dency-preserving, lossless-join decomposition into BCNF. This The above dependency shows that Ri violates BCNF, and is a schema, however, turns out to be in 3NF. To see that it is, we witness for the viola-tion. The BCNF decomposition note that {customer-name, branch-name} is a candidate key for algorithm, which we shall see in Section 7.6.2, makes use of the Banker-schema, so the only attribute not contained in a witness. candidate key for Banker-schema is banker-name. The only Third Normal Form nontrivial functional dependencies of the form As we saw earlier, there are relational schemas where a BCNF banker-name decomposition cannot be dependency preserving. For such include {customer-name, branch-name} as part of Q. Since schemas, we have two alternatives if we wish to check if an {customer-name, branch-name} is a candidate key, these update violates any functional dependencies: dependencies do not violate the definition of 3NF. Pay the extra cost of computing joins to test for violations. As an optimization when testing for 3NF, we can consider only Use an alternative decomposition, third normal form functional depen-dencies in the given set F, rather than in F+. (3NF), which we present below, which makes testing of Also, we can decompose the dependen-cies in F so that their. updates cheaper. Unlike BCNF, 3NF decompo-sitions may right-hand side consists of only single attributes, and use the contain some redundancy in the decomposed schema. resultant set in place of F. We shall see that it is always possible to find a lossless-join, Given a dependency , we can use the same attribute- dependency-preserving decomposition that is in 3NF. Which of closure-based tech-nique that we used for BCNF to check if is the two alternatives to choose is a design decision to be made a superkey. If is not a superkey, we have to verify whether each by the database designer on the basis of the application require- attribute in is contained in a candidate key of R; this test is ments. rather more expensive, since it involves finding candidate keys. Definition In fact, test-ing for 3NF has been .shown to be NP-hard; thus, BCNF requires that all nontrivial dependencies be of the form it is very unlikely that there is a polynomial time complexity , where is a super key. 3NF relaxes this constraint algorithm for the task. slightly by allowing nontrivial functional de-pendencies whose Comparison of BCNF and 3NF left side is not a superkey. Of the two normal forms for relational-database schemas, 3NF A relation schema R is in third normal form (3NF) with respect and BCNF, there are advantages to 3NF in that we know that it to a set F of func-tional dependencies if, for all functional is always possible to obtain a 3NF design without sacrificing a dependencies in F+ of the form , where R and lossless join or dependency preservation. Nevertheless, there are R, at least one of the following holds: disadvantages to 3NF: If we do not eliminate all transitive is a trivial functional dependency. relations schema depen-dencies, we may have to use null values is a superkey for R. to represent some of the possible meaningful relationships among data items, and there is the problem of repetition of Each attribute A in - is contained in a candidate key for information. R. As an illustration of the null value problem, consider again the Note that the third condition above does not say that a single Banker-schema and its associated functional dependencies. Since candidate key should contain all the attributes in - ; each banker-name branch-name, we may want to represent attribute A in - may be contained in a different candidate relationships between values for banker-name and values for key. branch--name in our database. If we are to do so, however, The first two alternatives are the same as the two alternatives in either there must be a correspond-ing value for customer-name, the definition of BCNF. The third alternative of the 3NF or we must use a null value for the attribute customer--name1. 113

119 Customer-name bank-name branch-name Consider again our banking example. Assume that, in an DATABASE alternative design for the bank database schema, we have the Jones Johnson Perryridge schema Smith Johnson Perryridge BC-schema = (loan-number, customer-name, customer-street, Hayes Johnson Perryridge customer-city) Jackson Johnson Perryridge MANAGEMENT Curry Johnson Perryridge The astute reader will recognize this schema as a non-BCNF Turner Johnson Perryridge schema because of the functional dependency customer-name customer-street customer-city An instance of Banker-schema. that we asserted earlier, and because customer-name is not a key As an illustration of the repetition of information problem, for BC-schema. How-ever, assume that our bank is attracting consider the instance of Banker-schema. Notice that the wealthy customers who have several ad-dresses (say, a winter information indicating that Johnson is working at the home and a summer home). Then, we no longer wish to en- Perryridge branch is repeated. force the functional dependency customer-name Recall that our goals of database design with functional customer-street customer-city. If we remove this functional dependencies are: dependency, we find BC-schema to be in BCNF with respect to 1. BCNF our modified set of functional dependencies. Yet, even though BC-schema is now in BCNF, we still have the problem of 2. Lossless join repetition of information that we had earlier. 3. Dependency preservation To deal with this problem, we must define a new form of Since it is not always possible to satisfy all three, we may be constraint, called a mul-tivalued dependency. As we did for forced to choose between BCNF and dependency preservation functional dependencies, we shall use multivalued dependencies with 3NF. to define a normal form for relation schemas. This normal It is worth noting that SQL does not provide a way of form, called fourth normal form (4NF), is more restrictive than specifying functional depen-dencies, except for the special case BCNF. We shall see that every 4NF schema is also in BCNF, but of declaring superkeys by using the primary key or unique there are BCNF schemas that are not in 4NF. constraints. It is possible, although a little complicated, to write Multivalued Dependencies assertions that enforce a functional dependency, unfortunately Functional dependencies rule out certain tuples from being in a testing the assertions would be very expensive in most database relation. If A B, then we cannot have two tuples with the systems. Thus even if we had a dependency-preserving same A value but different B values. Mul-tivalued dependencies, decomposition, if we use standard SQL we would not be able on the other hand, do not rule out the existence of certain to efficiently test a functional dependency whose left-hand side tuples. Instead, they require that other tuples of a certain form is not a key. be present in the rela-tion. For this reason, functional depen- Although testing functional dependencies may involve a join if dencies sometimes are referred to as equality- generating the decomposition is not dependency preserving, we can reduce dependencies, and multivalued dependencies are referred to as the cost by using materialized views, which many database tuple- generating dependencies. systems support. Given a BCNF decomposition that is not de- Let R be a relation schema and let R and R. The pendency preserving, we consider each dependency in a multivalued dependency minimum cover Fc that is not preserved in the decomposition. For each such dependency , we define a materialized view that computes a join of all relations in the decomposition, and holds on R if, in any legal relation r(R), for all pairs of tuples t1 projects the result on . The functional dependency can be and t2 in r such that t1[] = t2[], there exist tuples t3 and t4 in easily tested on the ma-terialized view, by means of a constraint r such that unique (). On the negative side, there is a space and time tr[] = t2[] = t3[] = t4[] overhead due to the materialized view, but on the positive side, t3 [] = td{] the application programmer need not worry about writing code to keep redundant data consistent on updates; it is the job of t3[R - ] = t2[R - ] the database system to maintain the material-ized view, that is, t4 [] .= t2 [] keep up up to date when the database is updated t4[R - ] = t1(R - ] Thus, in case we are not able to get a dependency-preserving BCNF decomposition, it is generally preferable to opt for R- - BCNF, and use techniques such as materialized views to reduce T1 A1 ai +1 .aj Aj +1 an T2 A1 bi + 1 bj Bj +1 .bn the cost of checking functional dependencies. T3 A1 Ai+1 . aj Bj +1 .bn Fourth Normal Form T4 A1 Bi +1 .bj Aj +1 an Some relation schemas, even though they are in BCNF, do not seem to be sufficiently normalized, in the sense that they still Tabular representation of suffer from the problem of repetition of infor-mation. 114

120 This definition. is less complicated than it appears to be. dependencies that occur in practice appear to be quite simple. DATABASE Figure7.16 gives a tabular picture of t1, t2, t3 and t4. Intuitively, For complex dependencies, it is better to reason about sets of the multivalued dependency says that the relationship dependencies by using a system of inference rules. between and is independent of the relationship between From the definition of multivalued dependency, we can derive and R - . If the multivalued dependency is satisfied the following rule: by all relations on schema R, then is a trivial MANAGEMENT , then multivalued dependency on schema R. Thus is trivial if or U = R. In other words, every functional dependency is also a multivalued dependency. To illustrate the difference between functional and multivalued dependencies, we consider the BC-schema again, and the Definition of Fourth Normal Form relation bc (BC-schema). We must repeat the loan number once Consider again our BC-schema example in which the for each address a customer has, and we must repeat the address multivalued dependency customer-name customer-street for each loan a customer has. This repetition is unnecessary, customer-city holds, but no nontrivial functional de-pendencies since the relationship between a customer and his address is hold. We saw in the opening paragraphs of Section 7.8 that, independent of the relationship between that customer and a although BC-schema is in BCNF, the design is not ideal, since loan. If a customer (say, Smith) has a loan (say, loan number L- we must repeat a customers address information lor each loan. 23), we want that loan to We shall see that we can use the given multivalued de-pendency be associated with all Smiths addresses. Thus, the relation of to improve the database design, by decomposing BC-schema Figure 7.18 is illegal. To make this relation legal, we need to add into a fourth normal form decomposition. the tuples (L-23, Smith, Main, Manchester) and (L-27, Smith, A relation schema R is in fourth normal form (4NF) with North, Rye) to the bc relation respect to a set 0 of functional and multivalued dependencies if, Comparing the preceding example with our definition of for all multivalued dependencies in D+ of the form , multivalued dependency, we see that we want the multivalued where R and R, at least one of the following holds dependency is a trivial multivalued dependency. customer-name customer-street customer-city is a superkey for schema R. to hold. (The multivalued dependency customer-name A database design is in 4NF if each member of the set of loan-number will do as well. We shall soon see that they are relation schemas that consti-tutes the design is in 4NE equivalent.) Note that the definition of 4NF differs from the definition of As with functional dependencies, we shall use multivalued BCNF in only the use of multivalued dependencies instead of dependencies in two ways: functional dependencies. Every 4NF schema is in 1. To test relations to determine whether they are legal under a BCNF. To see this fact, we note that, if a schema R is not in given set of func-tional and multivalued dependencies BCNF, then there is 2. To specify constraints on the set of legal relations; we shall result := {R}; done := false; thus concern our-selves with only those relations that satisfy compute D+; Given schema Ri, let Di denote the restriction of a given set of functional and mul-tivalued dependencies D+ to Ri while (not done) do Customer-street if (there is a schema Ri in result that is not in 4NF W.r.t.- Di) Loan-number Customer-name Customer-city then begin L-23 Smith Rye North let a nontrivial multivalued dependency that holds on L-23 Smith Main Manchester Ri such that Ri is not in Di, and = ; L-93 Curry Lake Horseneck result := (result - Ri) U (Ri - U ( , ); Relation bc: An example of redundancy in a BCNF relation. end Loan-number Customer-name Customer-street customercity else done := true; L-23 Smith North Rye L-27 Smith Main Manchester 4NF decomposition algorithm. An illegal bc relation. a nontrivial functional dependency holding on R, where a is not a superkey. Since implies , R cannot be Note that, if a relation r fails to satisfy a given multivalued in 4NF. dependency, we can con-struct a relation r that does satisfy the multivalued dependency by adding tuples to r. Let R be a relation schema, and let R1, R2,. . . , Rn be a decom- position of R. To check if each relation schema Ri in the Let D denote a set of functional and multivalued dependencies. decomposition is in 4NF, we need to find what multivalued The closure D+ of D is the set of all functional and dependencies hold on each Ri. Recall that, for a set F of multivalued dependencies logically implied by o. As we did for functional dependencies, the restriction Fi of F to Ri is all functional dependencies, we can compute D+ from 0, using the functional dependencies in F+ that include only attributes of Ri. formal definitions of functional dependencies and multivalued Now consider a set D of both functional and multivalued dependencies. We can man-age with such reasoning for very simple multivalued dependencies. Luckily, multi-valued 115

121 dependencies. The restriction of D to Ri is the set Di consisting 4. Define Multivalued dependencies? DATABASE of 1. All functional dependencies in D+ that include only attributes of Ri 2. All multivalued dependencies of the form MANAGEMENT Ri where Ri and is in D+. Points to Ponder A relation schema R is in BCNF with respect to a set F of 5. Define 4NF? functional dependencies if, for all functional dependencies in F+ of the form third normal form (3NF), which we present below, which makes testing of updates cheaper Mul-tivalued dependencies do not rule out the existence of certain tuples Instead, they require that other tuples of a certain form be present in the rela-tion. Every 4NF schema is also in BCNF, but there are BCNF schemas that are not in 4NF. Review Terms BCNF 3NF Comparison of BCNF & 3NF 4NF Students Activity 1. Define BCNF? 2 Define 3NF? 3. Differentiate between BCNF & 3NF? 116

122 DATABASE MANAGEMENT 117 Student Notes

123 DATABASE LESSON 38: FILE ORGANIZATION METHOD - I MANAGEMENT Lesson objectives Disk storage is called direct access storage as it is Physical Storage Media possible to read data on the disk in any order (unlike sequential access). Performance Measures of Disks Disk storage usually survives power failures and Optimization of Disk-Block Access system crashes. Fixed-Length Records Optical storage: CD-ROM (compact-disk read-only Variable-Length Records memory), WORM (write-once read-many) disk (for Sequential File Organization archival storage of data), and Juke box (containing a Clustering File Organization few drives and numerous disks loaded on demand). Data Dictionary Storage Tape Storage: used primarily for backup and archival data. Overview of Physical Storage Media Cheaper, but much slower access, since tape must 1. Several types of data storage exist in most computer be read sequentially from the beginning. systems. They vary in speed of access, cost per unit of data, and reliability. Used as protection from disk failures! Cache: most costly and fastest form of storage. 2. The storage device hierarchy is presented , where the higher Usually very small, and managed by the operating levels are expensive (cost per bit), fast (access time), but the system. capacity is smaller. Main Memory (MM): the storage area for data available to be operated on. General-purpose machine instructions operate on main memory. Contents of main memory are usually lost in a power failure or crash. Usually too small (even with megabytes) and too expensive to store the entire database. Flash memory: EEPROM (electrically erasable programmable read-only memory). Data in flash memory survive from power failure. Reading data from flash memory takes about 10 nano-secs (roughly as fast as from main memory), and writing data into flash memory is more Storage-device hierarchy complicated: write-once takes about 4-10 microsecs. 3. Another classification: Primary, secondary, and tertiary To overwrite what has been written, one has to first storage. erase the entire bank of the memory. It may 1. Primary storage: the fastest storage media, such as cash support only a limited number of erase cycles (104 and main memory. to 106). 2. Secondary (or on-line) storage: the next level of the It has found its popularity as a replacement for hierarchy, e.g., magnetic disks. disks for storing small volumes of data (5-10 megabytes). 3. Tertiary (or off-line) storage: magnetic tapes and optical disk juke boxes. Magnetic-disk storage: primary medium for long- term storage. 4. Volatility of storage. Volatile storage loses its contents when the power is removed. Without power backup, data in the Typically the entire database is stored on disk. volatile storage (the part of the hierarchy from main Data must be moved from disk to main memory memory up) must be written to nonvolatile storage for in order for the data to be operated on. safekeeping. After operations are performed, data must be Performance Measures of Disks copied back to disk if any changes were made. The main measures of the qualities of a disk are capacity, access time, data transfer rate, and reliability, 118

124 1. Access time: the time from when a read or write request is The restore operation writes back the blocks of each DATABASE issued to when data transfer begins. To access data on a file continuously (or nearly so). Some systems, such as given sector of a disk, the arm first must move so that it is MS-DOS, have utilities that scan the disk and then positioned over the correct track, and then must wait for the move blocks to decrease the fragmentation. sector to appear under it as the disk rotates. The time for Nonvolatile write buffers. Use nonvolatile RAM (such repositioning the arm is called seek time, and it increases MANAGEMENT as battery-back-up RAM) to speed up disk writes with the distance the arm must move. Typical seek time drastically (first write to nonvolatile RAM buffer and range from 2 to 30 milliseconds. inform OS that writes completed). Average seek time is the average of the seek time, measured Log disk. Another approach to reducing write latency over a sequence of (uniformly distributed) random is to use a log disk, a disk devoted to writing a requests, and it is about one third of the worst-case seek sequential log. All access to the log disk is sequential, time. essentially eliminating seek time, and several Once the seek has occurred, the time spent waiting for the consecutive blocks can be written at once, making sector to be accesses to appear under the head is called writes to log disk several times faster than random rotational latency time. Average rotational latency time is writes. about half of the time for a full rotation of the disk. File Organization (Typical rotational speeds of disks ranges from 60 to 120 rotations per second). 1. A file is organized logically as a sequence of records. The access time is then the sum of the seek time and the 2. Records are mapped onto disk blocks. latency and ranges from 10 to 40 milli-sec. 3. Files are provided as a basic construct in operating systems, 2. Data transfer rate, the rate at which data can be retrieved so we assume the existence of an underlying file system. from or stored to the disk. Current disk systems support 4. Blocks are of a fixed size determined by the operating transfer rate from 1 to 5 megabytes per second. system. 3. Reliability, measured by the mean time to failure. The typical 5. Record sizes vary. mean time to failure of disks today ranges from 30,000 to 6. In relational database, tuples of distinct relations may be of 800,000 hours (about 3.4 to 91 years). different sizes. Optimization of Disk-Block Access 7. One approach to mapping database to files is to store 1. Data is transferred between disk and main memory in units records of one length in a given file. called blocks. 8. An alternative is to structure files to accommodate variable- 2. A block is a contiguous sequence of bytes from a single length records. (Fixed-length is easier to implement.) track of one platter. Fixed-Length Records 3. Block sizes range from 512 bytes to several thousand. 1. Consider a file of deposit records of the form: 4. The lower levels of file system manager covert block aaaaaaaaaaaatype deposit = record addresses into the hardware-level cylinder, surface, and bname : char(22); sector number. account# : char(10); 5. Access to data on disk is several orders of magnitude slower than is access to data in main memory. Optimization balance : real; techniques besides buffering of blocks in main memory. end Scheduling: If several blocks from a cylinder need to If we assume that each character occupies one byte, an be transferred, we may save time by requesting them in integer occupies 4 bytes, and a real 8 bytes, our deposit the order in which they pass under the heads. A record is 40 bytes long. commonly used disk-arm scheduling algorithm is the The simplest approach is to use the first 40 bytes for elevator algorithm. the first record, the next 40 bytes for the second, and File organization. Organize blocks on disk in a way so on. that corresponds closely to the manner that we expect However, there are two problems with this approach. data to be accessed. For example, store related It is difficult to delete a record from this structure. information on the same track, or physically close Space occupied must somehow be deleted, or we need tracks, or adjacent cylinders in order to minimize seek to mark deleted records so that they can be ignored. time. IBM mainframe OSs provide programmers fine control on placement of files but increase Unless block size is a multiple of 40, some records will programmers burden. cross block boundaries. UNIX or PC OSs hide disk organizations from users. It would then require two block accesses to read or Over time, a sequential file may become fragmented. write such a record. To reduce fragmentation, the system can make a back- 2. When a record is deleted, we could move all successive up copy of the data on disk and restore the entire disk. records up one, which may require moving a lot of records. 119

125 We could instead move the last record into the hole DATABASE Sequential File Organization created by the deleted record 1. A sequential file is designed for efficient processing of This changes the order the records are in. records in sorted order on some search key. It turns out to be undesirable to move records to Records are chained together by pointers to permit fast occupy freed space, as moving requires block accesses. retrieval in search key order. MANAGEMENT Also, insertions tend to be more frequent than Pointer points to next record in order. deletions. Records are stored physically in search key order (or as It is acceptable to leave the space open and wait for a close to this as possible). subsequent insertion. This minimizes number of block accesses. This leads to a need for additional structure in our file 2. It is difficult to maintain physical sequential order as records design. are inserted and deleted. 3. So one solution is: Deletion can be managed with the pointer chains. At the beginning of a file, allocate some bytes as a file Insertion poses problems if no space where new header. record should go. This header for now need only be used to store the If space, use it, else put new record in an overflow address of the first record whose contents are deleted. block. This first record can then store the address of the Adjust pointers accordingly. second available record, To insert a new record, we use Problem: we now have some records out of physical the record pointed to by the header, and change the sequential order. header pointer to the next available record. If very few records in overflow blocks, this will work If no deleted records exist we add our new record to well. the end of the file. If order is lost, reorganize the file. 4. Note: Use of pointers requires careful programming. If a record pointed to is moved or deleted, and that pointer is Reorganizations are expensive and done when system not corrected, the pointer becomes a dangling pointer. load is low. Records pointed to are called pinned. 3. If insertions rarely occur, we could keep the file in physically 5. Fixed-length file insertions and deletions are relatively sorted order and reorganize when insertion occurs. In this simple because one size fits all. For variable length, this is case, the pointer fields are no longer required. not the case. Clustering File Organization Variable-Length Records 1. One relation per file, with fixed-length record, is good for Variable-length records arise in a database in several ways: small databases, which also reduces the code size. Storage of multiple items in a file. 2. Many large-scale DB systems do not rely directly on the Record types allowing variable field size underlying operating system for file management. One large OS file is allocated to DB system and all relations are stored Record types allowing repeating fields in one file. Organization of Records in Files 3. To efficiently execute queries involving There are several ways of organizing records in files. , one may store the depositor tuple heap file organization. Any record can be placed anywhere for each cname near the customer tuple for the in the file where there is space for the record. There is no corresponding cname ordering of records. 4. This structure mixes together tuples from two relations, sequential file organization. Records are stored in but allows for efficient processing of the join. sequential order, based on the value of the search key of 5. If the customer has many accounts which cannot fit in one each record. block, the remaining records appear on nearby blocks. This hashing file organization. A hash function is computed file structure, called clustering, allows us to read many of the on some attribute of each record. The result of the function required records using one block read. specifies in which block of the file the record should be 6. Our use of clustering enhances the processing of a placed to be discussed in chapter 11 since it is closely particular join but may result in slow processing of other related to the indexing structure. types of queries, such as selection on customer. clustering file organization. Records of several different For example, the query relations can be stored in the same file. Related records of aaaaaaaaaaaaselect * the different relations are stored on the same block so that one I/O operation fetches related records from all the from customer relations. now requires more block accesses as our customer relation is now interspersed with the deposit relation. 120

126 7. Thus it is a trade-off, depending on the types of query that DATABASE Students Activity the database designer believes to be most frequent. Careful 1. Define various types of storage media? use of clustering may produce significant performance gain. Data Dictionary Storage 1. The database also needs to store information about the MANAGEMENT relations, known as the data dictionary. This includes: Names of relations. Names of attributes of relations. Domains and lengths of attributes. Names and definitions of views. 2. Define qualities of a disk? Integrity constraints (e.g., key constraints). plus data on the system users: Names of authorized users. Accounting information about users. plus (possibly) statistical and descriptive data: Number of tuples in each relation. Method of storage used for each relation (e.g., clustered or non-clustered). 2. When we look at indices well also see a need to store 3. Define sequential file storage? information about each index on each relation: Name of the index. Name of the relation being indexed. Attributes the index is on. Type of index. 3. This information is, in itself, a miniature database. We can use the database to store data about itself, simplifying the overall structure of the system, and allowing the full power 4. Define clustering file storage? of the database to be used to permit fast access to system data. Points to Ponder Several types of data storage exist in most computer systems as main memory,cache, Flash memory, Magnetic- disk storage,optical storage, tape storage. The main measures of the qualities of a disk are capacity, access time, data transfer rate, and reliability A sequential file is designed for efficient processing of 5. Define data dictionary storage? records in sorted order on some search key Clustering structure mixes together tuples from two relations, but allows for efficient processing of the join. The database also needs to store information about the relations, known as the data dictionary. Review Terms Storage device Sequential file storage Clustering file storage Data dictionary storage Search key 121

127 Student Notes 122 DATABASE MANAGEMENT

128 DATABASE LESSON 39: FILE ORGANISATION METHOD - II MANAGEMENT Lesson objectives 5. Space Overhead additional space occupied by an Indexing & Hashing index structure. Ordered Indices 6. We may have more than one index or hash function for a file. (The library may have card catalogues by author, subject Dense and Sparse Indices or title.) Multi-Level Indices 7. The attribute or set of attributes used to look up records in Index Update a file is called the search key (not to be confused with Secondary Indices primary key, etc.). Static Hashing Ordered Indices Hash functions 1. In order to allow fast random access, an index structure may Bucket overflow be used. Dynamic hashing 2. A file may have several indices on different search keys. Indexing and Hashing 3. If the file containing the records is sequentially ordered, the index whose search key specifies the sequential order of the 1. Many queries reference only a small proportion of records in file is the primary index, or clustering index. Note: The a file. For example, finding all records at Perryridge branch search key of a primary index is usually the primary key, but only returns records where bname = Perryridge. it is not necessarily so. 2. We should be able to locate these records directly, rather 4. Indices whose search key specifies an order different from than having to read every record and check its branch-name. the sequential order of the file are called the secondary We then need extra file structuring. indices, or nonclustering indices. Basic Concepts 1. An index for a file works like a catalogue in a library. Cards in alphabetic order tell us where to find books by a particular author. 2. In real-world databases, indices like this might be too large to be efficient. Well look at more sophisticated indexing techniques. 3. There are two kinds of indices. Ordered indices: indices are based on a sorted ordering of the values. Hash indices: indices are based on the values being Dense and Sparse Indices distributed uniformly across a range of buckets. The buckets to which a value is assigned is determined by a 1. There are two types of ordered indices: function, called a hash function. Dense Index: 4. We will consider several indexing techniques. No one An index record appears for every search key value in technique is the best. Each technique is best suited for a file. particular database application. This record contains search key value and a pointer to 5. Methods will be evaluated on: the actual record. 1. Access Types types of access that are supported Sparse Index: efficiently, e.g., value-based search or range search. Index records are created only for some of the records. 2. Access Time time to find a particular data item or To locate a record, we find the index record with the set of items. largest search key value less than or equal to the search 3. Insertion Time time taken to insert a new data key value we are looking for. item (includes time to find the right place to insert). We start at that record pointed to by the index record, 4. Deletion Time time to delete an item (includes and proceed along the pointers in the file (that is, time taken to find item, as well as to update the index sequentially) until we find the desired record. structure). 123

129 DATABASE MANAGEMENT Dense index. 2. Notice how we would find records for Perryridge branch using both methods. (Do it!) 4. Two-level sparse index. 5. Use binary search on outer index. Scan index block found until correct index record found. Use index record as before - scan block pointed to for desired record. 6. For very large files, additional levels of indexing may be required. 7. Indices must be updated at all levels when insertions or deletions require it. Sparse index. 8. Frequently, each level of index corresponds to a unit of 3. Dense indices are faster in general, but sparse indices require physical storage (e.g. indices at the level of track, cylinder and less space and impose less maintenance for insertions and disk). deletions. (Why?) Index Update 4. A good compromise: to have a sparse index with one entry Regardless of what form of index is used, every index must be per block. updated whenever a record is either inserted into or deleted Why is this good? from the file. Biggest cost is in bringing a block into main memory. 1. Deletion We are guaranteed to have the correct block with this Find (look up) the record method, unless record is on an overflow block (actually If the last record with a particular search key value, could be several blocks). delete that search key value from index. Index size still small. For dense indices, this is like deleting a record in a file. Multi-Level Indices For sparse indices, delete a key value by replacing key 1. Even with a sparse index, index size may still grow too values entry in index by next search key value. If that large. For 100,000 records, 10 per block, at one index record value already has an index entry, delete the entry. per block, thats 10,000 index records! Even if we can fit 100 2. Insertion index records per block, this is 100 blocks. Find place to insert. 2. If index is too large to be kept in main memory, a search Dense index: insert search key value if not present. results in several disk reads. Sparse index: no change unless new block is created. (In If there are no overflow blocks in the index, we can use this case, the first search key value appearing in the new binary search. block is inserted into the index). This will read as many as 1+log2(b) blocks (as many as Secondary Indices 7 for our 100 blocks). 1. If the search key of a secondary index is not a candidate key, If index has overflow blocks, then sequential search it is not enough to point to just the first record with each typically used, reading all b index blocks. search-key value because the remaining records with the 3. Solution: Construct a sparse index on the index same search-key value could be anywhere in the file. Therefore, a secondary index must contain pointers to all the records. 124

130 DATABASE Hash Functions 1. A good hash function gives an average-case lookup that is a small constant, independent of the number of search keys. 2. We hope records are distributed uniformly among the buckets. MANAGEMENT 3. The worst hash function maps all keys to the same bucket. 4. The best hash function maps all keys to distinct addresses. 5. Ideally, distribution of keys to addresses is uniform and random. 6. Suppose we have 26 buckets, and map names beginning Sparse secondary index on cname. with ith letter of the alphabet to the ith bucket. 2. We can use an extra-level of indirection to implement Problem: this does not give uniform distribution. secondary indices on search keys that are not candidate keys. Many more names will be mapped to A than to X. A pointer does not point directly to the file but to a bucket Typical hash functions perform some operation on the that contains pointers to the file. internal binary machine representations of characters in To perform a lookup on Peterson, we must read all a key. three records pointed to by entries in bucket 2. For example, compute the sum, modulo # of buckets, Only one entry points to a Peterson record, but three of the binary representations of characters of the records need to be read. search key. As file is not ordered physically by cname, this may take using this method for 10 buckets (assuming the ith 3 block accesses. character in the alphabet is represented by integer i). 3. Secondary indices must be dense, with an index entry for Handling of Bucket Overflows every search-key value, and a pointer to every record in the 1. Open hashing occurs where records are stored in different file. buckets. Compute the hash function and search the 4. Secondary indices improve the performance of queries on corresponding bucket to find a record. non-primary keys. 2. Closed hashing occurs where all records are stored in one 5. They also impose serious overhead on database bucket. Hash function computes addresses within that hh((KKi ) ) = h ( K ) i j modification: whenever a file is updated, every index must bucket. (Deletions are difficult.) Not used much in database be updated. applications. Designer must decide whether to use secondary indices or not. 3. Drawback to our approach: Hash function must be Static Hashing chosen at implementation time. Index schemes force us to traverse an index structure. Hashing Number of buckets is fixed, but the database may avoids this. grow. Hash File Organization If number is too large, we waste space. 1. Hashing involves computing the address of a data item by If number is too small, we get too many collisions, computing a function on the search key value. resulting in records of many search key values being in 2. A hash function h is a function from the set of all search the same bucket. key values K to the set of all bucket addresses B. Choosing the number to be twice the number of We choose a number of buckets to correspond to the search key values in the file gives a good space/ number of search key values we will have stored in the performance tradeoff. database. Hash Indices To perform a lookup on a search key value K i , we 1. A hash index organizes the search keys with their associated compute , and search the bucket with that address. pointers into a hash file structure. If two search keys i and j map to the same address, 2. We apply a hash function on a search key to identify a because , then the bucket at the address bucket, and store the key and its associated pointers in the obtained will contain records with both search key bucket (or in overflow buckets). values. 3. Strictly speaking, hash indices are only secondary index In this case we will have to check the search key value of structures, since if a file itself is organized using hashing, every record in the bucket to get the ones we want. there is no need for a separate hash index structure on it. Insertion and deletion are simple. 125

131 All such entries will have a common hash prefix, but DATABASE Dynamic Hashing 1. As the database grows over time, we have three options: the length of this prefix may be less than i. Choose hash function based on current file size. Get So we give each bucket an integer giving the length of performance degradation as file grows. the common hash prefix. Choose hash function based on anticipated file size. Number of bucket entries pointing to bucket j is then MANAGEMENT Space is wasted initially. 2 (iij). Periodically re-organize hash structure as file grows. 4. To find the bucket containing search key value Ki: Requires selecting new hash function, recomputing all Compute h(Kt ). addresses and generating new bucket assignments. Take the first i high order bits of h(Kt). Costly, and shuts down database. Look at the corresponding table entry for this i-bit 2. Some hashing techniques allow the hash function to be string. modified dynamically to accommodate the growth or Follow the bucket pointer in the table entry. shrinking of the database. These are called dynamic hash functions. 5. We now look at insertions in an extendable hashing scheme. Extendable hashing is one form of dynamic hashing. Follow the same procedure for lookup, ending up in some bucket j. Extendable hashing splits and coalesces buckets as database size changes. If there is room in the bucket, insert information and insert record in the file. This imposes some performance overhead, but space efficiency is maintained. If the bucket is full, we must split the bucket, and redistribute the records. As reorganization is on one bucket at a time, overhead is acceptably low. If bucket is split we may need to increase the number of bits we use in the hash. 3. How does it work? 6. Two cases exist: 1. If i = ij, then only one entry in the bucket address table points to bucket j. Then we need to increase the size of the bucket address table so that we can include pointers to the two buckets that result from splitting bucket j. We increment i by one, thus considering more of the hash, and doubling the size of the bucket address table. Each entry is replaced by two entries, each containing original value. Now two entries in bucket address table point to bucket j. General extendable hash structure. We allocate a new bucket z, and set the second We choose a hash function that is uniform and pointer to point to z. random that generates values over a relatively large Set ij and iz to i. range. Rehash all records in bucket j which are put in either Range is b-bit binary integers (typically b=32). j or z. is over 4 billion, so we dont generate that many Now insert new record. buckets! It is remotely possible, but unlikely, that the new Instead we create buckets on demand, and do not use hash will still put all of the records in one bucket. all b bits of the hash initially. If so, split again and increment i again. At any point we use i bits where . 2. If i>ij , then more than one entry in the bucket address The i bits are used as an offset into a table of bucket table points to bucket j. addresses. Then we can split bucket j without increasing the Value of i grows and shrinks with the database. size of the bucket address table (why?). Note that the i appearing over the bucket address table Note that all entries that point to bucket j tells how many bits are required to determine the correspond to hash prefixes that have the same correct bucket. value on the leftmost ij bits. It may be the case that several entries point to the same bucket. 126

132 We allocate a new bucket z, and set ij and iz to the DATABASE Points to Ponder original ij value plus 1. An index for a file works like a catalogue in a library Now adjust entries in the bucket address table that In order to allow fast random access, an index structure may previously pointed to bucket j. be used. Leave the first half pointing to bucket j, and make Indices whose search key specifies an order different from MANAGEMENT the rest point to bucket z. the sequential order of the file are called the secondary Rehash each record in bucket j as before. indices Reattempt new insert. Dense Index appears for every search key value in file. This 7. Note that in both cases we only need to rehash records in record contains search key value and a pointer to the actual bucket j. record. 8. Deletion of records is similar. Buckets may have to be Sparse Index: are created only for some of the records. coalesced, and bucket address table may have to be halved. Regardless of what form of index is used, every index must 9. Insertion is illustrated for the example deposit file be updated whenever a record is either inserted into or deleted from the file. 32-bit hash values on bname are shown An initial empty hash structure is shown Hashing involves computing the address of a data item by computing a function on the search key value. We insert records one by one. We (unrealistically) assume that a bucket can only hold Review Terms 2 records, in order to illustrate both situations Index described. Ordered index As we insert the Perryridge and Round Hill records, Dense Index this first bucket becomes full. Sparse index When we insert the next record (Downtown), we must Hashing split the bucket. Hash function Since i = i0, we need to increase the number of bits we use from the hash. Static hashing We now use 1 bit, allowing us 21 = 2 buckets. Dynamic hashing This makes us double the size of the bucket address Students Activity table to two entries. 1. Define index and hashing? We split the bucket, placing the records whose search key hash begins with 1 in the new bucket, and those with a 0 in the old bucket Next we attempt to insert the Redwood record, and find it hashes to 1. That bucket is full, and i = i1. So we must split that bucket, increasing the number of bits we must use to 2. 2. Define ordered indices? This necessitates doubling the bucket address table again to four entries (F We rehash the entries in the old bucket. We continue on for the deposit records , obtaining the extendable hash structure 8. Advantages: Extendable hashing provides performance that does 3. Differentiate between sparse index & dense index? not degrade as the file grows. Minimal space overhead - no buckets need be reserved for future use. Bucket address table only contains one pointer for each hash value of current prefix length. 9. Disadvantages: Extra level of indirection in the bucket address table Added complexity 127

133 4. Define secondary index & multilevel index? DATABASE MANAGEMENT 5. Define Hashing?What is hash function? 6. Differentiate between Static hashing & dynamic hashing? 128

134 DATABASE MANAGEMENT 129 Student Notes

135 DATABASE LESSON 40: TRANSACTIONS MANAGEMENT MANAGEMENT Lesson Objectives To gain a better understanding of ACID properties and the Transactions need for them, con-sider a simplified banking system consisting of several accounts and a set of trans-actions that access and ACID properties update those accounts. For the time being, we assume that the Transaction state database permanently resides on disk, but that some portion of Implementation of ACID properties it is temporarily residing in main memory. Collections of operations that form a single logical unit of Transactions access data using two operations: work are called transac-tions. A database system must ensure read(X), which transfers the data item X from the database proper execution of transactions despite fail-ures-either the to a local buffer belonging to the transaction that executed entire transaction executes, or none of it does. For example, a the read operation. transfer of funds from a checking account to a savings account write(X), which transfers the data item X from the the local is a single operation from the customers standpoint; within the buffer of the trans action that executed the write back to the database system, however, it consists of several operations. database. Clearly, it is essential that all these operations occur, or that, in case of a failure, none occur. Furthermore, it must manage In a real database system, the write operation does not necessar- concurrent execution of transactions in a way that avoids the ily result in the imme-diate update of the data on the disk; the introduction of inconsistency. In our funds-transfer example, a write operation may be temporarily stored in memory and transaction computing the customers total money might see executed on the disk later. For now, however, we shall assume the checking-account balance before it is debited by the funds - that the write operation updates the database immediately. transfer transaction, but see the savings balance after it is Let Ti be a transaction that transfers $50 from account A to credited. As a result, it would obtain an incorrect result. account B. This trans-action can be defined as Transaction Ti: read(A); A transaction is a unit of program execution that accesses and A := A-50; possibly updates var-ious data items. Usually, a transaction is write(A); initiated by a user program written in a high-level data-manipu- read(B); lation language or programming language (for example, SQL, COBOL, C, C++, or Java), where it is delimited by statements B := B + 50; (or function calls) of the form begin transaction and end write(B). transaction. The transaction consists of all opera-tions executed Let us now consider each of the ACID requirements. (For ease between the begin transaction and end transaction. of presentation, we consider them in an order different from To ensure integrity of the data, we require that the database the order A-C-I-D). system maintain the following properties of the transactions. Consistency: The consistency requirement here is that the These properties are often called the ACID properties, the sum of A and B be unchanged by the execution of the acronym is derived from the first letter of each of the four transaction. Without the consistency requirement, money properties. could be created or destroyed by the transaction It can be Atomicity. Either all operations of the transaction are verified easily that, if the database is consistent before an reflected properly in the database, or none are. execution of the transaction, the database remains Consistency. Execution of a transaction in isolation (that consistent after the execution of the transac-tion. is, with no other transaction executing concurrently) Ensuring consistency for an individual transaction is the preserves the consistency of the database. responsibility of the application programmer who codes the Isolation. Even though multiple transactions may execute transaction. This task may be facil-itated by automatic testing of concurrently, the system guarantees that, for every pair of integrity constraints transactions Ti and Tj, it appears to Ti that either Tj finished Atomicity: Suppose that, just before the execution of execution before Ti started, or Tj started execu-tion after Ti transaction Ti the values of accounts A and Bare $1000 and finished. Thus, each transaction is unaware of other $2000, respectively. Now suppose that, dur-ing the transactions executing concurrently in the system. execution of transaction Ti, a failure occurs that prevents Ti Durability. After a transaction completes successfully, the from com-pleting its execution successfully. Examples of changes it has made to the database persist, even if there are such failures include power failures, hardware failures, and system failures. software errors. Further, suppose that the fail-ure happened after the write (A) operation but before the write(B) 130

136 operation. In this case, the values of accounts A and B is executing, with the de-ducted total written to A and the DATABASE reflected in the database are $950 and $2000. The system increased total yet to be written to B. If a second concurrently destroyed $50 as a result of this failure. In particular, we running transaction reads A and B at this intermediate point note that the sum A + B is no longer preserved. and computes A + B, it will observe an inconsistent value. Thus, because of the failure, the state of the system no longer Furthermore, if this second transaction then performs updates on A and B based on the in-consistent values that it read, the MANAGEMENT reflects a real state of the world that the database is supposed to capture. We term such a state an inconsistent state. We must database may be left in an inconsistent state even after both ensure that such inconsistencies are not visible in a database transactions have completed. system. Note, however, that the system must at some point be A way to avoid the problem of concurrently executing transac- in an inconsistent state. Even if transaction Ti is executed to tions is to execute transactions serially-that is, one after the comple-tion, there exists a point at which the value of account other. However, concur-rent execution of transactions provides A is $950 and the value of account B is $2000, which is clearly significant performance benefits Other solutions have therefore an inconsistent state. This state, how-ever, is eventually replaced been developed; they allow multiple transactions to execute by the consistent state where the value of account A is $950, concurrently. and the value of account B is $2050. Thus, if the transaction The isolation property of a transaction ensures that the concur- never started or was guaranteed to complete, such an inconsis- rent execution of transactions results in a system state that is tent state would not be visible except during the execution of equivalent to a state that could have been obtained had these the transaction. That is the reason for the atomicity requirement: transactions executed one at a time in some order. Ensuring the If the atomicity property is present, all actions of the transac- isolation property is the responsibility of a component of the tion are reflected in the database, or none are. database system called the concurrency-control component. The basic idea behind ensuring atomicity is this: The database Transaction State system keeps track (on disk) of the old values of any data on In the absence of failures, all transactions complete successfully. which a transaction performs a write, and, if the transaction However, as we noted earlier, a transaction may not always does not complete its execution, the database sys-tem restores complete its execution successfully. Such a transaction is termed the old values to make it appear as though the transaction never aborted. If we are to ensure the atomicity property, an aborted executed. transaction must have no effect on the state of the database. Durability: Once the execution of the transaction Thus, any changes that the aborted transaction made to the completes successfully, and the user who initiated the database must be undone. Once the changes caused by an transaction has been notified that the transfer of funds has aborted transaction have been undone, we say that the transac- taken place, it must be the case that no system failure will tion has been rolled back. It is part of the responsibility of the result in a loss of data corresponding to this transfer of recovery scheme to manage transaction aborts. funds. A transaction that completes its execution successfully is said to The durability property guarantees that, once a transaction be committed. A committed transaction that has performed completes suc-cessfully, all the updates that it carried out on the updates transforms the database into a new consistent state, database persist, even if there is a system failure after the which must persist even if there is a system failure. transaction completes execution. Once a transaction has committed, we cannot undo its effects by We assume for now that a failure of the computer system may aborting it. The only way to undo the effects of a committed result in loss of data in main memory, but data written to disk transaction is to execute a compensating transaction. For are never lost. We can guarantee durability by ensuring that instance, if a transaction added $20 to an account, the compen- either sating transaction would subtract $20 from the account. 1. The updates carried out by the transaction have been written However, it is not always possible to create such a compensating to disk be-fore the transaction completes. transaction. Therefore, the responsibility of writing and 2. Information about the updates carried out by the executing a compensating transaction is left to the user, and is transaction and written to disk is sufficient to enable the not handled by the database system. database to reconstruct the updates when the database We need to be more precise about what we mean by successful system is restarted after the failure. completion of a trans-action. We therefore establish a simple Ensuring durability is the responsibility of a component of the abstract transaction model. A transaction must be in one of the database sys-tem called the recovery-management component. following states: The transaction-manage-ment component and the recovery- Active, the initial state; the transaction stays in this state management component are closely re-lated. while it is executing . Partially committed, after the final Isolation: Even if the consistency and atomicity properties statement has been executed are ensured for each transaction, if several transactions are Failed, after the discovery that normal execution can no executed concurrently, their oper-ations may interleave in longer proceed some undesirable way, resulting in an inconsistent state. Aborted, after the transaction has been rolled back and the For example, as we saw earlier, the database is temporarily database has been restored to its state prior to the start of inconsistent while the transaction to transfer funds from A to B the transaction 131

137 Committed, after successful completion writes temporarily in nonvolatile storage, and to perform the ac- DATABASE The state diagram corresponding to a transaction appears in tual writes only after the transaction enters the committed state. Figure 23.1 . We say that a transaction has committed only if it If the system should fail after the transaction has entered the has entered the committed state. Simi-larly, we say that a committed state, but before it could complete the external transaction has aborted only if it has entered the aborted state. writes, the database system will carry out the external writes (using\~he data in nonvolatile storage) when the system is MANAGEMENT A transaction is said to have terminated if has either committed or aborted. restarted. A transaction starts in the active state. When it finishes its final Handling external writes can be more complicated in some statement, it enters the partially committed state. At this point, situations. For example suppose the external action is that of the transaction has completed its exe-cution, but it is still dispensing cash at an automated teller machine, and the system possible that it may have to be aborted, since the actual output fails just before the cash is actually dispensed (we assume that may still be temporarily residing in main memory, and thus a cash can be dispensed atomically). It makes no sense to hardware failure may preclude its successful completion. dispense cash when the system is restarted, since the user may have left the machine. In such a case a compensat-ing transac- The database system then writes out enough information to tion, such as depositing the cash back in the users account, disk that, even in the event of a failure, the updates performed needs to be executed when the system is restarted. by the transaction can be re-created when the system restarts after the failure. When the last of this information is written For certain applications, it may be desirable to allow active out, the transaction enters the committed state. transactions to dis-play data to users, particularly for long- duration transactions that run for minutes or hours. As mentioned earlier, we assume for now that failures do not Unfortunately, we cannot allow such output of observable data result in loss of data on disk. unless we are willing to compromise transaction atomicity. Most A transaction enters the failed state after the system determines current transaction systems ensure atomicity and, therefore, that the transac-tion can no longer proceed with its normal forbid this form of interaction with users. execution (for example, because of hard-ware or logical errors). Such a transaction must be rolled back. Then, it enters the Implementation of Atomicity and aborted state. At this point, the system has two options: Durability The recovery-management component of a database system can Committed support atomicity and durability by a variety of schemes. We Partially committed first consider a simple, but extremely in-efficient, scheme called the shadow copy scheme. This scheme, which is based on making copies of the database, called shadow copies, assumes that only one transac-tion is active at a time. The scheme also Active assumes that the database is simply a file on disk. A pointer called db-pointer is maintained on disk; it points to the current copy of the database. In the shadow-copy scheme, a transaction that wants to update the database first creates a complete copy of the database. All Failed Aborted updates are done on the new database copy, leaving the original copy, the shadow copy, untouched. If at any point the trans- action has to be aborted, the system merely deletes the new Figure 23.1 state diagram of a transaction copy. The old copy of the database has not been affected. It can restart the transaction, but only if the transaction was If the transaction completes, it is committed as follows. First, aborted as a result of some hardware or software error that the operating system is asked to make sure that all pages of the was not created through the internal logic of the new copy of the database have been written out to disk. (Unix transaction. A restarted transaction is considered to be a new systems use the flush command for this purpose.) After the transaction. operat-ing system has written all the pages to disk, the database It can kill the transaction. It usually does so because of system updates the pointer db-pointer to point to the new copy some internal logical error that can be corrected only by of the database; the new copy then becomes the current copy of rewriting the application program, or be-cause the input was the database. The old copy of the database is then deleted. Fig- bad, or because the desired data were not found in the ure 23.2 depicts the scheme, showing the database state before database. and after the update. We must be cautious when dealing with observable external db-pointer db-pointer writes, such as writes to a terminal or printer. Once such a write has occurred, it cannot be erased, since it may have been seen external to the database system. Most systems allow such writes Old copy of Old copy of database (to be to take place only after the transaction has entered the commit- database deleted ) New copy of database ted state. One way to implement such a scheme is for the database system to store any value associated with such external (a) Before update (a) After update 132

138 Figure 23.2 Shadow-copy technique for atomicity and durabil- Unfortunately, this implementation is extremely inefficient in DATABASE ity. the context of large databases, since executing a single transac- The transaction is said to have been committed at the point tion requires copying the entire database. Furthermore, the where the updated db-pointer is written to disk. implementation does not allow transactions to execute concurrently with one another. There are practical ways of We now consider how the technique handles transaction and implementing atomicity and durability that are much less MANAGEMENT system failures. First, consider transaction failure. If the expensive and more powerful. transaction fails at any time before db-pointer is updated, the old contents of the database are not affected. We can abort the Points to Ponder trans-action by just deleting the new copy of the database. Once Collections of operations that form a single logical unit of the transaction has been committed, all the updates that it work are called transac-tions performed are in the database pointed to by db-pointer. Thus, Usually, a transaction is initiated by a user program written either all updates of the transaction are reflected, or none of the in a high-level data-manipulation language or programming effects are reflected, regardless of transaction failure. language Now consider the issue of system failure. Suppose that the Transactions access data using two operations: system fails at any time before the updated db-pointer is written to disk. Then, when the system restarts, it will read db-pointer Read(X) arid will thus see the original contents of the database, and Write(X) none of the effects of the transaction will be visible on the A transaction may not always complete its execution database. Next, suppose that the system fails after db-pointer successfully. Such a transaction is termed aborted. has been updated on disk. Before the pointer is updated, all A transaction must be in one of the following states: updated pages of the new copy of the database were written to Active disk. Again, we assume that, once a file is written to disk, its contents will not be damaged even if there is a system failure. Failed Therefore, when the system restarts, it will read db-pointer and Aborted will thus see the contents of the database after all the updates Commited performed by the transaction. The recovery-management component of a database system The implementation actually depends on the write to db- can support atomicity and durability by a variety of schemes pointer being atomic; that is, either all its bytes are written or none of its bytes are written. If some of the bytes of the Review Terms pointer were updated by the write, but others were not, the Transactions pointer is meaningless, and neither old nor new versions of the ACID properties database may be found when the system restarts. Luckily, disk Transaction state systems provide atomic updates to entire blocks, or at least to a disk sector. In other words, the disk system guarantees that it Implementation of ACID properties will update db-pointer atomically, as long as we make sure that Students Activity db-pointer lies entirely in a single sector, which we can ensure by 1. Define transaction with the help of an example? storing db-pointer at the beginning of a block. Thus, the atomicity and durability properties of transactions are ensured by the shadow-copy implementation of the recovery- management component. As a simple example of a transaction outside the database domain, consider a text editing session. An entire editing session can be modeled as a transaction. The actions executed by the transaction are reading and updating the file. Saving the file at the end of editing corresponds to a commit of the editing 2. Define ACID properties of a database? transaction; quitting the editing session without saving the file corresponds to an abort of the editing transaction. Many text editors use essentially the implementation just described, to ensure that an editing session is transactional. A new file is used to store the updated file. At the end of the editing session, if the updated file is to be saved, the text editor uses a file rename command to rename the new file to have the actual file name. The rename, assumed to be implemented as an 3. Define various states of a transaction ? atomic operation by the underlying file system, deletes the old file as well. 133

139 DATABASE MANAGEMENT 4. Define the implementation of Atomicity property of a database? 5. Define implementation of Durability? 6. Define the importance of consistency in a database transaction? 7. Define Isolation of a database? 134

140 DATABASE MANAGEMENT 135 Student Notes

141 DATABASE LESSON 41: CONCURRENCY CONTROL - I MANAGEMENT Lesson objectives to help identify those executions that are guaranteed to ensure concurrent executions consistency. Advantages of concurrency The database system must control the interaction among the concurrent trans-actions to prevent them from destroying the Problems with concurrency consistency of the database. It does so through a variety of Serializibility mechanisms called concurrency-control schemes Concurrent Executions Consider again the simplified banking system of Section 23.1, Transaction-processing systems usually allow multiple transac- which has several accounts, and a set of transactions that access tions to run concur-rently. Allowing multiple transactions to and update those accounts. Let Tl and T2 be two transactions update data concurrently causes several complications with that transfer funds from one account to another. Transaction Tl consistency of the data, as we saw earlier. Ensuring consistency transfers $50 from account A to account R It is defined as in spite of concurrent execution of transactions requires extra T1: read(A); work; it is far easier to insist that transactions run serially-that is, A := A-50; one at a time, each starting only after the previous one has completed. However, there are two good reasons for allowing write(A); concurrency: read(B); Improved throughput and resource utilization. A B := B + 50; transaction consists of many steps. Some involve I/O write(B). activity; others involve CPU activity. The CPU and the disks in a computer system can operate in parallel. Therefore, I/O Transaction T2 transfers 10 percent of the balance from account activity can be done in parallel with processing at the CPU. A to account B. It is defined as The parallelism of the CPU and the I/O system can T2: read(A); therefore be exploited to run multiple transactions in temp:= A * 0.1; parallel. While a read or write on behalf of one transaction A :=A - temp; is in progress on one disk, another transaction can be running in the CPU, while another disk may be executing a write(A); read or write on behalf of a third transaction. All of this read(B); increases the throughput of the system-that is, the number B := B + temp; of transactions executed in a given amount of time. write(B). Correspondingly, the processor and disk utilization also increase; in other words, the processor and disk spend less Suppose the current values of accounts A and Bare $1000 and time idle, or not performing any useful work. $2000, respectively. Suppose also that the two transactions are executed one at a time in the order Tl followed by T2. This Reduced waiting time. There may be a mix of execution sequence appears in Figure 23.3. In the figure, the transactions running on a sys-tem, some short and some sequence of instruction steps is in chronological order from top long. If transactions run serially, a short transaction may to bottom, with in-structions of Tl appearing in the left have to wait for a preceding long transaction to complete, column and instructions of T2 appearing in the right column. which can lead to unpredictable delays in running a The final values of accounts A and B, after the execution in transaction. If the transactions are oper-ating on different Figure 23.3 takes place, are $855 and $2145, respectively. Thus, parts of the database, it is better tolet them run the total amount of money in concurrently, sharing the CPU cycles and disk accesses among them. Concurrent execution reduces the T1 T2 unpredictable delays in running transactions. Moreover, it read(A) read(A) also reduces the average response time: the average time for A := A-50 temp:= A * 0.1 a transaction to be completed after it has been submitted. write (A) A := A ~ temp The motivation for using concurrent execution in a database is read(B) write(A) essentially the same as the motivation for using multiprogram- ming in an operating system. B := B + 50 read(B) When several transactions run concurrently, database consistency write (B) B := B + temp can be destroyed despite the correctness of each individual write(B) transaction. In this section, we present the concept of schedules Figure 41.1 Schedule I-a serial schedule in which Tl is followed by T2. 136

142 Accounts A and B-that is, the sum A + B - is preserved after T1 T2 DATABASE the execution of both transactions. read(A) Similarly, if the transactions are executed one at a time in the A := A-50 order T2 followed by TI, then the corresponding execution write(A) sequence is that of Figure 41.2. Again, as expected, the sum A + read(A) MANAGEMENT B is preserved, and the final values of accounts A and Bare $850 and $2150, respectively. temp := A * 0.1 The execution sequences just described are called schedules. A :=A - temp They represent the chronological order in which instructions are write(A) executed in the system. Clearly, a sched-ule for a set of transac- read(B) tions must consist of all instructions of those transactions, and must preserve the order in which the instructions appear in each B := B + 50 individual transac-tion. For example, in transaction TI, the write(B) instruction write(A) must appear before the instruction read(B), read(B) in any valid schedule. In the following discussion, we shall refer B := B + temp to the first execution sequence (TI followed by T2) as schedule 1, and to the second execution sequence (T2 followed by T1 )as write(B) schedule 2. Figure 41.3 Schedule 3-a concurrent schedule equivalent to These schedules are serial: Each serial schedule consists of a schedule 1. sequence of instruc-tions from various transactions, where the another transaction. Thus, the number of possible schedules instructions belonging to one single trans-action appear for a set of n transactions is much larger than n!. together in that schedule. Thus, for a set of n transactions, there Returning to our previous example, suppose that the two exist n! different valid serial schedules. transactions are exe-cuted concurrently. One possible schedule When the database system executes several transactions appears in Figure 41.3. After this execution takes place, we arrive concurrently, the corre-sponding schedule no longer needs to be at the same state as the one in which the transactions are serial. If two transactions are running con-currently, the executed serially in the order Tl followed by T2. The sum A + B operating system may execute one transaction for a little while, is indeed preserved. then perform a context switch, execute the second transaction Not all concurrent executions result in a correct state. To for some time, and then switch back to the first transaction for illustrate, consider the schedule of Figure 23.6. After the some time, and so on. With multiple transac-tions, the CPU execution of this schedule, we arrive at a state where the final time is shared among all the transactions. values of accounts A and Bare $950 and $2100, respectively. Several execution sequences are possible, since the various This final state is an inconsistent state, since we have gained $50 instructions from both transactions may now be interleaved. In in the process of the concur-rent execution. Indeed, the sum A general, it is not possible to predict exactly how many instruc- + B is not preserved by the execution of the two transactions. tions of a transaction will be executed before the CPU switches If control of concurrent execution is left entirely to the operat- to ing system, many possible schedules, including ones that leave Tl T2 the database in an inconsistent state, such as the one just read(A) described, are possible. It is the job of the database system to ensure that any schedule that gets executed will leave the temp:= A * 0.1 database in a consistent state. The concurrency-control compo- A :=A - temp nent of the database system carries out this task. write(A) We can ensure consistency of the database under concurrent read(B) execution by making sure that any schedule that executed has B := B + temp the same effect as a schedule that could have occurred without any concurrent execution. That is, the schedule should, in some write(B) sense, be equivalent to a serial schedule. We examine this idea in read(A) Section A := A-50 Serializability write(A) The database system must control concurrent execution of read(B) transactions, to ensure that the database state remains consis- B := B + 50 tent. Before we examine how the database write(B) T1 T2 Figure 41.2 Schedule 2-a serial schedule in which T2 is followed read(A) by TI. A := A-50 read(A) 137

143 temp := A * 0.1 Concurrent execution reduces the unpredictable delays in DATABASE A :=A temp running transactions. Moreover, it also reduces the average response time write (A) The database system must control concurrent execution of read(B) transactions, to ensure that the database state remains write (A) MANAGEMENT consistent read(B) Since transactions are programs, it is computationally B := B + 50 difficult to determine ex-actly what operations a transaction write (B) performs and how operations of various trans-actions interact B := B + temp write (B) Review Terms Figure 41.4 Schedule 4-a concurrent schedule. Concurrent executions system can carry out this task, we must first understand which Advantages of concurrency schedules will en-sure consistency, and which schedules will not. Problems with concurrency Since transactions are programs, it is computationally difficult to Serializibility determine ex-actly what operations a transaction performs and Students Activity how operations of various trans-actions interact. For this 1. Define concurrency? reason, we shall not interpret the type of operations that a transaction can perform on a data item. Instead, we consider only two operations: read and write. We thus assume that, between a read (Q) instruction and a write (Q) instruction on a data item Q, a transaction may perform an arbitrary sequence of op-erations on the copy of Q that is residing in the local buffer of the transaction. Thus, the only significant operations of a transaction, from a scheduling point of view, are its read and write instructions. We shall therefore usually show only read and write instructions in schedules, as we do in schedule 3 in 2. Define the advantages & disadvantages of concurrency Figure 41.5. control? In this section, we discuss different forms of schedule equiva- lence; they lead to the notions of conflict serializability and view serializability. T1 T2 read(A) write(A) read(A) write(A) read(B) 3. Define serializibility? write(B) read(B) write(B) Figure 41.5 Schedule 3-showing only the read and write instructions. Points to Ponder Transaction-processing systems usually allow multiple transactions to run concur-rently 4. When a database can enter into an inconsistent state? Allowing multiple transactions to update data concurrently causes several complications with consistency of the data The parallelism of the CPU and the I/O system can therefore be exploited to run multiple transactions in parallel 138

144 5. How we can ignore inconsistency problem? DATABASE MANAGEMENT 139

145 Student Notes 140 DATABASE MANAGEMENT

146 DATABASE LESSON 42: CONCURENCY CONTROL - II MANAGEMENT Lesson objectives new schedule S!. We expect S to be equivalent to S!, since all Types of serializibility instructions appear in the same order in both schedules except for Ii and Ij, whose order does not matter. Conflict Serializability Since the write(A) instruction of T2 in schedule 3 of Figure 23.7 View Serializability does not conflict with the read(B) instruction of TI, we can Implementation of isolation swap these instructions to generate an equivalent schedule, Transaction definition in SQL schedule 5, in. Figure 23.8. Regardless of the initial system state, Conflict Serializability schedules 3 and 5 both produce the same final system state. Let us consider a schedule 5 in which there are two consecutive We continue to swap non-conflicting instructions: instructions Ii and Ij, of transactions Ti and Tj, respectively (i Swap the read (B) instruction of TI with the read (A) j). If Ii and Ij refer to different data items, then we can swap Ii instruction of T2. and Ij without affecting the results of any instruction in the Swap the write (B) instruction of TI with the write (A) schedule. However, if Ii and Ij refer to the same data item Q, instruction of T2. then the order of the two steps may matter. Since we are dealing Swap the write(B) instruction of TI with the read(A) with only read and write instructions, there are four cases that instruction of T2. we need to consider: T1 T2 1. Ii = read(Q), Ij = read(Q). The order of Ii and Ij does not matter, since the same value of Q is read by Ti and Tj, Read (A) regardless of the order. Write (A) 2. Ii = read(Q), Ij = write(Q). If Ii comes before Ij, then Ti Read (A) does not read the value of Q that is written by Tj in Read (B) instruction Ij. If Ij comes before Ii, then Ti reads the value of Q that is written by Tj. Thus, the order of Ii and Ij Write (A) matters. Write (B) 3. Ii = write(Q), Ij = read(Q). The order of Ii and Ij matters Read (B) for reasons similar to those of the previous case. Write (B) 4. Ii = write(Q), Ij = write(Q). Since both instructions are write Figure 42.1 Schedule 5-schedule 3 after swapping of a pair of operations, the order of these instructions does not affect instructions. either Ti or Tj. However, the value obtained by the next The final result of these swaps, schedule 6 of Figure 42.2, is a read(Q) instruction of 5 is affected, since the result of only serial schedule. Thus, we have shown that schedule 3 is the latter of the two write instructions is preserved in the equivalent to a serial schedule. This equivalence implies that, database. If there is no other write(Q) instruction after Ii regardless of the initial system state, schedule 3 will produce the and Ij in 5, then the order of Ii and Ij directly affects the final same final state as will some serial schedule. value of Q in the database state that results from schedule 5. If a schedule S can be transformed into a schedule S by a series of swaps of non--conflicting instructions, we say that Sand S Thus, only in the case where both Ii and Ij are read instructions are conflict equivalent. does the relative order of their execution not matter. In our previous examples, schedule 1 is not conflict equivalent We say that Ii and Ij conflict if they are operations by different to schedule 2. How- ever, schedule 1 is conflict equivalent to transactions on the same data item, and at least one of these schedule 3, Because the read(B) and write(B) instruction of Tl instructions is a write operation. can be swapped with the read(A) and write(A) instruction of T2. To illustrate the concept of conflicting instructions, we consider The concept of conflict equivalence leads to the concept of schedule 3, in Fig-ure 23.7. The write(A) instruction of Tl conflict serializability. We say that a schedule S is conflict conflicts with the read(A) instruction of T2. However, the serializable if it is conflict equivalent to a serial schedule. Thus, write(A) instruction of T2 does not conflict with the read(B) schedule 3 is conflict serializable, since it is conflict equivalent to instruction of TI, because the two instructions access different the serial schedule 1. data items. Finally, consider schedule 7 of Figure 42.3; it consists of only Let Ii and Ij be consecutive instructions of a schedule 5. If Ii and the significant op-erations (that is, the read and write) of Ij are instructions of different transactions and Ii and Ij do not transactions T3 and T4. This schedule is not conflict serializable, conflict, then we can swap the order of Ii and Ij to produce a 141

147 since it is not equivalent to either the serial schedule or read(B) DATABASE the serial schedule . B := B + 50 It is possible to have two schedules that produce the same write(B) outcome, but that are not conflict equivalent. For example, read(A) consider transaction T5, which transfers $10 A := A + 10 MANAGEMENT T1 T2 write (A) read(A) figure 42.4 schedule 8. write(A) Consider two schedules S and S, where the same set of read(B) transactions participates in both schedules. The schedules S and write(B) S are said to be view equivalent if three conditions are met: read(A) 1. For each data item Q, if transaction Ti reads the initial value write(A) of Q in schedule S, then transaction Ti must, in schedule S, read(B) also read the initial value of Q. write (B) 2. For each data item Q, if transaction Ti executes read(Q) in schedule S, and if that value was produced by a write(Q) Figure 42.2 Schedule 6-a serial schedule that is equivalent to operation executed by transaction Ti, then the read(Q) schedule 3. operation of transaction Ti must, in schedule S, also read T3 T4 the value of Q that was produced by the same write(Q) Read (Q) operation of transaction Ti. Write (Q) 3. For each data item Q, the transaction (if any) that performs Write (Q) the final write(Q) operation in schedule S must perform the final write(Q) operation in sched-ule S. Figure 42.3 Schedule 7. Conditions 1 and 2 ensure that each transaction reads the same from account B to account A. Let schedule 8 be as defined in values in both schedules and, therefore, performs the same Figure 42.4. We claim that schedule 8 is not conflict equivalent computation. Condition 3, coupled with conditions 1 and 2, to the serial schedule , since, in schedule 8, the write (B) ensures that both schedules result in the same final system instruction of Ts conflicts with the read (B) instruction of TI. state. Thus, we cannot move all the instructions of TI before those of T5 by swapping con-secutive non-conflicting instructions. In our previous examples, schedule 1 is not view equivalent to However, the final values of accounts A and B after the schedule 2, since, in schedule 1, the value of account A read by execution of either schedule 8 or the serial schedule are transaction T2 was produced by T1 whereas this case does not the same -$960 and $2040, respectively. hold in schedule 2. However, schedule 1 is view equivalent to schedule 3, because the values of account A and B read by We can see from this example that there are less stringent transaction T2 were produced by T1 in both schedules. definitions of schedule equivalence than conflict equivalence. For the system to determine that schedule 8 produces the same The concept of view equivalence leads to the concept of view outcome as the serial schedule , it must analyze the serializability. We say that a schedule 5 is view serializable if it is com-putation performed by TI and T5, rather than just the read view equivalent to a serial schedule. and write operations. In general, such analysis is hard to As an illustration, suppose that we augment schedule 7 with implement and is computationally expensive. How-ever, there transaction T6, and obtain schedule 9 in Figure 42.5. Schedule 9 are other definitions of schedule equivalence based purely on is view serializable. Indeed, it is view equivalent to the serial the read and write operations. We will consider one such schedule , since the one read(Q) instruction reads definition in the next section. the initial value of Q in both schedules, and T6 performs the final write of Q in both schedules. View Serializability In this section, we consider a form of equivalence that is less Every conflict-serializable schedule is also view serializable, but stringent than conflict equivalence, but that, like conflict there are view-serializable schedules that are not conflict equivalence, is based on only the read and write operations of serializable. Indeed, schedule 9 is not con-flict serializable, since transactions. every pair of consecutive instructions conflicts, and, thus, no swapping of instructions is possible. TI T2 Observe that, in schedule 9, transactions T4 and T6 perform read(A) write(Q) operations without having performed a read(Q) A := A-50 operation. Writes of this sort are called blind writes. Blind write (A) writes appear in any view-serializable schedule that is not conflict read(B) seri-alizable. B := B - 10 write(B) 142

148 schedule must have the same effect as would some serial DATABASE T3 T4 T6 schedule. Thus, conflict and view serializability are both read( Q) acceptable. write(Q) The SQL.92 standard also allows a transaction to specify that it write( Q) write(Q) may be executed in a manner that causes it to become Figure 42.5 Schedule 9-a view-serializable schedule. MANAGEMENT nonserializable with respect to other transactions. We study Implementation of Isolation such weaker levels of consistency in Section 16.8. So far, we have seen what properties a schedule must have if it Points to Ponder is to leave the database in a consistent state and allow transac- A transaction is a unit of program execution that accesses tion failures to be handled in a safe manner. Specifically, and possibly updates var-ious data items. schedules that are conflict or view serializable and cascadeless satisfy these requirements. To ensure integrity of the data, we require that the database system maintain the following properties of the There are various concurrency-control schemes that we can use transactions. These properties are often called the ACID to ensure that, even when multiple transactions are executed properties. concurrently, only acceptable sched-ules are generated, regardless of how the operating-system time-shares resources (such as Allowing multiple transactions to update data concurrently CPU time) among the transactions. causes several complications with consistency As a trivial example of a concurrency-control scheme, consider The database system must control concurrent execution of this scheme: A transaction acquires a lock on the entire database transactions, to ensure that the database state remains before it starts and releases the lock after it has committed. consistent While a transaction holds a lock, no other transaction is allowed Review Terms to acquire the lock and all must therefore wait for the lock to be Inconsistent state released. As a result of the locking policy, only one transaction can execute at a time. Therefore, only serial schedules are Transaction state generated. These are trivially serializable, and it is easy to verify Active that they are cascadeless as well. Partially committed A concurrency-control scheme such as this one leads to poor Failed performance, since it forces transactions to wait for preceding Aborted transactions to finish before they can start. In other words, it Committed provides a poor degree of concurrency. As explained in Section 23.4, concurrent execution has several performance benefits. Terminated The goal of concurrency-control schemes is to provide a high Transaction degree of concur-rency, while ensuring that all schedules that can Restart be generated are conflict or view serializable, and are cascadeless. Kill The schemes have different trade-offs in terms of the amount Observable external writes of concurrency they allow and the amount of overhead that Shadow copy scheme they incur. Some of them allow only conflict serializable schedules to be generated; others allow certain view-serializable Concurrent executions schedules that are not conflict-serializable to be generated. Serial execution Transaction Definition in SQL Schedules A data-manipulation language must include a construct for Conflict of operations specifying the set of ac-tions that constitute a transaction. Conflict equivalence The SQL standard specifies that a transaction begins implicitly. Conflict serializability Transactions are ended by one of these SQL statements: View equivalence Commit work commits the current transaction and begins View serializability a new one. Rollback work causes the current transaction to abort. Students Activity The keyword work is optional in both the statements. If a 1. Define Transaction? program terminates with-out either of these commands, the updates are either committed or rolled back -which of the two happens is not specified by the standard and depends on the im-plementation. The standard also specifies that the system must ensure both serializability and freedom from cascading rollback. The definition of serializability used by the stan-dard is that a 143

149 2. Define Conflict Serializibility? DATABASE MANAGEMENT 3. Define View Serializibility? 4. Define Commit,Rollback transaction? 144

150 DATABASE MANAGEMENT 145 Student Notes

151 DATABASE LESSON 43: CONCURRENCY CONTROL - III MANAGEMENT Lesson objectives 2. Subsequently, a data item Q can be locked by Ti only if the Locking mechanism parent of Q is currently locked by Ti. Graph based protocol 3. Data items may be unlocked at any time. Timestamps 4. A data item that has been locked and unlocked by Ti cannot subsequently be relocked by T1. Timestamp based protocol All schedules that are legal under the tree protocol are conflict Timestamp ordering protocols serializable. Validation protocol To illustrate this protocol, consider the database graph of When the lock manager receives an unlock message from a Figure 43.1. The follow-ing four transactions follow the tree transaction, it deletes the record for that data item in the linked protocol on this graph. We show only the lock and unlock list corresponding to that transaction. It tests the record that instructions: follows, if any to see if that request can now be granted. If it TlO : lock-X(B); lock-X(E); lock-X(D); unlock(B); unlock(E); can, the lock man-ager grants that request, and processes the lock-X(G); unlock(D); unlock(G). record following it, if any, similarly, and so on. T11: lock-X(D); lock-X(H); unlock(D); unlock(H). If a transaction aborts, the lock manager deletes any waiting request made by the transaction. Once the database system has T12: lock-X(B); lock-X(E); unlock(E); unlock(B). taken appropriate actions to undo the transaction it releases all T13: lock-X(D); lock-X(H); unlock(D); unlock(H). locks held by the aborted transaction. One possible schedule in which these four transactions This algorithm guarantees freedom from starvation for lock participated appears in Figure 43.2. Note that, during its requests, since a re-quest can never be granted while a request execution, transaction T1O holds locks on two dis-joint subtrees. received earlier is waiting to be granted. Observe that the schedule of Figure 43.2 is conflict serializable. Graph-based Protocols It can be shown not only that the tree protocol ensures conflict But, if we wish to develop protocols that are not two phase, we serializability, but also that this proto-col ensures freedom from need additional information on how each transaction will access deadlock. the database. There are various models that can give us the The tree protocol in Figure 43.2 does not ensure recoverability additional information, each differing in the amount of and cascadeless-ness. To ensure recoverability and information provided. The simplest model requires that we cascadelessness, the protocol can be modified to not permit have prior knowledge about the order in which the database release of exclusive locks until the end of the transaction. items will be accessed. Given such information, it is possible to Holding exclu-sive locks until the end of the transaction reduces construct locking protocols that are not two phase, but that, concurrency. Here is an alterna-tive that improves concurrency, nevertheless, ensure conflict serializability. but ensures only recoverability: For each data item with an To acquire such prior knowledge, we impose a partial ordering uncommitted write we record which transaction performed the on the set D = {d1, d2, . . ., dh} of all data items. If di dj, last write to the data item. Whenever a transaction Ti performs a then any transaction accessing both di and dj must access di read of an uncommitted data item, we record a commit before accessing dj. This partial ordering may be the result of dependency of Ti on the transaction that performed the last either the logical or the physical organization of the data, or it write to the data item. Transaction Ti is then not permitted to may be imposed solely for the purpose of concurrency control. commit until the commit of all transactions on which it has a The partial ordering implies that the set D may now be viewed commit dependency. If any of these trans-actions aborts, Ti as a directed acyclic graph, called a database graph. In this section, must also be aborted. for the sake of simplicity, we will restrict our attention to only those graphs that are rooted trees. We will present a simple protocol, called the tree protocol, which is restricted to employ only exclusive locks. References to other, more complex, graph- based locking protocols are in the bibliographical notes. In the tree protocol, the only lock instruction allowed is lock-X. Each transaction Ti can lock a data item at most once, and must observe the following rules: 1. The first lock by Ti may be on any data item. 146

152 Figure 43.1 Tree-structured database graph TS(Ti), and a new transaction Tj enters the system, then TS(Ti) DATABASE < TS(Tj). There are two simple methods for implementing this T10 T11 T12 T13 lock-X(B) scheme: lock-x (D) 1. Use the value of the system clock as the timestamp; that is, lock-x(H) a transactions time-stamp is equal to the value of the clock unlock(D) MANAGEMENT when the transaction enters the system. lock-x(E) lock-x(D) 2. Use a logical counter that is incremented after a new unlock(B) timestamp has been assigned; that is, a transactions unlock(E) timestamp is equal to the value of the counter when the lock-x (B) transaction enters the system. lock-x (E) unlock (H) The timestamps of the transactions determine the serializability lock-X(G) order. Thus, if TS(Ti) < TS(Tj), then the system must ensure unlock(D) that the produced schedule is equiva-lent to a serial schedule in lock-x (D) which transaction Ti appears before transaction Tj. lock-X(H) unlock(D) To implement this scheme, we associate with each data item Q unlock(H) two timestamp values: unlock(E) W-timestamp(Q) denotes the largest timestamp of any unlock(B) unlock (G) transaction that exe-cuted write(Q) successfully. R-timestamp(Q) denotes the largest timestamp of any Figure 43.1 Serializable schedule under the tree protocol. transaction that exe-cuted read(Q) successfully. The tree-locking protocol has an advantage over the two-phase These timestamps are updated whenever a new read(Q) or locking protocol in that, unlike two-phase locking, it is dead- write(Q) instruction is executed. lock-free, so no rollbacks are required. The tree-locking protocol has another advantage over the two-phase locking protocol in The Timestamp-ordering Protocol that unlocking may occur earlier. Earlier unlocking may lead to The timestamp-ordering protocol ensures that any conflicting shorter waiting times, and to an increase in concurrency. read and write opera-tions are executed in timestamp order. However, the protocol has the disadvantage that, in some cases, This protocol operates as follows: a transaction may have to lock data items that it does not access. 1. Suppose that transaction Ti issues read(Q). For example, a transaction that needs to access data items A and a. If TS(Ti) < W-timestamp(Q), then Ti needs to read a J in the database graph of Figure 43.1 must lock not only A and value of Q that was already overwritten. Hence, the J, but also data items B, D, and H. This additional locking read operation is rejected, and Ti is rolled back. results in increased locking overhead, the possibility of addi- b. If TS(Ti) ~ W-timestamp(Q), then the read operation tional waiting time, and a potential decrease in concurrency. is executed, and R-timestamp(Q) is set to the Further, without prior knowledge of what data items will need maximum of R-timestamp(Q) and TS(Ti). to be locked, transactions will have to lock the root of the tree, 2. Suppose that transaction Ti issues write(Q). and that can reduce concurrency greatly. a. If TS(Ti) < R-timestamp(Q), then the value of Q that For a set of transactions, there may be conflict-serializable Ti is producing was needed previously, and the system schedules that cannot be obtained through the tree protocol. assumed that that value would never be produced. Indeed, there are schedules possible under the two-phase Hence, the system rejects the write operation and rolls locking protocol that are not possible under the tree protocol, Ti back. and vice versa. Examples of such schedules are explored in the exercises. b. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q. Hence, the system rejects Timestamp-based Protocols this write operation and rolls Ti back. The locking protocols that we have described thus far determine the order between every pair of conflicting transactions at c. Otherwise, the system executes the write operation and execution time by the first lock that both members of the pair sets W-time- stamp(Q) to TS(Ti). TS(Ti) = R/W - request that involves incompatible modes. Another method for TS(Q). determining the serializability order is to select an ordering If a transaction Ti is rolled back by the concurrency-control among transactions in advance. The most common method for scheme as result of is-suance of either a read or write operation, doing so is to use a timestamp-ordering scheme. the system assigns it a new timestamp and restarts it. Timestamps To illustrate this protocol, we consider transactions TI4 and T15. With each transaction Ti in the system, we associate a unique Transaction TI4 displays the contents of accounts A and B: fixed timestamp, de-noted by TS(Ti). This timestamp is T14: read(B); assigned by the database system before the trans-action Ti starts read(A); execution. If a transaction Ti has been assigned timestamp display(A + B). 147

153 Transaction TI5 transfers $50 from account A to account B, and Recoverability alone can be ensured by tracking DATABASE then displays the uncommitted writes, and al-lowing a transaction Ti to contents of both: commit only after the commit of any transaction that wrote a value that Ti read. Commit dependencies. T15: read(B); B := B - 50; Validation-based Protocols MANAGEMENT In cases where a majority of transactions are read-only transac- write(B); tions, the rate of con-flicts among transactions may be low. read(A); Thus, many of these transactions, if executed without the A := A + 50; supervision of a concurrency-control scheme, would neverthe- write(A); less leave the system in a consistent state. A concurrency-control scheme imposes overhead of code execution and possible delay display(A + B). of transactions. It may be better to use an alterna-tive scheme In presenting schedules under the timestamp protocol, we shall that imposes less overhead. A difficulty in reducing the assume that a trans-action is assigned a timestamp immediately overhead is that we do not know in advance which transactions before its first instruction. Thus, in sched-ule 3 of Figure 43.3, will be involved in a conflict. To gain that knowledge, we need a TS(TI4) < TS(TI5), and the schedule is possible under the time- scheme for monitoring the system. stamp protocol. We assume that each transaction Ti executes in two or three We note that the preceding execution can also be produced by different phases in its lifetime, depending on whether it is a the two-phase lock-ing protocol. There are, however, schedules read-only or an update transaction. The phases are, in order, that are possible under the two-phase locking protocol, but are 1. Read phase. During this phase, the system executes not possible under the timestamp protocol, and vice versa transaction Ti. It reads the values of the various data items The timestamp-ordering protocol ensures conflict serializability. and stores them in variables local to Ti. It performs all write This is because conflicting operations are processed in operations on temporary local variables, without updates of timestamp order. the actual database. The protocol ensures freedom from deadlock, since no 2. Validation phase. Transaction Ti performs a validation test transaction ever waits. However, there is a possibility of to determine whe-ther it can copy to the database the starvation of long transactions if a sequence of conflicting temporary local variables that hold the results of write short transactions causes repeated restarting of the long operations without causing a violation of serializability. transaction. If 3. Write phase. If transaction Ti succeeds in validation (step 2), T14 T15 then the system applies the actual updates to the database. read (B) Otherwise, the system rolls back Ti. Read (B) Each transaction must go through the three phases in the order B: = B-50 shown. However, all three phases of concurrently executing Write (B) transactions can be interleaved. read (A) read (A) To perform the validation test; we need to know when the Display(A + B) various phases of trans-actions Ti took place. We shall, there- A : = A+50 fore, associate three different timestamps with transaction Ti: Write (A) 1. Start(Ti), the time when Ti started its execution. Display (A+B) 2. Validation(Ti), the time when Ti finished its read phase and Figure 43.13 Schedule 3. started its vali-dation phase. a transaction is found to be getting restarted repeatedly, 3. Finish(Ti), the time when Ti finished its write phase. conflicting transactions need to be temporarily blocked to enable We determine the serializability order by the timestamp- the transaction to finish. ordering technique, using the value of the timestamp The protocol can generate schedules that are not recoverable. Validation(Ti). Thus, the value TS(Ti) = Validation(Ti) and, if However, it can be extended to make the schedules recoverable, TS(Tj) < TS(Tk), then any produced schedule must be equiva- in one of several ways: lent to a serial schedule in which transaction Tj appears before Recoverability and cascadelessness can be ensured by performing transaction Tk. The reason we have chosen Validation(Ti), rather all writes together at the end of the transaction. The writes than Start(Ti), as the timestamp of transaction Ti is that we can must be atomic in the fol-lowing sense: While the writes are in expect faster response time provided that conflict rates among progress, no transaction is permitted to access any of the data transactions are indeed low. items that have been written. The validation test for transaction Tj requires that, for all Recoverability and cascadelessness can also be guaranteed by transactions Ti with TS(Ti) < TS(Tj), one of the following two using a limited form of locking, whereby reads of conditions must hold: uncommitted items are postponed until the transaction that updated the item commits 148

154 1. Finish(Ti) < Start(Tj) Since Ti completes its execution With each transaction Ti in the system, we associate a unique DATABASE before Tj started, the serializability order is indeed fixed timestamp, This timestamp is assigned by the maintained. database system before the trans-action Ti starts execution 2. The set of data items written by Ti does not intersect with The timestamp-ordering protocol ensures that any the set of data items read by T, and Ti completes its write conflicting read and write opera-tions are executed in MANAGEMENT phase before Tj starts its validation phase (Start(Tj) < timestamp order Finish(Ti) < Validation(Tj)). This condition ensures that Review Terms T14 T15 Locking mechanism read(B) Graph based protocol read(B) B := B 50 Timestamps read (A) Timestamp based protocol A := A + 50 read (A) Timestamp ordering protocols (validate) Validation protocol display (A + B) (validate) Students Activity write(B) 1. Define locks in a database system?Why it is advisable write (A) Figure 43.5 Schedule 5, a schedule produced by using valida- tion. the writes of Ti and Tj do not overlap. Since the writes of Ti do not affect the read of Tj, and since Tj cannot affect the read of Ti the serializability order is indeed maintained. As an illustration, consider again transactions T14 and T15. Suppose that TS(T14) < TS(T15). Then, the validation phase 2. Define graph based protocol? succeeds in the schedule 5 in Figure 43.5. Note that the writes to the actual variables are performed only after the validation phase of T15. Thus, T14 reads the old values of B and A, and this schedule is serializable. The validation scheme automatically guards against cascading rollbacks, since the actual writes take place only after the transaction issuing the write has committed. However, there is a possibility of starvation of long transactions, due to a sequence of conflicting short transactions that cause repeated restarts of the long transaction. To avoid starvation, conflicting transac- 3. Define Timestamps? tions must be temporarily blocked, to enable the long transaction to finish. This validation scheme is called the optimistic concurrency control scheme since transactions execute optimistically, assuming they will be able to finish execution and validate at the end. In contrast, locking and timestamp ordering are pessimistic in that they force a wait or a rollback whenever a conflict is detected, even though there is a chance that the schedule may be conflict serializable. 4. Define Timestamp based protocol? Points to Ponder When the lock manager receives an unlock message from a transaction, it deletes the record for that data item in the linked list corresponding to that transaction If we wish to develop protocols that are not two phase, we need additional information on how each transaction will access the database One method for determining the serializability order is to select an ordering among transactions in advance 149

155 5. Define Timestamp ordering protocol? DATABASE MANAGEMENT 6. Define validation based protocol? 150

156 DATABASE MANAGEMENT 151 Student Notes

157 DATABASE LESSON 44: DATABASE RECOVERY MANAGEMENT Lesson objectives when there is an error. Hence, the fail-stop assumption is a Failure reasonable one. Types of failures Disk failure. A disk block loses its content as a result of either a head crash or failure during a data transfer Storage structure operation. Copies of the data on other disks, or archival Storage Type backups on tertiary media, such as tapes, are used to recover Stable-Storage Implementation from the failure. Data Access To determine how the system should recover from failures, we Recovery and Atomicity need to identify the failure modes of those devices used for storing data. Next, we must consider how these failure modes Log-Based Recovery affect the contents of the database. We can then propose Checkpoints algorithms to ensure database consistency and transaction The database system must take actions in advance to ensure that atomicity despite failures. These algorithms, known as recovery the atomicity and durability properties of transactions as a algorithms, have two parts: computer system, like any other device, is subject to failure from 1. Actions taken during normal transaction processing to a variety of causes: disk crash, power outage, software error, a ensure that enough information exists to allow recovery fire in the machine room, even sabotage. In any failure, from failures. information may be lost. are preserved. An integral part of a 2. Actions taken after a failure to recover the database contents database system is a recovery scheme that can restore the to a state that ensures database consistency, transaction database to the consistent state that existed before the failure. atomicity, and durability. The recovery scheme must also provide high availability; that is, it must minimize the time for which the database is not usable Storage Structure after a crash. The various data items in the database may be stored and accessed in a number of different storage media. To understand Failure Classification how to ensure the atomicity and durability properties of a There are various types of failure that may occur in a system, transaction, we must gain a better under-standing of these each of which needs to be dealt with in a different manner. The storage media and their access methods. simplest type of failure is one that does not result in the loss of information in the system. The failures that are more difficult Storage Types tom deal with are those that result in loss of information. Storage media can be distinguished by their relative speed, Various types of failure are: capacity, and resilience to failure, and classified as volatile storage or nonvolatile stor-age. We review these terms, and introduce Transaction Failure another class of storage, called stable stor-age. There are two types of errors that may cause a transaction to fail: Volatile storage. Information residing in volatile storage Logical error. The transaction can no longer continue with its does not usually sur-vive system crashes. Examples of such normal ex-ecution because of some internal condition, such as storage are main memory and cache memory. Access to bad input, data not found, overflow, or resource limit exceeded. volatile storage is extremely fast, both because of the speed System error. The system has entered an undesirable state (for of the memory access itself, and because it is possible to example, deadlock), as a result of which a transaction cannot access any data item in volatile storage directly. continue with its nor-mal execution. The transaction, however, Nonvolatile storage. Information residing in nonvolatile can be executed at a later time. storage survives sys-tem crashes. Examples of such storage System crash There is a hardware malfunction, or a bug in the are disk and magnetic tapes. Disks are used for online database soft- ware or the operating system, that causes the loss storage, whereas tapes are used for archival storage.Both of the content of volatile storage, and brings transaction however, are subject to failure (for example, head crash), processing to a halt. The content of nonvolatile storage remains which may result in loss of information. At the current intact, and is not corrupted. state of technology, nonvolatile st-age is slower than volatile The assumption that hardware errors and bugs in the software storage by several orders of magnitude. This is because disk bring the system to a halt, but do not corrupt the nonvolatile and tape devices are electromechanical, rather than based en- storage contents, is known as the fail-stop assumption. Well- tirely on chips, as is volatile storage. In database systems, designed systems have numerous internal checks, at the disks are used for most nonvolatile storage. Other hardware and the software level, that bring the system to a halt nonvolatile media are normally used only for backup data. 152

158 Flash storage, though nonvolatile, has insuffi-cient capacity of remote backup, one of the blocks is local, whereas the other DATABASE for most database systems. is at a remote site. An output operation is executed as follows: Stable storage. Information residing in stable storage is 1. Write the information onto the first physical block. never lost (never should be taken with a grain of salt, since 2. When the first write completes successfully, write the same theoretically never cannot be guaranteed for example, it is information onto the second physical block. MANAGEMENT possible, although extremely unlikely, that a black hole may 3. The output is completed only after the second write envelop the earth and permanently destroy all data!). completes successfully. Although stable stor-age is theoretically impossible to obtain, it can be closely approximated by techniques that During recovery, the system examines each pair of physical make data loss extremely unlikely. blocks. If both are the same and no detectable error exists, then no further actions are necessary. (Recall that errors in a disk The distinctions among the various storage types are often less block, such as a partial write to the block, are detected by storing clear in practice than in our presentation. Certain systems a checksum with each block.) If the system detects an error in provide battery backup, so that some main memory can survive one block, then it replaces its content with the content of the system crashes and power failures. Alternative forms of non- other block. If both blocks contain no detectable error, but they volatile storage, such as optical media, provide an even higher differ in content, then the system replaces the content of the degree of reliability than do disks. first block with the value of the second. This recovery procedure Stable-Storage Implementation ensures that a write to stable storage either succeeds completely To implement stable storage, we need to replicate the needed (that is, updates all copies) or results in no change. information in sev-eral nonvolatile storage media (usually disk) The requirement of comparing every corresponding pair of with independent failure modes, and to update the information blocks during recovery is expensive to meet. We can reduce the in a controlled manner to ensure that failure during data transfer cost greatly by keeping track of block writes that are in progress, does not damage the needed information. using a small amount of nonvolatile RAM. On recovery, only RAID systems guarantee that the failure of a single disk (even blocks for which writes were in progress need to be compared. during data transfer) will not result in loss of data. The The protocols for writing out a block to a remote site are similar simplest and fastest form of RAID is the mirrored disk, which to the protocols for writing blocks to a mirrored disk system keeps two copies of each block, on separate disks. Other forms We can extend this procedure easily to allow the use of an of RAID offer lower costs, but at the expense of lower arbitrarily large number of copies of each block of stable performance. storage. Although a large number of copies reduces the RAID systems, however, cannot guard against data loss due to probability of a failure to even lower than two copies-do, it is disasters such as fires or flooding. Many systems store archival usually reasonable to simulate stable storage with only two backups of tapes off-site to guard against such disasters. copies. However, since tapes cannot be carried off-site continually, updates since the most recent time that tapes were carried off- Data Access site could be lost in such a disaster. More secure systems keep a The database system resides permanently on nonvolatile storage copy of each block of stable storage at a remote site, writing it (usually disks), and is partitioned into fixed-length storage units out over a computer network, in addition to storing the block called blocks. Blocks are the units of data transfer to and from on a local disk system. Since the blocks are output to a remote disk, and may contain several data items. We shall assume that system as and when they are output to local storage, once an no data item spans two or more blocks. This assumption is output operation is complete, the output is not lost, even in realistic for most data-processing applications, such as our the event of a disaster such as a fire or flood. We study such banking example. remote backup systems Transactions input information from the disk to main memory, In the remainder of this section, we discuss how storage media and then output the information back onto the disk. The input can be protected from failure during data transfer. Block transfer and output operations are done in block units. The blocks between memory and disk storage can result in residing on the disk are referred to as physical blocks; the blocks residing temporarily in main memory are referred to as buffer Successful completion. The transferred information blocks. The area of memory where blocks reside temporarily is arrived safely at its des-tination. called the disk buffer. Partial failure. A failure occurred in the midst of transfer, Block movements between disk and main memory are initiated and the destination block has incorrect information. through the fol-lowing two operations: Total failure. The failure occurred sufficiently early during 1. input(B) transfers the physical block B to main memory. the transfer that the destination block remains intact. 2. output(B) transfers the buffer block B to the disk, and We require that, if a data-transfer failure occurs, the system replaces the appropriate physical block there. detects it and invokes a recovery procedure to restore the block Each transaction Ti has a private work area in which copies of all to a consistent state. To do so, the system must maintain two the data items accessed and updated by Ti are kept. The system physical blocks for each logical database block; in the case of creates this work area when the transaction is initiated; the mirrored disks, both blocks are at the same location; in the case system removes it when the transaction either commits or 153

159 aborts. Each data item X kept in the work area of transaction Ti for this difficulty is that we have modified the database without DATABASE is denoted by Xi. Transaction Ti interacts with the database having assurance that the transaction will indeed commit. Our system by transferring data to and from its work area to the goal is to perform either all or no database modifications made system buffer. We transfer data by these two operations: by Ti. However, if Ti performed multiple database modifica- 1. read(X) assigns the value of data item X to the local variable tions, several output operations may be re-quired, and a failure may occur after some of these modifications have been made, MANAGEMENT Xi. It executes this operation as follows: but before all of them are made. a. If block B x on which X resides is not in main memory, it issues input(B x). To achieve our goal of atomicity, we must first output informa- tion describing the modifications to stable storage, without b. It assigns to Xi the value of X from the buffer block. modifying the database itself. As we shall see, this procedure 2. write(X) assigns the value of local variable Xi to data item X will allow us to output all the modifications made by a commit- in the buffer block. ted transaction, despite failures. There are two ways to perform It executes this operation as follows: such outputs; we study them in Sections 44.4 and 44.5. In these a. If block B x on which X resides is not in main two sections, we shall assume that transactions are executed memory, it issues input(Bx L serially; in other words, only a single transaction is active at a time. We shall describe how to handle concurrently executing b. It assigns the value of Xi to X in buffer B x. transactions later, in Section 44.6. Note that both operations may require the transfer of a block Log-Based Recovery from disk to main mem-ory. They do not, however, specifically The most widely used structure for recording database modifi- require the transfer of a block from main mem-ory to disk. cations is the log. The log is a sequence of log records, recording all the update activities in the database. There are several types A buffer block is eventually written out to the disk either of log records. An update log record describes a single data-base because the buffer man-ager needs the memory space for other write. It has these fields: purposes or because the database system wishes to reflect the change to B on the disk. We shall say that the database system Transaction identifier is the unique identifier of the transaction performs a force-output of buffer B if it issues an output(B). that performed the write operation. When a transaction needs to access a data item X for the first Data-item identifier is the unique identifier of the data item time, it must execute read (X). The system then performs all written. Typically, it is the location on disk of the data item. updates to X on Xi. After the transaction ac-cesses X for the Old value is the value of the data item prior to the write. final time, it must execute write(X) to reflect the change to X in New value is the value that the data item will have after the the database itself. write. The output(B x) operation for the buffer block B x on which X Other special log records exist to record significant events during resides ,does not need to take effect immediately after write(X) transaction pro-cessing, such as the start of a transaction and is executed, since the block Bx may contain other data items that the commit or abort of a transaction. We denote the various are still being accessed. Thus, the actual output may take place types of log records as: later. Notice that, if the system crashes after the write(X) . Transaction Ti has started. operation was executed but before output(Bx) was executed, the new value of X is never written to disk and, thus, is lost. . Transaction Ti has performed a write on data item Xj. Xj Recovery and Atomicity had value VI before the write, and will have value V2 after Consider again our simplified banking system and transaction the write. Ti that transfers $50 from account A to account B, with initial values of A aI)d B being $1000 and $2000, respectively. . Transaction Ti has committed. . . Suppose that a system crash has occurred during the execution Transaction Ti has aborted. of Ti, after output (B A) has taken place, but before output(B Whenever a transaction performs a write, it is essential that the B) was executed, where B A and B B denote the buffer blocks log record for that write be created before the database is on which A and B reside. Since the memory contents were lost, modified. Once a log record exists, we can output the modifica- we do not know the fate of the transaction; thus, we could tion to the database if that is desirable. Also, we have the ability invoke one of two possible recovery procedures: to undo a modification that has already been output to the Execute Ti. This procedure will result in the value of A database. We undo it by using the old-value field in log records. becoming $900, rather than $950. Thus, the system enters For log records to be useful for recovery from system and disk an inconsistent state. failures, the log must reside in stable storage. For now, we Do not execute Ti. The current system state has values of assume that every log record is written to the end of the log on $950 and $2000 for A and B, respectively. Thus, the system stable storage as soon as it is created. In Section 44.7, we shall enters an inconsistent state. see when it is safe to relax this requirement so as to reduce the overhead imposed by logging. Two techniques for using the log In either case, the database is left in an inconsistent state, and to ensure transaction atomicity despite failures. Observe that the thus this simple re-covery scheme does not work. The reason log contains a complete record of all database activity. As a 154

160 result, the volume of data stored in the log may become DATABASE unreasonably large. Deferred Database Modification The deferred-modification technique ensures transaction atomicity by recording all database modifications in the log, but MANAGEMENT deferring the execution of all write operations of a transaction until the transaction partially commits. Recall that a transaction Figure 44.2 Portion of the database log corresponding to To is said to be partially committed once the final action of the and Tl transaction has been ex-ecuted. The version of the deferred- appears in Figure 44.3. Note that the value of A is changed in modification technique that we describe in this section assumes the database only after the record has been placed that transactions are executed serially. in the log. When a transaction partially commits, the information on the Using the log, the system can handle any failure that results in log associated with the transaction is used in executing the the loss of informa-tion on volatile storage. The recovery deferred writes. If the system crashes before the transaction scheme uses the following recovery procedure: completes its execution, or if the transaction aborts, then the redo(Ti) sets the value of all data items updated by informa-tion on the log is simply ignored. transaction Ti to the new values. The execution of transaction Ti proceeds as follows. Before Ti The set of data items updated by Ti and their respective new starts its execution, a record is written to the log. A values can be- found in the log. write(X) operation by Ti results in the writing of a new record The redo operation must be independent; that is, executing it to the log. Finally, when Ti partially commits, a record is written to the log. characteristic is required if we are to guarantee correct behavior When transaction Ti partially commits, the records associated even if a failure occurs during the recovery process. with it in the log are used in executing the deferred writes. Since After a failure, the recovery subsystem consults the log to a failure may occur while this updating is taking place, we must determine which trans-actions need to be redone. Transaction Ti ensure that, before the start of these updates, all the log records needs to be redone if and only if the log contains both the are written out to stable storage. Once they have been written, record and the record . Thus, if the the actual updating takes place, and the transaction enters the system crashes after the transaction completes its execution, the committed state. recovery scheme uses the information in the log to restore the Observe that only the new value of the data item is required by system to a previous consistent state after the transaction had the deferred modification technique. Thus, we can simplify the completed. general update-log record struc-ture that we saw in the previous As an illustration, let us return to our banking example with section, by omitting the old-value field. transactions To and Tl executed one after the other in the order To illustrate, reconsider our simplified banking system. Let To To followed by Tl Figure 44.2 shows the log that results from be a transaction that transfers $50 from account A to account B: the complete execution of To and Tl Let us suppose that the To: read (A); Log Database A := A-50; write(A); read(B); B := B + 50; write(B). A = 950 Let Tl be a transaction that withdraws $100 from account C: B = 2050 T1: read(C); C := C - 100; write(C). Suppose that these transactions are executed serially, in the order C = 600 To followed by Tl and that the values of accounts A, B, and C Figure 44.3 State of the log and database corresponding. to To before the execution took place were $1000, $2000, and $700, and Tl respectively. There are various orders in which the actual outputs can take place to both the database system and the log as a result of the < T0,A,950> < T0,A,950> < T0,A,950> execution of To and Tl One such order 155

161 found in the log , the system performs the operation redo(Ti) DATABASE .In other words ,it restarts the recovery action from the begin- ning. Since redo writes values to the database independent of the values currentlyin the database, the result of a successful (a) (b) (c) second attempt at redo is the same as though redo has suc- Figure 44.4 Same log shown at three different times. ceeded the first time. MANAGEMENT System crashes before the completion of transaction, so that we Checkpoints can see how the recovery technique restores the database to a When a system failure occurs, we must consult the log to consistent state.Assume that the crash occurs just after the log determine those transac-tions that need to be redone and those record for the step. that need to be undone. In principle, we need to search the Write(B) entire log to determine this information. There are two major Of transaction t0 has been written to stable storage.The log at difficulties with this approach: the time of crash appears in figure 44.4(a) When the system 1. The search process is time consuming. comes back up, no redo actions need to be taken, since no 2. Most of the transaction that, need to be redone have already commit record appears in the log. The values of accounts A written their updates into the database. and B remain $1000 and $2000, respectively. The log records of the incomplete transaction To can be deleted from the log. To reduce these types of overhead, we introduce checkpoints. During execution, the system maintains the log. In addition, Now, let us assume the crash comes just after the log record for the system periodically performs checkpoints, which require the the step following sequence of actions to take place: write( C) 1. Output onto stable storage all log records currently residing of transaction Tl has been written to stable storage. In this case, in main memory. the log at the time of the crash is as in Figure 17.4b. When the 2. Output to the disk all modified buffer blocks. system comes back up, the operation redo(To) is performed, since the record 3. Output onto stable storage a log record . Transactions are not allowed to perform any update action, such as writing to any buffer blockor writing a log record , while a appears in the log on the disk. After this operation is executed, checkpoint is in progress. the values of accounts A and Bare $950 and $2050, respectively. The value of account Remains $700. As before, the log records Points to Ponder of the incomplete transaction Tl can be deleted from the log. There are various types of failure that may occur in a system, Finally, assume that a crash occurs just after the log record each of which needs to be dealt with in a different manner as transaction failure,system crash, disk failures. Storage media can be distinguished by their relative speed, is written to stable storage. The log at the time of this crash is capacity, and resilience to failure, and classified as volatile as in Figure 44.4c. When the system comes back up, two storage or nonvolatile stor-age commit records are in the log: one for To and one for Tl Therefore, the system must perform operations redo(To) and RAID systems guarantee that the failure of a single disk redo(T1), in the order in which their commit records appear in (even during data transfer) will not result in loss of data the log. After the system executes these operations, the values When a system failure occurs, we must consult the log to of accounts A, B, and Care $950, $2050, and $600, respectively. determine those transac-tions that need to be redone and Finally, let us consider a case in which a second system crash those that need to be undone. occurs during re-covery from the first crash. Some changes may Review Terms have been made to the database as a result of the redo opera- Failure tions, but all changes may not have been made. When the System comes up after the second crash, recovery proceeds Types of failures exactly as in the pr_.g examples. For each commit record Storage structure Storage Type Finally, let us consider a case in which a second system crash Stable-Storage Implementation occurs during re-covery from the first crash. Some changes may Data Access have been made to the database as a Recovery and Atomicity result of the redo operations, but all changes may not have Log-Based Recovery been made. When the System comes up after the second crash, Checkpoints recovery proceeds exactly as in the pr_.g examples. For each commit record 156

162 DATABASE Students Activity 1. Define database recovery? MANAGEMENT 2. What are the various kinds of failures? 3. Define log-based recovery? 4. Define checkpoints? 157

163 Student Notes 158 DATABASE MANAGEMENT

164 The lesson content has been compiled from various sources in public domain including but not limited to the internet for the convenience of the users. The university has no proprietary right on the same. ? Rai Technology University Campus Dhodballapur Nelmangala Road, SH -74, Off Highway 207, Dhodballapur Taluk, Bangalore - 561204 E-mail: [email protected] | Web:

Load More