DML - Data Manipulation Language -used to retrieve, store, modify, delete, insert and update data in database.
SELECT – Retrieves data from a table
INSERT – Inserts data into a table
UPDATE – Updates existing data into a table
DELETE – Deletes all records from a table
DDL - Data Definition Language- used to create and modify the structure of database objects in database.
CREATE – Creates objects in the database
ALTER – Alters objects of the database
DROP – Deletes objects of the database
TRUNCATE – Deletes all records from a table and resets table identity to initial value.
DCL - Data Control Language- used to create roles, permissions, and referential integrity as well it is used to control access to database by securing it.
GRANT – Gives user’s access privileges to database
REVOKE – Withdraws user’s access privileges to database given with the GRANT command
TCL - Transactional Control Language -used to manage different transactions occurring within a database.
COMMIT – Saves work done in transactions
ROLLBACK – Restores database to original state since the last COMMIT command in transactions
SAVE TRANSACTION – Sets a savepoint within a transaction
DIFFERENCE BETWEEN TRUNCATE DELETE and DROP
TRUNCATE
TRUNCATE is a DDL command
TRUNCATE is executed using a table lock and whole table is locked for remove all records.
We cannot use Where clause with TRUNCATE.
TRUNCATE removes all rows from a table.
Minimal logging in transaction log, so it is performance wise faster.
TRUNCATE TABLE removes the data by deallocating the data pages used to store the table data and records only the page deallocations in the transaction log.
To use Truncate on a table you need at least ALTER permission on the table.
Truncate uses the less transaction space than Delete statement.
Truncate cannot be used with indexed views.
DELETE
DELETE is a DML command.
DELETE is executed using a row lock, each row in the table is locked for deletion.
We can use where clause with DELETE to filter & delete specific records.
The DELETE command is used to remove rows from a table based on WHERE condition.
It maintain the log, so it slower than TRUNCATE.
The DELETE statement removes rows one at a time and records an entry in the transaction log for each deleted row.
Identity of column keep DELETE retain the identity.
To use Delete you need DELETE permission on the table.
Delete uses the more transaction space than Truncate statement.
Delete can be used with indexed views.
DROP
The DROP command removes a table from the database.
All the tables' rows, indexes and privileges will also be removed.
No DML triggers will be fired.
The operation cannot be rolled back.
DROP and TRUNCATE are DDL commands, whereas DELETE is a DML command.
DELETE operations can be rolled back (undone), while DROP and TRUNCATE operations cannot be rolled back.
Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These are – Insertion, update and deletion anomaly. Let’s take an example to understand this.
Example: Suppose a manufacturing company stores the employee details in a table named employee that has four attributes: emp_id for storing employee’s id, emp_name for storing employee’s name, emp_address for storing employee’s address and emp_dept for storing the department details in which the employee works. At some point of time the table looks like this:
emp_id
|
emp_name
|
emp_address
|
emp_dept
|
101
|
Rick
|
Delhi
|
D001
|
101
|
Rick
|
Delhi
|
D002
|
123
|
Maggie
|
Agra
|
D890
|
166
|
Glenn
|
Chennai
|
D900
|
166
|
Glenn
|
Chennai
|
D004
|
The above table is not normalized. We will see the problems that we face when a table is not normalized.
Update anomaly: In the above table we have two rows for employee Rick as he belongs to two departments of the company. If we want to update the address of Rick then we have to update the same in two rows or the data will become inconsistent. If somehow, the correct address gets updated in one department but not in other then as per the database, Rick would be having two different addresses, which is not correct and would lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training and currently not assigned to any department then we would not be able to insert the data into the table if emp_dept field doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department D890 then deleting the rows that are having emp_dept as D890 would also delete the information of employee Maggie since she is assigned only to this department.
To overcome these anomalies we need to normalize the data. In the next section we will discuss about normalization.
Normalization
Here are the most commonly used normal forms:
A candidate key is a key that uniquely identifies rows in a table. Any of the identified candidate keys can be used as the table's primary key. Candidate keys that are not part of the primary key are called alternate keys. One can describe a candidate key as a super key that contains only the minimum number of columns necessary to determine uniqueness.Candidate Key – A Candidate Key can be any column or a combination of columns that can qualify as unique key in database. There can be multiple Candidate Keys in one table. Each Candidate Key can qualify as Primary Key.
Prime attributes are the attributes of the candidate key which defines the uniqueness (Eg: SSN number in an employee database)
A primary key is a column in a table whose values uniquely identify the rows in the table. The primary key is chosen from this list of candidates based on its perceived value to the business as an identifier.Primary Key – A Primary Key is a column or a combination of columns that uniquely identify a record. Only one Candidate Key can be Primary Key.
Primary Key:
There can only be one primary key in a table
In some DBMS it cannot be NULL - e.g. MySQL adds NOT NULL
Primary Key is a unique key identifier of the record
Unique Key:
Can be more than one unique key in one table
Unique key can have NULL values
It can be a candidate key
Unique key can be NULL and may not be unique
What are the different types of Normalization?
Ans: Different Types of Normalization are:
First Normal Form (1NF): A relation is said to be in 1NF only when all the entities of the table contain unique or atomic values.
Second Normal Form (2NF): A relation is said to be in 2NF only if it is in 1NF and all the non-key attribute of the table is fully dependent on the primary key.
Third Normal Form (3NF): A relation is said to be in 3NF only if it is in 2NF and every non-key attribute of the table is not transitively dependent on the primary key.
What is BCNF?
Ans: BCNF is Boyce Code Normal form. It is the higher version of 3Nf which does not have any multiple overlapping candidate keys.
What do you understand by Data Independence? What are its two types?
Ans: Data Independence refers to the ability to modify the schema definition in one level in such a way that it does not affect the schema definition in the next higher level.
The 2 types of Data Independence are:
Physical Data Independence: It modifies the schema at the physical level without affecting the schema at the conceptual level.
Logical Data Independence: It modifies the schema at the conceptual level without affecting or causing changes in the schema at the view level.
Define the relationship between ‘View’ and ‘Data Independence’.
Ans: View is a virtual table which does not have its data on its own rather the data is defined from one or more underlying base tables.
Views account for logical data independence as the growth and restructuring of base tables is not reflected in views.
Q #19) What are the advantages and disadvantages of views in the database?
Ans: Advantages of Views:
As there is no physical location where the data in views is stored, it generates output without wasting resources.
Data access is restricted as it does not allow commands like insertion, updation, and deletion.
Disadvantages of Views:
Now, let’s say that we want to run a query to find all the details of any employees who are named ‘Abc’?
SELECT * FROM Employee WHERE Employee_Name = 'Abc'
What would happen without an index?
Database software would literally have to look at every single row in the Employee table to see if the Employee_Name for that row is ‘Abc’. And, because we want every row with the name ‘Abc’ inside it, we can not just stop looking once we find just one row with the name ‘Abc’, because there could be other rows with the name Abc. So, every row up until the last row must be searched – which means thousands of rows in this scenario will have to be examined by the database to find the rows with the name ‘Abc’. This is what is called a full table scan
How a database index can help performance
The whole point of having an index is to speed up search queries by essentially cutting down the number of records/rows in a table that need to be examined. An index is a data structure (most commonly a B- tree) that stores the values for a specific column in a table.
How does B-trees index work?
The reason B- trees are the most popular data structure for indexes is due to the fact that they are time efficient – because look-ups, deletions, and insertions can all be done in logarithmic time. And, another major reason B- trees are more commonly used is because the data that is stored inside the B- tree can be sorted. The RDBMS typically determines which data structure is actually used for an index. But, in some scenarios with certain RDBMS’s, you can actually specify which data structure you want your database to use when you create the index itself.
How does a hash table index work?
The reason hash indexes are used is because hash tables are extremely efficient when it comes to just looking up values. So, queries that compare for equality to a string can retrieve values very fast if they use a hash index.
For instance, the query we discussed earlier could benefit from a hash index created on the Employee_Name column. The way a hash index would work is that the column value will be the key into the hash table and the actual value mapped to that key would just be a pointer to the row data in the table. Since a hash table is basically an associative array, a typical entry would look something like “Abc => 0x28939″, where 0x28939 is a reference to the table row where Abc is stored in memory. Looking up a value like “Abc” in a hash table index and getting back a reference to the row in memory is obviously a lot faster than scanning the table to find all the rows with a value of “Abc” in the Employee_Name column.
The disadvantages of a hash index
Hash tables are not sorted data structures, and there are many types of queries which hash indexes can not even help with. For instance, suppose you want to find out all of the employees who are less than 40 years old. How could you do that with a hash table index? Well, it’s not possible because a hash table is only good for looking up key value pairs – which means queries that check for equality
What exactly is inside a database index? So, now you know that a database index is created on a column in a table, and that the index stores the values in that specific column. But, it is important to understand that a database index does not store the values in the other columns of the same table. For example, if we create an index on the Employee_Name column, this means that the Employee_Age and Employee_Address column values are not also stored in the index. If we did just store all the other columns in the index, then it would be just like creating another copy of the entire table – which would take up way too much space and would be very inefficient.
How does a database know when to use an index? When a query like “SELECT * FROM Employee WHERE Employee_Name = ‘Abc’ ” is run, the database will check to see if there is an index on the column(s) being queried. Assuming the Employee_Name column does have an index created on it, the database will have to decide whether it actually makes sense to use the index to find the values being searched – because there are some scenarios where it is actually less efficient to use the database index, and more efficient just to scan the entire table.
What is the cost of having a database index?
It takes up space – and the larger your table, the larger your index. Another performance hit with indexes is the fact that whenever you add, delete, or update rows in the corresponding table, the same operations will have to be done to your index. Remember that an index needs to contain the same up to the minute data as whatever is in the table column(s) that the index covers.
An index consists of column values(Eg: John) from one table, and that those values are stored in a data structure.
So now the database will use the index to find employees named John because the index will presumably be sorted alphabetically by the Users name. And, because it is sorted, it means searching for a name is a lot faster because all names starting with a “J” will be right next to each other in the index!