Wednesday, 31 July 2019

SSL certificate and communication explained




SSL and SSL Certificates Explained For Beginners


Secure Sockets Layer (SSL) and Transport Layer security (TLS ) are protocols that provide secure communications over a computer network or link.
They are commonly used in web browsing and email.
In this tutorial we will look:
  • TLS and SSL
  • Public and Private keys
  • Why we need certificates and what they do
  • How to get a digital certificate and understand the different common certificate types.

What is TLS

TLS is based on SSL and was developed as a replacement in response to known vulnerabilities in SSLv3.
SSL is the term commonly used, and today usually refers to TLS.

Security Provided

SSL/TLS provides data encryption, data integrity and authentication.
This means that when using SSL/TLS you can be confident that
  • No one has read your message
  • No one has changed your message
  • You are communicating with the intended person (server)
When sending a message between two parties you have two problems that you need to address.
  • How do you know that no one has read the message?
  • How do you know that no one has changed the message?
The solutions to these problems are to:
  • Encrypt it.– This makes the content unreadable so that to anyone viewing the message it is just gibberish.
  • Sign it– This allows the recipient to be confident that it was you who sent the message, and that the message hasn’t been changed.
Both of these processes require the use of keys.
These keys are simply numbers (128 bit being common) that are then combined with the message using a particular method, commonly known as an algorithm- e.g. RSA, to either encrypt or sign the message.

Symmetrical Keys and Public and Private Keys

Almost all encryption methods in use today employ public and private keys.
These are considered much more secure than the old symmetrical key arrangement.
With a symmetrical key, a key is used to encrypt or sign the message, and the same key is used to decrypt the message.
This is the same as the keys (door, car keys) we deal with in everyday life.
The problem with this type of key arrangement is if you lose the key anyone who finds it can unlock your door.
With Public and Private keys, two keys are used that are mathematically related (they belong as a key pair), but are different.
This means a message encrypted with a public key cannot be decrypted with the same public key.
To decrypt the message you require the private key.
If this type of key arrangement were used with your car. Then you could lock the car, and leave the key in the lock as the same key cannot unlock the car.
This type of key arrangement is very secure and is used in all modern encryption/signature systems.

 Keys and SSL Certificates

SSL/TLS use public and private key system for data encryption and data Integrity.
Public keys can be made available to anyone, hence the term public.
Because of this there is a question of trust, specifically:
How do you know that a particular public key belongs to the person/entity that it claims.
For example, you receive a key claiming to belong to your bank.
How do you know that it does belong to your bank?
The answer is to use a digital certificate.
A certificate serves the same purpose as a passport does in everyday life.
A passport established a link between a photo and a person, and that link has been verified by atrusted authority (passport office).
A digital certificate provides a link between a public key and an entity (business,domain name etc) that has been verified (signed) by a trusted third party ( A certificate authority)
digital certificate provides a convenient way of distributing trusted public encryption keys.

Obtaining a Digital Certificate

You get a digital certificate from a recognized Certificate authority (CA). Just like you get a passport from a passport office.
In fact the procedure is very similar.
You fill out the appropriate forms add your public keys (they are just numbers) and send it/them to the certificate authority. (this is a certificate Request)
The certificate authority does some checks ( depends on authority), and sends you back the keys enclosed in a certificate.
The certificate is signed by the Issuing Certificate authority, and this it what guarantees the keys.
Now when someone wants your public keys, you send them the certificate, they verify the signatureon the certificate, and if it verifies, then they can trust your keys.

Example Usage

To illustrate we will look at a typical web browser and web server connection using SSL. (https).
This connection is used on the Internet to send email in Gmail etc and when doing online banking,shopping etc.
  1. Browser connects to server Using SSL (https)
  2. Server Responds with Server Certificate containing the public key of the web server.
  3. Browser verifies the certificate by checking the signature of the CA. To do this the CA certificate needs to be in the browser’s trusted store( See later)
  4. Browser uses this Public Key to agree a session key with the server.
  5. Web Browser and server encrypt data over the connection using the session key.

Database concepts summary

DML -  Data Manipulation Language -used to retrieve, store, modify, delete, insert and update data in database.
SELECT – Retrieves data from a table
INSERT –  Inserts data into a table
UPDATE – Updates existing data into a table
DELETE – Deletes all records from a table

DDL - Data Definition Language- used to create and modify the structure of database objects in database.
CREATE – Creates objects in the database
ALTER – Alters objects of the database
DROP – Deletes objects of the database
TRUNCATE – Deletes all records from a table and resets table identity to initial value.

DCL - Data Control Language- used to create roles, permissions, and referential integrity as well it is used to control access to database by securing it.
GRANT – Gives user’s access privileges to database
REVOKE – Withdraws user’s access privileges to database given with the GRANT command

TCL - Transactional Control Language -used to manage different transactions occurring within a database.
COMMIT – Saves work done in transactions
ROLLBACK – Restores database to original state since the last COMMIT command in transactions
SAVE TRANSACTION – Sets a savepoint within a transaction


DIFFERENCE BETWEEN TRUNCATE DELETE and DROP
TRUNCATE
  • TRUNCATE is a DDL command
  • TRUNCATE is executed using a table lock and whole table is locked for remove all records.
  • We cannot use Where clause with TRUNCATE.
  • TRUNCATE removes all rows from a table.
  • Minimal logging in transaction log, so it is performance wise faster.
  • TRUNCATE TABLE removes the data by deallocating the data pages used to store the table data and records only the page deallocations in the transaction log.
  • To use Truncate on a table you need at least ALTER permission on the table.
  • Truncate uses the less transaction space than Delete statement.
  • Truncate cannot be used with indexed views.
DELETE
  • DELETE is a DML command.
  • DELETE is executed using a row lock, each row in the table is locked for deletion.
  • We can use where clause with DELETE to filter & delete specific records.
  • The DELETE command is used to remove rows from a table based on WHERE condition.
  • It maintain the log, so it slower than TRUNCATE.
  • The DELETE statement removes rows one at a time and records an entry in the transaction log for each deleted row.
  • Identity of column keep DELETE retain the identity.
  • To use Delete you need DELETE permission on the table.
  • Delete uses the more transaction space than Truncate statement.
  • Delete can be used with indexed views.


DROP
  • The DROP command removes a table from the database.
  • All the tables' rows, indexes and privileges will also be removed.
  • No DML triggers will be fired.
  • The operation cannot be rolled back.
  • DROP and TRUNCATE are DDL commands, whereas DELETE is a DML command.
  • DELETE operations can be rolled back (undone), while DROP and TRUNCATE operations cannot be rolled back.


Anomalies in DBMS

There are three types of anomalies that occur when the database is not normalized. These are – Insertion, update and deletion anomaly. Let’s take an example to understand this.
Example: Suppose a manufacturing company stores the employee details in a table named employee that has four attributes: emp_id for storing employee’s id, emp_name for storing employee’s name, emp_address for storing employee’s address and emp_dept for storing the department details in which the employee works. At some point of time the table looks like this:
emp_id
emp_name
emp_address
emp_dept
101
Rick
Delhi
D001
101
Rick
Delhi
D002
123
Maggie
Agra
D890
166
Glenn
Chennai
D900
166
Glenn
Chennai
D004
The above table is not normalized. We will see the problems that we face when a table is not normalized.
Update anomaly: In the above table we have two rows for employee Rick as he belongs to two departments of the company. If we want to update the address of Rick then we have to update the same in two rows or the data will become inconsistent. If somehow, the correct address gets updated in one department but not in other then as per the database, Rick would be having two different addresses, which is not correct and would lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training and currently not assigned to any department then we would not be able to insert the data into the table if emp_dept field doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department D890 then deleting the rows that are having emp_dept as D890 would also delete the information of employee Maggie since she is assigned only to this department.
To overcome these anomalies we need to normalize the data. In the next section we will discuss about normalization.

Normalization

Here are the most commonly used normal forms:
  • First normal form(1NF)
  • Second normal form(2NF)
  • Third normal form(3NF)
  • Boyce & Codd normal form (BCNF)
A candidate key is a key that uniquely identifies rows in a table. Any of the identified candidate keys can be used as the table's primary key. Candidate keys that are not part of the primary key are called alternate keys. One can describe a candidate key as a super key that contains only the minimum number of columns necessary to determine uniqueness.Candidate Key – A Candidate Key can be any column or a combination of columns that can qualify as unique key in database. There can be multiple Candidate Keys in one table. Each Candidate Key can qualify as Primary Key.
Prime attributes are the attributes of the candidate key which defines the uniqueness (Eg: SSN number in an employee database)
A primary key is a column in a table whose values uniquely identify the rows in the table. The primary key is chosen from this list of candidates based on its perceived value to the business as an identifier.Primary Key – A Primary Key is a column or a combination of columns that uniquely identify a record. Only one Candidate Key can be Primary Key.
Primary Key:
  • There can only be one primary key in a table
  • In some DBMS it cannot be NULL - e.g. MySQL adds NOT NULL
  • Primary Key is a unique key identifier of the record
Unique Key:
  • Can be more than one unique key in one table
  • Unique key can have NULL values
  • It can be a candidate key
  • Unique key can be NULL and may not be unique


What are the different types of Normalization?
Ans: Different Types of Normalization are:
  • First Normal Form (1NF): A relation is said to be in 1NF only when all the entities of the table contain unique or atomic values.
  • Second Normal Form (2NF): A relation is said to be in 2NF only if it is in 1NF and all the non-key attribute of the table is fully dependent on the primary key.
  • Third Normal Form (3NF): A relation is said to be in 3NF only if it is in 2NF and every non-key attribute of the table is not transitively dependent on the primary key.
What is BCNF?
Ans: BCNF is Boyce Code Normal form. It is the higher version of 3Nf which does not have any multiple overlapping candidate keys.
What do you understand by Data Independence? What are its two types?
Ans: Data Independence refers to the ability to modify the schema definition in one level in such a way that it does not affect the schema definition in the next higher level.
The 2 types of Data Independence are:
  • Physical Data Independence: It modifies the schema at the physical level without affecting the schema at the conceptual level.
  • Logical Data Independence: It modifies the schema at the conceptual level without affecting or causing changes in the schema at the view level.
Define the relationship between ‘View’ and ‘Data Independence’.
Ans: View is a virtual table which does not have its data on its own rather the data is defined from one or more underlying base tables.
Views account for logical data independence as the growth and restructuring of base tables is not reflected in views.
Q #19) What are the advantages and disadvantages of views in the database?
Ans: Advantages of Views:
  • As there is no physical location where the data in views is stored, it generates output without wasting resources.
  • Data access is restricted as it does not allow commands like insertion, updation, and deletion.
Disadvantages of Views:
  • View becomes irrelevant if we drop a table related to that view.
  • More memory is occupied when the view is created for large tables.

Now, let’s say that we want to run a query to find all the details of any employees who are named ‘Abc’?
SELECT * FROM Employee WHERE Employee_Name = 'Abc'
What would happen without an index?
Database software would literally have to look at every single row in the Employee table to see if the Employee_Name for that row is ‘Abc’. And, because we want every row with the name ‘Abc’ inside it, we can not just stop looking once we find just one row with the name ‘Abc’, because there could be other rows with the name Abc. So, every row up until the last row must be searched – which means thousands of rows in this scenario will have to be examined by the database to find the rows with the name ‘Abc’. This is what is called a full table scan
How a database index can help performance
The whole point of having an index is to speed up search queries by essentially cutting down the number of records/rows in a table that need to be examined. An index is a data structure (most commonly a B- tree) that stores the values for a specific column in a table.
How does B-trees index work?
The reason B- trees are the most popular data structure for indexes is due to the fact that they are time efficient – because look-ups, deletions, and insertions can all be done in logarithmic time. And, another major reason B- trees are more commonly used is because the data that is stored inside the B- tree can be sorted. The RDBMS typically determines which data structure is actually used for an index. But, in some scenarios with certain RDBMS’s, you can actually specify which data structure you want your database to use when you create the index itself.
How does a hash table index work?
The reason hash indexes are used is because hash tables are extremely efficient when it comes to just looking up values. So, queries that compare for equality to a string can retrieve values very fast if they use a hash index.
For instance, the query we discussed earlier could benefit from a hash index created on the Employee_Name column. The way a hash index would work is that the column value will be the key into the hash table and the actual value mapped to that key would just be a pointer to the row data in the table. Since a hash table is basically an associative array, a typical entry would look something like “Abc => 0x28939″, where 0x28939 is a reference to the table row where Abc is stored in memory. Looking up a value like “Abc” in a hash table index and getting back a reference to the row in memory is obviously a lot faster than scanning the table to find all the rows with a value of “Abc” in the Employee_Name column.
The disadvantages of a hash index
Hash tables are not sorted data structures, and there are many types of queries which hash indexes can not even help with. For instance, suppose you want to find out all of the employees who are less than 40 years old. How could you do that with a hash table index? Well, it’s not possible because a hash table is only good for looking up key value pairs – which means queries that check for equality
What exactly is inside a database index? So, now you know that a database index is created on a column in a table, and that the index stores the values in that specific column. But, it is important to understand that a database index does not store the values in the other columns of the same table. For example, if we create an index on the Employee_Name column, this means that the Employee_Age and Employee_Address column values are not also stored in the index. If we did just store all the other columns in the index, then it would be just like creating another copy of the entire table – which would take up way too much space and would be very inefficient.
How does a database know when to use an index? When a query like “SELECT * FROM Employee WHERE Employee_Name = ‘Abc’ ” is run, the database will check to see if there is an index on the column(s) being queried. Assuming the Employee_Name column does have an index created on it, the database will have to decide whether it actually makes sense to use the index to find the values being searched – because there are some scenarios where it is actually less efficient to use the database index, and more efficient just to scan the entire table.
What is the cost of having a database index?
It takes up space – and the larger your table, the larger your index. Another performance hit with indexes is the fact that whenever you add, delete, or update rows in the corresponding table, the same operations will have to be done to your index. Remember that an index needs to contain the same up to the minute data as whatever is in the table column(s) that the index covers.
An index consists of column values(Eg: John) from one table, and that those values are stored in a data structure.
So now the database will use the index to find employees named John because the index will presumably be sorted alphabetically by the Users name. And, because it is sorted, it means searching for a name is a lot faster because all names starting with a “J” will be right next to each other in the index!