Index pages make it possible to directly access any row in a table. Remember that Indexes are not required. You can query and manipulate data without an index. However, data access is considerably faster when appropriate indexes are available (A good database engineer takes pride in efficient access structures)
Every column in a table can be indexed – but a decision has to be made keeping in view the additional overhead that index maintenance can present.
As a rule, I create indexes on the following:
- Primary keys
- Foreign keys /keys used in joins
- Columns that occur frequently in SQL ‘Where’ clauses
- Columns in SQL ‘order by’ or ‘group by’ clauses
Types of Indexes – Clustered and Non clustered
A clustered index physically orders rows in a table. So, you can only have one clustered index on a table.
Non clustered indexes use storage location information in index pages to navigate to data pages (if clustered indexes exist). If a non clustered index is built on top of a ‘heap’ (a table without a clustered index), internal rowID’s are used to point to the data on disk.
Only if you frequently process ranges of values – for example: WHERE key BETWEEN 10 AND 10000 and/or sort on key value would a clustered index on that value be worthwhile.
Note that in MySQL, the column(s) that you define as primary key automatically becomes the clustered index. So, if you have a surrogate key (based on an autonumber) on a table, the DB engine automatically creates a clustered index based on that column. Since you can only have one clustered index on a table, this kinda sucks!
In MS Sql server on the other hand, you can have a table that has a non-clustered index on a primary key and a clustered index on a field that is involved in range queries/sort (rock on!)
Key values in a clustered index MUST BE UNIQUE. If the Unique keyword is not specified explicitly, uniqueness is enforced by an internal identifier (that is inaccessible to the user).
Do not use clustered indexes if you don’t have to. When adding a new row to a full ‘data’ page, the RDBMS system does a ‘page split’ by moving approximately half the rows to a new page to make room. Page links in the index and data pages then need to be updated to maintain the physical sequence of records.
Page splits never need to occur in heaps, as pages are not linked (by definition, the pages are not in any order)
An index that is created on more than one column on a table is called a composite index. An index on (column1, column2) is NOT the same as an index on (column2, column1). Define the most unique/most selective column first.
Composite indexes are useful for tables with multiple column keys. For example, in a telephone directory, a composite index on (last name, firstname) can speed up searches.
Note that if the WHERE clause of a query references a lower order column of the composite index, it must also reference ALL higher order columns defined in the index for the query optimizer to use the composite index.
For example, if I have a composite index on (last name, first name) on Employee_Table,
- Select Phone from Employee_Table where Last_Name=’blah’ (Composite index is used)
- Select Phone from Employee_Table where First_Name=’blah’ (Composite index is NOT used)
- Select Phone from Employee_Table where Last_Name=’blah1’ AND First_Name=’blah2’ (Composite index is used)