Tuesday, August 26, 2008

SQL Server INDEXING Tutorial

Relational databases like SQL Server use indexes to find data quickly when a query is processed. Creating and removing indexes from a database schema will rarely result in changes to an application's code; indexes operate 'behind the scenes' in support of the database engine. However, creating the proper index can drastically increase the performance of an application.

The SQL Server engine uses an index in much the same way a reader uses a book index. For example, one way to find all references to INSERT statements in a SQL book would be to begin on page one and scan each page of the book. We could mark each time we find the word INSERT until we reach the end of the book. This approach is pretty time consuming and laborious. Alternately, we can also use the index in the back of the book to find a page number for each occurrence of the INSERT statements. This approach produces the same results as above, but with tremendous savings in time.

When a SQL Server has no index to use for searching, the result is similar to the reader who looks at every page in a book to find a word: the SQL engine needs to visit every row in a table. In database terminology we call this behavior a table scan, or just scan.

A table scan is not always a problem, and is sometimes unavoidable. However, as a table grows to thousands of rows and then millions of rows and beyond, scans become correspondingly slower and more expensive.

Consider the following query on the Products table of the Northwind database. This query retrieves products in a specific price range.

SELECT ProductID, ProductName, UnitPrice
FROM Products WHERE (UnitPrice > 12.5) AND (UnitPrice < 14)

There is currently no index on the Product table to help this query, so the database engine performs a scan and examines each record to see if UnitPrice falls between 12.5 and 14. In the diagram below, the database search touches a total of 77 records to find just three matches.



Now imagine if we created an index, just like a book index, on the data in the UnitPrice column. Each index entry would contain a copy of the UnitPrice value for a row, and a reference (just like a page number) to the row where the value originated. SQL will sort these index entries into ascending order. The index will allow the database to quickly narrow in on the three rows to satisfy the query, and avoid scanning every row in the table.

Create An Index

Having a data connection in the Server Explorer view of Visual Studio.NET allows us to easily create new indexes:
• Navigate to the Products table of the Northwind database.
• Right click the table and select Design Table from the context menu.
• With the design screen in focus, click the Indexes/Keys item on the View menu of the IDE.

This should bring you to the following tabbed dialog box.


The dialog is currently displaying an existing index on the Products table: the PK_Products index. We will see later in this chapter how primary key fields are automatically indexed to enforce uniqueness in the key values.

• In the above dialog click on the New button, and in the Index name text box, replace the existing entry with IDX_UnitPrice.
• Beneath the text box is a control where we set the columns to index. Pull down the entry with ProductID and select the UnitPrice column instead.
• Leave all of the other options with default settings.
• Close the dialog and the table design view, making sure to save all of the changes when prompted to do so. The IDE will then issue the commands to create the new index.

We can create the same index using the following SQL. The command specifies the name of the index (IDX_UnitPrice), the table name (Products), and the column to index (UnitPrice).
CREATE INDEX [IDX_UnitPrice] ON Products (UnitPrice)
To verify that the index is created, use the following stored procedure to see a list of all indexes on the Products table:
EXEC sp_helpindex Customers

How It Works

The database takes the columns specified in a CREATE INDEX command and sorts the values into a special data structure known as a B-tree. A B-tree structure supports fast searches with a minimum amount of disk reads, allowing the database engine to quickly find the starting and stopping points for the query we are using.

Conceptually, we may think of an index as shown in the diagram below. On the left, each index entry contains the index key (UnitPrice). Each entry also includes a reference (which points) to the table rows which share that particular value and from which we can retrieve the required information.

 

Much like the index in the back of a book helps us to find keywords quickly, so the database is able to quickly narrow the number of records it must examine to a minimum by using the sorted list of UnitPrice values stored in the index. We have avoided a table scan to fetch the query results. Given this sketch of how indexes work, lets examine some of the scenarios where indexes offer a benefit

Taking Advantage of Indexes

The database engine can use indexes to boost performance in a number of different queries. Sometimes these performance improvements are dramatic. An important feature of SQL Server 2000 is a component known as the query optimizer. The query optimizer's job is to find the fastest and least resource intensive means of executing incoming queries. An important part of this job is selecting the best index or indexes to perform the task. In the following sections we will examine the types of queries with the best chance of benefiting from an index.

Searching For Records

The most obvious use for an index is in finding a record or set of records matching a WHERE clause. Indexes can aid queries looking for values inside of a range (as we demonstrated earlier), as well as queries looking for a specific value. By way of example, the following queries can all benefit from an index on UnitPrice:

DELETE FROM Products WHERE UnitPrice = 1

UPDATE Products SET Discontinued = 1 WHERE UnitPrice > 15

SELECT * FROM PRODUCTS WHERE UnitPrice BETWEEN 14 AND 16

Indexes work just as well when searching for a record in DELETE and UPDATE commands as they do for SELECT statements.

Sorting Records

When we ask for a sorted dataset, the database will try to find an index and avoid sorting the results during execution of the query. We control sorting of a dataset by specifying a field, or fields, in an ORDER BY clause, with the sort order as ASC (ascending) or DESC (descending). For example, the following query returns all products sorted by price:

SELECT * FROM Products ORDER BY UnitPrice ASC

With no index, the database will scan the Products table and sort the rows to process the query. However, the index we created on UnitPrice (IDX_UnitPrice) earlier provides the database with a presorted list of prices. The database can simply scan the index from the first entry to the last entry and retrieve the rows in sorted order.
The same index works equally well with the following query, simply by scanning the index in reverse.

SELECT * FROM Products ORDER BY UnitPrice DESC

Grouping Records

We can use a GROUP BY clause to group records and aggregate values, for example, counting the number of orders placed by a customer. To process a query with a GROUP BY clause, the database will often sort the results on the columns included in the GROUP BY. The following query counts the number of products at each price by grouping together records with the same UnitPrice value.
SELECT Count(*), UnitPrice FROM Products GROUP BY UnitPrice
The database can use the IDX_UnitPrice index to retrieve the prices in order. Since matching prices appear in consecutive index entries, the database is able count the number of products at each price quickly. Indexing a field used in a GROUP BY clause can often speed up a query.


Maintaining a Unique Column


Columns requiring unique values (such as primary key columns) must have a unique index applied. There are several methods available to create a unique index. Marking a column as a primary key will automatically create a unique index on the column. We can also create a unique index by checking the Create UNIQUE checkbox in the dialog shown earlier. The screen shot of the dialog displayed the index used to enforce the primary key of the Products table. In this case, the Create UNIQUE checkbox is disabled, since an index to enforce a primary key must be a unique index. However, creating new indexes not used to enforce primary keys will allow us to select the Create UNIQUE checkbox. We can also create a unique index using SQL with the following command:

CREATE UNIQUE INDEX IDX_ProductName On Products (ProductName)

The above SQL command will not allow any duplicate values in the ProductName column, and an index is the best tool for the database to use to enforce this rule. Each time an application adds or modifies a row in the table, the database needs to search all existing records to ensure none of values in the new data duplicate existing values. Indexes, as we should know by now, will improve this search time.


Index Drawbacks

There are tradeoffs to almost any feature in computer programming, and indexes are no exception. While indexes provide a substantial performance benefit to searches, there is also a downside to indexing. Let's talk about some of those drawbacks now.

Indexes and Disk Space

Indexes are stored on the disk, and the amount of space required will depend on the size of the table, and the number and types of columns used in the index. Disk space is generally cheap enough to trade for application performance, particularly when a database serves a large number of users. To see the space required for a table, use the sp_spaceused system stored procedure in a query window.

EXEC sp_spaceused Orders


Given a table name (Orders), the procedure will return the amount of space used by the data and all indexes associated with the table, like so:
Name    rows     reserved    data    index_size  unused      
------- -------- ----------- ------  ----------  -------
Orders  830      504 KB      160 KB  320 KB      24 KB
According to the output above, the table data uses 160 kilobytes, while the table indexes use twice as much, or 320 kilobytes. The ratio of index size to table size can vary greatly, depending on the columns, data types, and number of indexes on a table.

Indexes and Data Modification

Another downside to using an index is the performance implication on data modification statements. Any time a query modifies the data in a table (INSERT, UPDATE, or DELETE), the database needs to update all of the indexes where data has changed. As we discussed earlier, indexing can help the database during data modification statements by allowing the database to quickly locate the records to modify, however, we now caveat the discussion with the understanding that providing too many indexes to update can actually hurt the performance of data modifications.

This leads to a delicate balancing act when tuning the database for performance.
In decision support systems and data warehouses, where information is stored for reporting purposes, data remains relatively static and report generating queries outnumber data modification queries. In these types of environments, heavy indexing is commonplace in order to optimize the reports generated. In contrast, a database used for transaction processing will see many records added and updated. These types of databases will use fewer indexes to allow for higher throughput on inserts and updates.

Every application is unique, and finding the best indexes to use for a specific application usually requires some help from the optimization tools offered by many database vendors. SQL Server 2000 and Access include the Profiler and Index Tuning Wizard tools to help tweak performance.

Now we have enough information to understand why indexes are useful and where indexes are best applied. It is time now to look at the different options available when creating an index and then address some common rules of thumb to use when planning the indexes for your database.

Clustered Indexes

Earlier in the article we made an analogy between a database index and the index of a book. A book index stores words in order with a reference to the page numbers where the word is located. This type of index for a database is a nonclustered index; only the index key and a reference are stored. In contrast, a common analogy for a clustered index is a phone book. A phone book still sorts entries into alphabetical order. The difference is, once we find a name in a phone book, we have immediate access to the rest of the data for the name, such as the phone number and address.

For a clustered index, the database will sort the table's records according to the column (or columns) specified by the index. A clustered index contains all of the data for a table in the index, sorted by the index key, just like a phone book is sorted by name and contains all of the information for the person inline. The nonclustered indexes created earlier in the chapter contain only the index key and a reference to find the data, which is more like a book index. You can only create one clustered index on each table.

In the diagram below we have a search using a clustered index on the UnitPrice column of the Products table. Compare this diagram to the previous diagram with a regular index on UnitPrice. Although we are only showing three columns from the Products table, all of the columns are present and notice the rows are sorted into the order of the index, there is no reference to follow from the index back to the data.


A clustered index is the most important index you can apply to a table. If the database engine can use a clustered index during a query, the database does not need to follow references back to the rest of\ the data, as happens with a nonclustered index. The result is less work for the database, and consequently, better performance for a query using a clustered index.

To create a clustered index, simply select the Create As CLUSTERED checkbox in the dialog box we used at the beginning of the chapter. The SQL syntax for a clustered index simply adds a new keyword to the CREATE INDEX command, as shown below:

CREATE CLUSTERED INDEX IDX_SupplierID ON Products(SupplierID)

Most of the tables in the Northwind database already have a clustered index defined on a table. Since we can only have one clustered index per table, and the Products table already has a clustered index (PK_Products) on the primary key (ProductId), the above command should generate the following error:

Cannot create more than one clustered index on table 'Products'.
Drop the existing clustered index 'PK_Products' before creating another.
As a general rule of thumb, every table should have a clustered index. If you create only one index for a table, use a clustered index. Not only is a clustered index more efficient than other indexes for retrieval operations, a clustered index also helps the database efficiently manage the space required to store the table.

In SQL Server, creating a primary key constraint will automatically create a clustered index (if none exists) using the primary key column as the index key.
Sometimes it is better to use a unique nonclustered index on the primary key column, and place the clustered index on a column used by more queries. For example, if the majority of searches are for the price of a product instead of the primary key of a product, the clustered index could be more effective if used on the price field. A clustered index can also be a UNIQUE index.

A Disadvantage to Clustered Indexes

If we update a record and change the value of an indexed column in a clustered index, the database might need to move the entire row into a new position to keep the rows in sorted order. This behavior essentially turns an update query into a DELETE followed by an INSERT, with an obvious decrease in performance. A table's clustered index can often be found on the primary key or a foreign key column, because key values generally do not change once a record is inserted into the database.

Composite Indexes

A composite index is an index on two or more columns. Both clustered and nonclustered indexes can be composite indexes. Composite indexes are especially useful in two different circumstances. First, you can use a composite index to cover a query. Secondly, you can use a composite index to help match the search criteria of specific queries. We will go onto more detail and give examples of these two areas in the following sections

Covering Queries with an Index

Earlier in the article we discussed how an index, specifically a nonclustered index, contains only the key values and a reference to find the associated row of data. However, if the key value contains all of the information needed to process a query, the database never has to follow the reference and find the row; it can simply retrieve the information from the index and save processing time. This is always a benefit for clustered indexes.
As an example, consider the index we created on the Products table for UnitPrice. The database copied the values from the UnitPrice column and sorted them into an index. If we execute the following query, the database can retrieve all of the information for the query from the index itself.

SELECT UnitPrice FROM Products ORDER BY UnitPrice

We call these types of queries covered queries, because all of the columns requested in the output are contained in the index itself. A clustered index, if selected for use by the query optimizer, always covers a query, since it contains all of the data in a table.

For the following query, there are no covering indexes on the Products table.

SELECT ProductName, UnitPrice FROM Products ORDER BY UnitPrice

This is because although the database will use the index on UnitPrice to avoid sorting records, it will need to follow the reference in each index entry to find the associated row and retrieve the product name. By creating a composite index on two columns (ProductName and UnitPrice), we can cover this query with the new index.

Matching Complex Search Criteria

For another way to use composite indexes, let's take a look at the OrderDetails table of Northwind. There are two key values in the table (OrderID and ProductID); these are foreign keys, referencing the Orders and Products tables respectively. There is no column dedicated for use as a primary key; instead, the primary key is the combination of the columns OrderID and ProductID.
The primary key constraint on these columns will generate a composite index, which is unique of course. The command the database would use to create the index looks something like the following:

CREATE UNIQUE CLUSTERED INDEX PK_Order_Details
ON [Order Details] (OrderID, ProductID)

The order in which columns appear in a CREATE INDEX statement is significant. The primary sort order for this index is OrderID. When the OrderID is the same for two or more records, the database will sort this subset of records on ProductID.
The order of columns determines how useful the index is for a query. Consider the phone book sorted by last name then first name. The phone book makes it easy to find all of the listings with a last name of Smith, or all of the listings with a last name of Jones and a first name of Lisa, but it is difficult to find all listings with a first name of Gary without scanning the book page by page.
Likewise, the composite index on Order Details is useful in the following two queries:

SELECT * FROM [Order Details] WHERE OrderID = 11077

SELECT * FROM [Order Details] WHERE OrderID = 11077 AND ProductID = 13

However, the following query cannot take advantage of the index we created since ProductID is the second part of the index key, just like the first name field in a phone book.

SELECT * FROM [Order Details] WHERE ProductID = 13

In this case, ProductID is a primary key, however, so an index does exist on the ProductID column for the database to use for this query.
Suppose the following query is the most popular query executed by our application, and we decided we needed to tune the database to support it.

SELECT ProductName, UnitPrice FROM Products ORDER BY UnitPrice

We could create the following index to cover the query. Notice we have specified two columns for the index: UnitPrice and ProductName (making the index a composite index):

CREATE INDEX IX_UnitPrice_ProductName ON Products(UnitPrice, ProductName)

While covered queries can provide a performance benefit, remember there is a price to pay for each index we add to a table, and we can also never cover every query in a non-trivial application.

Additional Index Guidelines

Choosing the correct columns and types for an index is another important step in creating an effective index. In this section, we will talk about two main points, namely short index keys and selective indexes (we'll explain what selective indexes are in just a moment).

Keep Index Keys Short

The larger an index key is, the harder a database has to work to use the index. For instance, an integer key is smaller in size then a character field for holding 100 characters. In particular, keep clustered indexes as short as possible.

There are several approaches to keeping an index key short. First, try to limit the index to as few columns as possible. While composite indexes are useful and can sometimes optimize a query, they are also larger and cause more disk reads for the database. Secondly, try to choose a compact data type for an index column, based on the number of bytes required for each data type. Integer keys are small and easy for the database to compare. In contrast, strings require a character-by-character comparison.
As a rule of thumb, try to avoid using character columns in an index, particularly primary key indexes. Integer columns will always have an advantage over character fields in ability to boost the performance of a query.

Distinct Index Keys

The most effective indexes are the indexes with a small percentage of duplicated values. Think of having a phone book for a city where 75% of the population has the last name of Smith. A phone book in this area might be easier to use if the entries were sorted by the resident's first names instead. A good index will allow the database to disregard as many records as possible during a search.
An index with a high percentage of unique values is a selective index. Obviously, a unique index is the most selective index of all, because there are no duplicate values. SQL Server will track statistics for indexes and will know how selective each index is. The query optimizer utilizes these statistics when selecting the best index to use for a query.

Maintaining Indexes

In addition to creating an index, we'll need to view existing indexes, and sometimes delete or rename them. This is part of the ongoing maintenance cycle of a database as the schema changes, or even naming conventions change.

View Existing Indexes

A list of all indexes on a table is available in the dialog box we used to create an index. Click on the Selected index drop down control and scroll through the available indexes.
There is also a stored procedure named sp_helpindex. This stored procedure gives all of the indexes for a table, along with all of the relevant attributes. The only input parameter to the procedure is the name of the table, as shown below.

EXEC sp_helpindex Customers

Rename an Index

We can also rename any user created object with the sp_rename stored procedure, including indexes. The sp_rename procedure takes, at a minimum, the current name of the object and the new name for the object. For indexes, the current name must include the name of the table, a dot separator, and the name of the index, as shown below:

EXEC sp_rename 'Products.IX_UnitPrice', 'IX_Price'

This will change the name of the IX_UnitPrice index to IX_Price.

Delete an Index

It is a good idea to remove an index from the database if the index is not providing any benefit. For instance, if we know the queries in an application are no longer searching for records on a particular column, we can remove the index. Unneeded indexes only take up storage space and diminish the performance of modifications. You can remove most indexes with the Delete button on the index dialog box, which we saw earlier. The equivalent SQL command is shown below.
DROP Index Products.IX_Price

Again, we need to use the name of the table and the name of the index, with a dot separator. Some indexes are not so easy to drop, namely any index supporting a unique or primary key constraint. For example, the following command tries to drop the PK_Products index of the Products table.

DROP INDEX Products.PK_Products

Since the database uses PK_Products to enforce a primary key constraint on the Products table, the above command should produce the following error.
An explicit DROP INDEX is not allowed on index 'Products.PK_Products'.
It is being used for PRIMARY KEY constraint enforcement.

Removing a primary key constraint from a table is a redesign of the table, and requires careful thought. It makes sense to know the only way to achieve this task is to either drop the table and use a CREATE TABLE command to recreate the table without the index, or to use the ALTER TABLE command.

Sunday, August 17, 2008

Searching for Dates and Times in SQL Server ??

Before you can effectively query date/time (or temporal) data, you have to know something about how date/time values are stored. SQL Server supports two date/time data types: datetime and smalldatetime. The difference between the two is the amount of storage used. Datetime uses 8 bytes of storage, while smalldatetime uses only 4 bytes. For this reason, datetime can represent date/time values within a wider range and with more precision than smalldatetime. These differences are summarized in the table below.



TypeMinimumMaximumPrecision
datetimeJan 1, 1753
midnight
Dec 31, 9999
23:59:59.997
(0.003 seconds until midnight)
To the nearest
3.33 milliseconds
smalldatetimeJan 1, 1900
midnight
Jun 6, 2079
23:59
(1 minute until midnight)
To the nearest
minute


Both datetime and smalldatetime represent the date and time as a value that’s equal to the number of days in relationship to a base date. In SQL Server, that base date is midnight on January 1, 1900. As you can see in the table, the smalldatetime type can only represent dates from this base date on. In contrast, the datetime type can also represent dates that are before January 1, 1900. To do that, it stores those values as negative numbers.
To visualize how date/time values are stored, you can think of them as consisting of two parts. The integer portion represents the number of whole days since January 1, 1900. The fractional portion represents the fraction of a day that’s passed since midnight. For example, the date/time value representing noon on January 4, 1900 is stored as 3.5. In this case, 3 represents three full days since the base date and 0.5 represents one half of a day between midnight and noon. To see this, submit the following query:

SELECT CAST(CAST('1900-01-04 12:00' AS datetime) AS float)


Datetime and smalldatetime are like the floating-point data types, float and real, in that they’re approximate numerics. That means the value retrieved from SQL Server may be different from the value that was originally stored. For example, if you store the expression 10/3.0 in a column of data type float, you’ll retrieve a value 3.3333330000000001. Although this is a reasonable representation of ten thirds, it’s not exact since it’s rounded past the 6th digit. In fact, if you add three such values together, you get 9.9999990000000007, not 10. Of course, most programmers understand this as a rounding error. And it’s a persistent problem for all digital computers, not just those running SQL Server. Still, you need to be aware of it as you code search conditions. In contrast, when working with exact numeric data, the value retrieved from SQL Server is exactly the value that was originally stored. For example, if you store 10/3.0 in a column of data type int, it’s stored as 3 and retrieved as 3. In this case, SQL Server implicitly casts the result of the expression as a real value, 3.333333. Then, SQL Server implicitly casts 3.333333 as an integer because it’s being stored in a column of type int. Although this is still a rounding error, it occurs before the value is stored, not as a result of the physical limitations of computer storage. In other words, the error was introduced by using the wrong data type, not by the inherent limitation of the data type itself. Since the system always returns the same value as was stored, the data type is exact.
Now, to see how this affects date/time values, consider the date and time value for 8:00AM on January 4, 1900. As you saw above, noon on this day is stored as 3.5, or halfway through the fourth day. In contrast, 8:00AM is one third of the way through the day, so its representation will be approximate. To see this for yourself, submit the following query:

SELECT CAST(CAST('1900-01-04 08:00' AS datetime) AS float)

You’ll get the following result:
3.3333333333333335
But if you submit this query:

SELECT CAST(3.3333333 AS datetime), CAST(3.3333334 AS datetime)

you’ll get the following results:

1900-01-04 07:59:59.997             1900-01-04 08:00:00.003

As you can see, these three values are all quite close. In fact, they’re close enough to be considered 8:00AM for most applications. However, in a search condition based on a single value, such as:

WHERE (DTValue = '1900-01-04 08:00')

you’d only match those rows where the stored value exactly matches 3.3333333333333335.

SQL Server doesn’t provide data types for storing just the date or just the time. So if you store a date/time value without an explicit time, the fractional portion of the value is set to zero. This represents midnight as 00:00:00. Similarly, if you store a date/time value without an explicit date, the integer portion of the value is set to zero. This represents the date January 1, 1900. To see this, submit the following query:

SELECT CAST('1900-01-04' AS datetime), CAST('10:00' AS datetime)

which returns the following result:
1900-01-04 00:00:00.000              1900-01-01 10:00:00.000

Whether you can ignore the date or the time component when you query a date/time column depends on how the column has been designed and used.

Saturday, August 16, 2008

Views in SQL Server

A view is a virtual table that consists of columns from one or more tables. Though it is similar to a table, it is stored in the database. It is a query stored as an object. Hence, a view is an object that derives its data from one or more tables. These tables are referred to as base or underlying tables.

Once you have defined a view, you can reference it like any other table in a database.

A view serves as a security mechanism. This ensures that users are able to retrieve and modify only the data seen by them. Users cannot see or access the remaining data in the underlying tables. A view also serves as a mechanism to simplify query execution. Complex queries can be stored in the form as a view, and data from the view can be extracted using simple queries.

Example

Consider the Publishers table below. If you want users to see only two columns in the table, you can create a view called vwPublishers that will refer to the Publishers table and the two columns required. You can grant Permissions to users to use the view and revoke Permissions from the base Publishers table. This way, users will be able to view only the two columns referred to by the view. They will not be able to query on the Publishers table.

Publishers

Publd

PubName

City

State

Country

0736

New Moon Books

Boston

MA

USA

0877

Binnet & Hardly

Washington

DC

USA

1389

Algodata Infosystems

Berkeley

CA

USA

1622

Five Lakes Publishing

Chicago

IL

USA

VW Publishers

Publd

PubName

0736

New Moon Books

0877

Binnet & Hardly

1389

Algodata Infosystems

1622

Five Lakes Publishing

Views ensure the security of data by restricting access to the following data:

  • Specific rows of the tables.
  • Specific columns of the tables.
  • Specific rows and columns of the tables.
  • Rows fetched by using joins.
  • Statistical summary of data in a given tables.
  • Subsets of another view or a subset of views and tables.

Some common examples of views are:

  • A subset of rows or columns of a base table.
  • A union of two or more tables.
  • A join of two or more tables.
  • A statistical summary of base tables.
  • A subset of another view, or some combination of views and base table.
Creating Views

A view can be created by using the CREATE VIEW statement.

Syntax

CREATE VIEW view_name
[(column_name[,column_name]….)]
[WITH ENCRYPTION]
AS select_statement [WITH CHECK OPTION]

Where:

view_name specifies the name of the view and must follow the rules for identifiers.

column_name specifies the name of the column to be used in view. If the column_name option is not specified, then the view is created with the same columns as specified in the select_statement.

WITH ENCRYPTION encrypts the text for the view in the syscomments table.

AS specifies the actions that will be performed by the view.

select_statement specifies the SELECT Statement that defines a view. The view may use the data contained in other views and tables.

WITH CHECK OPTION forces the data modification statements to fulfill the criteria given in the SELECT statement defining the view. It also ensures that the data is visible after the modifications are made permanent.

The restrictions imposed on views are as follows:

  • A view can be created only in the current database.
  • The name of a view must follow the rules for identifiers and must not be the same as that of the base table.
  • A view can be created only if there is a SELECT permission on its base table.
  • A SELECT INTO statement cannot be used in view declaration statement.
  • A trigger or an index cannot be defined on a view.
  • The CREATE VIEW statement cannot be combined with other SQL statements in a single batch.

Example

CREATE VIEW vwCustomer
AS
SELECT CustomerId, Company Name, Phone
FROM Customers

Creates a view called vwCustomer. Note that the view is a query stored as an object. The data is derived from the columns of the base table Customers.

You use the view by querying the view like a table.

SELECT *FROM vwCUSTOMER

The output of the SELECT statement is:

CustomerId

Company Name

Phone

ALFKI

Alfreds Futterkiste

030-0074321

ANTON

Antonio Moreno Taqueria

(5)555-3932

(91 rows affected)

Getting Informations Related to Views

SQL Server stores information on the view in the following system tables:

  • SYSOBJECTS — stores the name of the view.
  • SYSCOLUMNS — stores the names of the columns defined in the view.
  • SYSDEPENDS — stores information on the view dependencies.
  • SYSCOMMENTS — stores the text of the view definition.

There are also certain system-stored procedures that help retrieve information on views. The sp_help system-stored procedure displays view-related information. It displays the view definition, provided the name of the view is given as its parameter.

Example

Sp_helptext vwCustomer

Displays the definition of the vwCustomer view.

Note

If a view is created with the WITH ENCRYPTION option, it cannot view the sp_helptext system-stored procedure.

Friday, August 15, 2008

SQL Server Stored Procedures A Small Tutorial

Stored procedures are extremely useful objects.  Not only do they store T/SQL scripts for later execution, but they also provide us with an extremely important security barrier between the user interface and the database.  The security barrier is used to prevent the users from needing SELECT, INSERT, UPDATE and/or DELETE rights directly to the database tables and views. 

This is done through what is called permissions chaining.  When a user has rights to execute a stored procedure they are given temporary rights to use the table objects within the procedures which are used by the table.
Creating stored procedures is very easy.  Take your Transact SQL code and put it below the CREATE PROCEDURE command, and end the batch.  Like all other database objects the name of the stored procedure must be unique within the schema (or owner for SQL 2000 and below).  As an example lets create a stored procedure which returns the names of all the tables in the current database.
 
CREATE PROCEDURE ShowTables AS
SELECT schema_name(schema_id), name
FROM sys.tables
GO

 
As you can see the basic syntax is very simple.  To run this stored procedure we simply run the stored procedure name.
 
exec ShowTables
 
You can add in input parameters to handle filtering, or which would need to be inserted into a table.  An input parameter is simply a variable which you set when you run the procedure.  You can access the value of the input parameter within the stored procedure as you would any other variable.  Let’s look at the same procedure but this time we want to filter the tables by the first letter.
 
CREATE PROCEDURE ShowTables
  @FilterChar NVARCHAR(2)
AS
SET @FilterChar = @FilterChar + ‘%’
SELECT schema_name(schema_id), name
FROM sys.tables
WHERE name LIKE @FilterChar
GO

 
In this example as you can see we take the input parameter, and add the % wild card, then use the variable to filter down the records to see only the records which start with the character we supply.  Running the stored procedure with an input parameter is just as easy.
 
exec ShowTables @FilterChar=N'C'
 
We can also use output parameters to get values back from the stored procedures.  Output parameters are used basically in the same way that input parameters are, however you add the OUTPUT keyword after the parameter.  Within the stored procedure simply set the output variable to the value you want it to return to the calling code.  This can be done anywhere within the stored procedure, as long as the variable still holds the value when the stored procedure has completed it’s execution the value will be returned to the calling code.  First lets look at the code to create the stored procedure.
 
 CREATE PROCEDURE ShowTables
  @FilterChar NVARCHAR(2),
  @RowCount INT OUTPUT
AS
SET @FilterChar = @FilterChar + ‘%’
SELECT schema_name(schema_id), name
FROM sys.tables
WHERE name LIKE @FilterChar
SET @RowCount = @@ROWCOUNT
GO

 
We run the stored procedure in much the same way we do with the input parameter.
 
DECLARE @RowCount INT
exec ShowTables @FilterChar=N’C', @RowCount=@RowCount OUTPUT
SELECT @RowCount
 
 
In this case we are simply returning the row count as a second record set, but you’ll get the basic idea.
As I wrote earlier you can add records to a table with the stored procedure.
 
CREATE PROCEDURE InsertTable
  @Id INT,
  @Value VARCHAR(20)
AS
INSERT INTO SomeTable
(Id, Value)
VALUE
(@Id, @Value)
GO

 
As you can see it’s a very basic method.  It’s a regular insert statement with the parameters passed to it.

Wednesday, August 13, 2008

SQL case statement : Using the CASE Statement in a SQL SELECT Clause

SQL Server provides a mechanism for returning different values in a SELECT clause based on Boolean conditions: the CASE statement. This statement resembles Visual Basics Select Case statement.

The SQL CASE statement has WHEN, THEN, and ELSE clauses along with an END terminator. The syntax is:


CASE [expression]
WHEN [value | Boolean expression] THEN [return value]
[ELSE [return value]]
END
The [expression] is optional and contains a table column or a variable. When you specify [expression] directly after the CASE, you must populate the [value] parameter in the WHEN clause:

DECLARE @TestVal int
SET @TestVal = 3

SELECT
CASE @TestVal
WHEN 1 THEN 'First'
WHEN 2 THEN 'Second'
WHEN 3 THEN 'Third'
ELSE 'Other'
END
SQL Server compares this value to the expression and when the values match, it returns the THEN clauses [return value]. If none of the WHEN clauses equates to true, SQL Server returns the [return value] in the optional ELSE clause. If the ELSE clause is omitted and no value is matched, NULL is returned.

If you dont specify [expression], you must include the [Boolean expression] in the WHEN clause. This can contain any valid Boolean expression SQL Server allows:


DECLARE @TestVal int
SET @TestVal = 5

SELECT
CASE
WHEN @TestVal <=3 THEN 'Top 3' ELSE 'Other' END

Tuesday, August 12, 2008

SQL JOINS TUTORIAL

This is just a basic overview to give you an idea the data a particular join will return to you. This is not a technical discussion - just concepts.

We will start with just an empty diagram:

The T1 circle represents all the records in table 1. The T2 circle represents all the records in table 2. Notice how there is a bit of overlap of the 2 circles in the middle. Simple right?

I will use red to signify the records that will be returned by a particular join.

INNER JOIN
An inner join only returns those records that have “matches” in both tables. So for every record returned in T1 - you will also get the record linked by the foreign key in T2. In programming logic - think in terms of AND.

OUTER JOIN
An outer join is the inverse of the inner join. It only returns those records not in T1 and T2. “Give me the records that DON’T have a match.” In programming logic - think in terms of NOT AND.

LEFT JOIN
A left join returns all the records in the “left” table (T1) whether they have a match in the right table or not.

If, however, they do have a match in the right table - give me the “matching” data from the right table as well. If not - fill in the holes with null.

It should be noted that the same thing is possible with a right join - most people just use a left one.

LEFT OUTER JOIN
A left outer join combines the ideas behind a left join and an outer join. Basically - if you use a left outer join you will get the records in the left table that DO NOT have a match in the right table.

Again it is noted that the same thing is possible with a right outer join - most people just use a left one.

Theta JOIN
A theta join is the Cartesian product of the 2 tables and not normally what people are looking for - but what they sometimes get by mistake. How many of us have written a join similar to this only to get way more then we were ever expecting.

SELECT t1.*, t2.*
FROM table1 t1, table2 t2
WHERE t1.id = 5;


So there you have the basic concepts of joins. Next time you need to use a join and have no clue what to do to get the data you need from the database, draw a picture. It may help you figure out what join to use.



SQL Interview Questions, Explanation of statements SELECT ,INSERT, LIKE etc...

How would you find out the total number of rows in a table?
Use SELECT COUNT(*) ... in query

How do you eliminate duplicate values in SELECT ?
Use SELECT DISTINCT ... in SQL query
How you insert records into a table
Using SQL INSERT statement

How do you delete record from a table ?
Using DELETE statement
Example : DELETE FROM EMP

How do you select a row using indexes?
Specify the indexed columns in the WHERE clause of query.

How do you find the maximum value in a column?
Use SELECT MAX(...) .. in query

How do you retrieve the first 5 characters of FIRSTNAME column of table EMP ?
SELECT SUBSTR(FIRSTNAME,1,5) FROM EMP

My SQL statement SELECT AVG(SALARY) FROM EMP yields inaccurate results. Why?
Because SALARY is not declared to have NULLs and the employees for whom the
salary is not known are also counted.

How do you concatenate the FIRSTNAME and LASTNAME from EMP table to give a complete name?
SELECT FIRSTNAME || ‘ ‘ || LASTNAME FROM EMP

What is UNION,UNION ALL in SQL?
UNION : eliminates duplicates
UNION ALL: retains duplicates
Both these are used to combine the results of different SELECT statements.

Suppose I have five SQL SELECT statements connected by UNION/UNION ALL, how many times
should I specify UNION to eliminate the duplicate rows?
Once.

In the WHERE clause what is BETWEEN and IN?
BETWEEN supplies a range of values while IN supplies a list of values.

Is BETWEEN inclusive of the range values specified?
Yes.

What is 'LIKE' used for in WHERE clause? What are the wildcard characters?
LIKE is used for partial string matches. ‘%’ ( for a string of any character ) 
and ‘_’ (for any single character ) are the two wild card characters.

When do you use a LIKE statement?
To do partial search e.g. to search employee by name, you need not specify 
the complete name; using LIKE, you can search for partial string matches.

Example SQL : SELECT EMPNO FROM EMP
WHERE EMPNAME LIKE 'RAMESH%'

% is used to represent remaining all characters in the name.
This query fetches all records contains RAMESH in six characters.

What do you accomplish by GROUP BY ... HAVING clause?
GROUP BY partitions the selected rows on the distinct values of the column on 
which you group by. HAVING selects GROUPs which match the criteria specified


Consider the employee table with column PROJECT nullable. How can you get a list
of employees who are not assigned to any project?
SQL  :  SELECT EMPNO 
FROM EMP
WHERE PROJECT IS null;


What are the large objects supported by oracle and db2?
Blob , Clob ( Binary Large Objects, Character Large Objects)

What's the difference between a primary key and a unique key?
Primary key wont allow nulls, unique key allow nulls.
Both Primary key and Unique key enforce the uniqueness of the column on which they are defined.

What is a join and explain different types of joins?
INNER JOIN
OUTER JOIN
LEFT OUTER JOIN
RIGHT OUTER JOIN
FULL OUTER JOIN
INNER JOIN

What is a self join?
Joining two instances of a same table. 
Sample SQL : SELECT A.EMPNAME , B.EMPNAME
FROM EMP A, EMP B
WHERE A.MGRID = B.EMPID

What is a transaction and ACID?
Transaction - A transaction is a logicl unint of work. All steps must be commited or rolled back.
ACID - Atomicity, Consistency, Isolation and Duralbility, these are properties of a transaction.


Materialized Query Tables in db2 ( This feature might not be available in oracle) ?
Materialized Query Tables or MQTs are also known as automatic summary 
tables. A materialized query table (MQT) is a table whose definition is based upon the result of a
query. The data that is contained in an MQT is derived from one or more tables on which the materialized
query table definition is based. MQT improve the query performance.

Sample SQL to creat MQT.

CREATE TABLE CUSTOMER_ORDER AS
(SELECT SUM(AMOUNT) AS TOTAL_SUM,
TRANS_DT,
STATUS
FROM DB2INST2.CUSTOMER_ORDER
WHERE TRANS_DT BETWEEN '1/1/2001' AND '12/31/2001'
GROUP BY TRANS_DT,
STATUS)
DATA INITIALLY DEFERRED REFRESH DEFERRED;