Tuesday, July 31, 2007

Reading Excel Data From C#.NET: Problem With Data Types

Problem:
While retrieving data from Excel using any scripting language like ASP.NET
the data is retrieved from the Excel file, the data does not retrieve the correct/exact data type. The data type for each column is determined based on the values (scanning of rows by excel driver) of the first eight or 16 rows.

The problem is due to the fact that Excel Driver scans the first eight or 16 rows, the format of the data (for each particular column) in the first eight rows should be the same for all the rows. If the data in the first eight rows is numeric or numbers and the next few rows is alphanumeric (characters and numbers) then these few rows would not be retrieved or not retrieved in the correct datatype.

Eg: Say your database field or column (say weight) that can have values like

1) 900,
2) 900kgs,
3) 900pounds.

Then, if the first rows has values in numbers "900" and the following row(s) has values in alphanumeric (say 900kgs, 900pounds etc.,) then,

Excel cannot retrieve these rows since the values are in alphanumeric and the Excel driver has already scanned the first row and identified the values might be in numeric so it will only read the nummeric data in that field.

Solution:

Connection String:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=FilePath;Extended Properties="Excel 8.0;HDR=Yes;IMEX=1";


1) "HDR=Yes;" indicates that the first row contains columnnames, not data. "HDR=No;" indicates the opposite.

2) "IMEX=1;" tells the driver to always read "intermixed" (numbers, dates, strings etc) data columns as text. Note that this option might affect excel sheet write access negative.

SQL syntax "SELECT * FROM [sheet1$]". I.e. excel worksheet name followed by a "$" and wrapped in "[" "]" brackets.

Important:

Check out the [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel] located registry REG_DWORD "TypeGuessRows".

That's the key to not letting Excel use only the first 8 rows to guess the columns data type. Set this value to 0 to scan all rows. This might hurt performance.

If the Excel workbook is protected by a password, you cannot open it for data access, even by supplying the correct password with your connection string. If you try, you receive the following error message: "Could not decrypt file."

2 comments: