C# Tutorial - Using The Built In OLEDB CSV Parser

Skill

C# Tutorial - Using The Built In OLEDB CSV Parser

Posted in:

So, a while ago, one of the other writers here wrote a small tutorial on parsing simple CSV files in C#. It mostly just showed off the string split method, and only worked on really simple CSV files - no quoted fields, etc. Well, we got a comment asking about that, so today I sat down thinking I would write up a more robust parser. But as I read through the RFC that describes CSV files, I thought to myself, am I suffering from NIH (not invented here) syndrome? Do I really need to write a full CSV parser?

And, as you might expect, the answer is no. Not only did I not need to write a parser, I found that there is one that is built into OLEDB subsystem of Windows! And it actually takes fewer lines of code to use this built in parser than it does to do the simple string split algorithm that was in the previous tutorial (and it feels a lot nicer too). I was originally planning on this being a decently long tutorial, when I thought I would write my own parser - but you are actually in for a really short and simple one today. Less for me to write, less for you to read, and more functionality to boot!

So, without further ado, I think we can jump straight into the code:

using System;
using System.Data;
using System.IO; //not used by default
using System.Data.OleDb; //not used by default

namespace CSVParserExample
{
  class CSVParser
  {
    public static DataTable ParseCSV(string path)
    {
      if (!File.Exists(path))
        return null;

      string full = Path.GetFullPath(path);
      string file = Path.GetFileName(full);
      string dir = Path.GetDirectoryName(full);

      //create the "database" connection string
      string connString = "Provider=Microsoft.Jet.OLEDB.4.0;"
        + "Data Source=\"" + dir + "\\\";"
        + "Extended Properties=\"text;HDR=No;FMT=Delimited\"";

      //create the database query
      string query = "SELECT * FROM " + file;

      //create a DataTable to hold the query results
      DataTable dTable = new DataTable();

      //create an OleDbDataAdapter to execute the query
      OleDbDataAdapter dAdapter = new OleDbDataAdapter(query, connString);

      try
      {
        //fill the DataTable
        dAdapter.Fill(dTable);
      }
      catch (InvalidOperationException /*e*/)
      { }

      dAdapter.Dispose();

      return dTable;
    }
  }
}

So, first off, you will need to add the namespaces System.IO and System.Data.OleDb. The first we need because we will be doing some path manipulation, and the second gives us access to what we will need to do CSV parsing. I've created a nice static function here that takes a path to a CSV file and returns the contents of the file in a DataTable (which is really easy to view and manipulate).

The weird thing about all of this is that the CSV file gets treated as a database table. We need to create a connection string with the Jet OLEDB provider, and you set the Data Source to be the directory that contains the CSV file. Under extended properties, 'text' means that we are parsing a text file (as opposed to, say, an Excel file), the HDR property can be set to 'Yes' (the first row in the CSV files is header information) or 'No' (there is no header row), and setting the FMT property set to 'Delimited' essentially says that we will be working with a comma separated value file. You can also set FMT to 'FixedLength', for files where the fields are fixed length - but that wouldn't be a CSV file anymore, would it?

The next part to do is create the actual query. In this case, we want everything, so we have a "SELECT *". What are we selecting from? Well, in this somewhat twisted worldview, the directory is the database, so the file is the table we are selecting from.

Now we are into normal OLEDB territory - we create a DataTable that we will be filling with results, and we create a OleDbDataAdapter to actually execute the query. Then (inside of a try block, because it can throw an exception) we fill the data table with the results of the query. Afterwords, we clean up after ourselves by disposing the OleDbDataAdapter, and we return the now filled data table.

And using the now filled data table is extremely simple - we actually talk about it here and here.

And there you go! I'm not sure how parsing CSV could be much easier. If you would like the Visual Studio project I used to test all this, you can grab it here. And, as always, if you have any questions or comments, leave them below.

Andy
07/10/2008 - 14:27

Great stuff! I would never have thought of that...

reply

Tim Drew
07/21/2008 - 07:51

When I try the above code I receive an exception "Syntax error in FROM clause."
Excel can read the file with no questions or errors.
The variable 'query' is set as follows "SELECT * FROM hba-info.csv". The exception occurs when executing dAdapter.Fill(dTable);

reply

PochoCL
10/04/2008 - 12:23

if you have “Syntax error in FROM clause.”...
is beacause u must to clear the whitespaces in the filename.

reply

Slicc
10/06/2008 - 02:07

Also, putting sqaure brackets around the file name in the FROM clause of the select string may fix the Syntax error in FROM clause problem.

reply

Anonymous
02/23/2009 - 05:11

This worked for me thanks.

reply

DigitalDan3
10/14/2008 - 13:14

Thanks for the great article. To answer Tim Drews question and I came up against the same thing. Pulled my hair out for 3 hours. If the FileName contains any "-" you need to Square Bracket the file name.

“SELECT * FROM [hba-info.csv]”

reply

Brian
01/12/2010 - 16:20

Thanks, DigitalDan3!

I was struggling with this.

reply

Tina
07/29/2008 - 03:44

Nice article.

I have found that there can be issues with some foreign characters example the file contained äüö and then the data table held äüö

But very handy for file with non foreign characters.

Thank you.

reply

Mike Bowman
07/31/2008 - 07:33

Is there a generalization that allows reading from delimited files that use a separator other than the comma?

reply

Mike449
07/31/2008 - 08:25

Is there any provision for using a delimiter other than a comma?

reply

Dave Whiteford
08/19/2008 - 09:55

FMT=Delimited(,)

Substitute that in, in place of the existing parameter, and change the comma to your delimiter.

reply

brawtaman
02/07/2010 - 11:49

Couldn't get this to work.

reply

Dave Whiteford
08/19/2008 - 09:50

Great writeup, that connection string has saved me a lot of time migrating a legacy system record file which I managed to get to CSV format.

reply

Karl Rose
08/20/2008 - 14:32

getting a oleDBException with Cannot update, database or object is read-only.

on line Adapter.Fill()
--------------------
try
{
//fill the DataTable
dAdapter.Fill(dTable);
}
catch (InvalidOperationException /*e*/)
{ }

--------------------
i only added the headers = yes on the connection string. the file on disk is read/write and not open.

any idea what i could be doing wrong. I had this working before.
the dat file is from a recently opened file and then resaved with a few lines ripped off and resaved as a dat file so the parser will work well. could the adapter think the file is still open for writing.??

reply

Reo2
08/28/2008 - 13:05

This is a great routine! Thanks for posting this gem.

reply

Mehul Barot
09/18/2008 - 01:29

No doubt this is great, but i have face one problem with this. In my csv file i had decimal data type, after parsing csv using OLEDBConnection I checked the data and i came to know that my data is converted to integer.

Is there any solution for it???

reply

Thuta
09/19/2008 - 02:56

Hi Mehul,

If you want to solve this problem, you may need to create a schema(.ini) file to define your data types in file.

Please refer to following links:
http://doc.ddart.net/mssql/sql2000/html/mdacxml/htm/odbcjetschema_ini_file.htm

http://users.drew.edu/skass/sql/TextDriver.htm

reply

Leifsa
11/28/2008 - 06:38

In my case the filename where too long (>70 characters) so the brackets didn't help. I renamed the file to Guid.NewGuid().ToString() .tmp and it solved my problem.

reply

jainabhishek8
06/25/2009 - 23:36

getting a oleDBException with Cannot update, database or object is read-only.

on line Adapter.Fill()
--------------------
try
{
//fill the DataTable
dAdapter.Fill(dTable);
}
catch (InvalidOperationException /*e*/)
{ }

--------------------

reply

Anonymous
02/04/2010 - 09:08

This may be your issue:
http://support.microsoft.com/kb/245407

reply

Furism
07/16/2009 - 10:29

Just chiming in to say that this is a great article, but I've also got an error message for this piece:

 try
            {
                //fill the DataTable
                dAdapter.Fill(dTable);
            }

The error is:

"Could not find installable ISAM."

What gives?

reply

Tim
11/07/2009 - 16:31

I am trying to add a where clause to this algorithm.
When my query is "SELECT * FROM [hospitalData.csv] WHERE zipcode <= 80010"
When I run this I also get the "Syntax error in FROM clause."
The exception occurs when executing dAdapter.Fill(dTable);

Can anyone spot the problem? Is having a WHERE clause possible when using this technique?

reply

Anonymous
01/29/2010 - 03:59

I tried this code:

If (Not file.Exists(filename)) Then
      Return
    End If

    Dim full As String = System.IO.Path.GetFullPath(filename)
    Dim sfile As String = System.IO.Path.GetFileName(full)
    Dim dir As String = System.IO.Path.GetDirectoryName(full)

    'create the "database" connection string
    Dim connString As String = "Provider=Microsoft.Jet.OLEDB.4.0;" _
      & "Data Source=""" + dir + "\\"";" _
      & "Extended Properties=""text;HDR=No;FMT=Delimited(,)"""

    'create the database query
    Dim query As String = "SELECT * FROM " + sfile

    'create a DataTable to hold the query results
    Dim dTable As DataTable = New DataTable()

    'create an OleDbDataAdapter to execute the query
    Dim dAdapter As OleDbDataAdapter = New OleDbDataAdapter(query, connString)

    Try
      'fill the DataTable
      dAdapter.Fill(dTable)
    Catch ex As InvalidOperationException
      dAdapter.Dispose()
    End Try

But i'm having a problem with this file :

CategoryId, CategoryName
1,Cat1
2,Cat2
3,Cat3

After running the code (VB version), my datatable only contains one column, and each datarow contains a field with the entire text line

Is there a way to use the first line as column name definition?

reply

Add Comment

Put code snippets inside language tags:
[language] [/language]

Examples:
[javascript] [/javascript]
[actionscript] [/actionscript]
[csharp] [/csharp]

See here for supported languages.

Javascript must be enabled to submit anonymous comments - or you can login.

Sponsors