So, a while ago, one of the other writers here wrote a small tutorial on parsing simple CSV files in C#. It mostly just showed off the string split method, and only worked on really simple CSV files - no quoted fields, etc. Well, we got a comment asking about that, so today I sat down thinking I would write up a more robust parser. But as I read through the RFC that describes CSV files, I thought to myself, am I suffering from NIH (not invented here) syndrome? Do I really need to write a full CSV parser?
And, as you might expect, the answer is no. Not only did I not need to write a parser, I found that there is one that is built into OLEDB subsystem of Windows! And it actually takes fewer lines of code to use this built in parser than it does to do the simple string split algorithm that was in the previous tutorial (and it feels a lot nicer too). I was originally planning on this being a decently long tutorial, when I thought I would write my own parser - but you are actually in for a really short and simple one today. Less for me to write, less for you to read, and more functionality to boot!
So, without further ado, I think we can jump straight into the code:
using System.Data;
using System.IO; //not used by default
using System.Data.OleDb; //not used by default
namespace CSVParserExample
{
class CSVParser
{
public static DataTable ParseCSV(string path)
{
if (!File.Exists(path))
return null;
string full = Path.GetFullPath(path);
string file = Path.GetFileName(full);
string dir = Path.GetDirectoryName(full);
//create the "database" connection string
string connString = "Provider=Microsoft.Jet.OLEDB.4.0;"
+ "Data Source=\"" + dir + "\\\";"
+ "Extended Properties=\"text;HDR=No;FMT=Delimited\"";
//create the database query
string query = "SELECT * FROM " + file;
//create a DataTable to hold the query results
DataTable dTable = new DataTable();
//create an OleDbDataAdapter to execute the query
OleDbDataAdapter dAdapter = new OleDbDataAdapter(query, connString);
try
{
//fill the DataTable
dAdapter.Fill(dTable);
}
catch (InvalidOperationException /*e*/)
{ }
dAdapter.Dispose();
return dTable;
}
}
}
So, first off, you will need to add the namespaces System.IO and System.Data.OleDb. The first we need because we will be doing some path manipulation, and the second gives us access to what we will need to do CSV parsing. I've created a nice static function here that takes a path to a CSV file and returns the contents of the file in a DataTable (which is really easy to view and manipulate).
The weird thing about all of this is that the CSV file gets treated as a database table. We need to create a connection string with the Jet OLEDB provider, and you set the Data Source to be the directory that contains the CSV file. Under extended properties, 'text' means that we are parsing a text file (as opposed to, say, an Excel file), the HDR property can be set to 'Yes' (the first row in the CSV files is header information) or 'No' (there is no header row), and setting the FMT property set to 'Delimited' essentially says that we will be working with a comma separated value file. You can also set FMT to 'FixedLength', for files where the fields are fixed length - but that wouldn't be a CSV file anymore, would it?
The next part to do is create the actual query. In this case, we want everything, so we have a "SELECT *". What are we selecting from? Well, in this somewhat twisted worldview, the directory is the database, so the file is the table we are selecting from.
Now we are into normal OLEDB territory - we create a DataTable that we will be filling with results, and we create a OleDbDataAdapter to actually execute the query. Then (inside of a try block, because it can throw an exception) we fill the data table with the results of the query. Afterwords, we clean up after ourselves by disposing the OleDbDataAdapter, and we return the now filled data table.
And using the now filled data table is extremely simple - we actually talk about it here and here.
And there you go! I'm not sure how parsing CSV could be much easier. If you would like the Visual Studio project I used to test all this, you can grab it here. And, as always, if you have any questions or comments, leave them below.
07/10/2008 - 14:27
Great stuff! I would never have thought of that...
07/21/2008 - 07:51
When I try the above code I receive an exception "Syntax error in FROM clause."
Excel can read the file with no questions or errors.
The variable 'query' is set as follows "SELECT * FROM hba-info.csv". The exception occurs when executing dAdapter.Fill(dTable);
10/04/2008 - 12:23
if you have “Syntax error in FROM clause.”...
is beacause u must to clear the whitespaces in the filename.
10/06/2008 - 02:07
Also, putting sqaure brackets around the file name in the FROM clause of the select string may fix the Syntax error in FROM clause problem.
02/23/2009 - 05:11
This worked for me thanks.
10/14/2008 - 13:14
Thanks for the great article. To answer Tim Drews question and I came up against the same thing. Pulled my hair out for 3 hours. If the FileName contains any "-" you need to Square Bracket the file name.
“SELECT * FROM [hba-info.csv]”
01/12/2010 - 16:20
Thanks, DigitalDan3!
I was struggling with this.
07/29/2008 - 03:44
Nice article.
I have found that there can be issues with some foreign characters example the file contained äüö and then the data table held äüö
But very handy for file with non foreign characters.
Thank you.
11/09/2010 - 00:59
What to do If file has foreign characters?
07/31/2008 - 07:33
Is there a generalization that allows reading from delimited files that use a separator other than the comma?
07/31/2008 - 08:25
Is there any provision for using a delimiter other than a comma?
08/19/2008 - 09:55
FMT=Delimited(,)
Substitute that in, in place of the existing parameter, and change the comma to your delimiter.
02/07/2010 - 11:49
Couldn't get this to work.
08/19/2008 - 09:50
Great writeup, that connection string has saved me a lot of time migrating a legacy system record file which I managed to get to CSV format.
08/20/2008 - 14:32
getting a oleDBException with Cannot update, database or object is read-only.
on line Adapter.Fill()
--------------------
try
{
//fill the DataTable
dAdapter.Fill(dTable);
}
catch (InvalidOperationException /*e*/)
{ }
--------------------
i only added the headers = yes on the connection string. the file on disk is read/write and not open.
any idea what i could be doing wrong. I had this working before.
the dat file is from a recently opened file and then resaved with a few lines ripped off and resaved as a dat file so the parser will work well. could the adapter think the file is still open for writing.??
08/28/2008 - 13:05
This is a great routine! Thanks for posting this gem.
09/18/2008 - 01:29
No doubt this is great, but i have face one problem with this. In my csv file i had decimal data type, after parsing csv using OLEDBConnection I checked the data and i came to know that my data is converted to integer.
Is there any solution for it???
09/19/2008 - 02:56
Hi Mehul,
If you want to solve this problem, you may need to create a schema(.ini) file to define your data types in file.
Please refer to following links:
http://doc.ddart.net/mssql/sql2000/html/mdacxml/htm/odbcjetschema_ini_file.htm
http://users.drew.edu/skass/sql/TextDriver.htm
11/28/2008 - 06:38
In my case the filename where too long (>70 characters) so the brackets didn't help. I renamed the file to Guid.NewGuid().ToString() .tmp and it solved my problem.
06/25/2009 - 23:36
getting a oleDBException with Cannot update, database or object is read-only.
on line Adapter.Fill()
--------------------
try
{
//fill the DataTable
dAdapter.Fill(dTable);
}
catch (InvalidOperationException /*e*/)
{ }
--------------------
02/04/2010 - 09:08
This may be your issue:
http://support.microsoft.com/kb/245407
07/16/2009 - 10:29
Just chiming in to say that this is a great article, but I've also got an error message for this piece:
{
//fill the DataTable
dAdapter.Fill(dTable);
}
The error is:
"Could not find installable ISAM."
What gives?
04/27/2010 - 15:26
I got that error too. ~Matt
11/07/2009 - 16:31
I am trying to add a where clause to this algorithm.
When my query is "SELECT * FROM [hospitalData.csv] WHERE zipcode <= 80010"
When I run this I also get the "Syntax error in FROM clause."
The exception occurs when executing dAdapter.Fill(dTable);
Can anyone spot the problem? Is having a WHERE clause possible when using this technique?
01/29/2010 - 03:59
I tried this code:
Return
End If
Dim full As String = System.IO.Path.GetFullPath(filename)
Dim sfile As String = System.IO.Path.GetFileName(full)
Dim dir As String = System.IO.Path.GetDirectoryName(full)
'create the "database" connection string
Dim connString As String = "Provider=Microsoft.Jet.OLEDB.4.0;" _
& "Data Source=""" + dir + "\\"";" _
& "Extended Properties=""text;HDR=No;FMT=Delimited(,)"""
'create the database query
Dim query As String = "SELECT * FROM " + sfile
'create a DataTable to hold the query results
Dim dTable As DataTable = New DataTable()
'create an OleDbDataAdapter to execute the query
Dim dAdapter As OleDbDataAdapter = New OleDbDataAdapter(query, connString)
Try
'fill the DataTable
dAdapter.Fill(dTable)
Catch ex As InvalidOperationException
dAdapter.Dispose()
End Try
But i'm having a problem with this file :
CategoryId, CategoryName
1,Cat1
2,Cat2
3,Cat3
After running the code (VB version), my datatable only contains one column, and each datarow contains a field with the entire text line
Is there a way to use the first line as column name definition?
03/21/2010 - 09:29
If you want to build a parser that will deal with quotes with commas in, just split the whole line by quotes first, then replace the commas in that field with a tilda or something, join the first split back up into a line, the split the line by comma and then replace the tilda....simples ! 10 mins and job done.
10/02/2010 - 14:38
Er, no, because you'll replace the separators with a tilde too.
03/21/2010 - 12:16
//Line is string from a StreamReader
string TmpLine = "";
string NewLine = "";
bool swap = false;
int first = 0;
int last = 0;
for (n = 0; n < Line.Length; n++)
{
if (Line.Substring(n, 1) == "\"" && swap == true)
{
swap = false;
last = n; //next instance
TmpLine = Line.Substring(first, (last - first) + 1);//get the quoted section
NewLine = TmpLine.Replace(",", "~");
Line = Line.Replace(TmpLine, NewLine);
n++; //skip and extra 1
}
if (Line.Substring(n, 1) == "\"" && swap == false)
{
swap = true;
first = n; //first instance
}
}
//Line is now a as per orginal CSV but with text,text,"text~text" instead of text,text,"text,text" .... now just use string.split, to finish off.
You can change the "~" tilda later.
04/13/2010 - 03:47
when i run your source i see that everything is filled in one column - using the dAdapter.Fill(dTable).
is the fill method not able to set in the values in their respective columns?
04/13/2010 - 04:01
hah,
Got it, just added/specified a delimiter in your code :P
thanks for this nifty trick!
07/05/2010 - 23:31
Thanks for the article.
Having a problem trying to read a list of reference numbers from a CSV file. When the 'number' contains an alpha the data is not being read so there's no way to notify the user about bad data. It works fine if the data is in an Excel spreadsheet.
Sample Data:
ITEM_REF_NO
905893
1000O
913798
Second record contains an alpha O at the end of the number but this data does not appear in the datatable.
My code is C# and is copied from the sample.
string full = Path.GetFullPath(strFileName);
string file = Path.GetFileName(full);
string dir = Path.GetDirectoryName(full);
//create the "database" connection string
string connString = "Provider=Microsoft.Jet.OLEDB.4.0;"
+ "Data Source=\"" + dir + "\\\";"
+ "Extended Properties=\"text;HDR=No;FMT=Delimited\"";
//create the database query
string query = "SELECT * FROM [" + file + "]";
DataTable dt = new DataTable();
OleDbDataAdapter adpCSV = new OleDbDataAdapter(query, connString);
try
{
adpCSV.Fill(dt);
// view the datatable here and it contains 3 rows but the second row is empty
Any ideas how to get this working?
01/03/2011 - 11:55
Hey Cag,
I am having a similar issue. Although, I am reading zip codes.
for example:
ZIP:
77707
31820-3911
43062
12508-2000
33122
When I read that data using this parser it outpouts to a grid and the grid shows everying except record 2 and 4 with the dash lines, it is like the parser just makes it disapear when I output to datatable.
When I run the parser on other files with similar data the dash line and the rest of the zip code shows up. So, I am wondering if this is a whitespace issue or something. I am clueless, it works sometimes and other times it doesn't.
If anyone can help me and cag, Id appreciate it.
01/03/2011 - 12:01
correction: record 2 and 4 show but they are just empty on the grid.
07/09/2010 - 12:10
Julian - Somewhat newby to C#...Your two posts were absolutely perfect.
We don't have the luxury of changing our input/config file's layout (otherwise everything would be tilde- or pipe-delimited for this very reason.
I read in a CSV potentially having commas in quotes as literal values. I take that string input to build a class, then add it to a List collection of those classes. Yes, the constructor overrides allowing for either a string or array input...regardless, you shouldn't have to use a data-adapter to fill a data-table, especially if you class it out for use in other areas of the application - for mere display? sure - for classing out the type (and the props are dynamic) this is absolutely perfect...well done
08/04/2010 - 11:50
is it possible to use this to add an additional column with a specified value for each of the rows?
09/24/2010 - 02:04
Thanks Buddy! Working From Reviews .Good Code!
11/09/2010 - 00:27
What if Forign charecters are there. They are not uploading properly.
01/09/2011 - 15:16
First thanks for this tutorial!!!
If you get the error "Der 'Microsoft.Jet.OLEDB.4.0'-Provider ist nicht auf dem lokalen Computer registriert." or "The 'Microsoft.Jet.OLEDB.4.0' provider is not registered on the local machine." you probably try to to compile it for 64 bit cpu target. The error is because there is not a 64 bit version of jet. I know because I had the same problem. To force your app to use the 32 bit change the target cpu to x86 in the advanced compiler options:
Project => Project Properties => Build => Platform target =>x64
01/16/2011 - 13:06
I still have a problem with the OleDbDataAdapter
It always seems to use the first line of the file to determine how many columns there are, then will only read that number for the rest of the file.
I have a seperated file where the line lengths are different - fewer columns on the first line than the rest of the file
02/17/2011 - 15:38
Does this parser work for files delimited in other ways? Tab, for example?
05/05/2011 - 03:12
Nope it doesn't which makes it pretty unsolid. The delimiter of the OLEDB driver is used, and it can only be changed within this driver, so there is no simple way to change the delimiter from "," to ";" or "". Unfortunately the RFC4180 doesn't specify the "," as Separator. So per example in European countries the decimal separator is "," and not "." like in US. So in European countries ";" is used as csv delimiter most likely.
The delimiter can be specified in the registry at the following location:
HKEY_LOCAL_MACHINE \ SOFTWARE \ Microsoft \ Jet \ 4.0 \ Engines \ Text
"Format" = "TabDelimited"
or
"Format" = "Delimited(;)"
Nice approach with this OLEDB connection, but I guess this issue is hard to handle with it.
By the way, if someone got a solution for this, please let me know! I like this database query thingy. ;)
04/12/2011 - 12:06
SELECT AS does not work. IE> can't rename the columns
05/20/2011 - 02:27
Great article, thanks.
How would you go about modifying this code to work with just a simple string holding the CSV data, as opposed to an actual file?
07/04/2011 - 04:33
Hi,
I tried using the code but it won't delimit.
My delimiter is an equal sign (=), and i tried to put it insted of the standard delimiter, but it won't work...
code: ..... +"Extended Properties=\"text;HDR=No;FMT=Delimited(=)\"";
What am I doing wrong?
Thanks,
07/19/2011 - 09:55
This is cool, my only problem is my column names show up as F0,F1,F2 etc..Any idea how to get the column names? They're defined in my first row
07/19/2011 - 10:00
Please ignore.. found it myself.. had to set HDR=Yes in the extended props.. Sorry! Noob in the building :)
08/05/2011 - 14:08
Any idea how to use WHERE clause in SQL statement.
For Example, if I have a date column in csv and I want only records from certain dates I tried this condition and it does not work
string query = "SELECT * FROM " + file + " WHERE [Date Started]>='03/18/2011';
It gives me an exception when attemption dAdapter.Fill
08/05/2011 - 14:10
string query = "SELECT * FROM" + file + "WHERE [Date Started] >= '03/18/2011'"
01/12/2012 - 19:38
Maybe a space before and after the file variable?
08/21/2011 - 02:41
Hi
i am using the above code for getting csv file to datset. BUT In that dataset, the firstrow, firstcolumn is concatinated with the an unknowncode i.e. is "".
how to remove the uncoded for fistcolumn in the datset.
can you plz help.
12/28/2011 - 23:22
I had the same problem and discovered you need to specify a CharacterSet in the Extended Properties. The value can be ANSI, UNICODE or a numeric code page (e.g. 65001 for UTF-8):
"Extended Properties=\"text;CharacterSet=65001;HDR=No;FMT=Delimited\""
10/21/2011 - 05:59
Why the date column title get empty for this code?
I have a file with the title as "Modified date" and that have a content like "10-06-11 9:44"
as I am look in to my datatable after parse function the title is empty just like "".
any idea.?
thanks
Chetan
11/23/2011 - 15:44
This has been a very educational code set. As I need to create a pre-processor to read the CSV file and manipulate some fields of data and write back out to a TXT file. When I run this code and substitute in my CSV file or use your sample CSV file I get no errors but the Form displays empty. Any thoughts on what I should be looking at?
Thanks
12/06/2011 - 20:36
@ gbroche
I am having the same issue.
Do you know what might be wrong with it?
Thank you.
01/19/2012 - 01:29
I used above code to read data from Excel it runs fine. But I have invalid data in file like I have typed dataset and I wrote double to string type data in some columns(in Excel file) but no Exception generated after reading. I need to generate Exception. Can any one suggest.
Add Comment
[language] [/language]
Examples:
[javascript] [/javascript]
[actionscript] [/actionscript]
[csharp] [/csharp]
See here for supported languages.
Javascript must be enabled to submit anonymous comments - or you can login.