Building a Simple CSV Parser in C#

Skill

Building a Simple CSV Parser in C#

Posted in:

Edit: 2011/04/28
As several people have pointed out, the code in tutorial does not create a true CSV parser. We did, however, publish a followup tutorial that is a fully functional CSV parser.

And here it is:
C# Tutorial - Using The Built In OLEDB CSV Parser

So you are sitting around and you somehow have 100 Comma Separated Value files (CSV) and you are not quite sure exactly what the best way to read them is. Well, if you are using Visual Studio and C#, you are in quite a bit of luck, because you can read a CSV file quite easily. With one very small function you can spit out a list of values, separated conveniently by rows and columns. Then you can take this list and use it however you want, perhaps in a DataTable or GridView object.

The parser we are going to build today is going to be extremely simple, and will in fact break on more complicated CSV files (files that have commas actually in the data, etc...). But for most CSV files, this will work fine - and look for a tutorial in the near future about building a parser that can easily deal with even the most convoluted of CSV files.

To start off, you need to open up Visual Studio and start a new C# application project, so go ahead and do that, naming the project whatever you want. Once your project is up and ready, you need to find a place to build and call your parser function. If you right click on your Form1.cs, then go to 'View Code' you will get your form1's code. Inside the main Form1 : Form class, under your public Form1() definition is the perfect place for your function for now. Later on you can move it to somewhere more permanent, but for now we will get the function working.

Sadly, one of the namespaces we will be using is not declared by default in the standard 'using' statements at the top of our file. But all you have to do is add it below all the others:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using System.IO; //System.IO is not used by default

Now we can tear up some serious code. The first step is to declare our parser function. It will look something like:

public List<string[]> parseCSV(string path)
{
}

This is a pretty simple function, which will take in a string that represents the path of the CSV file and spit out a List of string arrays. Now you may be asking why not just use a string array of string arrays (string[][])? Well, adding elements to an array is not exactly efficient, but a list can be added to, subtracted from, and is just generally a lot more flexible.

The next thing we need to do is declare our return variable, which is really just one line. So inside the function, as the first line, we have:

List<string[]> parsedData = new List<string[]>();

This is just declaring a List of string arrays that will hold our file information as we read each line. Now the cool thing is that the System.IO namespace has this neat class called StreamReader, which can open a text based file and read it line by line. This gives us the advantage of just calling a method that reads the file line by line rather than byte by byte. StreamReader is extremely handy for reading text files and is a perfect candidate for us in this case.

We are gong to declare our StreamReader with a using statement so it will be disposed once it leaves scope, and when it is disposed it will be closed automatically. So declaring the new StreamReader object will look something like:

using (StreamReader readFile = new StreamReader(path))
{
}

Take notice that the actual declaration is inside the using statement. Inside this statement we will be doing everything involving reading the file and building our list of string arrays. Now all a comma separated values file is is exactly what you would think - a file full of values separated by commas. Each line really corresponds to a row of data, so all we have to do is read the file line by line, then separate the values. Since the StreamReader Class can read a file line by line, all we really need to do is take the line and split it. But first we have to declare some variables, inside the using block of course.

We will need two variables, one to hold the line as it is read and an array to hold the separated values. We will call these line and row:

using (StreamReader readFile = new StreamReader(path))
{
  string line;
  string[] row;

Next we have to read the file line by line, which can be done with a very simple while statement. We will be reading the file until the current line is null, which will be the case when there are no more lines to read. To do this we set our line variable to our currently read line, then when the line is null, stop reading the file. It will look like so:

while ((line = readFile.ReadLine()) != null)
{
}

A very simple while loop that runs through the file until you reach a line that is empty. *Take note that a line is not null if it is space, newline, or the like. A line is only null if there is truly nothing there.* Inside this loop, all we need to do is split each line at the commas, then add the resulting array to our list. Luckily there are many times you need to split a string, so the basic string class has a method to do just this. After we split the line, adding it to the list is just as simple. We call list.add(). So with two lines of code we can do what we need to. After our additions our while statement will look like:

while ((line = readFile.ReadLine()) != null)
{
  row = line.Split(',');
  parsedData.Add(row);
}

Simple yet effective. So simple in fact, that you really don't have to read in just comma separated files, but any file separated by a standard character can be read. all you have to do is change the split() call to whatever character is splitting the file. As mentioned above, the first line sets our row variable to the values of our split string, and the second line adds that string array to our list. Not difficult to understand at all.

The end of the while loop actually means the end of our using block as well. After the using block we have a completely filled list of string arrays, which represent rows of data in our CSV file. All we need to do now is put the whole thing in a Try-Catch block, which will catch any errors we may get when attempting to open or read the file.

We don't actually need anything fancy, in fact we will just catch any exception we get in the using block (since there are a bunch of different kinds that could be thrown). So our final function will look something like:

public List<string[]> parseCSV(string path)
{
  List<string[]> parsedData = new List<string[]>();

  try
  {
    using (StreamReader readFile = new StreamReader(path))
    {
      string line;
      string[] row;

      while ((line = readFile.ReadLine()) != null)
      {
        row = line.Split(',');
        parsedData.Add(row);
      }
    }
  }
  catch (Exception e)
  {
    MessageBox.Show(e.Message);
  }

  return parsedData;
}

Notice that we just take the message from the exception and display it with the standard MessageBox class. This will work, and our function will actually just return an empty list on any error, which means that our code actually doesn't break, we just don't get any data. So after we return our parseData list, whether filled or not, our function ends.

A small function that is easy to understand and even easier to build. Even better, since it returns a basic list object, you can use the data returned to do anything from fill a grid to making complex calculations. You can also make this function read any type of separated file, just change the separator in the split statement.

DataGrid populated with CSV data

Using our new function to fill a DataGrid

If you would like a full Visual Studio Solution for this tutorial, one can be found here. I hope this tutorial was informative and most of all useful, and I'll be back soon with a tutorial on the more complicated version of this parser (the one that can deal with complex CSV files). Just remember that when you need coding help, Switch on the Code.

Sam
05/26/2008 - 00:04

Very cool! Easy to follow and useful. thanks

reply

Anonymous
02/17/2011 - 08:48

http://oscar.angress.de/2011/02/17/csv-converter/

reply

Zohar
06/17/2008 - 01:26

Thanks , finally I found what i look for !

reply

Andy
07/09/2008 - 03:07

But what if one of the fields has a comma in it? *bug*

reply

The Tallest
07/09/2008 - 06:15

In the article itself, it says that this is just a simple CSV parser:

The parser we are going to build today is going to be extremely simple, and will in fact break on more complicated CSV files (files that have commas actually in the data, etc...).

reply

baxxtab
11/08/2010 - 12:47

Then it's not a csv parser. It's only something that splits lines based on a character. If I build an xml parser that only reads nodes and not attributes, then I can't call it an xml parser can I?

Sorry man, you're only halfway there. Calling something incomplete 'complete' doesn't make it so.

Try renaming your page to "comma-parser" or something.

reply

Anonymous
11/16/2010 - 09:42

CSV stands for "Comma Seperated Values" ... idiot! What a moronic comment from somebody try to sound smart. Haha

reply

Anonymous
02/15/2011 - 13:02

Nope. CSV is a well-known format that allows commas in data. If this doesn't handle that, it's not a CSV reader.

reply

Anonymous
04/28/2011 - 15:09

A CSV parser also can't use readLine to read the file. Values may contain newlines (surrounded by double quotations). FAIL!

reply

Derelikt
05/12/2011 - 12:41

Just to clear all of this mess up, CSV does = comma separated values, and a simple CSV reader (which this is) would be limited yet effective for files that you control.

BTW everything starts out simple, it's up to you to add the functionality to make it more complex. stop criticizing and let people learn the simple approach so they to can become better programmers. A.K.A. Stop being a cry baby and shut the heck up! =P

reply

Anonymous
07/18/2011 - 20:45

This code looks good... I read the comments it is not good to comment like this.. I did not see any CSV with quotes...

reply

Shreyans
07/11/2008 - 11:31

Can you please tell me how do I refer CSV file with indirect link.

We are using different CSV files. So instead of having hard code can we refer csv through the other directry or other files like xml.

Thanks in advance.

reply

MD Philip
08/06/2008 - 09:24

Every one could have done what you have done here. If people are searchig for a csv parser it is just to find the one that can address some complicated csv file with commas in the data, etc.

But thanks for your post.

reply

The Tallest
08/06/2008 - 09:27

For those with more complex CSV parsing needs, check out this post: Using The Built In OLEDB CSV Parser

reply

Anonymous
10/04/2011 - 19:42

with OLEDB we lose the leading zeros

reply

Anonymous
02/08/2012 - 06:57

so, by the comments above, the OLE DB CSV Parser is NOT a CSV Parser, because it does have bugs... :P

When people are stupid, they just are.

reply

Shweta
09/07/2008 - 05:35

Does anybody have done work on complicated CSV parsers. If yaa, please tell me.

reply

Shweta
09/07/2008 - 05:39

What happen , if there is commas and spaces in between ?

reply

The Reddest
09/07/2008 - 10:17

Shweta, check out the Using The Built In OLEDB CSV Parser tutorial.

reply

Anonymous
09/06/2009 - 22:01

I have built one, for the code go to
http://programmersfirststep.orgfree.com/CommaSeperatedValue.cs

reply

Anonymous (michael)
09/06/2009 - 22:04

my post above,is not a tutorial, but it is the source code, so i assume that you can pretty much make out the information (it does not use the split method, or that other tutorial listed by the author, it is a csv parser that only has one current flaw.... when the line ends and is in a quote it doesnt continue it as a token, but im working on that

reply

Katski11
09/30/2008 - 00:24

This was very informative especially to the ones just starting to learn parsing a csv file...

keep up the good work!

More Power...

reply

Mohammed Wafy
12/04/2008 - 09:21

very good and helpfull tutorial
i just started to parse CSV files and this will help me more .

thanks

reply

GUIOOS
08/06/2009 - 15:44

How convert CSV to XML file using ASP.NET and VISUALSTUDIO 2008
1. Load my file in a folder in my WebSite
2. Build the XML file (from CSV file, of course)!!!
Thanks

reply

The Reddest
08/06/2009 - 16:28

1. Load the CSV into a DataTable using this tutorial.

2. Serialize the DataTable to XML using .NET's built in XML serialization.

reply

The Blackest
09/14/2009 - 21:37

Rich,

Nice work and I like it all except I think you should be using a HashTable in preference of List<T> - ArrayList and strings are so limiting... String.Contains / StringComparison - too close to regex that it is.. well lacking in the accuracy!

:-)

reply

Jonathan
01/28/2010 - 01:14

Simple and effective. But, if you're csv data has comma's embedded in values, this could break.

Here is a one at codeplex that can handle this.
http://commonlibrarynet.codeplex.com/

reply

JDomo
02/04/2010 - 23:41

Hello.. Thanks for the good work on this great helpful tutorial. I am still a newbie for C#, as this is the first programming i which to learn. Now still try hard to understand it and play with it.

I need some help or advise, for the DataGrid, is it possible to edit the name of the "column1" ?
If can, den how am i going to do that?

Thanks in advance.

:)

reply

Steve
03/26/2010 - 17:49

This helped me kick off a project for work. Where do I send the check?
Thanks for explaining the steps in English. This makes it a great learning exercise. I look forward to finding more solutions here.

reply

Amresh Kumar (SSE)
08/15/2010 - 09:44

Thanks! Really this helped me to start my dream desktop application flying. I was trying some other menthod but everytime I found some obstruction. Now it helped me to anticipate the hurdle.

reply

Scott
08/20/2010 - 09:17

Good tutorial, one question though.

The csv film I'm reading contains "" which then get displayed as ??

Does anyone know how I can get round this problem?

Many thanks,

Scott.

reply

Siyad Ahmed
08/26/2010 - 08:53

I get following error.

Cannot serialize the DataTable. DataTable name is not set.

when i tries to do

DataTable.WriteXml()

reply

renjith
10/15/2010 - 01:37

Good Note

the code is working for me..

thanks

reply

Anonymous
11/22/2010 - 14:00

hi

I get the below error when I run the code on a Live server as opposed to my local machine... any ideas? thanks

System.IO.FileNotFoundException: Could not find file 'c:\windows\system32\inetsrv\Test.csv'. File name: 'c:\windows\system32\inetsrv\Test.csv' at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath) at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy) at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, String msgPath, Boolean bFromProxy) at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options) at System.IO.StreamReader..ctor(String path, Encoding encoding, Boolean detectEncodingFromByteOrderMarks, Int32 bufferSize) at System.IO.StreamReader..ctor(String path, Boolean detectEncodingFromByteOrderMarks) at _Default.parseCSV(String path)

reply

The Reddest
11/22/2010 - 15:52

Does the file "c:\windows\system32\inetsrv\Test.csv" exist on the remote server? When you run it on a remote machine, the code is not looking for that file on your local computer.

reply

Anonymous
12/13/2010 - 22:12

Where do you specify the text file to be parsed? And how do you bind parsedData to a datagrid? Thanks.

reply

The Reddest
12/14/2010 - 10:54

The source code for the working example is provided at the bottom of the post. It should have answers for both of your questions.

reply

Jonathan Wood
12/19/2010 - 12:44

You code seems good as far as it goes. But it's possible for values to have embedded commas. These values should be wrapped in double quotes. In addition, it's possible for values to have embedded double quotes. So some additional steps are required to correctly part CSV file.

My code to both read and write CSV files can be viewed at http://www.blackbeltcoder.com/Articles/files/reading-and-writing-csv-files-in-c.

reply

C# Newbie
12/28/2010 - 04:58

//This tutorial is very useful to me.
//It is easy to understand.
//I am an intern to a multinational company
//and I was asked to self-study CSV Reader in C#

//Thanks for the help!! ^_____^

reply

Greg McKee
12/29/2010 - 11:06

Great Tutorial. Thanks You!!!

reply

John, Another Newbie to C#
01/04/2011 - 21:41

Love C#, and your tutorial. This will help me create a Service that reads parameter output files from a SMT Solder Reflow oven at work, then move that data to a SQL Server table as a record of process control.
To Anonymous - The file cannot be seen unless you store it in the C:\ drive of your Server.
Try changing path in code to read:
\\[LocalPCname]\c$\windows\system32\inetsrv\Test.csv
then run on server (your server logon will, of course, need to have permissions to look at that folder on your local PC, with Sharing)
or place the CSV file on the Server's C: drive

reply

Rui
01/24/2011 - 11:55

Thanks mate, this is just what I was looking for.

reply

Anonymous
02/11/2011 - 13:18

There“s one mistake in your example code.

The line "dataGridView1.DataSource = newTable";

comes direct after the line newTable.Rows.Add(row);

otherwise the DataGridView will be empty.

reply

Anonymous
03/06/2011 - 03:02

///
/// Parses CSV data containing Quoted cells with double quote escape for quotes (Excel "Save as CSV" standard format), "cell,1",cell 2,"Cell ""3"""
///
/// CSV data in string format
///
public static List> ParseCSVData(String data)
{
List> csvData = new List>();
// parse the CSV data

char prevChr = ' ';

Boolean inQuotes = false;
StringBuilder cellBuilder = new StringBuilder();
List rowBuilder = new List();

foreach (char chr in data.ToCharArray())
{
if (chr == '"' && inQuotes) // cell quotes ended
{
inQuotes = false;
}
else if (chr == '"' && !inQuotes && prevChr != chr) // cell encapsulated by quotes
{
inQuotes = true;
}
else
{
if (chr == '\r' && !inQuotes) // secondary newline character (ignored)
{
// do nothing
}
if (chr == '\n' && !inQuotes) // new line character
{
rowBuilder.Add(cellBuilder.ToString());
cellBuilder = new StringBuilder();
csvData.Add(rowBuilder);
rowBuilder = new List();
}
if (chr == ',' && !inQuotes) // cell delimiter
{
rowBuilder.Add(cellBuilder.ToString());
cellBuilder = new StringBuilder();
}
else
{
cellBuilder.Append(chr);
}
}
prevChr = chr;
}

return csvData;
}

reply

Anonymous
03/23/2011 - 13:40

Thanks for the code, very nicely explained. Everybody should learn from you how to explain code!

reply

EricF
03/24/2011 - 12:16

Excellent article, great intro to many key concepts. For those folks complaining about needing to remove embdded commas inside quoted fields, here is a method that removes the comma and surrounding quotes. It could be improved - it only expects one field in the line that has quotes with embedded commas.

    /// <summary>
    /// The Description field in the input line from the CSV file sometimes contains
    /// an embedded comma.  This must be removed in order to easily parse the line
    /// using split(","), otherwise the Description field gets split into two fields.
    /// This method removes the embedded comma and surrounding quotes.
    /// </summary>
    /// <param name="line">The line that may have a Description field with an embedded comma.</param>
    /// <returns>string: the input line, cleaned of the embedded comma and surrounding quotes.</returns>
    private string FixDescriptionField(string line)
    {
        const int START_INDEX_ZERO = 0;
        const int NUMBER_OF_REMOVED_CHARS = 3;

        string CleanLine = line;

        // Find index of first quote, if any
        int firstQuoteIdx = line.IndexOf('\"', START_INDEX_ZERO);

        if( firstQuoteIdx > 0 )
        {
            // Find the column index of the closing quote
            int secondQuoteIdx = line.IndexOf('\"', firstQuoteIdx+1);

            // If there was a closing quote and sanity check passes...
            if (secondQuoteIdx != -1 &&
                secondQuoteIdx > firstQuoteIdx)
            {
                // Must be a line that has a quoted description and embedded comma.
                string Description = line.Substring(firstQuoteIdx + 1, secondQuoteIdx - firstQuoteIdx - 1);

                // Replace the embedded comma with a space
                Description = Description.Replace(',', ' ');

                // Rebuild the line without the quotes or embedded comma
                CleanLine = line.Substring(0, firstQuoteIdx);
                CleanLine += Description;

                // Figure out the new line length after removing the 2 quotes and 1 comma.
                int FinalCleanLineLength = line.Length - NUMBER_OF_REMOVED_CHARS;
                CleanLine += line.Substring(secondQuoteIdx + 1, FinalCleanLineLength - CleanLine.Length);
            }
        }

        return CleanLine;
    }

reply

colin
04/09/2011 - 12:47

Thanks found this a very useful starting ground. Really well explained. Look forward to more.

reply

Dharmendra
06/30/2011 - 01:41

This method does not pick leading zeros... gochhhaaaa :(

///
/// This method read provided CSV file and return a datatable.
///
/// The location to pick the file.
private DataTable ReadCSV(string FileName)
{

DataTable csvDataTable = new DataTable();
try
{
//no try/catch - add these in yourselfs or let exception happen
String[] csvData = File.ReadAllLines(FileName);

//if no data
if (csvData.Length == 0)
{
return csvDataTable;
}

String[] headings = csvData[0].Split(',');

//for each heading
for (int i = 0; i < headings.Length; i++)
{
////replace spaces with underscores for column names
//headings[i] = headings[i].Replace(" ", "_");

//add a column for each heading
csvDataTable.Columns.Add(headings[i], typeof(string));
}

//populate the DataTable
for (int i = 1; i < csvData.Length; i++)
{
//create new rows
DataRow row = csvDataTable.NewRow();

for (int j = 0; j < headings.Length; j++)
{
//fill them
row[j] = csvData[i].Split(',')[j];
}

//add rows to over DataTable
csvDataTable.Rows.Add(row);
}
}
catch (Exception ex)
{
MessageBox.Show( ex.Message.ToString());
}
//return the CSV DataTable
return csvDataTable;

}

reply

PWX
07/28/2011 - 07:38

Perfect for beginners, best tutorials out there.

reply

Anonymous
10/04/2011 - 19:43

just make a note that using OLEDB to connect and read from CSV will lose the leading zeros if the values have any

reply

Tony
11/01/2011 - 20:07

Great tutorial thanks

reply

Derrick
12/03/2011 - 17:28

If you are concerned about commas, try using the pike | as the separator. Murach's VB 2010 uses this method, and save the file with a dat extension. The result is the similar.

reply

Anonymous
01/06/2012 - 09:24

this is crap

reply

Add Comment

Put code snippets inside language tags:
[language] [/language]

Examples:
[javascript] [/javascript]
[actionscript] [/actionscript]
[csharp] [/csharp]

See here for supported languages.

Javascript must be enabled to submit anonymous comments - or you can login.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.