Posted by The Hairiest on 04/17/2008

12 comments
Skill

Building a Simple CSV Parser in C#

Posted in:

So you are sitting around and you somehow have 100 Comma Separated Value files (CSV) and you are not quite sure exactly what the best way to read them is. Well, if you are using Visual Studio and C#, you are in quite a bit of luck, because you can read a CSV file quite easily. With one very small function you can spit out a list of values, separated conveniently by rows and columns. Then you can take this list and use it however you want, perhaps in a DataTable or GridView object.

The parser we are going to build today is going to be extremely simple, and will in fact break on more complicated CSV files (files that have commas actually in the data, etc...). But for most CSV files, this will work fine - and look for a tutorial in the near future about building a parser that can easily deal with even the most convoluted of CSV files.

To start off, you need to open up Visual Studio and start a new C# application project, so go ahead and do that, naming the project whatever you want. Once your project is up and ready, you need to find a place to build and call your parser function. If you right click on your Form1.cs, then go to 'View Code' you will get your form1's code. Inside the main Form1 : Form class, under your public Form1() definition is the perfect place for your function for now. Later on you can move it to somewhere more permanent, but for now we will get the function working.

Sadly, one of the namespaces we will be using is not declared by default in the standard 'using' statements at the top of our file. But all you have to do is add it below all the others:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using System.IO; //System.IO is not used by default

Now we can tear up some serious code. The first step is to declare our parser function. It will look something like:

public List<string[]> parseCSV(string path)
{
}

This is a pretty simple function, which will take in a string that represents the path of the CSV file and spit out a List of string arrays. Now you may be asking why not just use a string array of string arrays (string[][])? Well, adding elements to an array is not exactly efficient, but a list can be added to, subtracted from, and is just generally a lot more flexible.

The next thing we need to do is declare our return variable, which is really just one line. So inside the function, as the first line, we have:

List<string[]> parsedData = new List<string[]>();

This is just declaring a List of string arrays that will hold our file information as we read each line. Now the cool thing is that the System.IO namespace has this neat class called StreamReader, which can open a text based file and read it line by line. This gives us the advantage of just calling a method that reads the file line by line rather than byte by byte. StreamReader is extremely handy for reading text files and is a perfect candidate for us in this case.

We are gong to declare our StreamReader with a using statement so it will be disposed once it leaves scope, and when it is disposed it will be closed automatically. So declaring the new StreamReader object will look something like:

using (StreamReader readFile = new StreamReader(path))
{
}

Take notice that the actual declaration is inside the using statement. Inside this statement we will be doing everything involving reading the file and building our list of string arrays. Now all a comma separated values file is is exactly what you would think - a file full of values separated by commas. Each line really corresponds to a row of data, so all we have to do is read the file line by line, then separate the values. Since the StreamReader Class can read a file line by line, all we really need to do is take the line and split it. But first we have to declare some variables, inside the using block of course.

We will need two variables, one to hold the line as it is read and an array to hold the separated values. We will call these line and row:

using (StreamReader readFile = new StreamReader(path))
{
  string line;
  string[] row;

Next we have to read the file line by line, which can be done with a very simple while statement. We will be reading the file until the current line is null, which will be the case when there are no more lines to read. To do this we set our line variable to our currently read line, then when the line is null, stop reading the file. It will look like so:

while ((line = readFile.ReadLine()) != null)
{
}

A very simple while loop that runs through the file until you reach a line that is empty. *Take note that a line is not null if it is space, newline, or the like. A line is only null if there is truly nothing there.* Inside this loop, all we need to do is split each line at the commas, then add the resulting array to our list. Luckily there are many times you need to split a string, so the basic string class has a method to do just this. After we split the line, adding it to the list is just as simple. We call list.add(). So with two lines of code we can do what we need to. After our additions our while statement will look like:

while ((line = readFile.ReadLine()) != null)
{
  row = line.Split(',');
  parsedData.Add(row);
}

Simple yet effective. So simple in fact, that you really don't have to read in just comma separated files, but any file separated by a standard character can be read. all you have to do is change the split() call to whatever character is splitting the file. As mentioned above, the first line sets our row variable to the values of our split string, and the second line adds that string array to our list. Not difficult to understand at all.

The end of the while loop actually means the end of our using block as well. After the using block we have a completely filled list of string arrays, which represent rows of data in our CSV file. All we need to do now is put the whole thing in a Try-Catch block, which will catch any errors we may get when attempting to open or read the file.

We don't actually need anything fancy, in fact we will just catch any exception we get in the using block (since there are a bunch of different kinds that could be thrown). So our final function will look something like:

public List<string[]> parseCSV(string path)
{
  List<string[]> parsedData = new List<string[]>();

  try
  {
    using (StreamReader readFile = new StreamReader(path))
    {
      string line;
      string[] row;

      while ((line = readFile.ReadLine()) != null)
      {
        row = line.Split(',');
        parsedData.Add(row);
      }
    }
  }
  catch (Exception e)
  {
    MessageBox.Show(e.Message);
  }

  return parsedData;
}

Notice that we just take the message from the exception and display it with the standard MessageBox class. This will work, and our function will actually just return an empty list on any error, which means that our code actually doesn't break, we just don't get any data. So after we return our parseData list, whether filled or not, our function ends.

A small function that is easy to understand and even easier to build. Even better, since it returns a basic list object, you can use the data returned to do anything from fill a grid to making complex calculations. You can also make this function read any type of separated file, just change the separator in the split statement.

DataGrid populated with CSV data

Using our new function to fill a DataGrid

If you would like a full Visual Studio Solution for this tutorial, one can be found here. I hope this tutorial was informative and most of all useful, and I'll be back soon with a tutorial on the more complicated version of this parser (the one that can deal with complex CSV files). Just remember that when you need coding help, Switch on the Code.

Sam
05/26/2008 - 00:04

Very cool! Easy to follow and useful. thanks

reply

Zohar
06/17/2008 - 01:26

Thanks , finally I found what i look for !

reply

Andy
07/09/2008 - 03:07

But what if one of the fields has a comma in it? *bug*

reply

The Tallest
07/09/2008 - 06:15

In the article itself, it says that this is just a simple CSV parser:

The parser we are going to build today is going to be extremely simple, and will in fact break on more complicated CSV files (files that have commas actually in the data, etc...).

reply

Shreyans
07/11/2008 - 11:31

Can you please tell me how do I refer CSV file with indirect link.

We are using different CSV files. So instead of having hard code can we refer csv through the other directry or other files like xml.

Thanks in advance.

reply

MD Philip
08/06/2008 - 09:24

Every one could have done what you have done here. If people are searchig for a csv parser it is just to find the one that can address some complicated csv file with commas in the data, etc.

But thanks for your post.

reply

The Tallest
08/06/2008 - 09:27

For those with more complex CSV parsing needs, check out this post: Using The Built In OLEDB CSV Parser

reply

Shweta
09/07/2008 - 05:35

Does anybody have done work on complicated CSV parsers. If yaa, please tell me.

reply

Shweta
09/07/2008 - 05:39

What happen , if there is commas and spaces in between ?

reply

The Reddest
09/07/2008 - 10:17

Shweta, check out the Using The Built In OLEDB CSV Parser tutorial.

reply

Katski11
09/30/2008 - 00:24

This was very informative especially to the ones just starting to learn parsing a csv file...

keep up the good work!

More Power...

reply

Mohammed Wafy
12/04/2008 - 09:21

very good and helpfull tutorial
i just started to parse CSV files and this will help me more .

thanks

reply

Add Comment

Put code snippets inside language tags:
[language] [/language]

Examples:
[javascript] [/javascript]
[actionscript] [/actionscript]
[csharp] [/csharp]

See here for supported languages.

Javascript must be enabled to submit anonymous comments - or you can login.