Here is what I want to do. From the internet I download some data that
are in csv format. All data will be in one long string. Now I need to
extract every cell. The problem ist that some cell content contains
commas and/or double quotes. Some of the cell contents that contain
double quotes use double double quotes and others don't, i.e. some look
like
quot;this quot;itemquot; is badquot;, quot;this item is okquot;
and others like
quot;this quot;quot;itemquot;quot; is badquot;, quot;this item is okquot;
There is also a chance that some cell are in double quotes (if they
contains commas or double quotes) and others are not in double quotes
(if they do not contain commas or double quotes). Considering all this
(and possibly more stuff) parsing becomes non trivial.
As a first approach I stored the content downloaded into a file and
then use odbc like this:
connectionString = @quot;Driver={Microsoft Text Driver (*.txt;
*.csv)};DBQ=quot; Path.GetDirectoryName(filename);
connection = new OdbcConnection(connectionString);
connection.Open();
command = new OdbcCommand(quot;Select * FROM quot;
Path.GetFileName(filename), connection);
reader = command.ExecuteReader();
Unfortunately, this approach does not work for the above scenarios.
Excel reads the files in question just fine, though. So my question is
what is the best approach to read csv files, preferably without having
to create temporary files?
Thanks
My understanding is that a .csv file has data fields separated by a
comma character, and if the field contains a comma then that field is
enclosed in quotes (being double-quotes), and if the field contains
quotes then those are indicated by two consecutive quotes.
The site en.wikipedia.org/wiki/Comma-separated_values perhaps
better explains .csv files, and has a pointer to required drivers.
Hope this helps.
--
rob Wrote:
gt; Here is what I want to do. From the internet I download some data that
gt; are in csv format. All data will be in one long string. Now I need to
gt; extract every cell. The problem ist that some cell content contains
gt; commas and/or double quotes. Some of the cell contents that contain
gt; double quotes use double double quotes and others don't, i.e. some
gt; look
gt; like
gt;
gt; quot;this quot;itemquot; is badquot;, quot;this item is okquot;
gt;
gt; and others like
gt;
gt; quot;this quot;quot;itemquot;quot; is badquot;, quot;this item is okquot;
gt;
gt; There is also a chance that some cell are in double quotes (if they
gt; contains commas or double quotes) and others are not in double quotes
gt; (if they do not contain commas or double quotes). Considering all this
gt; (and possibly more stuff) parsing becomes non trivial.
gt;
gt; As a first approach I stored the content downloaded into a file and
gt; then use odbc like this:
gt;
gt; connectionString = @quot;Driver={Microsoft Text Driver (*.txt;
gt; *.csv)};DBQ=quot; Path.GetDirectoryName(filename);
gt; connection = new OdbcConnection(connectionString);
gt; connection.Open();
gt; command = new OdbcCommand(quot;Select * FROM quot;
gt; Path.GetFileName(filename), connection);
gt; reader = command.ExecuteReader();
gt;
gt; Unfortunately, this approach does not work for the above scenarios.
gt; Excel reads the files in question just fine, though. So my question is
gt; what is the best approach to read csv files, preferably without having
gt; to create temporary files?
gt;
gt; Thanks--
Bryan Hessey
------------------------------------------------------------------------
Bryan Hessey's Profile: www.excelforum.com/member.php...oamp;userid=21059
View this thread: www.excelforum.com/showthread...hreadid=537641Bryan,
This is also my understanding of csv files (though some people say that
for double quotes you don't need double double quotes). In any case,
the problem is that most cells use double double quotes for double
quotes. Unfortunately, some cells don't seem to be formated right and
don't enclose double quotes into double double quotes as outlined in my
example. I know that Excel can load it just fine but with odbc it does
not work. Any idea if the parsing algorithm used by Excel is somehow
accessible through the .NET framework (I was hoping Excel uses odbc).
ThanksBryan Hessey wrote:
gt; My understanding is that a .csv file has data fields separated by a
gt; comma character, and if the field contains a comma then that field is
gt; enclosed in quotes (being double-quotes), and if the field contains
gt; quotes then those are indicated by two consecutive quotes.
gt;
gt; The site en.wikipedia.org/wiki/Comma-separated_values perhaps
gt; better explains .csv files, and has a pointer to required drivers.
gt;
gt; Hope this helps.
gt;
gt; --
gt;
gt; rob Wrote:
gt; gt; Here is what I want to do. From the internet I download some data that
gt; gt; are in csv format. All data will be in one long string. Now I need to
gt; gt; extract every cell. The problem ist that some cell content contains
gt; gt; commas and/or double quotes. Some of the cell contents that contain
gt; gt; double quotes use double double quotes and others don't, i.e. some
gt; gt; look
gt; gt; like
gt; gt;
gt; gt; quot;this quot;itemquot; is badquot;, quot;this item is okquot;
gt; gt;
gt; gt; and others like
gt; gt;
gt; gt; quot;this quot;quot;itemquot;quot; is badquot;, quot;this item is okquot;
gt; gt;
gt; gt; There is also a chance that some cell are in double quotes (if they
gt; gt; contains commas or double quotes) and others are not in double quotes
gt; gt; (if they do not contain commas or double quotes). Considering all this
gt; gt; (and possibly more stuff) parsing becomes non trivial.
gt; gt;
gt; gt; As a first approach I stored the content downloaded into a file and
gt; gt; then use odbc like this:
gt; gt;
gt; gt; connectionString = @quot;Driver={Microsoft Text Driver (*.txt;
gt; gt; *.csv)};DBQ=quot; Path.GetDirectoryName(filename);
gt; gt; connection = new OdbcConnection(connectionString);
gt; gt; connection.Open();
gt; gt; command = new OdbcCommand(quot;Select * FROM quot;
gt; gt; Path.GetFileName(filename), connection);
gt; gt; reader = command.ExecuteReader();
gt; gt;
gt; gt; Unfortunately, this approach does not work for the above scenarios.
gt; gt; Excel reads the files in question just fine, though. So my question is
gt; gt; what is the best approach to read csv files, preferably without having
gt; gt; to create temporary files?Bryan,
This is also my understanding of csv files (though some people say that
for double quotes you don't need double double quotes). In any case,
the problem is that most cells use double double quotes for double
quotes. Unfortunately, some cells don't seem to be formated right and
don't enclose double quotes into double double quotes as outlined in my
example. I know that Excel can load it just fine but with odbc it does
not work. Any idea if the parsing algorithm used by Excel is somehow
accessible through the .NET framework (I was hoping Excel uses odbc).
ThanksBryan Hessey wrote:
gt; My understanding is that a .csv file has data fields separated by a
gt; comma character, and if the field contains a comma then that field is
gt; enclosed in quotes (being double-quotes), and if the field contains
gt; quotes then those are indicated by two consecutive quotes.
gt;
gt; The site en.wikipedia.org/wiki/Comma-separated_values perhaps
gt; better explains .csv files, and has a pointer to required drivers.
gt;
gt; Hope this helps.
gt;
gt; --
gt;
gt; rob Wrote:
gt; gt; Here is what I want to do. From the internet I download some data that
gt; gt; are in csv format. All data will be in one long string. Now I need to
gt; gt; extract every cell. The problem ist that some cell content contains
gt; gt; commas and/or double quotes. Some of the cell contents that contain
gt; gt; double quotes use double double quotes and others don't, i.e. some
gt; gt; look
gt; gt; like
gt; gt;
gt; gt; quot;this quot;itemquot; is badquot;, quot;this item is okquot;
gt; gt;
gt; gt; and others like
gt; gt;
gt; gt; quot;this quot;quot;itemquot;quot; is badquot;, quot;this item is okquot;
gt; gt;
gt; gt; There is also a chance that some cell are in double quotes (if they
gt; gt; contains commas or double quotes) and others are not in double quotes
gt; gt; (if they do not contain commas or double quotes). Considering all this
gt; gt; (and possibly more stuff) parsing becomes non trivial.
gt; gt;
gt; gt; As a first approach I stored the content downloaded into a file and
gt; gt; then use odbc like this:
gt; gt;
gt; gt; connectionString = @quot;Driver={Microsoft Text Driver (*.txt;
gt; gt; *.csv)};DBQ=quot; Path.GetDirectoryName(filename);
gt; gt; connection = new OdbcConnection(connectionString);
gt; gt; connection.Open();
gt; gt; command = new OdbcCommand(quot;Select * FROM quot;
gt; gt; Path.GetFileName(filename), connection);
gt; gt; reader = command.ExecuteReader();
gt; gt;
gt; gt; Unfortunately, this approach does not work for the above scenarios.
gt; gt; Excel reads the files in question just fine, though. So my question is
gt; gt; what is the best approach to read csv files, preferably without having
gt; gt; to create temporary files?
Beyond my scope, but
www.creativyst.com/Doc/Articl...m#CSVariations and
it's comments on Excel might suggest that it doesn't.
--
rob Wrote:
gt; Bryan,
gt;
gt; This is also my understanding of csv files (though some people say
gt; that
gt; for double quotes you don't need double double quotes). In any case,
gt; the problem is that most cells use double double quotes for double
gt; quotes. Unfortunately, some cells don't seem to be formated right and
gt; don't enclose double quotes into double double quotes as outlined in
gt; my
gt; example. I know that Excel can load it just fine but with odbc it does
gt; not work. Any idea if the parsing algorithm used by Excel is somehow
gt; accessible through the .NET framework (I was hoping Excel uses odbc).
gt;
gt; Thanks
gt;
gt;
gt; Bryan Hessey wrote:
gt; gt; My understanding is that a .csv file has data fields separated by a
gt; gt; comma character, and if the field contains a comma then that field
gt; is
gt; gt; enclosed in quotes (being double-quotes), and if the field contains
gt; gt; quotes then those are indicated by two consecutive quotes.
gt; gt;
gt; gt; The site en.wikipedia.org/wiki/Comma-separated_values perhaps
gt; gt; better explains .csv files, and has a pointer to required drivers.
gt; gt;
gt; gt; Hope this helps.
gt; gt;
gt; gt; --
gt; gt;
gt; gt; rob Wrote:
gt; gt; gt; Here is what I want to do. From the internet I download some data
gt; that
gt; gt; gt; are in csv format. All data will be in one long string. Now I need
gt; to
gt; gt; gt; extract every cell. The problem ist that some cell content
gt; contains
gt; gt; gt; commas and/or double quotes. Some of the cell contents that
gt; contain
gt; gt; gt; double quotes use double double quotes and others don't, i.e. some
gt; gt; gt; look
gt; gt; gt; like
gt; gt; gt;
gt; gt; gt; quot;this quot;itemquot; is badquot;, quot;this item is okquot;
gt; gt; gt;
gt; gt; gt; and others like
gt; gt; gt;
gt; gt; gt; quot;this quot;quot;itemquot;quot; is badquot;, quot;this item is okquot;
gt; gt; gt;
gt; gt; gt; There is also a chance that some cell are in double quotes (if
gt; they
gt; gt; gt; contains commas or double quotes) and others are not in double
gt; quotes
gt; gt; gt; (if they do not contain commas or double quotes). Considering all
gt; this
gt; gt; gt; (and possibly more stuff) parsing becomes non trivial.
gt; gt; gt;
gt; gt; gt; As a first approach I stored the content downloaded into a file
gt; and
gt; gt; gt; then use odbc like this:
gt; gt; gt;
gt; gt; gt; connectionString = @quot;Driver={Microsoft Text Driver (*.txt;
gt; gt; gt; *.csv)};DBQ=quot; Path.GetDirectoryName(filename);
gt; gt; gt; connection = new OdbcConnection(connectionString);
gt; gt; gt; connection.Open();
gt; gt; gt; command = new OdbcCommand(quot;Select * FROM quot;
gt; gt; gt; Path.GetFileName(filename), connection);
gt; gt; gt; reader = command.ExecuteReader();
gt; gt; gt;
gt; gt; gt; Unfortunately, this approach does not work for the above
gt; scenarios.
gt; gt; gt; Excel reads the files in question just fine, though. So my question
gt; is
gt; gt; gt; what is the best approach to read csv files, preferably without
gt; having
gt; gt; gt; to create temporary files?--
Bryan Hessey
------------------------------------------------------------------------
Bryan Hessey's Profile: www.excelforum.com/member.php...oamp;userid=21059
View this thread: www.excelforum.com/showthread...hreadid=537641An (Excel) CSV file is:
1. simply a text file with columns delimited by a comma
2. simply a text file with columns delimited by a comma and text values
enclosed in double quotes.
In 1, a missing value is indicated by two contiguous commas, and in 2. a
missing value is indicated by ,quot;quot;,
Your CSV file is much more complicated (dos not follow the patter 1 or 2)
and you appear to be using ADO.NET. Although you have ruled it out, you might
have to pre-process your file as 1 or 2 above before you can execute SQL
queries. You might find that when the SQL query works, it ignores the first
row (unless it contains or you add column names in the first row).
quot;robquot; wrote:
gt; Here is what I want to do. From the internet I download some data that
gt; are in csv format. All data will be in one long string. Now I need to
gt; extract every cell. The problem ist that some cell content contains
gt; commas and/or double quotes. Some of the cell contents that contain
gt; double quotes use double double quotes and others don't, i.e. some look
gt; like
gt;
gt; quot;this quot;itemquot; is badquot;, quot;this item is okquot;
gt;
gt; and others like
gt;
gt; quot;this quot;quot;itemquot;quot; is badquot;, quot;this item is okquot;
gt;
gt; There is also a chance that some cell are in double quotes (if they
gt; contains commas or double quotes) and others are not in double quotes
gt; (if they do not contain commas or double quotes). Considering all this
gt; (and possibly more stuff) parsing becomes non trivial.
gt;
gt; As a first approach I stored the content downloaded into a file and
gt; then use odbc like this:
gt;
gt; connectionString = @quot;Driver={Microsoft Text Driver (*.txt;
gt; *.csv)};DBQ=quot; Path.GetDirectoryName(filename);
gt; connection = new OdbcConnection(connectionString);
gt; connection.Open();
gt; command = new OdbcCommand(quot;Select * FROM quot;
gt; Path.GetFileName(filename), connection);
gt; reader = command.ExecuteReader();
gt;
gt; Unfortunately, this approach does not work for the above scenarios.
gt; Excel reads the files in question just fine, though. So my question is
gt; what is the best approach to read csv files, preferably without having
gt; to create temporary files?
gt;
gt; Thanks
gt;
gt;
quot;robquot; gt; wrote in message ups.com...
gt; Bryan,
gt;
gt; This is also my understanding of csv files (though some people say that
gt; for double quotes you don't need double double quotes). In any case,
gt; the problem is that most cells use double double quotes for double
gt; quotes. Unfortunately, some cells don't seem to be formated right and
gt; don't enclose double quotes into double double quotes as outlined in my
gt; example. I know that Excel can load it just fine but with odbc it does
gt; not work. Any idea if the parsing algorithm used by Excel is somehow
gt; accessible through the .NET framework (I was hoping Excel uses odbc).
gt;
gt; ThanksIs this CSV file(s) going to be an on-going thing? If not, and since you
say it
works fine in Excel, how about opening it in Excel and then re-exporting it
to
either another CSV file or TabDelimited file and then running your code
against that?
That way, you may be able to get the data out in the formatting you need for
your
application.
james
This will be an ongoing thing. I was thinking about opening it in Excel
(programatically) and store it with tabs but there is a chance some of
the cell content contains tabs as well. Another option would be to open
the file in Excel (programatically) and then read out the cells. This
is a huge overkill, though. Worse it's something that will run on a
server. Although it usually runs during off-peak times I still don't
like the idea doing it this way. Therefore, I might really end up
writing my own parser. Thanks for the input, though. It's definitely
appreciated.
quot;robquot; gt; wrote in message ups.com...
gt; This will be an ongoing thing. I was thinking about opening it in Excel
gt; (programatically) and store it with tabs but there is a chance some of
gt; the cell content contains tabs as well. Another option would be to open
gt; the file in Excel (programatically) and then read out the cells. This
gt; is a huge overkill, though. Worse it's something that will run on a
gt; server. Although it usually runs during off-peak times I still don't
gt; like the idea doing it this way. Therefore, I might really end up
gt; writing my own parser. Thanks for the input, though. It's definitely
gt; appreciated.
gt;
Your welcome. Sorry I couldn't offer any better suggestion. It does sound as
though you will have to write some sort of parser yourself. The scary part
is
you have no control over how the original file is created and what sort of
input
it allows. And with that being the case, it seems that writing a parser
routine that
will consistantly return the correct output, will be tough. Especially if
the app will
be running unattended. Otherwise you could write an editor of sorts that
would allow
a user to make some decisions on what to keep and what to discard. That
would go
a long way towards increasing the dependability of the app. (as long as the
end uers
understands what needs to be retained and what needs to be discarded)
I don't envy you on this one!!!
james
Try this:
www.codeproject.com/cs/database/CsvReader.asp
Cheers,
Jason
On 1 May 2006 09:34:19 -0700, rob wrote:
gt; This will be an ongoing thing. I was thinking about opening it in Excel
gt; (programatically) and store it with tabs but there is a chance some of
gt; the cell content contains tabs as well. Another option would be to open
gt; the file in Excel (programatically) and then read out the cells. This
gt; is a huge overkill, though. Worse it's something that will run on a
gt; server. Although it usually runs during off-peak times I still don't
gt; like the idea doing it this way. Therefore, I might really end up
gt; writing my own parser. Thanks for the input, though. It's definitely
gt; appreciated.
- Jun 22 Fri 2007 20:38
Parse csv files
close
全站熱搜
留言列表
發表留言