how is csv file recognized

sometime i do not see any special character in how columns are listed in the csv file.

how does the system recognize which columns are what and how the data is assigned?



Open in new window

Who is Participating?
Ingeborg Hawighorst (Microsoft MVP / EE MVE)Connect With a Mentor Microsoft MVP ExcelCommented:

what characters are we talking about? Can you post a data sample in a txt or csv format?

Excel will separate the columns by commas.

What exactly is the question? What do you want to achieve?

cheers, teylyn
SterlingMcClungConnect With a Mentor Commented:
I agree that more information is needed to be able to answer properly.  However, some csv files will enclose the data for each column in quotation marks to help in determining what data should be in the columns.
dbruntonConnect With a Mentor Commented:
The comma is usually the character that defines the column break.  Not always and you can usually tell the application what character to look for.  For exampe in some cases it coud be the Tab character - this doesn't usually show up in text files when viewed with a text editor.

How the data is assigned is an interesting question.  The application could look at the data in the first row and see what types are there but I suspect most applications have a good rummange through each row and check types found against what is in the first row.

For example if in the first row-third column looks like a number and all rows-third columns are numbers then numbers will be assigned to that column.  If one of those rows-third column has letters then the column will probably be assigned text.   Dollar signs could signify money.   Columns with numbers-hyphens-numbers could signify dates and the application would probably check to see if the dates are valid.
Improve Your Query Performance Tuning

In this FREE six-day email course, you'll learn from Janis Griffin, Database Performance Evangelist. She'll teach 12 steps that you can use to optimize your queries as much as possible and see measurable results in your work. Get started today!

anushahannaAuthor Commented:
I saw sometime something like the character Al 0141 separating the records.. what may be its purpose? 
anushahannaAuthor Commented:

How does the system decipher the above. I am meaning 2 records. what is intelligent in the above csv that the system can pick it up- if it can't what should do to help it?
SterlingMcClungConnect With a Mentor Commented:
CSV files contains records(and possibly column names) that are separated by a newline charater.  Within each record the columns are separated by commas.  In the example that you give in 34936012, this would be deciphered as a 12 column record.  If you changed this to the following:

Open in new window

Then you would end up with one real record that has the firts four columns named, but the last four columns would be unnamed.

The purpose of using other characters to delimit the columns is to allow you to include a comma within a column.  If you wanted two columns to contain the "this,that" in one column and "me,you" in the second column, you could use a tab-delimited file or a pipe-delimited file.  Te basic idea is that you separate your data with a character that will never appear in the data itself.  If there isn't a character that won't appear in your data, then you have to start worrying about escaping characters.
dbruntonConnect With a Mentor Commented:
It looks like Alt 0141 is meant to be a non displayable character in normal use.

If you go here you'll see a blank space for the 141 character.  The characters here are the ones normally used for Windows.

As for the second question it will probably recognise Col1 etc as a mixture of letters and numbers.  If this is the first row it may identify it as a header row where these are the names of columns.  In your example in the question you had a blank row 2.  That would signify to the application that Row 1 is a header row rather than a data row.
Ingeborg Hawighorst (Microsoft MVP / EE MVE)Connect With a Mentor Microsoft MVP ExcelCommented:
AFAIK, in a typical CSV file, the first row of a CSV file determines the field names, separated by commas.
The following rows have the data, also separated by commas.

If the first row in the file consists of ...


... then that means that there are 12 fields.

>> I am meaning 2 records.

That does not make sense to me (but maybe that's just me). A straightforward CSV file looks like this:


... and so on. If a field is empty, you will see just two commas, like in row 4 of the above sample, where there is no value for Field4 in the 3rd row of data.

The first row defines the fields, the ensuing rows have the data. Each data row must match the number of fields defined by the first row.

There are exceptions. The first row may actually already be the first row of data, so there are no field definitions.

In any case, we're just guessing at what's going on in your data. So please do as has been suggested several times above and post a data sample. Explain what you expect this data sample to translate to in a table.

cheers, teylyn
Don ThomsonConnect With a Mentor Commented:
The Alt 0141 is in actual fact the letter ì - Not the english i but with the dot a little to the left and at an angle
If you want to see it open Notepad - change the font size to about 48 and holding down the ALT key pres 141   -Not 0141 - It's called extended ascii - Standard Ascii Code goes from 000-127 and includes all the printer control codes, 0-9 a-z A-Z and some extras - Extended ascii includes foreign characters as well
like Ü  or +  and goes from ALT 128-255

A CSV file is simply a format that allows one type of program to import or export raw data in a form that can be read by another program. Virtually all database programs allow you to either import data using a CSV format or export it in a CSV format.  It is the basic database format going back to the original DOS based Basic programming language.

Any database that has a fixed number of columns can be converted to a CSV file.

The basic delimiter is the comma - if you have 8 columns of figures - you will generally have 8 fields separated by 7 commas

If a field contains an Alpha character then it is treated as a test field
If a field is all numbers it is treated as a numeric field

But what do you do with say Barcodes such as 0 12345 67891 5  or a phone number 212-555-1212
They are all numbers but if you write them as a number it strips the 0 from the barcode or subtracts the second and third number from the first.

Or what if your field is   Potatoes,Red    - Now we have 8 fields with 8 commas - and one field left over

How does a CSV handle these situations - It encloses the entire filed whether or not it a barcode, phone number or a field with an embedded comman and surrounds it with double quotes  "   "
This tells the program that this field is a TEXT field and everything in between it is one field

This works well until you get into something like measurements say 48"   - now what does the program do?
Some will look for a matching double quote - so We need to do "48""
Depending on how sophisticated the program import routine is it may work - it may not.

Myself - if I know that there are embeded comma or "  in some fields - I change to a Tab delimited format using a tab Alt 008  as the delimiter

The easiest way to understand this is to play with a program like Excel and notepad

Create some test data - on a couple of lines - If you don't have data in a certain field just repeat the comma or do field1,"",  if the blank field is a text field.

Normally if there are specific numberic fields you may not want to have a title or header row as the first line generally tells thye importing program what TYPE of field it is.

If you have an extension of .csv  - Excel will assume it's a comma delimited file with double quotes around the text field

If you have the same file as a .txt - It will prompt you for what the field types are

Hope this helps
ocanada_techguyConnect With a Mentor Commented:
There is inconsistency, as CSV is a very "loose" format carried over from the old text-based terminals and OSes circa 1970 1980.
Somebody had the smart idea that the first row "could" be a header record that has the names of the fields, but there is nothing to say whether they are text or numeric fields.
If you go to import with Excel, there's a toggle whether that first row even exists.
Whether CR-LF pairs or just CR are used for records can vary.
Often text fields need to be surrounded by quotes.
Wether carriage returns and/or commas are allowed to be in a text field also varies.
For example, what if you need to embed commas in a text field?  What if you need to have a multi-line text-field?  Some pay greatest attention to the opening and closing of quotes.  Some pay greatest attention to using CR or LF for one or the other.  Some programs CSV import will like it or handle it one way, some will not or another.

XML solved all that.
anushahannaAuthor Commented:
thanks all - i am trying to absorb all the input...

the background of this is this:

attached soi format works with my program. but now the source is coming from a diff place with csv format, but it does not work- i am wondering if there is any fundamental issue with the csv char formatting that was not so in the soi file.

the columns are different- they can be mapped once the csv file is made to work. just renaming the file to soi did not help.

please rename test1.txt to test1.soi

thank you.
dbruntonConnect With a Mentor Commented:
End of line formats are different in the two files.

Is the first file, the test2.csv, the one you are having problems with?  And what type of computer system is it coming from?
Don ThomsonConnect With a Mentor Commented:
I imported both files into MSAccess tables with only one problem, In the Test1.txt - The "Sample ID" must have some kind of control Character because it replace the header field name with Field55
This would easily be fixed using a template.

Both files imported all of the data without error except as described above

Importing into Excel is always a crap shot unless the CSV file is actually delimited with double quotes around the text fields. I always brieng CSV files into Access where I can tell it what type of field it actually is. This is especially true with Date and time fields as they may have to be brought in as text field then reformated depending on their structure.

I'm enclosing the access file as it went in so you can see it
anushahannaAuthor Commented:
thanks very much - I have told the source programmer to take note of what is found and fix it.

appreciate it.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.