Solved

how is csv file recognized

Posted on 2011-02-19
14
1,116 Views
Last Modified: 2012-08-13
sometime i do not see any special character in how columns are listed in the csv file.

how does the system recognize which columns are what and how the data is assigned?

thanks
Col1,Col2,Col3,Col4,Col51,2,3,4,5,6,7,8,9,10

Col1,Col2,Col3,Col4,Col5<char>1,2,3,4,5<char>6,7,8,9,10<char>

Open in new window

0
Comment
Question by:anushahanna
  • 4
  • 3
  • 2
  • +3
14 Comments
 
LVL 50

Accepted Solution

by:
teylyn earned 100 total points
ID: 34935967
Hello,

what characters are we talking about? Can you post a data sample in a txt or csv format?

Excel will separate the columns by commas.

What exactly is the question? What do you want to achieve?

cheers, teylyn
0
 
LVL 7

Assisted Solution

by:SterlingMcClung
SterlingMcClung earned 100 total points
ID: 34935976
I agree that more information is needed to be able to answer properly.  However, some csv files will enclose the data for each column in quotation marks to help in determining what data should be in the columns.
0
 
LVL 47

Assisted Solution

by:dbrunton
dbrunton earned 150 total points
ID: 34935998
The comma is usually the character that defines the column break.  Not always and you can usually tell the application what character to look for.  For exampe in some cases it coud be the Tab character - this doesn't usually show up in text files when viewed with a text editor.

How the data is assigned is an interesting question.  The application could look at the data in the first row and see what types are there but I suspect most applications have a good rummange through each row and check types found against what is in the first row.

For example if in the first row-third column looks like a number and all rows-third columns are numbers then numbers will be assigned to that column.  If one of those rows-third column has letters then the column will probably be assigned text.   Dollar signs could signify money.   Columns with numbers-hyphens-numbers could signify dates and the application would probably check to see if the dates are valid.
0
 
LVL 6

Author Comment

by:anushahanna
ID: 34936007
I saw sometime something like the character Al 0141 separating the records.. what may be its purpose?
http://www.tedmontgomery.com/tutorial/altchrc.html
0
 
LVL 6

Author Comment

by:anushahanna
ID: 34936012
Col1,Col2,Col3,Col4,One,Two,Three,Four,Five,Six,Seven,Eight

How does the system decipher the above. I am meaning 2 records. what is intelligent in the above csv that the system can pick it up- if it can't what should do to help it?
0
 
LVL 7

Assisted Solution

by:SterlingMcClung
SterlingMcClung earned 100 total points
ID: 34936049
CSV files contains records(and possibly column names) that are separated by a newline charater.  Within each record the columns are separated by commas.  In the example that you give in 34936012, this would be deciphered as a 12 column record.  If you changed this to the following:
 
Col1,Col2,Col3,Col4
One,Two,Three,Four,Five,Six,Seven,Eight

Open in new window

Then you would end up with one real record that has the firts four columns named, but the last four columns would be unnamed.

The purpose of using other characters to delimit the columns is to allow you to include a comma within a column.  If you wanted two columns to contain the "this,that" in one column and "me,you" in the second column, you could use a tab-delimited file or a pipe-delimited file.  Te basic idea is that you separate your data with a character that will never appear in the data itself.  If there isn't a character that won't appear in your data, then you have to start worrying about escaping characters.
0
 
LVL 47

Assisted Solution

by:dbrunton
dbrunton earned 150 total points
ID: 34936053
It looks like Alt 0141 is meant to be a non displayable character in normal use.

If you go here http://en.wikipedia.org/wiki/Windows-1252 you'll see a blank space for the 141 character.  The characters here are the ones normally used for Windows.


As for the second question it will probably recognise Col1 etc as a mixture of letters and numbers.  If this is the first row it may identify it as a header row where these are the names of columns.  In your example in the question you had a blank row 2.  That would signify to the application that Row 1 is a header row rather than a data row.
0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 
LVL 50

Assisted Solution

by:teylyn
teylyn earned 100 total points
ID: 34936178
AFAIK, in a typical CSV file, the first row of a CSV file determines the field names, separated by commas.
The following rows have the data, also separated by commas.

If the first row in the file consists of ...

Col1,Col2,Col3,Col4,One,Two,Three,Four,Five,Six,Seven,Eight

... then that means that there are 12 fields.

>> I am meaning 2 records.

That does not make sense to me (but maybe that's just me). A straightforward CSV file looks like this:

Field1,Field2,Field3,Field4,Field5
ValueA1,ValueA2,ValueA3,ValueA4,ValueA5
ValueB1,ValueB2,ValueB3,ValueB4,ValueB5
ValueC1,ValueC2,ValueC3,,ValueC5

... and so on. If a field is empty, you will see just two commas, like in row 4 of the above sample, where there is no value for Field4 in the 3rd row of data.

The first row defines the fields, the ensuing rows have the data. Each data row must match the number of fields defined by the first row.

There are exceptions. The first row may actually already be the first row of data, so there are no field definitions.

In any case, we're just guessing at what's going on in your data. So please do as has been suggested several times above and post a data sample. Explain what you expect this data sample to translate to in a table.

cheers, teylyn
0
 
LVL 14

Assisted Solution

by:Don Thomson
Don Thomson earned 100 total points
ID: 34937140
The Alt 0141 is in actual fact the letter ì - Not the english i but with the dot a little to the left and at an angle
If you want to see it open Notepad - change the font size to about 48 and holding down the ALT key pres 141   -Not 0141 - It's called extended ascii - Standard Ascii Code goes from 000-127 and includes all the printer control codes, 0-9 a-z A-Z and some extras - Extended ascii includes foreign characters as well
like Ü  or +  and goes from ALT 128-255


A CSV file is simply a format that allows one type of program to import or export raw data in a form that can be read by another program. Virtually all database programs allow you to either import data using a CSV format or export it in a CSV format.  It is the basic database format going back to the original DOS based Basic programming language.

Any database that has a fixed number of columns can be converted to a CSV file.

The basic delimiter is the comma - if you have 8 columns of figures - you will generally have 8 fields separated by 7 commas

If a field contains an Alpha character then it is treated as a test field
If a field is all numbers it is treated as a numeric field

But what do you do with say Barcodes such as 0 12345 67891 5  or a phone number 212-555-1212
They are all numbers but if you write them as a number it strips the 0 from the barcode or subtracts the second and third number from the first.

Or what if your field is   Potatoes,Red    - Now we have 8 fields with 8 commas - and one field left over

How does a CSV handle these situations - It encloses the entire filed whether or not it a barcode, phone number or a field with an embedded comman and surrounds it with double quotes  "   "
This tells the program that this field is a TEXT field and everything in between it is one field

This works well until you get into something like measurements say 48"   - now what does the program do?
Some will look for a matching double quote - so We need to do "48""
Depending on how sophisticated the program import routine is it may work - it may not.

Myself - if I know that there are embeded comma or "  in some fields - I change to a Tab delimited format using a tab Alt 008  as the delimiter

The easiest way to understand this is to play with a program like Excel and notepad

Create some test data - on a couple of lines - If you don't have data in a certain field just repeat the comma or do field1,"",  if the blank field is a text field.

Normally if there are specific numberic fields you may not want to have a title or header row as the first line generally tells thye importing program what TYPE of field it is.

If you have an extension of .csv  - Excel will assume it's a comma delimited file with double quotes around the text field

If you have the same file as a .txt - It will prompt you for what the field types are

Hope this helps
0
 
LVL 11

Assisted Solution

by:ocanada_techguy
ocanada_techguy earned 50 total points
ID: 34951396
There is inconsistency, as CSV is a very "loose" format carried over from the old text-based terminals and OSes circa 1970 1980.
Somebody had the smart idea that the first row "could" be a header record that has the names of the fields, but there is nothing to say whether they are text or numeric fields.
If you go to import with Excel, there's a toggle whether that first row even exists.
Whether CR-LF pairs or just CR are used for records can vary.
Often text fields need to be surrounded by quotes.
Wether carriage returns and/or commas are allowed to be in a text field also varies.
For example, what if you need to embed commas in a text field?  What if you need to have a multi-line text-field?  Some pay greatest attention to the opening and closing of quotes.  Some pay greatest attention to using CR or LF for one or the other.  Some programs CSV import will like it or handle it one way, some will not or another.

XML solved all that.
0
 
LVL 6

Author Comment

by:anushahanna
ID: 34957439
thanks all - i am trying to absorb all the input...

the background of this is this:

attached soi format works with my program. but now the source is coming from a diff place with csv format, but it does not work- i am wondering if there is any fundamental issue with the csv char formatting that was not so in the soi file.

the columns are different- they can be mapped once the csv file is made to work. just renaming the file to soi did not help.

please rename test1.txt to test1.soi

thank you.
test2.csv
test1.txt
0
 
LVL 47

Assisted Solution

by:dbrunton
dbrunton earned 150 total points
ID: 34967675
End of line formats are different in the two files.

Is the first file, the test2.csv, the one you are having problems with?  And what type of computer system is it coming from?
0
 
LVL 14

Assisted Solution

by:Don Thomson
Don Thomson earned 100 total points
ID: 34968976
I imported both files into MSAccess tables with only one problem, In the Test1.txt - The "Sample ID" must have some kind of control Character because it replace the header field name with Field55
This would easily be fixed using a template.

Both files imported all of the data without error except as described above

Importing into Excel is always a crap shot unless the CSV file is actually delimited with double quotes around the text fields. I always brieng CSV files into Access where I can tell it what type of field it actually is. This is especially true with Date and time fields as they may have to be brought in as text field then reformated depending on their structure.

I'm enclosing the access file as it went in so you can see it
 db2.mdb
0
 
LVL 6

Author Comment

by:anushahanna
ID: 34976525
thanks very much - I have told the source programmer to take note of what is found and fix it.

appreciate it.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Today, still in the boom of Apple, PC's and products, nearly 50% of the computer users use Windows as graphical operating systems. If you are among those users who love windows, but are grappling to keep the system's hard drive optimized, then you s…
Shadow IT is coming out of the shadows as more businesses are choosing cloud-based applications. It is now a multi-cloud world for most organizations. Simultaneously, most businesses have yet to consolidate with one cloud provider or define an offic…
The viewer will learn how to simulate a series of coin tosses with the rand() function and learn how to make these “tosses” depend on a predetermined probability. Flipping Coins in Excel: Enter =RAND() into cell A2: Recalculate the random variable…
This Micro Tutorial will demonstrate how to use longer labels with horizontal bar charts instead of the vertical column chart.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now