asked on

Text File Delimited Problem - Comma

I have a text file that is comma delimited, but there is a field that has some "non-delimiter" commas included in the text. This is causing the parsing to be incorrect when I import this file to Access. THis is causing about 100,000 records to come into the database table incorrectly. Is there a solution to this problem?

Tom Farrar

ASKER

Can I "save as" the current comma delimited file, and it will save with the new delimiter? Then the comma will be replaced by the new delimiter? I cannot download the original file before it was saved as comma delimited.

Rainer Jeschor

Hi,
do you have any chance to get into the process before the comma-delimited file is created?
Normally you would have to encapsulate the fields containing the delimiter - this is called Text qualifier and normally the quotes character is used.

Sample:
ID,Text,AnotherText
1,This should work,As there is no delimiter
2.This will fail, as it contains,a delimiert in, both field

would become
1,This should work,As there is no delimiter
2."This will fail, as it contains","a delimiert in, both field"

This would be your only chance - and that has to be done before you start importing the file as only the tool/program which generates the file knows, where this field starts and ends.

HTH
Rainer

RobSampson

If you only have the final file available, and have no control over the export, there is only one other i

RobSampson

.....option I can think of. Since you say there is "a" field that has these extra commas, then can we assume that there is always a fixed amount of known fields either side of that? So say you have 10 fields, if the first three never have extra commas, and the last six never have extra commas, then
anything in between, no matter how many commas, would be the fourth field. We could write a script to reformat the file....

Rob.

Jeffrey Coachman

Coming late to the party but...

You can simply do a find/replace on the source file and change the commas to another delimiter (the "|" is a common option)
Then the import should work.
Then when the import is done, do a find replace on the imported table to change the "|" back to a comma.

Rainer Jeschor

@Rob: wunderful idea!

@boag2000: How should that work? If I replace every "," with a "|" I would not change anything - a line with originally 5 delimiters + 1 wrong comma will then have also 5 + 1 delimiters

Jeffrey Coachman

RainerJ,

Oops, yes you are correct, sorry about that.

Despite reading your post, I was thinking that this was a space delimited file, and the comma was only in the address...

Again, sorry about that...
;-)

Jeff

Hamed Nasr

Can you do that manually for few records?
If yes, do you follow a logical process in performing this? If yes, then it can be done.

If you have to do a mental processing, meaning "ah this is a name of a person", then it proves to be a hard task unless you tell the process to expect a comma after such words.

List few anticipated records of the file and see if a logical process exists.
Waiting for your feedback.

"I was thinking that this was a space delimited file"
This is why we keep asking for example of records to be included in the body of the question, to save the brain from unnecessarily working trivial things.

rspahitz

As mentioned, there are several options, including going back to the source and having it saved as a proper CSV file (with "qualified" fields), or using some basic "intelligence" to determine where the comma-riddled field is located (pull off the from and back pieces.) Or if you can manually locate those problem items and fix them, that works.

Beyond that you need to add more computer logic into the process, which can get complex. For example, if you know that a comma-riddled field is surrounded by numeric fields, you could search for the numeric field separated by command and conclude that the piece before is your problem field (although you have to watch for your problem containing the sequence of a comma and number) or maybe the next field is a fixed length or a date or some other way to distinguish where the next field is located. Basically, the more complex the data, the more complex the logic needed to find the different parts.

Tom Farrar

ASKER

Hi All - Thanks for the comments. A couple of facts you have pointed out seem to indicate the process of correctly parsing the file will most likely require some manual manipulation.

More facts: there are half-million records, each record has 23 fields, and two of those fields appear to be problem areas. One is an account description field that has data such as "Mileage, gas, and travel" (there are probably six or so descriptions similar to this in this field. The second field is a company name like "My Company, Inc" and there are probably 50 different combinations of names that include one or more commas. Because the file cannot be recreated, it appears the solution is to open in Excel and manually adjust columns until the proper alignment is made for the 23 fields. I have begun this process.

Thanks much for your thoughts, and if you have other thoughts, please share.

rspahitz

>field that has data such as "Mileage, gas, and travel"

Can you post some examples?
If it's something like 12345,75,ontario then you may be able to look for characteristics of the data and "link" them together as a single field. Even if it works only 80% of the time, that means you only have to manually update 20% of the records.

Hamed Nasr

If you are happy with excel, go ahead. But "half-million records, " I doubt if it is a good idea to do that.

At least you may isolate most of the fields in access table fields, and the rest in a one field.
Use VBA to manipulate the one field to split it further.

As rspahitz suggested, you reduce the volume of data to be processed. In addition you may automate the process and reduce introducing new errors.

als315

You can also read this file from VBA and save data from proper strings to one table, but wrong strings to other and analyze them manually. If you can upload some sample with good and problem strings, may be we can find any solution

Tom Farrar

ASKER

Okay, I can give that a try.