Link to home
Start Free TrialLog in
Avatar of ststesting
ststesting

asked on

hidden characters in a csv?

I have a log downloaded from UPS everyday as a csv.

The csv has ~300 records in it, but the import stops after ~26-28 records (it's consist per file, the changes when downloading a file on a new day)

If I open that csv in wordpad, make no changes and save it, they all import correctly.  This has happened over multiple days, there is nothing i can see that is different on the failing records.

We have a vb program that reads a file and outputs the ascii value and it bombs when it gets to the record that stops the import.


Any ideas on what I'm missing?
Avatar of Bill Prew
Bill Prew

I would need to see the file to have an idea of what might be causing a problem.

~bp
I have seen problems like this with hidden Word special characters added during copy/paste.  I want to say opening it in ultraedit or notepad++ might allow you to see the character.  
Hi STS,


Several things come to mind.  First, run a binary compare on the file you receive and the file that Wordpad writes.

1.  Wordpad will read a file with unix line terminators.  (NEWLINE)  The Microsoft newline is the two character combination CR/LF.  Wordpad will write a Microsoft formatted line.  This is the most likely scenario.

2.  When reading into WordPad and then writing a new file, TAB characters may be replaced by spaces, or vice versa.

3.  Some of the Microsoft I/O routines consider a quote to have a higher precedence than end-of-line.  I doubt that this is causing this error, but it can cause the NEWLINE or CR/LF characters to embed in text.  If the file passes tests 1 and 2 above, check for quotes.



Good Luck,
Kent
Avatar of ststesting

ASKER

billprew, I saw that coming...

Is it allowable to send a file directly to you? I can't edit the data as saving it would fix the problem.  I'd rather not post it all here.


Yes, you can email me the file and I'll take a look.  I have a gmail <dot> com account with the same address as my username.  I don't want to actually type that out in full here because web crawlers will pick it up and spam it.

~bp
What do you see if you open your .csv file in notepad (not write or notepad++), you should see some strange characters.
Please just attach the file to this thread using the "File" link under the text entry box.  Thanks.
Got your file, but I didn't seen any odd characters in it looking it over quickly.  I also was able to open it in Excel with no error or problem, does that work for you?

~bp
Just a thought, are you sure you didn't send me a file that you had already opened and saved in Notepad, which fixed the problem?

~bp
Hi STS,

Open the file with Notepad (not Wordpad).

Do the lines look correct or do they all run together?

Now open the file with Wordpad and SAVE AS a new file.

Check the file lengths.  If the new file is about 300 bytes longer than the original, it's the NEWLINE - CR/LF conversion.


Kent
Yes, I can open in excel also and can't find any problems in it.

I just tried to import again using the exact file I sent, it is the correct (failing) one.
This particular one only imports 26 records (skipping the first line)

I can open it in wordpad, save and exit and all 177 go in.


I haven't installed notepad++ yet.  Notepad shows no difference, but the lines do mostly run together.
Opening it with notepad and saving as a new file makes the size 151 bytes larger in properties
If you attach this mystery file to your question as I suggested above, myself and other experts can help you to find the issue as well.
edit last line of last post-- Opening it in wordpad and saving as a new file makes the size 151 bytes larger in properties
SOLUTION
Avatar of Kent Olsen
Kent Olsen
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Okay, it looks like the issue *may* revolve around the fact that the data lines in the original file are terminated with a mixture of some being just a line feed character, and others being the standard PC format of carriage return / line feed.  Wordpad cleans tat up.

I also see one other odd character at offset 0x3C3F but am still looking in to that...

~bp
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Interestingly, here is the profile of the end of lines used for each record, seems somewhat random (I was expecting perhaps a patter, like every so many lines a CRLF was tossed in).

  1 : CRLF
  2 : LF
  3 : LF
  4 : LF
  5 : CRLF
  6 : LF
  7 : LF
  8 : LF
  9 : CRLF
 10 : LF
 11 : LF
 12 : LF
 13 : LF
 14 : CRLF
 15 : LF
 16 : LF
 17 : CRLF
 18 : LF
 19 : LF
 20 : CRLF
 21 : LF
 22 : LF
 23 : CRLF
 24 : LF
 25 : CRLF
 26 : CRLF
 27 : CRLF
 28 : LF
 29 : LF
 30 : LF
 31 : LF
 32 : CRLF
 33 : LF
 34 : LF
 35 : LF
 36 : LF
 37 : LF
 38 : LF
 39 : CRLF
 40 : LF
 41 : LF
 42 : LF
 43 : LF
 44 : CRLF
 45 : LF
 46 : CRLF
 47 : LF
 48 : LF
 49 : LF
 50 : CRLF
 51 : CRLF
 52 : LF
 53 : LF
 54 : LF
 55 : LF
 56 : LF
 57 : LF
 58 : LF
 59 : LF
 60 : CRLF
 61 : CRLF
 62 : LF
 63 : LF
 64 : LF
 65 : LF
 66 : LF
 67 : LF
 68 : LF
 69 : LF
 70 : LF
 71 : LF
 72 : LF
 73 : LF
 74 : LF
 75 : LF
 76 : LF
 77 : LF
 78 : LF
 79 : LF
 80 : LF
 81 : LF
 82 : LF
 83 : LF
 84 : LF
 85 : LF
 86 : LF
 87 : LF
 88 : LF
 89 : LF
 90 : LF
 91 : LF
 92 : LF
 93 : LF
 94 : LF
 95 : LF
 96 : LF
 97 : LF
 98 : LF
 99 : LF
100 : LF
101 : LF
102 : LF
103 : LF
104 : LF
105 : LF
106 : LF
107 : LF
108 : LF
109 : LF
110 : LF
111 : LF
112 : CRLF
113 : CRLF
114 : LF
115 : LF
116 : CRLF
117 : CRLF
118 : CRLF
119 : LF
120 : LF
121 : LF
122 : LF
123 : LF
124 : LF
125 : LF
126 : LF
127 : LF
128 : LF
129 : LF
130 : LF
131 : LF
132 : LF
133 : LF
134 : LF
135 : LF
136 : LF
137 : LF
138 : LF
139 : LF
140 : LF
141 : LF
142 : LF
143 : LF
144 : LF
145 : LF
146 : LF
147 : LF
148 : LF
149 : LF
150 : LF
151 : LF
152 : LF
153 : LF
154 : LF
155 : LF
156 : LF
157 : LF
158 : LF
159 : LF
160 : CRLF
161 : LF
162 : LF
163 : LF
164 : LF
165 : LF
166 : LF
167 : CRLF
168 : LF
169 : LF
170 : LF
171 : LF
172 : LF
173 : LF
174 : LF
175 : LF
176 : CRLF
177 : LF
178 : CRLF
179 : CRLF

Open in new window

~bp
Hi Bill,

That suggests that the file is being written by UPS from different processes.  Weird....
Are you collecting it using an FTP process btw?  If so then use ASC rather than BIN mode,

If you are scripting the process then using more, type, or find /v "" or the like will generally clean it up, e.g.

find /v "" < original.csv > new.csv


Steve
Sorry having posted I just see Bill had already suggested using more to clean the file, sorry bill.

Steve
Odd for sure.

It's downloaded with a light program without many options called autoload- or manually downloaded.
http://www.ups.com/content/us/en/resources/track/quantum_view/data/autoload.html




The more command does fix this for me, thanks so much!

I still don't see why it happens...it doesn't like 3 CRLF in a row?  Wouldn't quite a few people be running into this issue?
Thanks so much for all the help, I can't tell you how happy I am to get past this one.
Hi STS,

It has nothing to do with anything that you're doing.  It's happening on the UPS end, or with the lightweight client that you're using to download the file.

You might use a regular FTP client to get the file.  Use BINARY mode and see if the original file has both NEWLINE and CR/LF terminators.  Then just switch to ASCII mode and get the file again.  The new one should be just fine.

The are a LOT of FTP clients out there.  You should consider an upgrade from what you're using now.


Good Luck,
Kent
Glad that helped, interesting file.

Not sure why it trips up the import, something about switch back and forth from <CR><LF> to a single character <LF> must make it sad at some point.

Thanks for the feedback.

~bp
SouthMod. I disagree.  The action of editing the data in any way "Corrected" it as it amended what turned out to be the issue, i.e. the replacing of end of line characters CRLF and LF mixed to just whatever the text editor left it with, i.e. probably CR LF.

Without using a hex editor or similar it would have been difficult to pull part of the file, and if it is sensitive information regarding UPS shipping then posting here would likely be out of the question. Of course until the issue was known to be an end of line one it wouldn't have been found otherwise and others did suggets it MIGHT be BUT Billprew offered a way of fixing it, and the other peopl contributing were given assists.

So who complained!?

Steve
@SouthMod,

I do understand the point about behind the scenes communications, and appreciate the guidance. in this case there wasn't really any reasonably feasible way for me to view the problem data, so when the poster asked if I could look at it if he sent it to me I agreed, out of a desire to help them out. As Steve mentions, the act of adjusting the data in this case was extremely likely to have eliminated or obfuscated the problem.

FWIW, my intention was genuinely about helping resolve the posters problem, and not about gaining points, etc.  To those on this thread that were offended by my approach to that assistance I genuinely apologize - not for helping the poster, but if my approach was perceived as unfair or inappropriate.

SouthMod, I am disappointed that this resulted in such a negative comment in my member profile. Especially since the phrasing there seems to imply I requested the poster to send me the file, and that is not the case.  I asked for it to be posted here, and then the poster asked me if I could look at it "privately" since it contained sensitive data.

~bp
All, I'm trying to view this objectively:

>> Is it allowable to send a file directly to you?
@billprew - You should have replied that it is not allowed -> This is maybe a weak point for EE in general and should be recognized and adressed. Southmod?

>> Please just attach the file to this thread using the "File" link under the text entry box.  Thanks.
Xterm is pointing in the right direction here, asking to post the file

>> If you attach this mystery file to your question as I suggested above, myself and other experts can help you to find the issue as well.
Xterm is again asking for the file

>> and if it is sensitive information regarding UPS shipping
@dragon-it - You are correct -> But the OP could have may an additional comment acknowledging the sensitive nature of the file.

>> in this case there wasn't really any reasonably feasible way for me to view the problem data
@billprew - You are absolutely correct! IMHO this is a weak point for EE (see above).

Us Experts all want to help EE Members the best way they can. The nature of this case (and others) where sensitive or private information is involved, together with the way EE is setup (open, accessible) is in conflict. With this in mind I would like Southmod to review this case again and think again about billprew's negative profile comment which sounds out of proportion. I would also like to know how this issue (OP's sensitive data / EE's openness and accessibility) can be adressed as this is the cause for all this IMHO.

Thanks.
Southmod, like others have stated, I could not have edit the file and posted it, as that would have fixed the problem.

Not editing was not an option; I didn't want to post my customers addresses, charges and account info.  I only asked to send it to billprew, after seeing his name at the top of the dos hall of fame list, I figured I'd risk sending it to one person with a reputation.

In either case, I did ask if I could send it, I apologize for the trouble.  However, the problem got solved and others that come across this case in the future will see the parts of the file that are relevant.  Reading this thread would have fixed my problem; which to me is hardly meaningless.

Thanks again for the help everyone!
-Jon