Link to home
Start Free TrialLog in
Avatar of Dilen Pat
Dilen Pat

asked on

Notepad++ search/replace doesn’t find all instances

I have some cvs data. If I open up in excel will show each record in the form:
ABC|DEF|GHI=xxx|JKL=yyy

I want to do some data analysis on the difference in value between xxx and yyy but there’s a lot of junk characters in there. xxx and yyy can also have “|” in it so it’s a bad delimiter to have.

What I usually do to resolve everything is I copy all the records from excel and paste into notepad++, then using find/replace I replace “|GHI” with “‘GHI” and “|JKL” with “‘JKL’. I.e I replace the delimiter with a character that isn’t being used in the xxx and yyy values (in this case I use the left apostrophe you see on the top left of your keyboard (near ESC))

After doing this, I can just copy/paste back into excel and use the ‘text to columns’ feature to split the cell using the ‘ as a delimiter.

This has always worked for me. Except now for a new dataset it’s stopped working. In notepad++ it doesn’t replace all instances. I.e for a dataset with 280 rows, it only applies find and replace with 260 of them. For the instances that weren’t replaced the seem to look exactly the same as those that were. I’m unsure why this is happening

I tried copy pasting |GHI from a record where it does get replaced, and replacing |GHI from a record where it doesn’t get replaced, but it still doesn’t work
Avatar of Dilen Pat
Dilen Pat

ASKER

I tried with copying just one record into notepad++

So in the find and replace window, it is set under Normal mode.

It doesn't work initially under Normal mode
When I switch to Extended mode, it does work
And then when I undo the replace, and switch back to Normal mode, it starts working again?

I'm on a work laptop so I can't really reinstall notepad++
Avatar of Shaun Vermaak
Bypass Notepad++ by using | as the delimiter in Excel. | is not a bad delimiter
ASKER CERTIFIED SOLUTION
Avatar of Ryan Chong
Ryan Chong
Flag of Singapore image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
[Was still editing]
Bypass Notepad++ by using | as the delimiter in Excel. | is not a bad delimiter
User generated image
I'm on a work laptop so I can't really reinstall notepad++
PS: You get a portable version that requires not installation
I usually use notepad++ during my data analysis because I could be doing other keep kinds of work on excel like filtering the data before playing with it and this causes some unexpected conflicting behaviour with the data that is hidden due to the filtering (whether it's an excel glitch or an issue in my understanding, I don't know). But copying the filtered records to notepad++ and then back (onto
A new spreadsheet) resolves this issue
Attach/Embed whole source file (280 lines) so that I can test
I have some cvs data. If I open up in excel will show each record in the form:
Excel has a lot of intelligence, which is a problem when working with raw data. The only way to be almost sure, that you have the correct data in Excel is to use the Data tab with the Import from Text function as Ryan already wrote.

What I usually do to resolve everything is I copy all the records from excel and paste into notepad++, then using find/replace I replace “|GHI” with “‘GHI” and “|JKL” with “‘JKL’
Why? If the column terminator is part of a column value, then it must be enclosed in the text qualifier. If the text qualifier is part of the column value, then the column is enclosed in the text qualifier and the text qualifier is doubled to escape it. These are the most common CSV rules.

E.g. pipe as column delimiter and double quote as text qualifier
ABC|DEF|GHI=xxx|JKL=yyy|"MNO=|x|"|"PQR=""Hi!"

Open in new window


Another problem can be the character itself. Is it really in all instances a pipe?

Cause
I want to do some data analysis on the difference in value between xxx and yyy but there’s a lot of junk characters in there. xxx and yyy can also have “|” in it so it’s a bad delimiter to have.

maybe it is no DSV (delimiter separated value) format, but product specific?
No comment has been added to this question in more than 21 days, so it is now classified as abandoned.

I have recommended this question be closed as follows:

Accept: Ryan Chong (https:#a42429639)

If you feel this question should be closed differently, post an objection and the moderators will review all objections and close it as they feel fit. If no one objects, this question will be closed automatically the way described above.

broomee9
Experts-Exchange Cleanup Volunteer