parpaa
asked on
How to remove the junk values from csv file
Hi All,
I 've this csv file which looks fine when I open it, but when I reach to the bottom of this report I see this signature
File Layout
ID, M_Name,L_Name,F_Name, Street_name,SSN,Phone1,
Records:17k approximately
I remove this junk data manually and load this file into the table [SQL SERVER 2008].
Is there any way we can write a script to remove this junk values?
Please let me know if you need any thing else?
I 've this csv file which looks fine when I open it, but when I reach to the bottom of this report I see this signature
Report Generated by
Report generated time
Name of the Company
Company Address
File Layout
ID, M_Name,L_Name,F_Name, Street_name,SSN,Phone1,
Records:17k approximately
I remove this junk data manually and load this file into the table [SQL SERVER 2008].
Is there any way we can write a script to remove this junk values?
Please let me know if you need any thing else?
use this command:
findstr /B /V "Report Name Company" filename.csv > fixedfile.csv
this command will output all lines not beginning with those search strings to the fixedfile.csv
Good Luck,
- gurutc
findstr /B /V "Report Name Company" filename.csv > fixedfile.csv
this command will output all lines not beginning with those search strings to the fixedfile.csv
Good Luck,
- gurutc
ASKER
awesome I will try that now :)
ASKER
GuruTc,
unfortunately it is not excluding these values, though it created new csv file.
Anything missing here?
unfortunately it is not excluding these values, though it created new csv file.
Anything missing here?
ASKER
"ReportName"
"Copyright (c) 2000-2013 , inc. All rights reserved."
"Confidential Information - Do Not Distribute"
"Generated By: XYZ 10/1/2013 12:35 PM"
"Name of the company"
This is something tried:
findstr /B /V "ReportName Copyright Confidential Generated Name" filename.csv > fixedfile.csv
but no luck
Are there any leading spaces before those strings in the file?
Another way is a multi step but could be put in a batch file:
findstr /V /I /C:"Report Generated by" filename.csv > fixedfile1.csv
findstr /V /I /C:"Name of" fixedfile1.csv > fixedfile2.csv
findstr /V /I /C:"Company Address" fixedfile2.csv > fixedfile3.csv
This will remove the lines with these strings no matter if there are spaces or tabs or anything else before the strings.
- gurutc
Another way is a multi step but could be put in a batch file:
findstr /V /I /C:"Report Generated by" filename.csv > fixedfile1.csv
findstr /V /I /C:"Name of" fixedfile1.csv > fixedfile2.csv
findstr /V /I /C:"Company Address" fixedfile2.csv > fixedfile3.csv
This will remove the lines with these strings no matter if there are spaces or tabs or anything else before the strings.
- gurutc
Your try would work if you took out the /B which means beginning of line and add /I making it case-insensitive.
- gurutc
- gurutc
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
I think what you need is to have a list of first words from each line in the signature. Try:
findstr /B /V "<first word from report name here> Copyright Confidential Generated <first word of the company name here>" filename.csv > fixedfile.csv
So if the report name is: Address List and the company name is ABC Inc. then your command will be:
findstr /B /V "Address Copyright Confidential Generated ABC" filename.csv > fixedfile.csv
findstr /B /V "<first word from report name here> Copyright Confidential Generated <first word of the company name here>" filename.csv > fixedfile.csv
So if the report name is: Address List and the company name is ABC Inc. then your command will be:
findstr /B /V "Address Copyright Confidential Generated ABC" filename.csv > fixedfile.csv
ASKER
@gurutc
"ReportName"
"Copyright (c) 2000-2013 , inc. All rights reserved."
"Confidential Information - Do Not Distribute"
"Generated By: XYZ 10/1/2013 12:35 PM"
"Name of the company"
As you see here it has double quotations at beginning of the lines. Sorry i should have told you this before.
"ReportName"
"Copyright (c) 2000-2013 , inc. All rights reserved."
"Confidential Information - Do Not Distribute"
"Generated By: XYZ 10/1/2013 12:35 PM"
"Name of the company"
As you see here it has double quotations at beginning of the lines. Sorry i should have told you this before.
ASKER
Thank you so much Guru.. this one has removed the values. Closing this out.
ASKER
Hi @gurutc,
If am using /I /V it is excluding many valid records. Is there any other way to remove these double quotes of each column and use
findstr /B /V ..
I know this ticket has been closed. Do you want me to open a new ticket for this request?
If am using /I /V it is excluding many valid records. Is there any other way to remove these double quotes of each column and use
findstr /B /V ..
I know this ticket has been closed. Do you want me to open a new ticket for this request?
ASKER
setlocal enabledelayedexpansion
if exist after.csv del after.csv
for /f "delims=" %%A in (before.csv) do (
set csvline=%%A
echo !csvline:"=! >> after.csv
)
I'm using this code to trim the double quotes.. but it is not working.
When I get back to my PC we"ll solve this
-gurutc
-gurutc
Hi,
Use the \ directive to escape the double quotes and treat it as literal:
findstr /V /B "\"ReportName \"Copyright \"Confidential \"Generated \"Name" filename.csv > fixedfile.csv
You only have to escape the quotes that are at the beginning of each of your search terms so those quote marks also become part of what's being searched for.
So only the lines starting with the following will be recognized and left out:
"ReportName
"Copyright
"Confidential
"Generated
"Name
If there's anything hidden before the " mark you'll have to remove the /B which means beginning of line, but let's count on our hoping there isn't and this'll work groovy.
- gurutc
Use the \ directive to escape the double quotes and treat it as literal:
findstr /V /B "\"ReportName \"Copyright \"Confidential \"Generated \"Name" filename.csv > fixedfile.csv
You only have to escape the quotes that are at the beginning of each of your search terms so those quote marks also become part of what's being searched for.
So only the lines starting with the following will be recognized and left out:
"ReportName
"Copyright
"Confidential
"Generated
"Name
If there's anything hidden before the " mark you'll have to remove the /B which means beginning of line, but let's count on our hoping there isn't and this'll work groovy.
- gurutc
ASKER
Thank you so much again guru :)
ASKER
Once I remove this junk/signature values from csv file, I load them into tables using SSIS package. Thanks