Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

How to remove the junk values from csv file

Posted on 2013-10-24
16
Medium Priority
?
523 Views
Last Modified: 2013-10-24
Hi All,

I 've this csv file which looks fine when I open it, but when I reach to the bottom of this report I see this signature

Report Generated by
Report generated time
Name of the Company
Company Address

File Layout
ID, M_Name,L_Name,F_Name, Street_name,SSN,Phone1,

Records:17k approximately
I remove this junk data manually and load this file into the table [SQL SERVER 2008].

Is there any way we can write a script to remove this junk values?

Please let me know if you need any thing else?
0
Comment
Question by:parpaa
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 6
16 Comments
 

Author Comment

by:parpaa
ID: 39597644
Almost forget::
Once I remove this junk/signature values from csv file, I load them into tables using SSIS package. Thanks
0
 
LVL 16

Expert Comment

by:gurutc
ID: 39597687
use this command:

findstr /B  /V "Report Name Company"  filename.csv > fixedfile.csv

this command will output all lines not beginning with those search strings to the fixedfile.csv

Good Luck,

- gurutc
0
 

Author Comment

by:parpaa
ID: 39597716
awesome I will try that now :)
0
Get your Conversational Ransomware Defense e‑book

This e-book gives you an insight into the ransomware threat and reviews the fundamentals of top-notch ransomware preparedness and recovery. To help you protect yourself and your organization. The initial infection may be inevitable, so the best protection is to be fully prepared.

 

Author Comment

by:parpaa
ID: 39597768
GuruTc,
unfortunately it is not excluding these values, though it created new csv file.
Anything missing here?
0
 

Author Comment

by:parpaa
ID: 39597791

"ReportName"
"Copyright (c) 2000-2013 , inc. All rights reserved."
"Confidential Information - Do Not Distribute"
"Generated By: XYZ  10/1/2013 12:35 PM"
"Name of the company"

This is something tried:
findstr /B  /V "ReportName Copyright Confidential Generated Name"  filename.csv > fixedfile.csv

but no luck
0
 
LVL 16

Expert Comment

by:gurutc
ID: 39597813
Are there any leading spaces before those strings in the file?

Another way is a multi step but could be put in a batch file:

findstr /V  /I  /C:"Report Generated by"  filename.csv > fixedfile1.csv
findstr /V  /I  /C:"Name of"  fixedfile1.csv > fixedfile2.csv
findstr /V  /I  /C:"Company Address"  fixedfile2.csv > fixedfile3.csv

This will remove the lines with these strings no matter if there are spaces or tabs or anything else before the strings.

- gurutc
0
 
LVL 16

Expert Comment

by:gurutc
ID: 39597822
Your try would work if you took out the /B which means beginning of line and add /I making it case-insensitive.

- gurutc
0
 
LVL 16

Accepted Solution

by:
gurutc earned 2000 total points
ID: 39597835
findstr   /V  /I  "ReportName Copyright Confidential Generated Name"  filename.csv > fixedfile.csv
0
 
LVL 27

Expert Comment

by:Zberteoc
ID: 39597858
I think what you need is to have a list of first words from each line in the signature. Try:

findstr /B  /V "<first word from report name here> Copyright Confidential Generated <first word of the company name here>"  filename.csv > fixedfile.csv

So if the report name is:  Address List  and the company name is  ABC Inc.  then your command will be:

findstr /B  /V "Address Copyright Confidential Generated ABC"  filename.csv > fixedfile.csv
0
 

Author Comment

by:parpaa
ID: 39597883
@gurutc
"ReportName"
"Copyright (c) 2000-2013 , inc. All rights reserved."
"Confidential Information - Do Not Distribute"
"Generated By: XYZ  10/1/2013 12:35 PM"
"Name of the company"
As you see here it has double quotations at beginning of the lines. Sorry i should have told you this before.
0
 

Author Closing Comment

by:parpaa
ID: 39597901
Thank you so much Guru.. this one has removed the values. Closing this out.
0
 

Author Comment

by:parpaa
ID: 39598009
Hi @gurutc,

If am using /I /V it is excluding many valid records. Is there any other way to remove these double quotes of each column and use

findstr /B /V ..

I know this ticket has been closed. Do you want me to open a new ticket for this request?
0
 

Author Comment

by:parpaa
ID: 39598018
setlocal enabledelayedexpansion
if exist after.csv del after.csv
for /f "delims=" %%A in (before.csv) do (
  set csvline=%%A
  echo !csvline:"=! >> after.csv
  )

Open in new window


I'm using this code to trim the double quotes.. but it is not working.
0
 
LVL 16

Expert Comment

by:gurutc
ID: 39598116
When I get back to my PC we"ll solve this

-gurutc
0
 
LVL 16

Expert Comment

by:gurutc
ID: 39598328
Hi,

Use the \ directive to escape the double quotes and treat it as literal:

findstr   /V   /B "\"ReportName \"Copyright \"Confidential \"Generated \"Name"  filename.csv > fixedfile.csv

You only have to escape the quotes that are at the beginning of each of your search terms so those quote marks also become part of what's being searched for.  

So only the lines starting with the following will be recognized and left out:

"ReportName
"Copyright
"Confidential
"Generated
"Name

If there's anything hidden before the " mark you'll have to remove the /B which means beginning of line, but let's count on our hoping there isn't and this'll work groovy.

- gurutc
0
 

Author Comment

by:parpaa
ID: 39599406
Thank you so much again guru :)
0

Featured Post

[Webinar] Lessons on Recovering from Petya

Skyport is working hard to help customers recover from recent attacks, like the Petya worm. This work has brought to light some important lessons. New malware attacks like this can take down your entire environment. Learn from others mistakes on how to prevent Petya like worms.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Use this article to create a batch file to backup a Microsoft SQL Server database to a Windows folder.  The folder can be on the local hard drive or on a network share.  This batch file will query the SQL server to get the current date & time and wi…
This is a fine trick which I've found useful many times, when you just don't want to accidentally run a batch script or the commands needs administrator rights.
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…

704 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question