[2 days left] What’s wrong with your cloud strategy? Learn why multicloud solutions matter with Nimble Storage.Register Now

x
?
Solved

Remove rows from csv file based on date column

Posted on 2013-11-19
13
Medium Priority
?
545 Views
Last Modified: 2013-11-20
I have a csv file with columns below.  The second column is the only field that is an actual date.  What we would like to do is remove all rows from the csv file where the date in that column is < 3 days from today whenever run.  Meaning, delete all records where the date is 3 days or older then today.  Don't have any special preference, but figured some type of bat file utilizing DOS commands, or VB script, powershell, etc.  We've used things like 'FINDSTR' etc in the past for certain strings, but not a date so not sure where to start.  Any help would greatly be appreciated.  
Columns:

PackageReference1,ShipmentInformationCollectiondate,ShipmentInformationActualWeight,ShipmentInformationLeadTrackingNumber,ShipmentInformationServiceType,ThirdPartyUPSAccountNumber
"15231622","11/6/2013","1.00","581089305388","3","444444444"
"15242251","11/6/2013","1.00","581089305399","92","555555555"
"15282587","11/16/2013","1.00","581089305403","92","555555555"
"15278493","11/18/2013","1.00","581089305414","92","444444444"
0
Comment
Question by:ibgadmin
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 6
13 Comments
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39660735
Why not simply import the csv into Excel and filter the date?

You have attached a test Excel file, using your sample.
test.xlsx
0
 
LVL 1

Author Comment

by:ibgadmin
ID: 39660803
I'm looking for something to run fully automated via code within a bat file, etc.  The file is pretty much like a log file I receive from a vendor so I don't extract or have control over writing the file nor do we run Excel on the on windows server where I need it to run unattended.
0
 
LVL 1

Author Comment

by:ibgadmin
ID: 39660810
Also, the csv file needs to remain intact as a csv file once records are removed.  Again think of it more of as log file that you want to remove older records from but keep everything else as is.
0
Survive A High-Traffic Event with Percona

Your application or website rely on your database to deliver information about products and services to your customers. You can’t afford to have your database lose performance, lose availability or become unresponsive – even for just a few minutes.

 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39661503
Looks like the case for grep.
If you're on Linux you probably already have it installed, if on Windows get it from here: http://gnuwin32.sourceforge.net/packages/grep.htm

The command would be something like this:
grep '11/18/2013\|11/19/2013\|11/20/2013' log-file.csv

That basically says: parse log-file and return only the lines that contain "11/18/2013", "11/19/2013" or "11/20/2013".

Then you can pipe/redirect the output to the file of your choice, and you can programatically set the dates using the script language you have on that server.

LE: forgot to escape the "|"
0
 
LVL 1

Author Comment

by:ibgadmin
ID: 39662460
I was kind of looking at grep, but am no expert with grep and I need to keep everything but the last 3 days in the csv log file dynamically.  So the grep script would need to be something where it's like TODAY-3 only, so if today beeing 11/20/2013, it would need to remove any record LE to 11/17/2013 on the fly.  I won't have an opportunity to fill in the dates as you recommend.  Again, I need it to be fully un-attended/automated through a bat script, etc.
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39663025
OK, I was too lazy to write the whole script. Powershell.

Here it is:
$dte = Get-Date
$today = $dte.Month, $dte.Day, $dte.Year -join "\/"
$dte = $dte.AddDays(-1)
$yesterday = $dte.Month, $dte.Day, $dte.Year -join "\/"
$dte = $dte.AddDays(-1)
$otherday = $dte.Month, $dte.Day, $dte.Year -join "\/"

&"X:\path\to\sed\sed.exe" -i `""/"$today"\|"$yesterday"\|"$otherday"/!d"`" "Y:\path\to\log\log.csv"

Open in new window

Tested and working with your sample data.

This uses sed -i with the d command, to delete lines from the logfile.
You can get sed from here: http://gnuwin32.sourceforge.net/packages/sed.htm

Replace X:\path\to\sed\sed.exe and Y:\path\to\log\log.csv with the actual paths.
0
 
LVL 1

Author Comment

by:ibgadmin
ID: 39663200
That worked but unfortunately it removed the Header row with the column headings.  I need to keep that intact, so is there a way to NOT remove the header row?
0
 
LVL 1

Author Comment

by:ibgadmin
ID: 39663231
Also, if I understand how this is working, how would one say change this to keep 7 days vs. 3?  I guess I was expecting some kind of variable that contained how many days to exclude where that could be changed on the fly.  That would be preferred and would make it usable for other files with different rules if that's possible.  Sorry if I wasn't clear about that...
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39663246
Yup.

&"X:\path\to\sed\sed.exe" -i `""/PackageReference\|"$today"\|"$yesterday"\|"$otherday"/!d"`" "Y:\path\to\log\log.csv"

Open in new window


As long as there is a column named PackageReference, it will keep the header.
0
 
LVL 1

Author Comment

by:ibgadmin
ID: 39663285
That worked perfectly!  Now as long as it can be changed a bit to accommodate the question to keeping 7 days, etc and making it usable over and over for other files to parse and modify, that would be great!
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39663312
$dte = Get-Date             #current date
$history = 3                # number of days to keep
$searchstring = "PackageReference"   # must be an actual value from the header
$period = @(0) * $history   #array to store formatted dates
for ( $i = 0; $i -lt $history; $i++ ) {
  $period[$i] = $dte.Month, $dte.Day, $dte.Year -join "\/"
  $searchstring += "\|" + $period[$i]
  $dte = $dte.AddDays(-1)
}

&"X:\path\to\sed\sed.exe" -i `""/"$searchstring"/!d"`" "X:\path\to\log\log.csv"

Open in new window

You'll need to modify $history and $searchstring to accomodate your needs.

LE: refactored code
0
 
LVL 35

Accepted Solution

by:
Dan Craciun earned 2000 total points
ID: 39663649
Further modification. No longer matters what the first line contains, it will keep it.

Param([Int32]$history = 3,                 # number of days to keep
      [string]$log = "Y:\path\to\log.csv") # path to log file
$dte = Get-Date             # current date
$searchstring = ""          # 
$period = @(0) * $history   # array to store formatted dates

for ( $i = 0; $i -lt $history; $i++ ) {
  $period[$i] = $dte.Month, $dte.Day, $dte.Year -join "\/"
  $searchstring += "\|" + $period[$i]
  $dte = $dte.AddDays(-1)
}
$searchstring = $searchstring.Substring(2) # remove first \| 

&"X:\path\to\sed\sed.exe" -i -e1p -e `""/"$searchstring"/!d"`" $log

Open in new window


Replace path\to\sed, save it as whatever.ps1, then run it as
whatever.ps1 -history 5 -log "Y:\path\to\log.csv"
0
 
LVL 1

Author Comment

by:ibgadmin
ID: 39663767
Thanks so much Dan for your help - this works perfectly.
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Although it can be difficult to imagine, someday your child will have a career of his or her own. He or she will likely start a family, buy a home and start having their own children. So, while being a kid is still extremely important, it’s also …
Q&A with Course Creator, Mark Lassoff, on the importance of HTML5 in the career of a modern-day developer.
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …
Windows 8 came with a dramatically different user interface known as Metro. Notably missing from that interface was a Start button and Start Menu. Microsoft responded to negative user feedback of the Metro interface, bringing back the Start button a…

656 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question