Solved

Remove rows from csv file based on date column

Posted on 2013-11-19
13
529 Views
Last Modified: 2013-11-20
I have a csv file with columns below.  The second column is the only field that is an actual date.  What we would like to do is remove all rows from the csv file where the date in that column is < 3 days from today whenever run.  Meaning, delete all records where the date is 3 days or older then today.  Don't have any special preference, but figured some type of bat file utilizing DOS commands, or VB script, powershell, etc.  We've used things like 'FINDSTR' etc in the past for certain strings, but not a date so not sure where to start.  Any help would greatly be appreciated.  
Columns:

PackageReference1,ShipmentInformationCollectiondate,ShipmentInformationActualWeight,ShipmentInformationLeadTrackingNumber,ShipmentInformationServiceType,ThirdPartyUPSAccountNumber
"15231622","11/6/2013","1.00","581089305388","3","444444444"
"15242251","11/6/2013","1.00","581089305399","92","555555555"
"15282587","11/16/2013","1.00","581089305403","92","555555555"
"15278493","11/18/2013","1.00","581089305414","92","444444444"
0
Comment
Question by:ibgadmin
  • 7
  • 6
13 Comments
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39660735
Why not simply import the csv into Excel and filter the date?

You have attached a test Excel file, using your sample.
test.xlsx
0
 

Author Comment

by:ibgadmin
ID: 39660803
I'm looking for something to run fully automated via code within a bat file, etc.  The file is pretty much like a log file I receive from a vendor so I don't extract or have control over writing the file nor do we run Excel on the on windows server where I need it to run unattended.
0
 

Author Comment

by:ibgadmin
ID: 39660810
Also, the csv file needs to remain intact as a csv file once records are removed.  Again think of it more of as log file that you want to remove older records from but keep everything else as is.
0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39661503
Looks like the case for grep.
If you're on Linux you probably already have it installed, if on Windows get it from here: http://gnuwin32.sourceforge.net/packages/grep.htm

The command would be something like this:
grep '11/18/2013\|11/19/2013\|11/20/2013' log-file.csv

That basically says: parse log-file and return only the lines that contain "11/18/2013", "11/19/2013" or "11/20/2013".

Then you can pipe/redirect the output to the file of your choice, and you can programatically set the dates using the script language you have on that server.

LE: forgot to escape the "|"
0
 

Author Comment

by:ibgadmin
ID: 39662460
I was kind of looking at grep, but am no expert with grep and I need to keep everything but the last 3 days in the csv log file dynamically.  So the grep script would need to be something where it's like TODAY-3 only, so if today beeing 11/20/2013, it would need to remove any record LE to 11/17/2013 on the fly.  I won't have an opportunity to fill in the dates as you recommend.  Again, I need it to be fully un-attended/automated through a bat script, etc.
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39663025
OK, I was too lazy to write the whole script. Powershell.

Here it is:
$dte = Get-Date
$today = $dte.Month, $dte.Day, $dte.Year -join "\/"
$dte = $dte.AddDays(-1)
$yesterday = $dte.Month, $dte.Day, $dte.Year -join "\/"
$dte = $dte.AddDays(-1)
$otherday = $dte.Month, $dte.Day, $dte.Year -join "\/"

&"X:\path\to\sed\sed.exe" -i `""/"$today"\|"$yesterday"\|"$otherday"/!d"`" "Y:\path\to\log\log.csv"

Open in new window

Tested and working with your sample data.

This uses sed -i with the d command, to delete lines from the logfile.
You can get sed from here: http://gnuwin32.sourceforge.net/packages/sed.htm

Replace X:\path\to\sed\sed.exe and Y:\path\to\log\log.csv with the actual paths.
0
 

Author Comment

by:ibgadmin
ID: 39663200
That worked but unfortunately it removed the Header row with the column headings.  I need to keep that intact, so is there a way to NOT remove the header row?
0
 

Author Comment

by:ibgadmin
ID: 39663231
Also, if I understand how this is working, how would one say change this to keep 7 days vs. 3?  I guess I was expecting some kind of variable that contained how many days to exclude where that could be changed on the fly.  That would be preferred and would make it usable for other files with different rules if that's possible.  Sorry if I wasn't clear about that...
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39663246
Yup.

&"X:\path\to\sed\sed.exe" -i `""/PackageReference\|"$today"\|"$yesterday"\|"$otherday"/!d"`" "Y:\path\to\log\log.csv"

Open in new window


As long as there is a column named PackageReference, it will keep the header.
0
 

Author Comment

by:ibgadmin
ID: 39663285
That worked perfectly!  Now as long as it can be changed a bit to accommodate the question to keeping 7 days, etc and making it usable over and over for other files to parse and modify, that would be great!
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39663312
$dte = Get-Date             #current date
$history = 3                # number of days to keep
$searchstring = "PackageReference"   # must be an actual value from the header
$period = @(0) * $history   #array to store formatted dates
for ( $i = 0; $i -lt $history; $i++ ) {
  $period[$i] = $dte.Month, $dte.Day, $dte.Year -join "\/"
  $searchstring += "\|" + $period[$i]
  $dte = $dte.AddDays(-1)
}

&"X:\path\to\sed\sed.exe" -i `""/"$searchstring"/!d"`" "X:\path\to\log\log.csv"

Open in new window

You'll need to modify $history and $searchstring to accomodate your needs.

LE: refactored code
0
 
LVL 34

Accepted Solution

by:
Dan Craciun earned 500 total points
ID: 39663649
Further modification. No longer matters what the first line contains, it will keep it.

Param([Int32]$history = 3,                 # number of days to keep
      [string]$log = "Y:\path\to\log.csv") # path to log file
$dte = Get-Date             # current date
$searchstring = ""          # 
$period = @(0) * $history   # array to store formatted dates

for ( $i = 0; $i -lt $history; $i++ ) {
  $period[$i] = $dte.Month, $dte.Day, $dte.Year -join "\/"
  $searchstring += "\|" + $period[$i]
  $dte = $dte.AddDays(-1)
}
$searchstring = $searchstring.Substring(2) # remove first \| 

&"X:\path\to\sed\sed.exe" -i -e1p -e `""/"$searchstring"/!d"`" $log

Open in new window


Replace path\to\sed, save it as whatever.ps1, then run it as
whatever.ps1 -history 5 -log "Y:\path\to\log.csv"
0
 

Author Comment

by:ibgadmin
ID: 39663767
Thanks so much Dan for your help - this works perfectly.
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Citrix XenApp, Internet Explorer 11 set to Enterprise Mode and using central hosted sites.xml file.
Have you tried to learn about Unicode, UTF-8, and multibyte text encoding and all the articles are just too "academic" or too technical? This article aims to make the whole topic easy for just about anyone to understand.
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question