[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 548
  • Last Modified:

Remove rows from csv file based on date column

I have a csv file with columns below.  The second column is the only field that is an actual date.  What we would like to do is remove all rows from the csv file where the date in that column is < 3 days from today whenever run.  Meaning, delete all records where the date is 3 days or older then today.  Don't have any special preference, but figured some type of bat file utilizing DOS commands, or VB script, powershell, etc.  We've used things like 'FINDSTR' etc in the past for certain strings, but not a date so not sure where to start.  Any help would greatly be appreciated.  
Columns:

PackageReference1,ShipmentInformationCollectiondate,ShipmentInformationActualWeight,ShipmentInformationLeadTrackingNumber,ShipmentInformationServiceType,ThirdPartyUPSAccountNumber
"15231622","11/6/2013","1.00","581089305388","3","444444444"
"15242251","11/6/2013","1.00","581089305399","92","555555555"
"15282587","11/16/2013","1.00","581089305403","92","555555555"
"15278493","11/18/2013","1.00","581089305414","92","444444444"
0
ibgadmin
Asked:
ibgadmin
  • 7
  • 6
1 Solution
 
Dan CraciunIT ConsultantCommented:
Why not simply import the csv into Excel and filter the date?

You have attached a test Excel file, using your sample.
test.xlsx
0
 
ibgadminAuthor Commented:
I'm looking for something to run fully automated via code within a bat file, etc.  The file is pretty much like a log file I receive from a vendor so I don't extract or have control over writing the file nor do we run Excel on the on windows server where I need it to run unattended.
0
 
ibgadminAuthor Commented:
Also, the csv file needs to remain intact as a csv file once records are removed.  Again think of it more of as log file that you want to remove older records from but keep everything else as is.
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
Dan CraciunIT ConsultantCommented:
Looks like the case for grep.
If you're on Linux you probably already have it installed, if on Windows get it from here: http://gnuwin32.sourceforge.net/packages/grep.htm

The command would be something like this:
grep '11/18/2013\|11/19/2013\|11/20/2013' log-file.csv

That basically says: parse log-file and return only the lines that contain "11/18/2013", "11/19/2013" or "11/20/2013".

Then you can pipe/redirect the output to the file of your choice, and you can programatically set the dates using the script language you have on that server.

LE: forgot to escape the "|"
0
 
ibgadminAuthor Commented:
I was kind of looking at grep, but am no expert with grep and I need to keep everything but the last 3 days in the csv log file dynamically.  So the grep script would need to be something where it's like TODAY-3 only, so if today beeing 11/20/2013, it would need to remove any record LE to 11/17/2013 on the fly.  I won't have an opportunity to fill in the dates as you recommend.  Again, I need it to be fully un-attended/automated through a bat script, etc.
0
 
Dan CraciunIT ConsultantCommented:
OK, I was too lazy to write the whole script. Powershell.

Here it is:
$dte = Get-Date
$today = $dte.Month, $dte.Day, $dte.Year -join "\/"
$dte = $dte.AddDays(-1)
$yesterday = $dte.Month, $dte.Day, $dte.Year -join "\/"
$dte = $dte.AddDays(-1)
$otherday = $dte.Month, $dte.Day, $dte.Year -join "\/"

&"X:\path\to\sed\sed.exe" -i `""/"$today"\|"$yesterday"\|"$otherday"/!d"`" "Y:\path\to\log\log.csv"

Open in new window

Tested and working with your sample data.

This uses sed -i with the d command, to delete lines from the logfile.
You can get sed from here: http://gnuwin32.sourceforge.net/packages/sed.htm

Replace X:\path\to\sed\sed.exe and Y:\path\to\log\log.csv with the actual paths.
0
 
ibgadminAuthor Commented:
That worked but unfortunately it removed the Header row with the column headings.  I need to keep that intact, so is there a way to NOT remove the header row?
0
 
ibgadminAuthor Commented:
Also, if I understand how this is working, how would one say change this to keep 7 days vs. 3?  I guess I was expecting some kind of variable that contained how many days to exclude where that could be changed on the fly.  That would be preferred and would make it usable for other files with different rules if that's possible.  Sorry if I wasn't clear about that...
0
 
Dan CraciunIT ConsultantCommented:
Yup.

&"X:\path\to\sed\sed.exe" -i `""/PackageReference\|"$today"\|"$yesterday"\|"$otherday"/!d"`" "Y:\path\to\log\log.csv"

Open in new window


As long as there is a column named PackageReference, it will keep the header.
0
 
ibgadminAuthor Commented:
That worked perfectly!  Now as long as it can be changed a bit to accommodate the question to keeping 7 days, etc and making it usable over and over for other files to parse and modify, that would be great!
0
 
Dan CraciunIT ConsultantCommented:
$dte = Get-Date             #current date
$history = 3                # number of days to keep
$searchstring = "PackageReference"   # must be an actual value from the header
$period = @(0) * $history   #array to store formatted dates
for ( $i = 0; $i -lt $history; $i++ ) {
  $period[$i] = $dte.Month, $dte.Day, $dte.Year -join "\/"
  $searchstring += "\|" + $period[$i]
  $dte = $dte.AddDays(-1)
}

&"X:\path\to\sed\sed.exe" -i `""/"$searchstring"/!d"`" "X:\path\to\log\log.csv"

Open in new window

You'll need to modify $history and $searchstring to accomodate your needs.

LE: refactored code
0
 
Dan CraciunIT ConsultantCommented:
Further modification. No longer matters what the first line contains, it will keep it.

Param([Int32]$history = 3,                 # number of days to keep
      [string]$log = "Y:\path\to\log.csv") # path to log file
$dte = Get-Date             # current date
$searchstring = ""          # 
$period = @(0) * $history   # array to store formatted dates

for ( $i = 0; $i -lt $history; $i++ ) {
  $period[$i] = $dte.Month, $dte.Day, $dte.Year -join "\/"
  $searchstring += "\|" + $period[$i]
  $dte = $dte.AddDays(-1)
}
$searchstring = $searchstring.Substring(2) # remove first \| 

&"X:\path\to\sed\sed.exe" -i -e1p -e `""/"$searchstring"/!d"`" $log

Open in new window


Replace path\to\sed, save it as whatever.ps1, then run it as
whatever.ps1 -history 5 -log "Y:\path\to\log.csv"
0
 
ibgadminAuthor Commented:
Thanks so much Dan for your help - this works perfectly.
0

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

  • 7
  • 6
Tackle projects and never again get stuck behind a technical roadblock.
Join Now