?
Solved

Remove rows from csv file based on date column

Posted on 2013-11-19
13
Medium Priority
?
542 Views
Last Modified: 2013-11-20
I have a csv file with columns below.  The second column is the only field that is an actual date.  What we would like to do is remove all rows from the csv file where the date in that column is < 3 days from today whenever run.  Meaning, delete all records where the date is 3 days or older then today.  Don't have any special preference, but figured some type of bat file utilizing DOS commands, or VB script, powershell, etc.  We've used things like 'FINDSTR' etc in the past for certain strings, but not a date so not sure where to start.  Any help would greatly be appreciated.  
Columns:

PackageReference1,ShipmentInformationCollectiondate,ShipmentInformationActualWeight,ShipmentInformationLeadTrackingNumber,ShipmentInformationServiceType,ThirdPartyUPSAccountNumber
"15231622","11/6/2013","1.00","581089305388","3","444444444"
"15242251","11/6/2013","1.00","581089305399","92","555555555"
"15282587","11/16/2013","1.00","581089305403","92","555555555"
"15278493","11/18/2013","1.00","581089305414","92","444444444"
0
Comment
Question by:ibgadmin
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 6
13 Comments
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39660735
Why not simply import the csv into Excel and filter the date?

You have attached a test Excel file, using your sample.
test.xlsx
0
 
LVL 1

Author Comment

by:ibgadmin
ID: 39660803
I'm looking for something to run fully automated via code within a bat file, etc.  The file is pretty much like a log file I receive from a vendor so I don't extract or have control over writing the file nor do we run Excel on the on windows server where I need it to run unattended.
0
 
LVL 1

Author Comment

by:ibgadmin
ID: 39660810
Also, the csv file needs to remain intact as a csv file once records are removed.  Again think of it more of as log file that you want to remove older records from but keep everything else as is.
0
Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39661503
Looks like the case for grep.
If you're on Linux you probably already have it installed, if on Windows get it from here: http://gnuwin32.sourceforge.net/packages/grep.htm

The command would be something like this:
grep '11/18/2013\|11/19/2013\|11/20/2013' log-file.csv

That basically says: parse log-file and return only the lines that contain "11/18/2013", "11/19/2013" or "11/20/2013".

Then you can pipe/redirect the output to the file of your choice, and you can programatically set the dates using the script language you have on that server.

LE: forgot to escape the "|"
0
 
LVL 1

Author Comment

by:ibgadmin
ID: 39662460
I was kind of looking at grep, but am no expert with grep and I need to keep everything but the last 3 days in the csv log file dynamically.  So the grep script would need to be something where it's like TODAY-3 only, so if today beeing 11/20/2013, it would need to remove any record LE to 11/17/2013 on the fly.  I won't have an opportunity to fill in the dates as you recommend.  Again, I need it to be fully un-attended/automated through a bat script, etc.
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39663025
OK, I was too lazy to write the whole script. Powershell.

Here it is:
$dte = Get-Date
$today = $dte.Month, $dte.Day, $dte.Year -join "\/"
$dte = $dte.AddDays(-1)
$yesterday = $dte.Month, $dte.Day, $dte.Year -join "\/"
$dte = $dte.AddDays(-1)
$otherday = $dte.Month, $dte.Day, $dte.Year -join "\/"

&"X:\path\to\sed\sed.exe" -i `""/"$today"\|"$yesterday"\|"$otherday"/!d"`" "Y:\path\to\log\log.csv"

Open in new window

Tested and working with your sample data.

This uses sed -i with the d command, to delete lines from the logfile.
You can get sed from here: http://gnuwin32.sourceforge.net/packages/sed.htm

Replace X:\path\to\sed\sed.exe and Y:\path\to\log\log.csv with the actual paths.
0
 
LVL 1

Author Comment

by:ibgadmin
ID: 39663200
That worked but unfortunately it removed the Header row with the column headings.  I need to keep that intact, so is there a way to NOT remove the header row?
0
 
LVL 1

Author Comment

by:ibgadmin
ID: 39663231
Also, if I understand how this is working, how would one say change this to keep 7 days vs. 3?  I guess I was expecting some kind of variable that contained how many days to exclude where that could be changed on the fly.  That would be preferred and would make it usable for other files with different rules if that's possible.  Sorry if I wasn't clear about that...
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39663246
Yup.

&"X:\path\to\sed\sed.exe" -i `""/PackageReference\|"$today"\|"$yesterday"\|"$otherday"/!d"`" "Y:\path\to\log\log.csv"

Open in new window


As long as there is a column named PackageReference, it will keep the header.
0
 
LVL 1

Author Comment

by:ibgadmin
ID: 39663285
That worked perfectly!  Now as long as it can be changed a bit to accommodate the question to keeping 7 days, etc and making it usable over and over for other files to parse and modify, that would be great!
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39663312
$dte = Get-Date             #current date
$history = 3                # number of days to keep
$searchstring = "PackageReference"   # must be an actual value from the header
$period = @(0) * $history   #array to store formatted dates
for ( $i = 0; $i -lt $history; $i++ ) {
  $period[$i] = $dte.Month, $dte.Day, $dte.Year -join "\/"
  $searchstring += "\|" + $period[$i]
  $dte = $dte.AddDays(-1)
}

&"X:\path\to\sed\sed.exe" -i `""/"$searchstring"/!d"`" "X:\path\to\log\log.csv"

Open in new window

You'll need to modify $history and $searchstring to accomodate your needs.

LE: refactored code
0
 
LVL 35

Accepted Solution

by:
Dan Craciun earned 2000 total points
ID: 39663649
Further modification. No longer matters what the first line contains, it will keep it.

Param([Int32]$history = 3,                 # number of days to keep
      [string]$log = "Y:\path\to\log.csv") # path to log file
$dte = Get-Date             # current date
$searchstring = ""          # 
$period = @(0) * $history   # array to store formatted dates

for ( $i = 0; $i -lt $history; $i++ ) {
  $period[$i] = $dte.Month, $dte.Day, $dte.Year -join "\/"
  $searchstring += "\|" + $period[$i]
  $dte = $dte.AddDays(-1)
}
$searchstring = $searchstring.Substring(2) # remove first \| 

&"X:\path\to\sed\sed.exe" -i -e1p -e `""/"$searchstring"/!d"`" $log

Open in new window


Replace path\to\sed, save it as whatever.ps1, then run it as
whatever.ps1 -history 5 -log "Y:\path\to\log.csv"
0
 
LVL 1

Author Comment

by:ibgadmin
ID: 39663767
Thanks so much Dan for your help - this works perfectly.
0

Featured Post

Get real performance insights from real users

Key features:
- Total Pages Views and Load times
- Top Pages Viewed and Load Times
- Real Time Site Page Build Performance
- Users’ Browser and Platform Performance
- Geographic User Breakdown
- And more

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Whether you've completed a degree in computer sciences or you're a self-taught programmer, writing your first lines of code in the real world is always a challenge. Here are some of the most common pitfalls for new programmers.
Computer science students often experience many of the same frustrations when going through their engineering courses. This article presents seven tips I found useful when completing a bachelors and masters degree in computing which I believe may he…
In this video, we discuss why the need for additional vertical screen space has become more important in recent years, namely, due to the transition in the marketplace of 4x3 computer screens to 16x9 and 16x10 screens (so-called widescreen format). …
Simple Linear Regression
Suggested Courses

800 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question