Solved

# Remove rows from csv file based on date column

Posted on 2013-11-19
518 Views
I have a csv file with columns below.  The second column is the only field that is an actual date.  What we would like to do is remove all rows from the csv file where the date in that column is < 3 days from today whenever run.  Meaning, delete all records where the date is 3 days or older then today.  Don't have any special preference, but figured some type of bat file utilizing DOS commands, or VB script, powershell, etc.  We've used things like 'FINDSTR' etc in the past for certain strings, but not a date so not sure where to start.  Any help would greatly be appreciated.
Columns:

"15231622","11/6/2013","1.00","581089305388","3","444444444"
"15242251","11/6/2013","1.00","581089305399","92","555555555"
"15282587","11/16/2013","1.00","581089305403","92","555555555"
"15278493","11/18/2013","1.00","581089305414","92","444444444"
0
• 7
• 6

LVL 34

Expert Comment

ID: 39660735
Why not simply import the csv into Excel and filter the date?

You have attached a test Excel file, using your sample.
test.xlsx
0

Author Comment

ID: 39660803
I'm looking for something to run fully automated via code within a bat file, etc.  The file is pretty much like a log file I receive from a vendor so I don't extract or have control over writing the file nor do we run Excel on the on windows server where I need it to run unattended.
0

Author Comment

ID: 39660810
Also, the csv file needs to remain intact as a csv file once records are removed.  Again think of it more of as log file that you want to remove older records from but keep everything else as is.
0

LVL 34

Expert Comment

ID: 39661503
Looks like the case for grep.
If you're on Linux you probably already have it installed, if on Windows get it from here: http://gnuwin32.sourceforge.net/packages/grep.htm

The command would be something like this:
grep '11/18/2013\|11/19/2013\|11/20/2013' log-file.csv

That basically says: parse log-file and return only the lines that contain "11/18/2013", "11/19/2013" or "11/20/2013".

Then you can pipe/redirect the output to the file of your choice, and you can programatically set the dates using the script language you have on that server.

LE: forgot to escape the "|"
0

Author Comment

ID: 39662460
I was kind of looking at grep, but am no expert with grep and I need to keep everything but the last 3 days in the csv log file dynamically.  So the grep script would need to be something where it's like TODAY-3 only, so if today beeing 11/20/2013, it would need to remove any record LE to 11/17/2013 on the fly.  I won't have an opportunity to fill in the dates as you recommend.  Again, I need it to be fully un-attended/automated through a bat script, etc.
0

LVL 34

Expert Comment

ID: 39663025
OK, I was too lazy to write the whole script. Powershell.

Here it is:
$dte = Get-Date$today = $dte.Month,$dte.Day, $dte.Year -join "\/"$dte = $dte.AddDays(-1)$yesterday = $dte.Month,$dte.Day, $dte.Year -join "\/"$dte = $dte.AddDays(-1)$otherday = $dte.Month,$dte.Day, $dte.Year -join "\/" &"X:\path\to\sed\sed.exe" -i ""/"$today"\|"$yesterday"\|"$otherday"/!d"" "Y:\path\to\log\log.csv"

Tested and working with your sample data.

This uses sed -i with the d command, to delete lines from the logfile.
You can get sed from here: http://gnuwin32.sourceforge.net/packages/sed.htm

Replace X:\path\to\sed\sed.exe and Y:\path\to\log\log.csv with the actual paths.
0

Author Comment

ID: 39663200
That worked but unfortunately it removed the Header row with the column headings.  I need to keep that intact, so is there a way to NOT remove the header row?
0

Author Comment

ID: 39663231
Also, if I understand how this is working, how would one say change this to keep 7 days vs. 3?  I guess I was expecting some kind of variable that contained how many days to exclude where that could be changed on the fly.  That would be preferred and would make it usable for other files with different rules if that's possible.  Sorry if I wasn't clear about that...
0

LVL 34

Expert Comment

ID: 39663246
Yup.

&"X:\path\to\sed\sed.exe" -i ""/PackageReference\|"$today"\|"$yesterday"\|"$otherday"/!d"" "Y:\path\to\log\log.csv"  As long as there is a column named PackageReference, it will keep the header. 0 Author Comment ID: 39663285 That worked perfectly! Now as long as it can be changed a bit to accommodate the question to keeping 7 days, etc and making it usable over and over for other files to parse and modify, that would be great! 0 LVL 34 Expert Comment ID: 39663312 $dte = Get-Date             #current date
$history = 3 # number of days to keep$searchstring = "PackageReference"   # must be an actual value from the header
$period = @(0) *$history   #array to store formatted dates
for ( $i = 0;$i -lt $history;$i++ ) {
$period[$i] = $dte.Month,$dte.Day, $dte.Year -join "\/"$searchstring += "\|" + $period[$i]
$dte =$dte.AddDays(-1)
}

&"X:\path\to\sed\sed.exe" -i ""/"$searchstring"/!d"" "X:\path\to\log\log.csv"  You'll need to modify$history and $searchstring to accomodate your needs. LE: refactored code 0 LVL 34 Accepted Solution Dan Craciun earned 500 total points ID: 39663649 Further modification. No longer matters what the first line contains, it will keep it. Param([Int32]$history = 3,                 # number of days to keep
[string]$log = "Y:\path\to\log.csv") # path to log file$dte = Get-Date             # current date
$searchstring = "" #$period = @(0) * $history # array to store formatted dates for ($i = 0; $i -lt$history; $i++ ) {$period[$i] =$dte.Month, $dte.Day,$dte.Year -join "\/"
$searchstring += "\|" +$period[$i]$dte = $dte.AddDays(-1) }$searchstring = $searchstring.Substring(2) # remove first \| &"X:\path\to\sed\sed.exe" -i -e1p -e ""/"$searchstring"/!d"" \$log


Replace path\to\sed, save it as whatever.ps1, then run it as
whatever.ps1 -history 5 -log "Y:\path\to\log.csv"
0

Author Comment

ID: 39663767
Thanks so much Dan for your help - this works perfectly.
0

## Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Displaying an arrayList in a listView using the default adapter is rarely the best solution. To get full control of your display data, and to be able to refresh it after editing, requires the use of a custom adapter.
Entering a date in Microsoft Access can be tricky. A typo can cause month and day to be shuffled, entering the day only causes an error, as does entering, say, day 31 in June. This article shows how an inputmask supported by code can help the user a…
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…