On a daily basis, I want to compare two text files and place the differences between them into a 3rd text file.
The text files are comma delimited extracts from an Oracle DB which contain personnel data. One file will be from the previous day. The 2nd file will be the newest one with data changes.
I want my 3rd file to contain only the changed rows and new rows.
Some things to know about the data:
1. The first column "EMPLOYEE_ID" will never change for existing records. But new rows will be assigned a new EMPLOYEE_ID value.
2. The headers for the data elements will not have double quotes. However the data itself will include double quotes. I need the 1st line of headers included in the 3rd file creation.
3. Yes I know that "EMPL_SUFFIX_CD" is in there twice. I have no control over this. :(
4. Each file, each day, will have headers at the top but the data itself will be sorted differently each day. I have no control over this.
I have powerscript code put together, however the part that compares is not giving me what I expect.
My script does this:
1. Rename the newest file to append current date. For the example data below, I've just named them "yesterdaydata" and "todaydata".
2. sort both text files by the 1st column and keep headers the top. I'm assuming this will make it easier to compare the two files. I can't get this to work.
3a. check each line of file 1 (yesterday) with file 2 (today) and send any EMPLOYEE_ID data that has changed from file 2 (today) to the 3rd file. <- this not working
3b. Also check if there are any new lines of data added (new EMPLOYEE_IDs) to File 2 (today) and put them in the 3rd file as well. <-- not working
Example of contents:
yesterdayfile contents:
EMPLOYEE_ID,EMPL_LAST_NM,EMPL_FIRST_NM,EMPL_MIDDLE_NM,EMPL_SUFFIX_CD,EMAIL_ADDR,HOME_DEPT_CD,DEPT_NM,HOME_UNIT_CD,UNIT_NM,STR_1_NM,STR_2_NM,CITY_NM,ZIP,PH_NO,TITLE_CD,TITL_LONG_DD,EMPL_SUFFIX_CD,INT_RPT_POSN_ID_NO,EMPLOYEE_STATUS
"0000000002","CHAMBERS","DIANE","S","","","400","Office Of The Upity","xxxx","Book Reader","111 Broad St","","NY","55555","5555555555","0000","SR. SUPERVISOR ","","","Inactive"
"0000001012","PETERSON","NORM","P","","","600","Office of Cheers","xxxx","Beer Taster","112 Broad St","","NY","55555","5555555555","0000","JR. ACCOUNTANT ","","","Inactive"
"0000004403","CLAVIN","CLIFF","Q","","","500","Office Of The Mail","xxxx","Stamp Inspector","39 Broad St","","NY","55555","5555555555","0000","SR. INSPECTOR ","","","Inactive"
todayfile contents:
EMPLOYEE_ID,EMPL_LAST_NM,EMPL_FIRST_NM,EMPL_MIDDLE_NM,EMPL_SUFFIX_CD,EMAIL_ADDR,HOME_DEPT_CD,DEPT_NM,HOME_UNIT_CD,UNIT_NM,STR_1_NM,STR_2_NM,CITY_NM,ZIP,PH_NO,TITLE_CD,TITL_LONG_DD,EMPL_SUFFIX_CD,INT_RPT_POSN_ID_NO,EMPLOYEE_STATUS
"0000000002","CHAMBERS","DIANE","S","","","400","Office Of The Upity","xxxx","Book Reader","111 Broad St","","NY","55555","5555555555","0000","SR. SUPERVISOR ","","","Active"
"0000001012","PETERSON","NORM","P","","","600","Office of Cheers","xxxx","Beer Taster","112 Broad St","","NY","55555","5555555555","0000","JR. ACCOUNTANT ","","","Inactive"
"0000004403","CLAVICAL","CLIFF","Q","","","500","Office Of The Mail","xxxx","Stamp Inspector","39 Broad St","","NY","55555","5555555555","0000","SR. INSPECTOR ","","","Inactive"
"0000009901","LEBECK","CARLA","B","","","900","Office Of The Earings","xxxx","Earring Tester","14 Broad St","","NY","55555","5555555555","0000","SR. FUNKY DRESSER ","","","Inactive"
The differences between the files are:
CLAVIN changed to CLAVICAL for Employee_id 0000004403
New employee LEBECK
Employee "0000000002" changed to "Active"
What I want to show in the 3rd file is the following (not in any particular order but need headers at the top):
EMPLOYEE_ID,EMPL_LAST_NM,EMPL_FIRST_NM,EMPL_MIDDLE_NM,EMPL_SUFFIX_CD,EMAIL_ADDR,HOME_DEPT_CD,DEPT_NM,HOME_UNIT_CD,UNIT_NM,STR_1_NM,STR_2_NM,CITY_NM,ZIP,PH_NO,TITLE_CD,TITL_LONG_DD,EMPL_SUFFIX_CD,INT_RPT_POSN_ID_NO,EMPLOYEE_STATUS
"0000000002","CHAMBERS","DIANE","S","","","400","Office Of The Upity","xxxx","Book Reader","111 Broad St","","NY","55555","5555555555","0000","SR. SUPERVISOR ","","","Active"
"0000004403","CLAVICAL","CLIFF","Q","","","500","Office Of The Mail","xxxx","Stamp Inspector","39 Broad St","","NY","55555","5555555555","0000","SR. INSPECTOR ","","","Inactive"
"0000009901","LEBECK","CARLA","B","","","900","Office Of The Earings","xxxx","Earring Tester","14 Broad St","","NY","55555","5555555555","0000","SR. FUNKY DRESSER ","","","Inactive"
Here's the code I have so far:
$workingDir = "C:\PS_Scripts\"
# #################################################
# Obtain file name from yesterday employee file.
# #################################################
$YesterdayDate = Get-Date ((Get-Date).AddDays(-1)) -Format yyyyMMdd
$YesterdayFile = $workingDir + (Get-ChildItem C:\PS_Scripts\ -Filter "employees-$YesterdayDate*")
"Yesterday file is: $YesterdayFile"
# #################################################
# Rename todays employee file to append todays date
# #################################################
$fileName = "C:\PS_Scripts\employees.txt"
# Check the file exists
if (-not(Test-Path $fileName)) {break}
# Display the original name
"Original filename: $fileName"
$fileObj = get-item $fileName
# Get the date
$DateStamp = get-date -uformat "%Y%m%d"
$extOnly = $fileObj.extension
if ($extOnly.length -eq 0) {
$nameOnly = $fileObj.Name
rename-item "$fileObj" "$nameOnly-$DateStamp"
}
else {
$nameOnly = $fileObj.Name.Replace( $fileObj.Extension,'')
rename-item "$fileName" "$nameOnly-$DateStamp$extOnly"
}
$TodayFile = $workingDir + (Get-ChildItem C:\PS_Scripts\ -Filter "employees-$DateStamp*")
"TodayFile is: $TodayFile"
# ##########################################################
# Sort the two files (or just the new one assuming we
# did the other one yesterday)
# ##########################################################
(Nothing I've tried so far for sorting has worked so I have no code to show).
(For some attempts I get an error that a column exists twice in the data but I have no control over this)
# ##########################################################
# Compare yesterday employee list with todays employee list.
# Place differences into new employee text file
#
# (This is not working either even after manually sorting the files content)
# ##########################################################
get-content $TodayFile |
where {$YesterdayFile -notcontains $_} |
set-content C:\PS_Scripts\diff.txt
# place first line (headers) into final file
get-content $TodayFile -First 1 > C:\PS_Scripts\employees_final.txt
get-content C:\PS_Scripts\diff.txt >> C:\PS_Scripts\employees_final.txt