asked on
additional requirement for unzipping files.
I got an additional requirement from the team, for the script you shared earlier.
Can we integrate the following requirement to the script you shared.
Requirement :
The zip file date-time stamp, all .dsv(data files) date-time stamps and control file date-timestamp should match or else throw an error, this should be done after the unzip process.
Here are the files,
this is the zip file, CMMDC_PMM_DWExtract_ALL_031522_04_30_40.zip
Inside the zip file, we have 10 files, 1 control file(.dsv) and 9 data files (.dsv)
data files :
CMMDC_PMM_DWExtract_ALL_CLAIMS_031522_04_30_40.dsv
CMMDC_PMM_DWExtract_ALL_VISIT_031522_04_30_43..dsv
CMMDC_PMM_DWExtract_ALL_PROVIDER_031522_04_30_40.dsv e.t.c
Control file :
CMMDC_PMM_DWExtract_ALL_Control_031522_04_30_40.dsv
we need to check if all these date-timestamp matches from their names, if not throw error in error log and if for suppose control file doesn't exist throw error.
Sorry for the additional requirement.... its crazy here -_-
Here's the earlier script to be clear, need additional modification to this code.
ASKER
Thanks for the response.
This isn't showing any output ?
will the script work for both date and time stamp ? it should match both date and time stamp.
i tried inserting a @pause in a 2nd line and - skip 3, its just asking me to click enter
@PowerShell.exe -Command "Invoke-Expression -Command ((Get-Content -Path '%~f0' | Select-Object -Skip 2) -join [environment]::NewLine)"
@exit /b %Errorlevel%
$inFolder = '\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST'
$outFolder = '\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST_EXPAND'
$errorLog = '\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST\CMMDC_PMM_DWExtract_ALL_Expand.csv'
$filePattern = '*.zip'
$timestampPattern = '_(?<Timestamp>\d{6}_\d\d_\d\d_\d\d)$'
$zipItems = Get-ChildItem -LiteralPath $inFolder -Filter $filePattern -File |
Where-Object {$_.BaseName -match $timestampPattern} |
Select-Object -Property Name, FullName, @{n='Timestamp'; e={[DateTime]::ParseExact($Matches['Timestamp'], 'MMddyy_HH_mm_ss', $null)}}, LastWriteTime
$errors = $zipItems |
Group-Object -Property {$_.Timestamp.Date} |
Where-Object {$_.Count -gt 1} |
Select-Object -ExpandProperty Group |
Select-Object -Property *, @{n='Error'; e={'Duplicate Date'}}
If (-not $errors) {
Try {
$zipItem = $zipItems |
Sort-Object -Property Timestamp |
Select-Object -First 1
Expand-Archive -Path $zipItem.FullName -DestinationPath $outFolder -Force -Verbose -ErrorAction Stop
$errors = Get-ChildItem -LiteralPath $outFolder -Filter '*.dsv' -File |
Where-Object {$_.BaseName -match $timestampPattern} |
Select-Object -Property Name, FullName, @{n='Date'; e={[DateTime]::ParseExact($Matches['Date'], 'MMddyy_HH_mm_ss', $null)}}, LastWriteTime |
Where-Object {$_.Timestamp -ne $zipItem.Timestamp} |
Select-Object -Property *, @{n='Error'; e={'Incorrect Date'}}
} Catch {
$message = $_.Exception.Message
$errors = $zipItem |
Select-Object -Property *, @{n='Error'; e={$message}}
}
}
If ($errors) {
$errors | Export-Csv -NoTypeInformation -Path $errorLog -ErrorAction Stop
Write-Warning 'Found errors:'
$errors | Format-Table -AutoSize | Out-String | Write-Warning
}
ASKER
I tested the above script although the date and time stamp are correct it still says error - incorrect date
"CMMDC_PMM_DWExtract_ALL_CLAIMS_031922_04_31_16.dsv",'\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST_EXPAND\CMMDC_PMM_DWExtract_ALL_CLAIMS_031922_04_31_16.dsv,,"3/19/2022 4:31:16 AM","Incorrect Date"
Also i think its not comparing, zip file time stamp, control file time stamp and all data files time stamp ,
all three different files date -time stamps should match
CMMDC_PMM_DWExtract_ALL_031522_04_30_40.zip
data files :
CMMDC_PMM_DWExtract_ALL_CLAIMS_031522_04_30_40.dsv
CMMDC_PMM_DWExtract_ALL_VISIT_031522_04_30_43..dsv
CMMDC_PMM_DWExtract_ALL_PROVIDER_031522_04_30_40.dsv e.t.c
Control file :
CMMDC_PMM_DWExtract_ALL_Control_031522_04_30_40.dsv
we should compare if all three have same date/time stamp and throw error if it doesn't , please let me know if you have anymore questions.
$outFolder is expected to be empty when unzipping the files; dsv files still lying around there will lead to errors.
Lines 10 - 12 read all the zip files in the in directory, and will convert the file name time stamp to a DateTime object, and add it to the properties.
Lines 14 - 18 will find zip files with a duplicate date.
Lines 22 - 24 find the oldest zip file.
Line 26 unzips the archive.
Lines 28 - 32 will get all dsv files from the expansion target folder (so control file and all data files), add the Timestamp property, and compare their Timestamps with the Timestamp from the zip file.
So this is comparing all dsv files with the zip.
ASKER
That's just an example date and time stamp,
I will try removing all the files and run it.
Thanks for the info
ASKER
But it still showing the same error when i run the script with outfolder empty
got the same error,
this is the error :
CMMDC_PMM_DWExtract_ALL_CLAIMS_031922_04_31_16.dsv",'\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST_EXPAND\CMMDC_PMM_DWExtract_ALL_CLAIMS_031922_04_31_16.dsv,,"3/19/2022 4:31:16 AM","Incorrect Date"
ASKER
"CMMDC_PMM_DWExtract_ALL_CLAIMS_031922_04_31_16.dsv",'\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST_EXPAND\CMMDC_PMM_DWExtract_ALL_CLAIMS_031922_04_31_16.dsv,,"3/19/2022 4:31:16 AM","Incorrect Date"
"CMMDC_PMM_DWExtract_ALL_VISIT_031922_04_31_16.dsv",'\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST_EXPAND\CMMDC_PMM_DWExtract_ALL_VISIT_031922_04_31_16.dsv,,"3/19/2022 4:31:16 AM","Incorrect Date"
"CMMDC_PMM_DWExtract_ALL_PROVIDER_031922_04_31_16.dsv",'\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST_EXPAND\CMMDC_PMM_DWExtract_ALL_PROVIDER_031922_04_31_16.dsv,,"3/19/2022 4:31:16 AM","Incorrect Date"
"CMMDC_PMM_DWExtract_ALL_EMPLOYEE_031922_04_31_16.dsv",'\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST_EXPAND\CMMDC_PMM_DWExtract_ALL_EMPLOYEE_031922_04_31_16.dsv,,"3/19/2022 4:31:16 AM","Incorrect Date"
The file name and control file and zip file have same date/time stamp in their name.
This is from error log.
@PowerShell.exe -Command "Invoke-Expression -Command ((Get-Content -Path '%~f0' | Select-Object -Skip 2) -join [environment]::NewLine)"
@exit /b %Errorlevel%
$inFolder = '\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST'
$outFolder = '\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST_EXPAND'
$errorLog = '\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST\CMMDC_PMM_DWExtract_ALL_Expand.csv'
$filePattern = '*.zip'
$timestampPattern = '_(?<Timestamp>\d{6}_\d\d_\d\d_\d\d)$'
$dtFormat = 'MMddyy_HH_mm_ss'
$zipItems = Get-ChildItem -LiteralPath $inFolder -Filter $filePattern -File |
Where-Object {$_.BaseName -match $timestampPattern} |
Select-Object -Property Name, FullName, @{n='Timestamp'; e={[DateTime]::ParseExact($Matches['Timestamp'], $dtFormat, $null)}}, LastWriteTime
$errors = $zipItems |
Group-Object -Property {$_.Timestamp.Date} |
Where-Object {$_.Count -gt 1} |
Select-Object -ExpandProperty Group |
Select-Object -Property *, @{n='Error'; e={'Duplicate Date'}}
If (-not $errors) {
Try {
$zipItem = $zipItems |
Sort-Object -Property Timestamp |
Select-Object -First 1
Expand-Archive -Path $zipItem.FullName -DestinationPath $outFolder -Force -Verbose -ErrorAction Stop
$errors = Get-ChildItem -LiteralPath $outFolder -Filter '*.dsv' -File |
Where-Object {$_.BaseName -match $timestampPattern} |
Select-Object -Property Name, FullName, @{n='Timestamp'; e={[DateTime]::ParseExact($Matches['Timestamp'], $dtFormat, $null)}}, LastWriteTime |
Where-Object {$_.Timestamp -ne $zipItem.Timestamp} |
Select-Object -Property *, @{n='Error'; e={"Incorrect Date; expected from zip: $($zipItem.Timestamp)"}}
} Catch {
$message = $_.Exception.Message
$errors = $zipItem |
Select-Object -Property *, @{n='Error'; e={$message}}
}
}
If ($errors) {
$errors | Export-Csv -NoTypeInformation -Path $errorLog -ErrorAction Stop
Write-Warning 'Found errors:'
$errors | Format-Table -AutoSize | Out-String | Write-Warning
}
ASKER
ASKER
Can we add a filename_format check for zip file to this code,
the oldest date file we are picking should also check filename and date format
CMMDC_PMM_DWExtract_ALL_MMddyy_HH_mm_ss and if it didn't match throw an error.
if we get some file without ss : CMMDC_PMM_DWExtract_ALL_MMddyy_HH_mm_ or without standard name : CMMDC_P_DWExtract_ALL_MMddyy_HH_mm_ss, then it should throw an error.
This is the final requirement sorry, I know its crazy 😅, sorry. Thank you soo much
we are done with this after this requirement
ASKER
If an incorrect name doesn't matter if its newer than the oldest one, why does an incorrect date in a newer zip file matter?
ASKER
if duplicate files stop the process and notify through email
if no duplicate files check if the oldest date picked format is good or else send an email notification
like this we have multi level checks before loading the data
ASKER
This is working fine but its checking all zip files instead of the picked zip file(oldest date). its also picking .tgr file in that folder and throwing an error can you make it check for just the oldest file we are going to unzip ? It needs to check for duplicate files like before but only check the file pattern of file picked(oldest date) we are unarchiving.
ASKER
this is from error log
"CMMDC_PMM_DWExtract_ALL_031922_04_31.zip",\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST\Upload\PCMMDC_PMM_DWExtract_ALL031922_04_31.zip,,"3/19/2022 8:28:04 AM","Unexpected Filename"
"CMMDC_PMM_DWExtract_ALL_032422_04_31_4.zip",\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST\Upload\CMMDC_PMM_DWExtract_ALL_032422_04_31_4.zip,,"4/11/2022 2:42:51 PM","Unexpected Filename"
"SAMPLE.TGR",\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST\Upload\SAMPLE.TGR,,"4/6/2022 11:39:56 AM","Unexpected Filename"
Make sure you're using the code from https://www.experts-exchange.com/questions/29239277/additional-requirement-for-unzipping-files.html?anchorAnswerId=43422206#a43422206, that $fileFilter in line 7 is set to "*.zip" or something like "CMMDC_*.zip", and that nothing below line 7 is changed.
ASKER
If you can't guarantee (because of all the test you have to run) that the archive name is correct to start with, then picking "just the oldest zip" might return one with an invalid name.
What are "all the zip files in the folder"? If there are unrelated files in there, you can adjust the $fileFilter to include a partial name, like I suggested above, for example "CMMDC_*.zip".
ASKER
ASKER
And if it is supposed to be "the one with a valid the oldest date and continue", then testing for invalid file names is rather pointless. You'd only ever have an error if no file at all would have a valid name.
ASKER
But think long and hard about the ramifications; try to explain to someone else under which conditions exactly a zip file will now be picked and extracted.
Again: what would be the point of checking for invalid names and reporting these as errors if you're picking one of the files, basically at random, just because it happens to have a valid name?
If the existence of a zip file with a valid name is enough to justify using the files it contains, then testing for the existence of zip files with invalid names is a rather moot point, because by continuing with just any one of the files, you basically declared that an invalid name is not an error anymore, but just some "yeah well, that happens."
Reporting these as errors, even though the process continued and extracted "any" archive, will be really confusing for anyone not intimately familiar with the process.
And why would it be a full "terminating" error if two files with valid names happen to have the same date, but not a terminating error anymore if there are two files that have the same date as well, but one of them happens to be missing an underscore, so it is now ignored, because it's "only" an invalid name?
ASKER
I can request it as a new question.
Open in new window