Link to home
Start Free TrialLog in
Avatar of Straw C
Straw CFlag for Denmark

asked on

additional requirement for unzipping files.

I got an additional requirement from the team, for the script you shared earlier.


Can we integrate the following requirement to the script you shared.


Requirement : 

The zip file date-time stamp,  all .dsv(data files) date-time stamps and control file date-timestamp should match or else throw an error, this should be done after the unzip process.



Here are the files, 


this is the zip file, CMMDC_PMM_DWExtract_ALL_031522_04_30_40.zip

Inside the zip file, we have 10 files, 1 control file(.dsv) and 9 data files (.dsv)


data files :


CMMDC_PMM_DWExtract_ALL_CLAIMS_031522_04_30_40.dsv

CMMDC_PMM_DWExtract_ALL_VISIT_031522_04_30_43..dsv

CMMDC_PMM_DWExtract_ALL_PROVIDER_031522_04_30_40.dsv e.t.c



Control file : 

CMMDC_PMM_DWExtract_ALL_Control_031522_04_30_40.dsv


we need to check if all these date-timestamp matches from their names, if not throw error in error log and if for suppose control file doesn't exist throw error.



Sorry for the additional requirement.... its crazy here -_-




Here's the earlier script to be clear, need additional modification to this code.



https://www.experts-exchange.com/questions/29239217/Need-PowerShell-script-to-check-if-zip-files-in-a-folder-have-same-date-stamp.html?anchor=a43421469&notificationFollowed=284441729




Avatar of oBdA
oBdA

This should do it:
@PowerShell.exe -Command "Invoke-Expression -Command ((Get-Content -Path '%~f0' | Select-Object -Skip 2) -join [environment]::NewLine)"
@exit /b %Errorlevel%

$inFolder = '\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST'
$outFolder = '\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST_EXPAND'
$errorLog = '\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST\CMMDC_PMM_DWExtract_ALL_Expand.csv'
$filePattern = '*.zip'

$timestampPattern = '_(?<MM>\d\d)(?<dd>\d\d)(?<yy>\d\d)_\d\d_\d\d_\d\d$'
$zipItems = Get-ChildItem -LiteralPath $inFolder -Filter $filePattern -File |
	Where-Object {$_.BaseName -match $timestampPattern} |
	Select-Object -Property Name, FullName, @{n='Date'; e={"$($Matches['yy'])$($Matches['MM'])$($Matches['dd'])"}}, LastWriteTime

$errors = $zipItems |
	Group-Object -Property Date |
	Where-Object {$_.Count -gt 1} |
	Select-Object -ExpandProperty Group |
	Select-Object -Property *, @{n='Error'; e={'Duplicate Date'}}

If (-not $errors) {
	Try {
		$zipItem = $zipItems |
			Sort-Object -Property Date |
			Select-Object -Property -First 1
		Expand-Archive -Path $zipItem.FullName -DestinationPath $outFolder -Force -Verbose -ErrorAction Stop

		$errors = Get-ChildItem -LiteralPath $outFolder -Filter '*.dsv' -File |
			Where-Object {$_.BaseName -match $timestampPattern} |
			Select-Object -Property Name, FullName, @{n='Date'; e={"$($Matches['yy'])$($Matches['MM'])$($Matches['dd'])"}}, LastWriteTime |
			Where-Object {$_.Date -ne $zipItem.Date} |
			Select-Object -Property *, @{n='Error'; e={'Incorrect Date'}}
	} Catch {
		$message = $_.Exception.Message
		$errors = $zipItem |
			Select-Object -Property *, @{n='Error'; e={$message}} 
	}
}

If ($errors) {
	$errors | Export-Csv -NoTypeInformation -Path $errorLog -ErrorAction Stop
	Write-Warning 'Found errors:'
	$errors | Format-Table -AutoSize | Out-String | Write-Warning
}

Open in new window

Avatar of Straw C

ASKER

Hi oBdA,

Thanks for the response.

This isn't showing any output ?

will the script work for both date and time stamp ? it should match both date and time stamp.

i tried inserting a @pause in a 2nd line and - skip 3, its just asking me to click enter
This now checks both date and time stamp for the dsvs.
@PowerShell.exe -Command "Invoke-Expression -Command ((Get-Content -Path '%~f0' | Select-Object -Skip 2) -join [environment]::NewLine)"
@exit /b %Errorlevel%

$inFolder = '\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST'
$outFolder = '\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST_EXPAND'
$errorLog = '\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST\CMMDC_PMM_DWExtract_ALL_Expand.csv'
$filePattern = '*.zip'

$timestampPattern = '_(?<Timestamp>\d{6}_\d\d_\d\d_\d\d)$'
$zipItems = Get-ChildItem -LiteralPath $inFolder -Filter $filePattern -File |
	Where-Object {$_.BaseName -match $timestampPattern} |
	Select-Object -Property Name, FullName, @{n='Timestamp'; e={[DateTime]::ParseExact($Matches['Timestamp'], 'MMddyy_HH_mm_ss', $null)}}, LastWriteTime

$errors = $zipItems |
	Group-Object -Property {$_.Timestamp.Date} |
	Where-Object {$_.Count -gt 1} |
	Select-Object -ExpandProperty Group |
	Select-Object -Property *, @{n='Error'; e={'Duplicate Date'}}

If (-not $errors) {
	Try {
		$zipItem = $zipItems |
			Sort-Object -Property Timestamp |
			Select-Object -First 1

		Expand-Archive -Path $zipItem.FullName -DestinationPath $outFolder -Force -Verbose -ErrorAction Stop

		$errors = Get-ChildItem -LiteralPath $outFolder -Filter '*.dsv' -File |
			Where-Object {$_.BaseName -match $timestampPattern} |
			Select-Object -Property Name, FullName, @{n='Date'; e={[DateTime]::ParseExact($Matches['Date'], 'MMddyy_HH_mm_ss', $null)}}, LastWriteTime |
			Where-Object {$_.Timestamp -ne $zipItem.Timestamp} |
			Select-Object -Property *, @{n='Error'; e={'Incorrect Date'}}

	} Catch {
		$message = $_.Exception.Message
		$errors = $zipItem |
			Select-Object -Property *, @{n='Error'; e={$message}} 
	}
}

If ($errors) {
	$errors | Export-Csv -NoTypeInformation -Path $errorLog -ErrorAction Stop
	Write-Warning 'Found errors:'
	$errors | Format-Table -AutoSize | Out-String | Write-Warning
}

Open in new window

Avatar of Straw C

ASKER

oBdA,

I tested the above script although the date and time stamp are correct it still says error -  incorrect date

"CMMDC_PMM_DWExtract_ALL_CLAIMS_031922_04_31_16.dsv",'\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST_EXPAND\CMMDC_PMM_DWExtract_ALL_CLAIMS_031922_04_31_16.dsv,,"3/19/2022 4:31:16 AM","Incorrect Date"



Also i think its not comparing,  zip file  time stamp, control file time stamp and all data files time stamp , 

all three different files date -time stamps should match 


 CMMDC_PMM_DWExtract_ALL_031522_04_30_40.zip



data files :


CMMDC_PMM_DWExtract_ALL_CLAIMS_031522_04_30_40.dsv

CMMDC_PMM_DWExtract_ALL_VISIT_031522_04_30_43..dsv

CMMDC_PMM_DWExtract_ALL_PROVIDER_031522_04_30_40.dsv e.t.c



Control file : 

CMMDC_PMM_DWExtract_ALL_Control_031522_04_30_40.dsv



we should compare if all three have same date/time stamp and throw error if it doesn't , please let me know if you have anymore questions.

 


The second data file is not correct: time is 04_30_43, should be 04_30_40.
$outFolder is expected to be empty when unzipping the files; dsv files still lying around there will lead to errors.

Lines 10 - 12 read all the zip files in the in directory, and will convert the file name time stamp to a DateTime object, and add it to the properties.
Lines 14 - 18 will find zip files with a duplicate date.
Lines 22 - 24 find the oldest zip file.
Line 26 unzips the archive.
Lines 28 - 32 will get all dsv files from the expansion target folder (so control file and all data files), add the Timestamp property, and compare their Timestamps with the Timestamp from the zip file.
So this is comparing all dsv files with the zip.
Avatar of Straw C

ASKER

oBdA,


That's just an example date and time stamp,

I will try removing all the files and run it.

Thanks for the info
Avatar of Straw C

ASKER

Hey i restarted and ran the script now, its working.

But it still showing the same error when i run the script with outfolder empty 


got the same error,

this is the error :

CMMDC_PMM_DWExtract_ALL_CLAIMS_031922_04_31_16.dsv",'\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST_EXPAND\CMMDC_PMM_DWExtract_ALL_CLAIMS_031922_04_31_16.dsv,,"3/19/2022 4:31:16 AM","Incorrect Date"
"Same error" meaning what exactly? Please select the script's output and paste it here as [code]
Avatar of Straw C

ASKER

Here's the error: 

"CMMDC_PMM_DWExtract_ALL_CLAIMS_031922_04_31_16.dsv",'\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST_EXPAND\CMMDC_PMM_DWExtract_ALL_CLAIMS_031922_04_31_16.dsv,,"3/19/2022 4:31:16 AM","Incorrect Date"
"CMMDC_PMM_DWExtract_ALL_VISIT_031922_04_31_16.dsv",'\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST_EXPAND\CMMDC_PMM_DWExtract_ALL_VISIT_031922_04_31_16.dsv,,"3/19/2022 4:31:16 AM","Incorrect Date"

"CMMDC_PMM_DWExtract_ALL_PROVIDER_031922_04_31_16.dsv",'\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST_EXPAND\CMMDC_PMM_DWExtract_ALL_PROVIDER_031922_04_31_16.dsv,,"3/19/2022 4:31:16 AM","Incorrect Date"

"CMMDC_PMM_DWExtract_ALL_EMPLOYEE_031922_04_31_16.dsv",'\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST_EXPAND\CMMDC_PMM_DWExtract_ALL_EMPLOYEE_031922_04_31_16.dsv,,"3/19/2022 4:31:16 AM","Incorrect Date"


The file name and control file and zip file have same date/time stamp in their name.


This is from error log.
Should be fixed now; this will now also add the expected zip date in case of an error:
@PowerShell.exe -Command "Invoke-Expression -Command ((Get-Content -Path '%~f0' | Select-Object -Skip 2) -join [environment]::NewLine)"
@exit /b %Errorlevel%

$inFolder = '\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST'
$outFolder = '\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST_EXPAND'
$errorLog = '\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST\CMMDC_PMM_DWExtract_ALL_Expand.csv'
$filePattern = '*.zip'

$timestampPattern = '_(?<Timestamp>\d{6}_\d\d_\d\d_\d\d)$'
$dtFormat = 'MMddyy_HH_mm_ss'
$zipItems = Get-ChildItem -LiteralPath $inFolder -Filter $filePattern -File |
	Where-Object {$_.BaseName -match $timestampPattern} |
	Select-Object -Property Name, FullName, @{n='Timestamp'; e={[DateTime]::ParseExact($Matches['Timestamp'], $dtFormat, $null)}}, LastWriteTime

$errors = $zipItems |
	Group-Object -Property {$_.Timestamp.Date} |
	Where-Object {$_.Count -gt 1} |
	Select-Object -ExpandProperty Group |
	Select-Object -Property *, @{n='Error'; e={'Duplicate Date'}}

If (-not $errors) {
	Try {
		$zipItem = $zipItems |
			Sort-Object -Property Timestamp |
			Select-Object -First 1

		Expand-Archive -Path $zipItem.FullName -DestinationPath $outFolder -Force -Verbose -ErrorAction Stop

		$errors = Get-ChildItem -LiteralPath $outFolder -Filter '*.dsv' -File |
			Where-Object {$_.BaseName -match $timestampPattern} |
			Select-Object -Property Name, FullName, @{n='Timestamp'; e={[DateTime]::ParseExact($Matches['Timestamp'], $dtFormat, $null)}}, LastWriteTime |
			Where-Object {$_.Timestamp -ne $zipItem.Timestamp} |
			Select-Object -Property *, @{n='Error'; e={"Incorrect Date; expected from zip: $($zipItem.Timestamp)"}}

	} Catch {
		$message = $_.Exception.Message
		$errors = $zipItem |
			Select-Object -Property *, @{n='Error'; e={$message}} 
	}
}

If ($errors) {
	$errors | Export-Csv -NoTypeInformation -Path $errorLog -ErrorAction Stop
	Write-Warning 'Found errors:'
	$errors | Format-Table -AutoSize | Out-String | Write-Warning
}

Open in new window

Avatar of Straw C

ASKER

Thanks oBdA, Will test it.

Avatar of Straw C

ASKER

oBdA, This is working fine, Thanks,


Can we add a filename_format check for zip file to this code,

the oldest date file we are picking should also check filename and date format

CMMDC_PMM_DWExtract_ALL_MMddyy_HH_mm_ss and if it didn't match throw an error. 

if we get some file without ss : CMMDC_PMM_DWExtract_ALL_MMddyy_HH_mm_  or without standard name : CMMDC_P_DWExtract_ALL_MMddyy_HH_mm_ss, then it should throw an error.


This is the final requirement sorry, I know its crazy 😅, sorry. Thank you soo much 

we are done with this after this requirement



So all zip files in the folder must follow this pattern?
Avatar of Straw C

ASKER

oBdA, if we can check for the oldest file we are picking its fine.  no need for all zip files. just the oldest date file we are picking.
If you only ever want the oldest anyway, why then the check for multiple files with the same date?
If an incorrect name doesn't matter if its newer than the oldest one, why does an incorrect date in a newer zip file matter?
Avatar of Straw C

ASKER

 The process will stop if we find duplicate dates. these are all the multiple checks we need to perform before loading the data into Dwarehouse.

if duplicate files stop the process and notify through email

if no duplicate files check if the oldest date picked format is good or else send an email notification

like this we have multi level checks before loading the data
ASKER CERTIFIED SOLUTION
Avatar of oBdA
oBdA

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Straw C

ASKER

Hi oBdA,

This is working fine but its checking all zip files instead of the picked zip file(oldest date). its also picking .tgr file in that folder and throwing an error can you make it check for just the oldest file we are going to unzip ?  It needs to check for duplicate files like before but only check the file pattern of file picked(oldest date) we are unarchiving.
If $fileFilter is set to ".zip", it won't pick up a .tgr file.
Avatar of Straw C

ASKER

Oh ! when run the script its picking .tgr and throwing an error  in the output,

this is from error log 

"CMMDC_PMM_DWExtract_ALL_031922_04_31.zip",\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST\Upload\PCMMDC_PMM_DWExtract_ALL031922_04_31.zip,,"3/19/2022 8:28:04 AM","Unexpected Filename"
"CMMDC_PMM_DWExtract_ALL_032422_04_31_4.zip",\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST\Upload\CMMDC_PMM_DWExtract_ALL_032422_04_31_4.zip,,"4/11/2022 2:42:51 PM","Unexpected Filename"
"SAMPLE.TGR",\\Informatica\DEV\EXTRACTS\Extracts\OMAP\EVV\EVV_TST\Upload\SAMPLE.TGR,,"4/6/2022 11:39:56 AM","Unexpected Filename"
Can't reproduce.
Make sure you're using the code from https://www.experts-exchange.com/questions/29239277/additional-requirement-for-unzipping-files.html?anchorAnswerId=43422206#a43422206, that $fileFilter in line 7 is set to "*.zip" or something like "CMMDC_*.zip", and that nothing below line 7 is changed.
Avatar of Straw C

ASKER

now its taking just the .zip files, but its taking all the zip files in the folder ! can we do anything to pick just the oldest .zip ?
How would that work?
If you can't guarantee (because of all the test you have to run) that the archive name is correct to start with, then picking "just the oldest zip" might return one with an invalid name.
What are "all the zip files in the folder"? If there are unrelated files in there, you can adjust the $fileFilter to include a partial name, like I suggested above, for example "CMMDC_*.zip".
Avatar of Straw C

ASKER

Yeah i get it , but all the zip file have same name different time stamp.  Thanks anyways :) .  i will try to execute that in a separate script.
As long as all zip files have valid names, the script will use the oldest of them (lines 37-39).
Avatar of Straw C

ASKER

Yeah but the check for invalid name file stopping the execution although the oldest date which i'm picking  is good. 
Having a valid file name is a requirement to determine the oldest file, otherwise the date can't be parsed from the file name.
And if it is supposed to be "the one with a valid the oldest date and continue", then testing for invalid file names is rather pointless. You'd only ever have an error if no file at all would have a valid name.
Avatar of Straw C

ASKER

Can we pick the file first based on  oldest date and time stamp then run all these checks ? like if it has valid name ? i know this is bit dumb ? but can we re arrange our script.
Of course that can be done technically.
But think long and hard about the ramifications; try to explain to someone else under which conditions exactly a zip file will now be picked and extracted.
Again: what would be the point of checking for invalid names and reporting these as errors if you're picking one of the files, basically at random, just because it happens to have a valid name?
If the existence of a zip file with a valid name is enough to justify using the files it contains, then testing for the existence of zip files with invalid names is a rather moot point, because by continuing with just any one of the files, you basically declared that an invalid name is not an error anymore, but just some "yeah well, that happens."
Reporting these as errors, even though the process continued and extracted "any" archive, will be really confusing for anyone not intimately familiar with the process.

And why would it be a full "terminating" error if two files with valid names happen to have the same date, but not a terminating error anymore if there are two files that have the same date as well, but one of them happens to be missing an underscore, so it is now ignored, because it's "only" an invalid name?
Avatar of Straw C

ASKER

Yeah, to make it less complex can we do another script which just check for existence of zip file in the folder , if not throw error and also check if all the zip file formats are correct ? else throw an error ?


I can request it as a new question.