Christopher Minor
asked on
Project for humanity, have 1800 text files to search for numbers, do some math, print results to text file
1800 scripts to search like below, that I have created a for humanity, so you can see the need, to be able to do this in bulk. I provided some code code.
SCRIPT
The world (SILENCE TAG="500") as we have created it (SILENCE TAG="500") is a process (SILENCE TAG"250") of our thinking (SILENCE MSEC="1000") It cannot be changed (SILENCE TAG="500") without changing our thinking (SILENCE TAG="1000") ~ Albert Einstein
I need a batch file that can go though all of the files in a folder, search text files for numbers inside these tags: (SILENCE TAG"500") then divided them by 2, and store the results into a variable like sum1. Copy the variable into a text file. Search's for the next number, divides it by 2, , and stores the results into a variable like sum2, now adds sum1+sum2 together, then copies the variable into a text file. It does this until it finds 1000. Then the saved variable is added to the 1000 .
PRINT TO DOC = PTD
The world (500/2=250) 250 PTD 250 remains
as we have created it (500/2=250) 250+250 remains .500 PTD 250 remains
is a process (250/2=125) 125+250 remains .375 PTD 125 remains
of our thinking (1000) 1000+125 remains 1.125 PTD 0 remains
It cannot be changed (500 / 2 = 250) 0 remains .250 PTD 250 remains
without changing our thinking .250 remains .250 PTD
1000+250 remains 1.250 PTD
END OF DOC
OUTPUT TO TEXT DOC
1st L-01-TV-01-Clip-01-silence
2nd .250
3rd .500
4th .375
5th 1.125+1-second-fadeout
6th .250
7th 1.250+1-second-fadeout
END OF DOC
Here is what I can contribute
SCRIPT
The world (SILENCE TAG="500") as we have created it (SILENCE TAG="500") is a process (SILENCE TAG"250") of our thinking (SILENCE MSEC="1000") It cannot be changed (SILENCE TAG="500") without changing our thinking (SILENCE TAG="1000") ~ Albert Einstein
I need a batch file that can go though all of the files in a folder, search text files for numbers inside these tags: (SILENCE TAG"500") then divided them by 2, and store the results into a variable like sum1. Copy the variable into a text file. Search's for the next number, divides it by 2, , and stores the results into a variable like sum2, now adds sum1+sum2 together, then copies the variable into a text file. It does this until it finds 1000. Then the saved variable is added to the 1000 .
PRINT TO DOC = PTD
The world (500/2=250) 250 PTD 250 remains
as we have created it (500/2=250) 250+250 remains .500 PTD 250 remains
is a process (250/2=125) 125+250 remains .375 PTD 125 remains
of our thinking (1000) 1000+125 remains 1.125 PTD 0 remains
It cannot be changed (500 / 2 = 250) 0 remains .250 PTD 250 remains
without changing our thinking .250 remains .250 PTD
1000+250 remains 1.250 PTD
END OF DOC
OUTPUT TO TEXT DOC
1st L-01-TV-01-Clip-01-silence
2nd .250
3rd .500
4th .375
5th 1.125+1-second-fadeout
6th .250
7th 1.250+1-second-fadeout
END OF DOC
Here is what I can contribute
@ECHO OFF
SETLOCAL ENABLEEXTENSIONS
SETLOCAL ENABLEDELAYEDEXPANSION
DO UP TO 30 NUMBERS FOUND
for /L %%i in (1,1,30) do (
for /f "tokens=3 delims=. " %%A in (
)
:START
'findstr /rc:"At revision [0-9][0-9]*."'
do echo %%A
IF /I "%%i" EQU "1" GOTO first(
) ELSE (
IF /I "%%A" EQU "250" GOTO small(
) ELSE (
IF /I "%%A" EQU "500" GOTO medium (
) ELSE (
IF /I "%%A" EQU "500" GOTO large (
)
Copy short filename ADD -silence PTD
:first
set /a num1=%%A
set /a sum1=num1/2
set /a remains=sum1
set /a numout1=sum1
echo %numout1% >C:\Labels\L-01\filename-silence.txt
GOTO START
:small
set /a num2=%%A
set /a sum2=num2/2
set /a numout2=sum2+remains
echo %numout2% >C:\Labels\L-01\filename-silence.txt
GOTO START
:medium
set /a num3=%%A
set /a sum3=num3/2
set /a numout3=sum3+remains
echo %numout3% >C:\Labels\L-01\filename-silence.txt
GOTO START
:large
set /a sum4=%%A
set /a numout4=sum4+remains
echo %numout4%+1-second-fadeout >C:\Labels\L-01\filename-silence.txt
GOTO START
ENDLOCAL
ENDLOCAL
)
First question:
You indicated this is a sample input file:
Questions:
»bp
You indicated this is a sample input file:
The world (SILENCE TAG="500") as we have created it (SILENCE TAG="500") is a process (SILENCE TAG"250") of our thinking (SILENCE MSEC="1000") It cannot be changed (SILENCE TAG="500") without changing our thinking (SILENCE TAG="1000") ~ Albert Einstein
Questions:
- Are the numeric values always either 250, 500 or 1000, or can there be other values?
- I'm seeing several different formats of the info inside the parens (see below), are these true variations, or do they all fit a single template, and if so what is that?
- (SILENCE TAG="500")
- (SILENCE TAG"250")
- (SILENCE MSEC="1000")
»bp
ASKER
David Favor if they're small, time is no concern, then any logic will do. The files are only 1k each, the largest has about 25 TAGS to be processed in it.
I have never done any code writing, but I see a loop that checks against EOF, allowing the next file to be processed, It search's text for a number, the number is compared in the If Else, then passed to either, :first :small :medium :large and processed, then returned to :START. The files are sequenced by number L-01-TV 01-Clip-01--last number then L-01-TV 02-Clip-01--last number etc... Up to L-13 levels
How the files are actually formatted:
The world (<SILENCE MSEC ="500"/>) as we have created it (<SILENCE MSEC ="500"/>) is a process (<SILENCE MSEC ="250"/>) of our thinking (<SILENCE MSEC="1000"/>) It cannot be changed (<SILENCE MSEC ="500"/>) without changing our thinking (<SILENCE MSEC ="1000"/>) ~ Albert Einstein
Bill Prew
I'm sorry about the tags not being the same, that happened because I posted this first at superuser that uses formatting tags and the system deleted all of these tags (<SILENCE MSEC ="500"/>) so I reformatted them and missed some. Which I corrected above and below
Answers:
I have never done any code writing, but I see a loop that checks against EOF, allowing the next file to be processed, It search's text for a number, the number is compared in the If Else, then passed to either, :first :small :medium :large and processed, then returned to :START. The files are sequenced by number L-01-TV 01-Clip-01--last number then L-01-TV 02-Clip-01--last number etc... Up to L-13 levels
How the files are actually formatted:
The world (<SILENCE MSEC ="500"/>) as we have created it (<SILENCE MSEC ="500"/>) is a process (<SILENCE MSEC ="250"/>) of our thinking (<SILENCE MSEC="1000"/>) It cannot be changed (<SILENCE MSEC ="500"/>) without changing our thinking (<SILENCE MSEC ="1000"/>) ~ Albert Einstein
Bill Prew
I'm sorry about the tags not being the same, that happened because I posted this first at superuser that uses formatting tags and the system deleted all of these tags (<SILENCE MSEC ="500"/>) so I reformatted them and missed some. Which I corrected above and below
Answers:
- The numeric values are always either 250, 500 or 1000
- That was why I made the :first :small :medium :large processes.
- The files are 1k each the largest file has 28 TAGS
Can you provide a sample of a couple of the actual input files (at least one of the larger ones) for testing?
»bp
»bp
ASKER
ASKER
There is some indicators I use that did not get deleted in L-08-TV-05-Clip-13.txt, I uploaded it again,L-08-TV-05-Clip-13.txt
I'm still not sure I understand completely what you're trying to do, but this seems to create what you're after.
It's PowerShell at its core, but wrapped in Batch, so save it as Whatever.cmd
Output files will be saved right next to the input files, with -silence added.
It's PowerShell at its core, but wrapped in Batch, so save it as Whatever.cmd
Output files will be saved right next to the input files, with -silence added.
@PowerShell.exe -Command "Invoke-Expression -Command ((Get-Content -Path '%~f0' | Select-Object -Skip 2) -join [environment]::NewLine)"
@exit /b %Errorlevel%
$SourceDir = 'C:\Temp'
$Filter = '*.txt'
$Recurse = $false
$MaxMatches = 30
$dtProvider = New-Object -TypeName System.Globalization.CultureInfo -ArgumentList 'en-US'
Get-ChildItem -Path $SourceDir -Filter $Filter -File -Recurse:$Recurse | Where-Object {$_.BaseName -notmatch '-silence$'} | ForEach-Object {
Write-Host "Processing $($_.Name)"
$inFile = $_.FullName
$outFile = "$($_.DirectoryName)\$($_.BaseName)-silence$($_.Extension)"
$content = Get-Content -LiteralPath $_.FullName -Raw
$results = [regex]::Matches($content, '\<SILENCE\s+MSEC\s*=\s*"(?<Silence>\d+)"')
If ($results.Count -eq 0) {
Write-Warning "Found no 'SILENCE' tags in '$($_.FullName)'!"
} Else {
$oldRemains = $newRemains = 0
$i = 0
$(ForEach ($result in $results) {
$i += 1
$silence = [int]$result.Groups['Silence'].Value
If ($silence -eq 1000) {
$newRemains = $silence
} Else {
$newRemains = [int]($silence / 2)
}
$out = (($oldRemains + $newRemains) / 1000).ToString('N3', $dtProvider)
Write-Host " Tag $($i.ToString().PadLeft(2)): $($silence.ToString().PadLeft(5)); out: $($out)"
$out | Write-Output
If ($silence -eq 1000) {
$oldRemains = 0
} Else {
$oldRemains = $newRemains
}
If ($i -ge $MaxMatches) {
If ($i -lt $results.Count) {
Write-Warning "Stopped processing '$($inFile)' after $($MaxMatches) of $($results.Count) tags!"
}
Break
}
}) | Set-Content -Path $outFile
}
}
ASKER
I want you to know how important this batch file is, in me being able to do, what I am creating for humanity. It will allow me to automate something, that was prone to mistakes, thank you so much.
ASKER
oBdA
I don't see how 250 and 500 are being processed, could you add some comments, it would help me understand the code. Thank you so much.
I don't see how 250 and 500 are being processed, could you add some comments, it would help me understand the code. Thank you so much.
It should be pretty straightforward. Based on your description and your samples, the only "silence" value that differs in handling is 1000, so the script handles every "silence" value except 1000 the same way: it's divided by half, and half of the previous value is added. If the value is 1000, it doesn't get divided, and it won't be added in the next iteration.
ASKER
What this is for, is to give me the duration of time to display images in video. Here is an example of the output file:
OUTPUT TO TEXT DOC
(1st line L-01-TV-01-Clip-01-silence
(2nd line) .250
(3rd line) .500
(4th line) .375
(5th line) 1.125 +1-second-fadeout
(6th line) .250
(7th line) 1.250 +1-second-fadeout
END OF DOC
This (+1-second-fadeout) being added to the amount is for adding a 1 second fadeout, which I will us in notepad++ with find and replace to construct the file for making the video using FFmpeg at the command line. Other wise it adds a lot more work.
OUTPUT TO TEXT DOC
(1st line L-01-TV-01-Clip-01-silence
(2nd line) .250
(3rd line) .500
(4th line) .375
(5th line) 1.125 +1-second-fadeout
(6th line) .250
(7th line) 1.250 +1-second-fadeout
END OF DOC
This (+1-second-fadeout) being added to the amount is for adding a 1 second fadeout, which I will us in notepad++ with find and replace to construct the file for making the video using FFmpeg at the command line. Other wise it adds a lot more work.
ASKER
I don't see how the next file in the folder is being called up? Because there are1800 of these files. Could you explain that for me; I got the other code.
ASKER
Get-ChildItem
This is what is getting the file, does the code return here after it finds no more silence? Thanks for explaining the other code.
This is what is getting the file, does the code return here after it finds no more silence? Thanks for explaining the other code.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I see it, thanks so much, you have made a great contribution to humanity's project. How would I drop you a note, so I can let you see what you contributed towards? I have been working on this project for 5 years.
First question is how fast this has to run.
1800 files... if they're small + time is no concern, then any logic will do.
If files are big + you must process them repeatedly + quickly, say all 1800 in a few seconds... or sub second (< 1 second)... best to arrange code as follows.
1) First script as a master script to find all files, then pass them off to a processing script, managing total number of scripts, to create pseudo threading using heavy weight processes.
2) Second script will just process a single file, for some result.
3) You master script can then be passed any name for a processing script, so you can run various transforms on your data.
4) Attach a copy of a subset of your data + likely someone can provide comments about processing your data.
This will be much faster than attempting to reverse engineer your code, to attempt coming up with original data format.