Link to home
Start Free TrialLog in
Avatar of Christopher Minor
Christopher MinorFlag for United States of America

asked on

Project for humanity, have 1800 text files to search for numbers, do some math, print results to text file

1800 scripts to search like below, that I have created a for humanity, so you can see the need, to be able to do this in bulk. I provided some code code.

SCRIPT
The world (SILENCE TAG="500") as we have created it (SILENCE TAG="500") is a process (SILENCE TAG"250") of our thinking (SILENCE MSEC="1000") It cannot be changed (SILENCE TAG="500") without changing our thinking (SILENCE TAG="1000") ~ Albert Einstein

I need a batch file that can go though all of the files in a folder, search text files for numbers inside these tags: (SILENCE TAG"500") then divided them by 2, and store the results into a variable like sum1. Copy the variable into a text file. Search's for the next number, divides it by 2, , and stores the results into a variable like sum2, now adds sum1+sum2 together, then copies the variable into a text file. It does this until it finds 1000. Then the saved variable is added to the 1000 .

PRINT TO DOC = PTD
The world (500/2=250) 250 PTD 250 remains
as we have created it (500/2=250) 250+250 remains .500 PTD 250 remains
is a process (250/2=125) 125+250 remains .375 PTD 125 remains
of our thinking (1000) 1000+125 remains 1.125 PTD 0 remains
It cannot be changed (500 / 2 = 250) 0 remains .250 PTD 250 remains
without changing our thinking .250 remains .250 PTD
1000+250 remains 1.250 PTD
END OF DOC

OUTPUT TO TEXT DOC
1st L-01-TV-01-Clip-01-silence
2nd .250
3rd .500
4th .375
5th 1.125+1-second-fadeout
6th .250
7th 1.250+1-second-fadeout
END OF DOC

Here is what I can contribute

@ECHO OFF
SETLOCAL ENABLEEXTENSIONS
SETLOCAL ENABLEDELAYEDEXPANSION


DO UP TO 30 NUMBERS FOUND


for /L %%i in (1,1,30) do (
for /f "tokens=3 delims=. " %%A in (
)
:START

'findstr /rc:"At revision [0-9][0-9]*."'
do echo %%A


IF /I "%%i" EQU "1" GOTO first(


) ELSE (
IF /I "%%A" EQU "250" GOTO small(


) ELSE (
IF /I "%%A" EQU "500" GOTO medium (


) ELSE (
IF /I "%%A" EQU "500" GOTO large (

)

Copy short filename ADD -silence PTD


:first
set /a num1=%%A
set /a sum1=num1/2
set /a remains=sum1
set /a numout1=sum1
echo %numout1% >C:\Labels\L-01\filename-silence.txt
GOTO START


:small
set /a num2=%%A
set /a sum2=num2/2
set /a numout2=sum2+remains
echo %numout2% >C:\Labels\L-01\filename-silence.txt
GOTO START


:medium
set /a num3=%%A
set /a sum3=num3/2
set /a numout3=sum3+remains
echo %numout3% >C:\Labels\L-01\filename-silence.txt
GOTO START


:large
set /a sum4=%%A
set /a numout4=sum4+remains
echo %numout4%+1-second-fadeout >C:\Labels\L-01\filename-silence.txt
GOTO START


ENDLOCAL
ENDLOCAL
)

Open in new window


Avatar of David Favor
David Favor
Flag of United States of America image

Aside: About your processing approach.

First question is how fast this has to run.

1800 files... if they're small + time is no concern, then any logic will do.

If files are big + you must process them repeatedly + quickly, say all 1800 in a few seconds... or sub second (< 1 second)... best to arrange code as follows.

1) First script as a master script to find all files, then pass them off to a processing script, managing total number of scripts, to create pseudo threading using heavy weight processes.

2) Second script will just process a single file, for some result.

3) You master script can then be passed any name for a processing script, so you can run various transforms on your data.

4) Attach a copy of a subset of your data + likely someone can provide comments about processing your data.

This will be much faster than attempting to reverse engineer your code, to attempt coming up with original data format.
Avatar of Bill Prew
Bill Prew

First question:

You indicated this is a sample input file:

The world (SILENCE TAG="500") as we have created it (SILENCE TAG="500") is a process (SILENCE TAG"250") of our thinking (SILENCE MSEC="1000") It cannot be changed (SILENCE TAG="500") without changing our thinking (SILENCE TAG="1000") ~ Albert Einstein

Questions:
  1. Are the numeric values always either 250, 500 or 1000, or can there be other values?
  2. I'm seeing several different formats of the info inside the parens (see below), are these true variations, or do they all fit a single template, and if so what is that?
    • (SILENCE TAG="500")
    • (SILENCE TAG"250")
    • (SILENCE MSEC="1000")


»bp
Avatar of Christopher Minor

ASKER

David Favor if they're small, time is no concern, then any logic will do. The files are only 1k each, the largest has about 25 TAGS to be processed in it.
I have never done any code writing, but I see a loop that checks against EOF, allowing the next file to be processed,  It search's text for a number, the number is compared in the If Else, then passed to either, :first :small :medium :large and processed, then returned to :START. The files are sequenced by number L-01-TV 01-Clip-01--last number  then L-01-TV 02-Clip-01--last number etc... Up to L-13 levels

How the files are actually formatted:
The world (<SILENCE MSEC ="500"/>) as we have created it (<SILENCE MSEC ="500"/>) is a process (<SILENCE MSEC ="250"/>) of our thinking (<SILENCE MSEC="1000"/>) It cannot be changed (<SILENCE MSEC ="500"/>) without changing our thinking (<SILENCE MSEC ="1000"/>) ~ Albert Einstein 

Bill Prew 
I'm sorry about the tags not being the same, that happened because I posted this first at superuser that uses formatting tags and the system deleted all of these tags (<SILENCE MSEC ="500"/>) so I reformatted them and missed some. Which I corrected above and below
Answers:
  1. The numeric values are always either 250, 500 or 1000
  2. That was why I made the :first :small :medium :large processes.
  3. The files are 1k each the largest file has 28 TAGS
Can you provide a sample of a couple of the actual input files (at least one of the larger ones) for testing?

»bp
I would be glad to. Thank you for asking
L-01-TV-11-Clip-01.txtL-08-TV-05-Clip-13.txt
There is some indicators I use that did not get deleted in L-08-TV-05-Clip-13.txt, I uploaded it again,L-08-TV-05-Clip-13.txt
I'm still not sure I understand completely what you're trying to do, but this seems to create what you're after.
It's PowerShell at its core, but wrapped in Batch, so save it as Whatever.cmd
Output files will be saved right next to the input files, with -silence added.
@PowerShell.exe -Command "Invoke-Expression -Command ((Get-Content -Path '%~f0' | Select-Object -Skip 2) -join [environment]::NewLine)"
@exit /b %Errorlevel%

$SourceDir = 'C:\Temp'
$Filter = '*.txt'
$Recurse = $false
$MaxMatches = 30

$dtProvider = New-Object -TypeName System.Globalization.CultureInfo -ArgumentList 'en-US'
Get-ChildItem -Path $SourceDir -Filter $Filter -File -Recurse:$Recurse | Where-Object {$_.BaseName -notmatch '-silence$'} | ForEach-Object {
	Write-Host "Processing $($_.Name)"
	$inFile = $_.FullName
	$outFile = "$($_.DirectoryName)\$($_.BaseName)-silence$($_.Extension)"
	$content = Get-Content -LiteralPath $_.FullName -Raw
	$results = [regex]::Matches($content, '\<SILENCE\s+MSEC\s*=\s*"(?<Silence>\d+)"')
	If ($results.Count -eq 0) {
		Write-Warning "Found no 'SILENCE' tags in '$($_.FullName)'!"
	} Else {
		$oldRemains = $newRemains = 0
		$i = 0
		$(ForEach ($result in $results) {
			$i += 1
			$silence = [int]$result.Groups['Silence'].Value
			If ($silence -eq 1000) {
				$newRemains = $silence
			} Else {
				$newRemains = [int]($silence / 2)
			}
			$out = (($oldRemains + $newRemains) / 1000).ToString('N3', $dtProvider)
			Write-Host "    Tag $($i.ToString().PadLeft(2)): $($silence.ToString().PadLeft(5)); out: $($out)"
			$out | Write-Output 
			If ($silence -eq 1000) {
				$oldRemains = 0
			} Else {
				$oldRemains = $newRemains
			}
			If ($i -ge $MaxMatches) {
				If ($i -lt $results.Count) {
					Write-Warning "Stopped processing '$($inFile)' after $($MaxMatches) of $($results.Count) tags!"
				}
				Break
			}
		}) | Set-Content -Path $outFile
	}
}

Open in new window

I want you to know how important this batch file is, in me being able to do, what I am creating for humanity. It will allow me to automate something, that was prone to mistakes, thank you so much.
oBdA 
I don't see how 250 and 500 are being processed, could you add some comments, it would help me understand the code. Thank you so much.
It should be pretty straightforward. Based on your description and your samples, the only "silence" value that differs in handling is 1000, so the script handles every "silence" value except 1000 the same way: it's divided by half, and half of the previous value is added. If the value is 1000, it doesn't get divided, and it won't be added in the next iteration.
What this is for, is to give me the duration of time to display images in video. Here is an example of the output file:

OUTPUT TO TEXT DOC
(1st line L-01-TV-01-Clip-01-silence
(2nd line)   .250
(3rd line)    .500
(4th line)    .375
(5th line)   1.125 +1-second-fadeout
(6th line)     .250
(7th line)   1.250 +1-second-fadeout    
END OF DOC

This (+1-second-fadeout)  being added to the amount is for adding a 1 second fadeout, which I will us in notepad++ with find and replace to construct the file for making the video using FFmpeg at the command line. Other wise it adds a lot more work. 
I don't see how the next file in the folder is being called up? Because there are1800 of these files. Could you explain that for me;  I got the other code.                                                                                                                        
Get-ChildItem
This is what is getting the file, does the code return here after it finds no more silence? Thanks for explaining  the other code.
ASKER CERTIFIED SOLUTION
Avatar of oBdA
oBdA

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I see it, thanks so much, you have made a great contribution to humanity's project. How would I drop you a note, so I can let you see what you contributed towards? I have been working on this project for 5 years.