Powershell scripting Assistance "Searchable PDF Code Fragment"

ITguy565
ITguy565 used Ask the Experts™
on
Experts,

I am looking for a "free" way to take a PDF document save the contents of the PDF to a variable in a "Searchable" format..


So essentially I want to do the following :


get-webrequest -uri "Path to PDF"

Store this information into a variable in a searchable format and then run a query based on "Text" or a "String" withen that PDF file..


What is the easiest way of doing this .. Please provide a code snippet for it.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Most Valuable Expert 2018
Distinguished Expert 2018
Commented:
This uses iTextSharp.dll (licensed under AGPL) from https://github.com/itext/itextsharp
Latest binary release is here: https://github.com/itext/itextsharp/releases/tag/5.5.11
No need to install anything; download itextsharp-all-5.5.11.zip, extract the "itextsharp-dll-core.zip", extract "itextsharp.dll", and drop it in a folder, and change line 9 accordingly.
Like Get-Content, it'll return an array of strings.
You can assign or filter the output as usual:
$found = ConvertFrom-Pdf -Path "Path to PDF file" |  Where-Object {$_ -like '*Find me*'}

Open in new window

Function ConvertFrom-Pdf {
[CmdletBinding()]
Param(
	[Parameter(ValueFromPipeline=$true)]
	[Alias('FullName')]
	[string]$Path,
	[switch]$Raw
)
	Begin {
		Add-Type -Path 'C:\Temp\itextsharp.dll'
	}
	Process {
		Try {
			If (-not [IO.Path]::IsPathRooted($_)) {
				$Path = (Resolve-Path -Path $Path -ErrorAction Stop).Path
			}
			$pdfReader = New-Object -TypeName 'iTextSharp.text.pdf.pdfreader' -ArgumentList $Path -ErrorAction Stop
			For ($i = 1; $i -le $pdfReader.NumberOfPages; $i++) {
				Write-Verbose "Converting page $($i)"
				[iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($pdfReader, $i) -split "`r?`n"
			}
		} Catch {
			$PSCmdlet.WriteError($_)
		} Finally {
			If ($pdfReader) {
				Try {$pdfReader.Close()} Catch {}
				Remove-Variable -Name pdfReader -ErrorAction SilentlyContinue
			}
		}
	}
}

Open in new window

Thanks oDbA!!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial