Solved

How to search a PDF File using VB.NET

Posted on 2011-02-21
6
1,682 Views
Last Modified: 2012-05-11
Hello,

I have a table with about 5000 records, I need to find the page numbers of the PDFs when a value from the table is found in the PDF. Is there a way to loop through the table and copy the page numbers of the PDFs in a column in the same table? I am using VB.NET with ACCESS 2007.

Thanks,

Victor
0
Comment
Question by:vcharles
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
6 Comments
 
LVL 22

Expert Comment

by:plusone3055
ID: 34946251
0
 

Author Comment

by:vcharles
ID: 34946799
How do I modify the code below to loop through the 5000 records to achieve the same task. My project is in VB.NET, do you know a good program to convert the coide below to vb.net?

using Acrobat;
using AFORMAUTLib;                              
private void pdfRandD(string fPath)        
{ AcroPDDocClass objPages = new AcroPDDocClass();            
objPages.Open(fPath);            
long TotalPDFPages = objPages.GetNumPages();              
objPages.Close();        
AcroAVDocClass avDoc = new AcroAVDocClass();  
avDoc.Open(fPath, "Title");          
IAFormApp formApp = new AFormAppClass();            
IFields myFields = (IFields)formApp.Fields;                        
string searchWord = "Search String";            
string k = "";            
StreamWriter sw = new StreamWriter(@"D:\KCG_FileChecker_Inputs\MAC\pdf\0230_525490_23_cha17.txt", false);          
for (int p = 0; p < TotalPDFPages; p++)            
{int numWords = int.Parse(myFields.ExecuteThisJavascript("event.value=this.getPageNumWords(" + p + ");"));              
 k = "";                
for (int i = 0; i < numWords; i++) {string chkWord = myFields.ExecuteThisJavascript("event.value=this.getPageNthWord(" + p + "," + i + ", true);");                  
 k = k + " " + chkWord;}
 if(k.Trim().Contains(searchWord))                
{int pNum = int.Parse(myFields.ExecuteThisJavascript("event.value=this.getPageLabel(" + p + ",true);"));                    
sw.WriteLine("The Word " + searchWord + " is exists in " + pNum);            
 }            
}            
sw.Close();            
MessageBox.Show("Process completed");        
}


Thamnks,

Victor
0
 
LVL 23

Accepted Solution

by:
wdosanjos earned 500 total points
ID: 34948772
Check the iTextSharp library (http://sourceforge.net/projects/itextsharp/) more specifically the PdfReader class.

You can do something like this (untested):

Dim reader as PdfReader, page As Integer, npages As Integer, content As String, buffer() As Byte

reader = New PdfReader("YourPDF.pdf")
npages = reader.NumberOfPages

For page = 1 To npages
     buffer = reader.GetPageContent(page)
     content = Encoding.UTF8.GetString(buffer, 0, buffer.Length);
     ' Search your content here
Next page

reader.Close()

Open in new window

0
 

Author Comment

by:vcharles
ID: 35093799
Thank You.
0
 
LVL 70

Expert Comment

by:Qlemo
ID: 36032357
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.
0

Featured Post

Enroll in May's Course of the Month

May’s Course of the Month is now available! Experts Exchange’s Premium Members and Team Accounts have access to a complimentary course each month as part of their membership—an extra way to increase training and boost professional development.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Need multiple Group By's 8 52
SSIS - Using VB.NET to parse XML file 11 38
Make borderless form movable by user 2 25
Suppress if value zero or NULL in crystal report 2 38
Occasionally there is a need to clean table columns, especially if you have inherited legacy data. There are obviously many ways to accomplish that, including elaborate UPDATE queries with anywhere from one to numerous REPLACE functions (even within…
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

738 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question