Solved

How to search a PDF File using VB.NET

Posted on 2011-02-21
6
1,711 Views
Last Modified: 2012-05-11
Hello,

I have a table with about 5000 records, I need to find the page numbers of the PDFs when a value from the table is found in the PDF. Is there a way to loop through the table and copy the page numbers of the PDFs in a column in the same table? I am using VB.NET with ACCESS 2007.

Thanks,

Victor
0
Comment
Question by:vcharles
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
6 Comments
 
LVL 22

Expert Comment

by:plusone3055
ID: 34946251
0
 

Author Comment

by:vcharles
ID: 34946799
How do I modify the code below to loop through the 5000 records to achieve the same task. My project is in VB.NET, do you know a good program to convert the coide below to vb.net?

using Acrobat;
using AFORMAUTLib;                              
private void pdfRandD(string fPath)        
{ AcroPDDocClass objPages = new AcroPDDocClass();            
objPages.Open(fPath);            
long TotalPDFPages = objPages.GetNumPages();              
objPages.Close();        
AcroAVDocClass avDoc = new AcroAVDocClass();  
avDoc.Open(fPath, "Title");          
IAFormApp formApp = new AFormAppClass();            
IFields myFields = (IFields)formApp.Fields;                        
string searchWord = "Search String";            
string k = "";            
StreamWriter sw = new StreamWriter(@"D:\KCG_FileChecker_Inputs\MAC\pdf\0230_525490_23_cha17.txt", false);          
for (int p = 0; p < TotalPDFPages; p++)            
{int numWords = int.Parse(myFields.ExecuteThisJavascript("event.value=this.getPageNumWords(" + p + ");"));              
 k = "";                
for (int i = 0; i < numWords; i++) {string chkWord = myFields.ExecuteThisJavascript("event.value=this.getPageNthWord(" + p + "," + i + ", true);");                  
 k = k + " " + chkWord;}
 if(k.Trim().Contains(searchWord))                
{int pNum = int.Parse(myFields.ExecuteThisJavascript("event.value=this.getPageLabel(" + p + ",true);"));                    
sw.WriteLine("The Word " + searchWord + " is exists in " + pNum);            
 }            
}            
sw.Close();            
MessageBox.Show("Process completed");        
}


Thamnks,

Victor
0
 
LVL 23

Accepted Solution

by:
wdosanjos earned 500 total points
ID: 34948772
Check the iTextSharp library (http://sourceforge.net/projects/itextsharp/) more specifically the PdfReader class.

You can do something like this (untested):

Dim reader as PdfReader, page As Integer, npages As Integer, content As String, buffer() As Byte

reader = New PdfReader("YourPDF.pdf")
npages = reader.NumberOfPages

For page = 1 To npages
     buffer = reader.GetPageContent(page)
     content = Encoding.UTF8.GetString(buffer, 0, buffer.Length);
     ' Search your content here
Next page

reader.Close()

Open in new window

0
 

Author Comment

by:vcharles
ID: 35093799
Thank You.
0
 
LVL 70

Expert Comment

by:Qlemo
ID: 36032357
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Occasionally there is a need to clean table columns, especially if you have inherited legacy data. There are obviously many ways to accomplish that, including elaborate UPDATE queries with anywhere from one to numerous REPLACE functions (even within…
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
Michael from AdRem Software outlines event notifications and Automatic Corrective Actions in network monitoring. Automatic Corrective Actions are scripts, which can automatically run upon discovery of a certain undesirable condition in your network.…
Michael from AdRem Software explains how to view the most utilized and worst performing nodes in your network, by accessing the Top Charts view in NetCrunch network monitor (https://www.adremsoft.com/). Top Charts is a view in which you can set seve…

689 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question