Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

How to search a PDF File using VB.NET

Posted on 2011-02-21
6
Medium Priority
?
1,792 Views
Last Modified: 2012-05-11
Hello,

I have a table with about 5000 records, I need to find the page numbers of the PDFs when a value from the table is found in the PDF. Is there a way to loop through the table and copy the page numbers of the PDFs in a column in the same table? I am using VB.NET with ACCESS 2007.

Thanks,

Victor
0
Comment
Question by:vcharles
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
6 Comments
 
LVL 22

Expert Comment

by:plusone3055
ID: 34946251
0
 

Author Comment

by:vcharles
ID: 34946799
How do I modify the code below to loop through the 5000 records to achieve the same task. My project is in VB.NET, do you know a good program to convert the coide below to vb.net?

using Acrobat;
using AFORMAUTLib;                              
private void pdfRandD(string fPath)        
{ AcroPDDocClass objPages = new AcroPDDocClass();            
objPages.Open(fPath);            
long TotalPDFPages = objPages.GetNumPages();              
objPages.Close();        
AcroAVDocClass avDoc = new AcroAVDocClass();  
avDoc.Open(fPath, "Title");          
IAFormApp formApp = new AFormAppClass();            
IFields myFields = (IFields)formApp.Fields;                        
string searchWord = "Search String";            
string k = "";            
StreamWriter sw = new StreamWriter(@"D:\KCG_FileChecker_Inputs\MAC\pdf\0230_525490_23_cha17.txt", false);          
for (int p = 0; p < TotalPDFPages; p++)            
{int numWords = int.Parse(myFields.ExecuteThisJavascript("event.value=this.getPageNumWords(" + p + ");"));              
 k = "";                
for (int i = 0; i < numWords; i++) {string chkWord = myFields.ExecuteThisJavascript("event.value=this.getPageNthWord(" + p + "," + i + ", true);");                  
 k = k + " " + chkWord;}
 if(k.Trim().Contains(searchWord))                
{int pNum = int.Parse(myFields.ExecuteThisJavascript("event.value=this.getPageLabel(" + p + ",true);"));                    
sw.WriteLine("The Word " + searchWord + " is exists in " + pNum);            
 }            
}            
sw.Close();            
MessageBox.Show("Process completed");        
}


Thamnks,

Victor
0
 
LVL 23

Accepted Solution

by:
wdosanjos earned 2000 total points
ID: 34948772
Check the iTextSharp library (http://sourceforge.net/projects/itextsharp/) more specifically the PdfReader class.

You can do something like this (untested):

Dim reader as PdfReader, page As Integer, npages As Integer, content As String, buffer() As Byte

reader = New PdfReader("YourPDF.pdf")
npages = reader.NumberOfPages

For page = 1 To npages
     buffer = reader.GetPageContent(page)
     content = Encoding.UTF8.GetString(buffer, 0, buffer.Length);
     ' Search your content here
Next page

reader.Close()

Open in new window

0
 

Author Comment

by:vcharles
ID: 35093799
Thank You.
0
 
LVL 71

Expert Comment

by:Qlemo
ID: 36032357
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Occasionally there is a need to clean table columns, especially if you have inherited legacy data. There are obviously many ways to accomplish that, including elaborate UPDATE queries with anywhere from one to numerous REPLACE functions (even within…
One of the most important things in an application is the query performance. This article intends to give you good tips to improve the performance of your queries.
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…

604 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question