Solved

Reading .PDF file

Posted on 2003-11-20
8
326 Views
Last Modified: 2010-04-16
Hi friends,
I would like to read .PDF content and store that content in the database(SQL Server 2000). What are all the available methods do we have to achieve the above task?

Thanks,
Ramachandra

0
Comment
Question by:raama16
8 Comments
 
LVL 3

Expert Comment

by:barryfandango
ID: 9787417
raama16,

In principle you can open the file with C# and put it into a blob, or "image" field in sql server.  Generally this is not recommended though, as moving entire files can really slow down your SQL server.  It's often better to just store the filename and/or path and have that file kept on the hard disk.  (just a suggestion of course.)
0
 
LVL 3

Accepted Solution

by:
barryfandango earned 68 total points
ID: 9787498
using System.IO;
using System.Data;
using System.Data.SqlClient;

FileStream myFile = new FileStream(ImageFile, FileMode.Open, FileAccess.Read);
byte[] MyPDF = new byte[myFile.Length];
myFile.Read(MyPDF, 0, (int)myFile.Length);
myFile.Close();

string ConnectString = "MyDSNEtc";
SqlConnection myCon = new SqlConnection(ConnectString) )
myCon.Open();

SqlCommand myCmd = new SqlCommand("AddPDF", myCon);
myCmd.CommandType = CommandType.StoredProcedure;
myCmd.Parameters.Add(new SqlParameter("@Id", SqlDbType.Int32));
myCmd.Parameters.Add(new SqlParameter("@Data", SqlDbType.Image));
myCmd.Parameters["@Data"].Value = MyPDF;
myCmd.ExecuteNonQuery();
myCon.Close();

This uses a stored procedure that would look something like

CREATE PROCEDURE dbo.AddPDF
(
      @Id int,
      @Data image
)
AS
INSERT INTO MyPDFTable
      ( Id, Data )
VALUES
      ( @Id, @Data )
0
 
LVL 9

Assisted Solution

by:malharone
malharone earned 66 total points
ID: 9789101
i think rama means actually parsing the contents... i don't think its easily possible ... since there are many encryptions & encoding for pdfs. what i've done is let the user open the pdf file first. from the reader CTRL+A, CTRL+C - to copy all the content. then wrote a little program that does pattern recognition of the data using regex & little bit of AI. i also let the users interactively create their own pattern. and then store the parsed contents in a DB/Excel file.
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 10

Expert Comment

by:ptmcomp
ID: 9800166
If you want to extract the text buy a third party tool or use Acrobat. (The implementation would take you months - believe me!)
About the performance storing text in a database - we once zipped the text to make it faster and it got slower cause zip was slower than the database. It depends on the computer and network speed you have. Of course locally files are faster than the database over network but in a database you have transaction and locking control.
0
 
LVL 1

Author Comment

by:raama16
ID: 9829639
Hi Friends,
I am going to accept any one of the above answers. Before that, are there any way to read PDF file using Crystal Report.Net engine?

Thanks,
Ramachandra
0
 
LVL 10

Assisted Solution

by:ptmcomp
ptmcomp earned 66 total points
ID: 9830071
Don't think so since Reporting is the opposite of parsing.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Summary: Persistence is the capability of an application to store the state of objects and recover it when necessary. This article compares the two common types of serialization in aspects of data access, readability, and runtime cost. A ready-to…
Performance in games development is paramount: every microsecond counts to be able to do everything in less than 33ms (aiming at 16ms). C# foreach statement is one of the worst performance killers, and here I explain why.
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…
This is a video that shows how the OnPage alerts system integrates into ConnectWise, how a trigger is set, how a page is sent via the trigger, and how the SENT, DELIVERED, READ & REPLIED receipts get entered into the internal tab of the ConnectWise …

919 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now