Solved

Reading .PDF file

Posted on 2003-11-20
8
327 Views
Last Modified: 2010-04-16
Hi friends,
I would like to read .PDF content and store that content in the database(SQL Server 2000). What are all the available methods do we have to achieve the above task?

Thanks,
Ramachandra

0
Comment
Question by:raama16
8 Comments
 
LVL 3

Expert Comment

by:barryfandango
ID: 9787417
raama16,

In principle you can open the file with C# and put it into a blob, or "image" field in sql server.  Generally this is not recommended though, as moving entire files can really slow down your SQL server.  It's often better to just store the filename and/or path and have that file kept on the hard disk.  (just a suggestion of course.)
0
 
LVL 3

Accepted Solution

by:
barryfandango earned 68 total points
ID: 9787498
using System.IO;
using System.Data;
using System.Data.SqlClient;

FileStream myFile = new FileStream(ImageFile, FileMode.Open, FileAccess.Read);
byte[] MyPDF = new byte[myFile.Length];
myFile.Read(MyPDF, 0, (int)myFile.Length);
myFile.Close();

string ConnectString = "MyDSNEtc";
SqlConnection myCon = new SqlConnection(ConnectString) )
myCon.Open();

SqlCommand myCmd = new SqlCommand("AddPDF", myCon);
myCmd.CommandType = CommandType.StoredProcedure;
myCmd.Parameters.Add(new SqlParameter("@Id", SqlDbType.Int32));
myCmd.Parameters.Add(new SqlParameter("@Data", SqlDbType.Image));
myCmd.Parameters["@Data"].Value = MyPDF;
myCmd.ExecuteNonQuery();
myCon.Close();

This uses a stored procedure that would look something like

CREATE PROCEDURE dbo.AddPDF
(
      @Id int,
      @Data image
)
AS
INSERT INTO MyPDFTable
      ( Id, Data )
VALUES
      ( @Id, @Data )
0
 
LVL 9

Assisted Solution

by:malharone
malharone earned 66 total points
ID: 9789101
i think rama means actually parsing the contents... i don't think its easily possible ... since there are many encryptions & encoding for pdfs. what i've done is let the user open the pdf file first. from the reader CTRL+A, CTRL+C - to copy all the content. then wrote a little program that does pattern recognition of the data using regex & little bit of AI. i also let the users interactively create their own pattern. and then store the parsed contents in a DB/Excel file.
0
Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

 
LVL 10

Expert Comment

by:ptmcomp
ID: 9800166
If you want to extract the text buy a third party tool or use Acrobat. (The implementation would take you months - believe me!)
About the performance storing text in a database - we once zipped the text to make it faster and it got slower cause zip was slower than the database. It depends on the computer and network speed you have. Of course locally files are faster than the database over network but in a database you have transaction and locking control.
0
 
LVL 1

Author Comment

by:raama16
ID: 9829639
Hi Friends,
I am going to accept any one of the above answers. Before that, are there any way to read PDF file using Crystal Report.Net engine?

Thanks,
Ramachandra
0
 
LVL 10

Assisted Solution

by:ptmcomp
ptmcomp earned 66 total points
ID: 9830071
Don't think so since Reporting is the opposite of parsing.
0

Featured Post

Does Powershell have you tied up in knots?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
This article aims to explain the working of CircularLogArchiver. This tool was designed to solve the buildup of log file in cases where systems do not support circular logging or where circular logging is not enabled
This tutorial gives a high-level tour of the interface of Marketo (a marketing automation tool to help businesses track and engage prospective customers and drive them to purchase). You will see the main areas including Marketing Activities, Design …
Microsoft Active Directory, the widely used IT infrastructure, is known for its high risk of credential theft. The best way to test your Active Directory’s vulnerabilities to pass-the-ticket, pass-the-hash, privilege escalation, and malware attacks …

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question