Solved

Download pdf file ASP.NET

Posted on 2012-03-28
8
1,812 Views
Last Modified: 2012-06-21
Okay, I am working in VB.NET using VS 2010, working on an ASP.NET 4.0 Website. That said, I am looking to create a service that downloads a PDF file (publicly available) from another website and then converts that into text for display. I am currently using HttpWebRequest and Response for my downloads and it has worked well - there is a problem with the file it brings down.

The test file is 17Kb, but when I download it, it displays a size of 21Kb. Now, when I attempt to open the downloaded PDF file, I get a warning saying that the file could not be open because it is either not a supported file type or because the file has been damaged. I know the test file is good, but I suspect that somewhere along the line, the header is getting bloated with a couple Kb's worth of junk.

Below is the code I am using to download and write.

Dim fName As String = Server.MapPath("Programs/") & Date.Now.Ticks.ToString & ".pdf"
        Dim wr As HttpWebRequest = CType(WebRequest.Create("http://www.somesite.com/test.pdf"), HttpWebRequest)
        Dim ws As HttpWebResponse = CType(wr.GetResponse(), HttpWebResponse)

        Dim memStream As MemoryStream = New MemoryStream

        Dim length As Integer = 1024
        Dim buffer As [Byte]() = New [Byte](length - 1) {}
        Dim bytesRead As Integer = ws.GetResponseStream.Read(buffer, 0, length - 1)

        ' write the required bytes
        While bytesRead > 0
            memStream.Write(buffer, 0, bytesRead)
            bytesRead = ws.GetResponseStream.Read(buffer, 0, length)
        End While

        Using fstr As FileStream = New FileStream(fName, FileMode.CreateNew, FileAccess.ReadWrite)
            memStream.WriteTo(fstr)
            fstr.Close()
        End Using

        parsePDF(fName)

        'Delete the PDF - Currently disabled for testing
        'System.IO.File.Delete(fName)

Open in new window


Help me Obi Wan Kenobi... I mean help me EE, I am lost and cannot find the answer on my own. I suspect it comes from the improper handling of the stream, but I can't figure it out.
0
Comment
Question by:Thomas_Hawkins
8 Comments
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 37780190
Have you considered using the WebClient class? I think it would make the task a bit simpler.

e.g.

...

Dim fName As String = Server.MapPath("Programs/") & Date.Now.Ticks.ToString & ".pdf"

Using client As New System.Net.WebClient()
    client.DownloadFile("http://www.somesite.com/test.pdf", fName)
End Using

parsePDF(fName)

...

Open in new window

0
 

Author Comment

by:Thomas_Hawkins
ID: 37780308
Kaufmed, I tried that solution just after you suggested, to the same results. The resulting PDF is 21Kb and unreadable. Here is the code:

     Dim fName As String = Server.MapPath("Programs/") & Date.Now.Ticks.ToString & ".pdf"
        Using client As New System.Net.WebClient()
        client.DownloadFile("http://www.equibase.com/premium/eqbPDFChartPlus.cfm?RACE=1&BorP=P&TID=AQU&CTRY=USA&DT=02/09/2011&DAY=D&STYLE=EQB.pdf", fName)
        End Using
        
        parsePDF(fName)

Open in new window

0
 
LVL 16

Expert Comment

by:Stephan
ID: 37780484
If the solution from kaufmed is not working.. (download the file and save it. Maybe the file equibase push is incorrectly sent. Try downloading a real pdf file like this:
http://archive.cs.uu.nl/mirror/CTAN/graphics/metapost/contrib/macros/automata/example.pdf

If that doesn't work, something else is wrong (maybe the parsePDF method?)
0
 
LVL 20

Accepted Solution

by:
BuggyCoder earned 500 total points
ID: 37780502
Try this:-

Dim request = WebRequest.Create("<your path>")
Dim response = TryCast(request.GetResponse(), HttpWebResponse)

If response IsNot Nothing Then
	Dim sReader = New BinaryReader(response.GetResponseStream())
	Dim bytes = sReader.ReadBytes(CInt(response.ContentLength))

	Dim fs = New FileStream("c:/test.pdf", FileMode.CreateNew)
	fs.Write(bytes, 0, bytes.Length)
        fs.Close();
End If

Open in new window

0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 37781148
I just tried the code in a new project, and it downloads the file correctly for me--17 KB. If you put breakpoint on the call to parsePDF, and then go to the download folder and view the file, is the size correct? Can you view the PDF in Adobe Viewer prior to parsePDF working with the file?
0
 
LVL 9

Expert Comment

by:darjimaulik
ID: 37781588
Ho you are trying to open the PDF File?
using code?
Then you need to use some third party tool to read the PDF. .Net does not have any inbuilt functionality of reading PDF.

One of the best and free is : iTextSharp
http://itextpdf.com/
0
 

Author Comment

by:Thomas_Hawkins
ID: 37783210
kaufmed, I've had the solution work for me as well - but for some reason sometimes (most times) it does not. When I've used stream and binary readers it displays the content length as -1. And yes, I've set a breakpoint right at the call to parsePDF(), and am attempting to open the file Adobe Reader 9.

StephanOnline, downloaded that PDF perfectly 89Kb as intended. It is true that this is a .cfm page rendered as a pdf - but I have successfully downloaded it before. Does anyone know of a way to properly do this, or am I going to have to scrape the page?

BuggyCoder, I've used your code successfully on numerous PDF files now, sadly it gives me a -1 ContentLength on my intended files. I suppose the fault is in the file I've chosen, a .cfm file presented as a PDF.

darjmaulik, I have used both SautinSoft's PDF Focus and PDFBox, I've not dabbled in iTextSharp any.
0
 

Author Closing Comment

by:Thomas_Hawkins
ID: 37831009
I would've given an A, but everyone dropped out on me, this solution was almost perfect; however it did not fix my issue.

However, sending a contentType along with the request (request.ContentType="application/pdf") and then grabbing the Content-Length to use in the filestream object fs (response.getresponseheader("Content-Length")) solved my issue.
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

Suggested Solutions

*Adobe Acrobat 9 was used for this article.  Particular steps may vary depending on software versions. Adobe Acrobat has many, many variables that my be utilized to customize your forms for clarity and ease of use. The Form Editing Tool will be y…
International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
In this video, we show how to convert an image-only PDF file into a PDF Searchable Image file, that is, a file with both the image (typically from scanning) and text, which is created in an automated fashion with Optical Character Recognition (OCR) …
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now