Solved

Download pdf file ASP.NET

Posted on 2012-03-28
8
1,902 Views
Last Modified: 2012-06-21
Okay, I am working in VB.NET using VS 2010, working on an ASP.NET 4.0 Website. That said, I am looking to create a service that downloads a PDF file (publicly available) from another website and then converts that into text for display. I am currently using HttpWebRequest and Response for my downloads and it has worked well - there is a problem with the file it brings down.

The test file is 17Kb, but when I download it, it displays a size of 21Kb. Now, when I attempt to open the downloaded PDF file, I get a warning saying that the file could not be open because it is either not a supported file type or because the file has been damaged. I know the test file is good, but I suspect that somewhere along the line, the header is getting bloated with a couple Kb's worth of junk.

Below is the code I am using to download and write.

Dim fName As String = Server.MapPath("Programs/") & Date.Now.Ticks.ToString & ".pdf"
        Dim wr As HttpWebRequest = CType(WebRequest.Create("http://www.somesite.com/test.pdf"), HttpWebRequest)
        Dim ws As HttpWebResponse = CType(wr.GetResponse(), HttpWebResponse)

        Dim memStream As MemoryStream = New MemoryStream

        Dim length As Integer = 1024
        Dim buffer As [Byte]() = New [Byte](length - 1) {}
        Dim bytesRead As Integer = ws.GetResponseStream.Read(buffer, 0, length - 1)

        ' write the required bytes
        While bytesRead > 0
            memStream.Write(buffer, 0, bytesRead)
            bytesRead = ws.GetResponseStream.Read(buffer, 0, length)
        End While

        Using fstr As FileStream = New FileStream(fName, FileMode.CreateNew, FileAccess.ReadWrite)
            memStream.WriteTo(fstr)
            fstr.Close()
        End Using

        parsePDF(fName)

        'Delete the PDF - Currently disabled for testing
        'System.IO.File.Delete(fName)

Open in new window


Help me Obi Wan Kenobi... I mean help me EE, I am lost and cannot find the answer on my own. I suspect it comes from the improper handling of the stream, but I can't figure it out.
0
Comment
Question by:Thomas_Hawkins
8 Comments
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 37780190
Have you considered using the WebClient class? I think it would make the task a bit simpler.

e.g.

...

Dim fName As String = Server.MapPath("Programs/") & Date.Now.Ticks.ToString & ".pdf"

Using client As New System.Net.WebClient()
    client.DownloadFile("http://www.somesite.com/test.pdf", fName)
End Using

parsePDF(fName)

...

Open in new window

0
 

Author Comment

by:Thomas_Hawkins
ID: 37780308
Kaufmed, I tried that solution just after you suggested, to the same results. The resulting PDF is 21Kb and unreadable. Here is the code:

     Dim fName As String = Server.MapPath("Programs/") & Date.Now.Ticks.ToString & ".pdf"
        Using client As New System.Net.WebClient()
        client.DownloadFile("http://www.equibase.com/premium/eqbPDFChartPlus.cfm?RACE=1&BorP=P&TID=AQU&CTRY=USA&DT=02/09/2011&DAY=D&STYLE=EQB.pdf", fName)
        End Using
        
        parsePDF(fName)

Open in new window

0
 
LVL 16

Expert Comment

by:Stephan
ID: 37780484
If the solution from kaufmed is not working.. (download the file and save it. Maybe the file equibase push is incorrectly sent. Try downloading a real pdf file like this:
http://archive.cs.uu.nl/mirror/CTAN/graphics/metapost/contrib/macros/automata/example.pdf

If that doesn't work, something else is wrong (maybe the parsePDF method?)
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 20

Accepted Solution

by:
BuggyCoder earned 500 total points
ID: 37780502
Try this:-

Dim request = WebRequest.Create("<your path>")
Dim response = TryCast(request.GetResponse(), HttpWebResponse)

If response IsNot Nothing Then
	Dim sReader = New BinaryReader(response.GetResponseStream())
	Dim bytes = sReader.ReadBytes(CInt(response.ContentLength))

	Dim fs = New FileStream("c:/test.pdf", FileMode.CreateNew)
	fs.Write(bytes, 0, bytes.Length)
        fs.Close();
End If

Open in new window

0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 37781148
I just tried the code in a new project, and it downloads the file correctly for me--17 KB. If you put breakpoint on the call to parsePDF, and then go to the download folder and view the file, is the size correct? Can you view the PDF in Adobe Viewer prior to parsePDF working with the file?
0
 
LVL 9

Expert Comment

by:darjimaulik
ID: 37781588
Ho you are trying to open the PDF File?
using code?
Then you need to use some third party tool to read the PDF. .Net does not have any inbuilt functionality of reading PDF.

One of the best and free is : iTextSharp
http://itextpdf.com/
0
 

Author Comment

by:Thomas_Hawkins
ID: 37783210
kaufmed, I've had the solution work for me as well - but for some reason sometimes (most times) it does not. When I've used stream and binary readers it displays the content length as -1. And yes, I've set a breakpoint right at the call to parsePDF(), and am attempting to open the file Adobe Reader 9.

StephanOnline, downloaded that PDF perfectly 89Kb as intended. It is true that this is a .cfm page rendered as a pdf - but I have successfully downloaded it before. Does anyone know of a way to properly do this, or am I going to have to scrape the page?

BuggyCoder, I've used your code successfully on numerous PDF files now, sadly it gives me a -1 ContentLength on my intended files. I suppose the fault is in the file I've chosen, a .cfm file presented as a PDF.

darjmaulik, I have used both SautinSoft's PDF Focus and PDFBox, I've not dabbled in iTextSharp any.
0
 

Author Closing Comment

by:Thomas_Hawkins
ID: 37831009
I would've given an A, but everyone dropped out on me, this solution was almost perfect; however it did not fix my issue.

However, sending a contentType along with the request (request.ContentType="application/pdf") and then grabbing the Content-Length to use in the filestream object fs (response.getresponseheader("Content-Length")) solved my issue.
0

Featured Post

Forrester Webinar: xMatters Delivers 261% ROI

Guest speaker Dean Davison, Forrester Principal Consultant, explains how a Fortune 500 communication company using xMatters found these results: Achieved a 261% ROI, Experienced $753,280 in net present value benefits over 3 years and Reduced MTTR by 91% for tier 1 incidents.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Have you ever come up with a need of emailing only few pages of PDF file to one of yourfriend or colleague, instead of whole Adobe file? If yes, then surely you have face problems in doing that! Read this section as I have suggested multiple solutio…
This article shows how to deploy dynamic backgrounds to computers depending on the aspect ratio of display
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …
In a recent question (https://www.experts-exchange.com/questions/28997919/Pagination-in-Adobe-Acrobat.html) here at Experts Exchange, a member asked how to add page numbers to a PDF file using Adobe Acrobat XI Pro. This short video Micro Tutorial sh…

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question