Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Download pdf file ASP.NET

Posted on 2012-03-28
8
1,879 Views
Last Modified: 2012-06-21
Okay, I am working in VB.NET using VS 2010, working on an ASP.NET 4.0 Website. That said, I am looking to create a service that downloads a PDF file (publicly available) from another website and then converts that into text for display. I am currently using HttpWebRequest and Response for my downloads and it has worked well - there is a problem with the file it brings down.

The test file is 17Kb, but when I download it, it displays a size of 21Kb. Now, when I attempt to open the downloaded PDF file, I get a warning saying that the file could not be open because it is either not a supported file type or because the file has been damaged. I know the test file is good, but I suspect that somewhere along the line, the header is getting bloated with a couple Kb's worth of junk.

Below is the code I am using to download and write.

Dim fName As String = Server.MapPath("Programs/") & Date.Now.Ticks.ToString & ".pdf"
        Dim wr As HttpWebRequest = CType(WebRequest.Create("http://www.somesite.com/test.pdf"), HttpWebRequest)
        Dim ws As HttpWebResponse = CType(wr.GetResponse(), HttpWebResponse)

        Dim memStream As MemoryStream = New MemoryStream

        Dim length As Integer = 1024
        Dim buffer As [Byte]() = New [Byte](length - 1) {}
        Dim bytesRead As Integer = ws.GetResponseStream.Read(buffer, 0, length - 1)

        ' write the required bytes
        While bytesRead > 0
            memStream.Write(buffer, 0, bytesRead)
            bytesRead = ws.GetResponseStream.Read(buffer, 0, length)
        End While

        Using fstr As FileStream = New FileStream(fName, FileMode.CreateNew, FileAccess.ReadWrite)
            memStream.WriteTo(fstr)
            fstr.Close()
        End Using

        parsePDF(fName)

        'Delete the PDF - Currently disabled for testing
        'System.IO.File.Delete(fName)

Open in new window


Help me Obi Wan Kenobi... I mean help me EE, I am lost and cannot find the answer on my own. I suspect it comes from the improper handling of the stream, but I can't figure it out.
0
Comment
Question by:Thomas_Hawkins
8 Comments
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 37780190
Have you considered using the WebClient class? I think it would make the task a bit simpler.

e.g.

...

Dim fName As String = Server.MapPath("Programs/") & Date.Now.Ticks.ToString & ".pdf"

Using client As New System.Net.WebClient()
    client.DownloadFile("http://www.somesite.com/test.pdf", fName)
End Using

parsePDF(fName)

...

Open in new window

0
 

Author Comment

by:Thomas_Hawkins
ID: 37780308
Kaufmed, I tried that solution just after you suggested, to the same results. The resulting PDF is 21Kb and unreadable. Here is the code:

     Dim fName As String = Server.MapPath("Programs/") & Date.Now.Ticks.ToString & ".pdf"
        Using client As New System.Net.WebClient()
        client.DownloadFile("http://www.equibase.com/premium/eqbPDFChartPlus.cfm?RACE=1&BorP=P&TID=AQU&CTRY=USA&DT=02/09/2011&DAY=D&STYLE=EQB.pdf", fName)
        End Using
        
        parsePDF(fName)

Open in new window

0
 
LVL 16

Expert Comment

by:Stephan
ID: 37780484
If the solution from kaufmed is not working.. (download the file and save it. Maybe the file equibase push is incorrectly sent. Try downloading a real pdf file like this:
http://archive.cs.uu.nl/mirror/CTAN/graphics/metapost/contrib/macros/automata/example.pdf

If that doesn't work, something else is wrong (maybe the parsePDF method?)
0
Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

 
LVL 20

Accepted Solution

by:
BuggyCoder earned 500 total points
ID: 37780502
Try this:-

Dim request = WebRequest.Create("<your path>")
Dim response = TryCast(request.GetResponse(), HttpWebResponse)

If response IsNot Nothing Then
	Dim sReader = New BinaryReader(response.GetResponseStream())
	Dim bytes = sReader.ReadBytes(CInt(response.ContentLength))

	Dim fs = New FileStream("c:/test.pdf", FileMode.CreateNew)
	fs.Write(bytes, 0, bytes.Length)
        fs.Close();
End If

Open in new window

0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 37781148
I just tried the code in a new project, and it downloads the file correctly for me--17 KB. If you put breakpoint on the call to parsePDF, and then go to the download folder and view the file, is the size correct? Can you view the PDF in Adobe Viewer prior to parsePDF working with the file?
0
 
LVL 9

Expert Comment

by:darjimaulik
ID: 37781588
Ho you are trying to open the PDF File?
using code?
Then you need to use some third party tool to read the PDF. .Net does not have any inbuilt functionality of reading PDF.

One of the best and free is : iTextSharp
http://itextpdf.com/
0
 

Author Comment

by:Thomas_Hawkins
ID: 37783210
kaufmed, I've had the solution work for me as well - but for some reason sometimes (most times) it does not. When I've used stream and binary readers it displays the content length as -1. And yes, I've set a breakpoint right at the call to parsePDF(), and am attempting to open the file Adobe Reader 9.

StephanOnline, downloaded that PDF perfectly 89Kb as intended. It is true that this is a .cfm page rendered as a pdf - but I have successfully downloaded it before. Does anyone know of a way to properly do this, or am I going to have to scrape the page?

BuggyCoder, I've used your code successfully on numerous PDF files now, sadly it gives me a -1 ContentLength on my intended files. I suppose the fault is in the file I've chosen, a .cfm file presented as a PDF.

darjmaulik, I have used both SautinSoft's PDF Focus and PDFBox, I've not dabbled in iTextSharp any.
0
 

Author Closing Comment

by:Thomas_Hawkins
ID: 37831009
I would've given an A, but everyone dropped out on me, this solution was almost perfect; however it did not fix my issue.

However, sending a contentType along with the request (request.ContentType="application/pdf") and then grabbing the Content-Length to use in the filestream object fs (response.getresponseheader("Content-Length")) solved my issue.
0

Featured Post

How Do You Stack Up Against Your Peers?

With today’s modern enterprise so dependent on digital infrastructures, the impact of major incidents has increased dramatically. Grab the report now to gain insight into how your organization ranks against your peers and learn best-in-class strategies to resolve incidents.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

PaperPort is a popular document imaging/management product from Nuance Communications (http://www.nuance.com/). It is in widespread use by both individuals (http://www.nuance.com/for-individuals/by-product/paperport/index.htm) and businesses (http:/…
PDF files have been in the limelight due to its unmatched features.  Personal documents, emails, business reports and eBooks are all converted into PDF files owing to peerless features provided by it. Adding watermark to a PDF file is a method to se…
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …
In a recent question (https://www.experts-exchange.com/questions/28997919/Pagination-in-Adobe-Acrobat.html) here at Experts Exchange, a member asked how to add page numbers to a PDF file using Adobe Acrobat XI Pro. This short video Micro Tutorial sh…

837 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question