Download pdf file ASP.NET

Okay, I am working in VB.NET using VS 2010, working on an ASP.NET 4.0 Website. That said, I am looking to create a service that downloads a PDF file (publicly available) from another website and then converts that into text for display. I am currently using HttpWebRequest and Response for my downloads and it has worked well - there is a problem with the file it brings down.

The test file is 17Kb, but when I download it, it displays a size of 21Kb. Now, when I attempt to open the downloaded PDF file, I get a warning saying that the file could not be open because it is either not a supported file type or because the file has been damaged. I know the test file is good, but I suspect that somewhere along the line, the header is getting bloated with a couple Kb's worth of junk.

Below is the code I am using to download and write.

Dim fName As String = Server.MapPath("Programs/") & Date.Now.Ticks.ToString & ".pdf"
        Dim wr As HttpWebRequest = CType(WebRequest.Create("http://www.somesite.com/test.pdf"), HttpWebRequest)
        Dim ws As HttpWebResponse = CType(wr.GetResponse(), HttpWebResponse)

        Dim memStream As MemoryStream = New MemoryStream

        Dim length As Integer = 1024
        Dim buffer As [Byte]() = New [Byte](length - 1) {}
        Dim bytesRead As Integer = ws.GetResponseStream.Read(buffer, 0, length - 1)

        ' write the required bytes
        While bytesRead > 0
            memStream.Write(buffer, 0, bytesRead)
            bytesRead = ws.GetResponseStream.Read(buffer, 0, length)
        End While

        Using fstr As FileStream = New FileStream(fName, FileMode.CreateNew, FileAccess.ReadWrite)
            memStream.WriteTo(fstr)
            fstr.Close()
        End Using

        parsePDF(fName)

        'Delete the PDF - Currently disabled for testing
        'System.IO.File.Delete(fName)

Open in new window


Help me Obi Wan Kenobi... I mean help me EE, I am lost and cannot find the answer on my own. I suspect it comes from the improper handling of the stream, but I can't figure it out.
Thomas_HawkinsAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

käµfm³d 👽Commented:
Have you considered using the WebClient class? I think it would make the task a bit simpler.

e.g.

...

Dim fName As String = Server.MapPath("Programs/") & Date.Now.Ticks.ToString & ".pdf"

Using client As New System.Net.WebClient()
    client.DownloadFile("http://www.somesite.com/test.pdf", fName)
End Using

parsePDF(fName)

...

Open in new window

0
Thomas_HawkinsAuthor Commented:
Kaufmed, I tried that solution just after you suggested, to the same results. The resulting PDF is 21Kb and unreadable. Here is the code:

     Dim fName As String = Server.MapPath("Programs/") & Date.Now.Ticks.ToString & ".pdf"
        Using client As New System.Net.WebClient()
        client.DownloadFile("http://www.equibase.com/premium/eqbPDFChartPlus.cfm?RACE=1&BorP=P&TID=AQU&CTRY=USA&DT=02/09/2011&DAY=D&STYLE=EQB.pdf", fName)
        End Using
        
        parsePDF(fName)

Open in new window

0
StephanLead Software EngineerCommented:
If the solution from kaufmed is not working.. (download the file and save it. Maybe the file equibase push is incorrectly sent. Try downloading a real pdf file like this:
http://archive.cs.uu.nl/mirror/CTAN/graphics/metapost/contrib/macros/automata/example.pdf

If that doesn't work, something else is wrong (maybe the parsePDF method?)
0
Introducing Cloud Class® training courses

Tech changes fast. You can learn faster. That’s why we’re bringing professional training courses to Experts Exchange. With a subscription, you can access all the Cloud Class® courses to expand your education, prep for certifications, and get top-notch instructions.

BuggyCoderCommented:
Try this:-

Dim request = WebRequest.Create("<your path>")
Dim response = TryCast(request.GetResponse(), HttpWebResponse)

If response IsNot Nothing Then
	Dim sReader = New BinaryReader(response.GetResponseStream())
	Dim bytes = sReader.ReadBytes(CInt(response.ContentLength))

	Dim fs = New FileStream("c:/test.pdf", FileMode.CreateNew)
	fs.Write(bytes, 0, bytes.Length)
        fs.Close();
End If

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
käµfm³d 👽Commented:
I just tried the code in a new project, and it downloads the file correctly for me--17 KB. If you put breakpoint on the call to parsePDF, and then go to the download folder and view the file, is the size correct? Can you view the PDF in Adobe Viewer prior to parsePDF working with the file?
0
darjimaulikCommented:
Ho you are trying to open the PDF File?
using code?
Then you need to use some third party tool to read the PDF. .Net does not have any inbuilt functionality of reading PDF.

One of the best and free is : iTextSharp
http://itextpdf.com/
0
Thomas_HawkinsAuthor Commented:
kaufmed, I've had the solution work for me as well - but for some reason sometimes (most times) it does not. When I've used stream and binary readers it displays the content length as -1. And yes, I've set a breakpoint right at the call to parsePDF(), and am attempting to open the file Adobe Reader 9.

StephanOnline, downloaded that PDF perfectly 89Kb as intended. It is true that this is a .cfm page rendered as a pdf - but I have successfully downloaded it before. Does anyone know of a way to properly do this, or am I going to have to scrape the page?

BuggyCoder, I've used your code successfully on numerous PDF files now, sadly it gives me a -1 ContentLength on my intended files. I suppose the fault is in the file I've chosen, a .cfm file presented as a PDF.

darjmaulik, I have used both SautinSoft's PDF Focus and PDFBox, I've not dabbled in iTextSharp any.
0
Thomas_HawkinsAuthor Commented:
I would've given an A, but everyone dropped out on me, this solution was almost perfect; however it did not fix my issue.

However, sending a contentType along with the request (request.ContentType="application/pdf") and then grabbing the Content-Length to use in the filestream object fs (response.getresponseheader("Content-Length")) solved my issue.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
.NET Programming

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.