Solved

Python urllib.urlopen() performance question

Posted on 2001-09-09
4
769 Views
Last Modified: 2012-06-21
Hello!  I have a Python script that retrieves the HTML text from a web page via urlopen() from urllib and then processes the result line by line via readline() calls to the returned object.

I noticed that the line-by-line read seems to be very slow on Win98SE, whereas the exact same code (and same interpreter) runs very quickly on Win2k Pro.

The snippet skeleton code in question is as follows:

[...]
webPage = urllib.urlopen(startURL)

line = webPage.readline()

while line != '':
   line = webPage.readline()
[...]

The actual retrieval (the call to urlopen) seems to go quickly, so I suspect something with how I'm calling or using readline().  Any suggestions or known issues with this?


Thanks,

AP9
0
Comment
Question by:ap9
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
4 Comments
 
LVL 22

Expert Comment

by:CJ_S
ID: 6469840
It's a common way of reading a file. I do not know much about Python, but usually there is also a readall method, you can try it.

line = webPage.readall()

It might be faster...

Regards,
CJ
0
 

Author Comment

by:ap9
ID: 6472564
Hello, CJ!  Thanks for the suggestion, but readall() isn't a valid method in the returned object.  The documentation says that it is a "file-like" object but not a real file object.

I think there is a method called "readlines()" which reads in all the lines, but I've tried it before and it's still slow.
0
 
LVL 22

Accepted Solution

by:
CJ_S earned 100 total points
ID: 6473580
Then I do not think you can overcome the problem if you intend to keep on using the same object. Maybe Python provides another object that does the same. Or if you can use, for example, the XMLHTTP object, you'd solve the speed-problem.

regards,
CJ
0
 

Author Comment

by:ap9
ID: 6485868
That's the thing, though -- on Win2k it works very quickly.  But I'll see about approaching it from another direction.  Thanks.
0

Featured Post

[Webinar] How Hackers Steal Your Credentials

Do You Know How Hackers Steal Your Credentials? Join us and Skyport Systems to learn how hackers steal your credentials and why Active Directory must be secure to stop them. Thursday, July 13, 2017 10:00 A.M. PDT

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Whether you’re a college noob or a soon-to-be pro, these tips are sure to help you in your journey to becoming a programming ninja and stand out from the crowd.
Today, the web development industry is booming, and many people consider it to be their vocation. The question you may be asking yourself is – how do I become a web developer?
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question