Solved

Python urllib.urlopen() performance question

Posted on 2001-09-09
4
760 Views
Last Modified: 2012-06-21
Hello!  I have a Python script that retrieves the HTML text from a web page via urlopen() from urllib and then processes the result line by line via readline() calls to the returned object.

I noticed that the line-by-line read seems to be very slow on Win98SE, whereas the exact same code (and same interpreter) runs very quickly on Win2k Pro.

The snippet skeleton code in question is as follows:

[...]
webPage = urllib.urlopen(startURL)

line = webPage.readline()

while line != '':
   line = webPage.readline()
[...]

The actual retrieval (the call to urlopen) seems to go quickly, so I suspect something with how I'm calling or using readline().  Any suggestions or known issues with this?


Thanks,

AP9
0
Comment
Question by:ap9
  • 2
  • 2
4 Comments
 
LVL 22

Expert Comment

by:CJ_S
Comment Utility
It's a common way of reading a file. I do not know much about Python, but usually there is also a readall method, you can try it.

line = webPage.readall()

It might be faster...

Regards,
CJ
0
 

Author Comment

by:ap9
Comment Utility
Hello, CJ!  Thanks for the suggestion, but readall() isn't a valid method in the returned object.  The documentation says that it is a "file-like" object but not a real file object.

I think there is a method called "readlines()" which reads in all the lines, but I've tried it before and it's still slow.
0
 
LVL 22

Accepted Solution

by:
CJ_S earned 100 total points
Comment Utility
Then I do not think you can overcome the problem if you intend to keep on using the same object. Maybe Python provides another object that does the same. Or if you can use, for example, the XMLHTTP object, you'd solve the speed-problem.

regards,
CJ
0
 

Author Comment

by:ap9
Comment Utility
That's the thing, though -- on Win2k it works very quickly.  But I'll see about approaching it from another direction.  Thanks.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

RIA (Rich Internet Application) tools are interactive internet applications which have many of the characteristics of desktop applications. The RIA tools typically deliver output either by the way of a site-specific browser or via browser plug-in. T…
This is about my first experience with programming Arduino.
An introduction to basic programming syntax in Java by creating a simple program. Viewers can follow the tutorial as they create their first class in Java. Definitions and explanations about each element are given to help prepare viewers for future …
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now