Solved

Python scraper

Posted on 2014-12-21
7
264 Views
Last Modified: 2014-12-27
I am trying to scrape a website for some information. I found a script and tried to convert it to python but the conversion still has some errors. I wondered if anyone can assist with the errors. Thanks

def scrapeEarningsZacks_(Stock=None,*args,**kwargs):

    varargin = cellarray(args)

    nargin = 1-[Stock].count(None)+len(args)



    s=urlread_(char('http://zacks.thestreet.com/CompanyView.php'),char('post'),[char('ticker'),Stock])

    try:

        etst=strfind_(s,char('Surprise%</strong></div></td>'))

    finally:

        pass

    etend=strfind_(s[etst:end()],char(' </table>'))

    et=s[etst:etst + etend]

    rowend=strfind_(et,char('</tr>'))

    earnings=cell_(length_(rowend) - 2,6)

    for i in arange_(1,(length_(rowend) - 1)).reshape(-1):

        if i == length_(rowend):

            row=et[rowend[i]:end()]

        else:

            row=et[rowend[i]:rowend[i + 1]]

        dst=strfind_(row,char('<td>'))

        for j in arange_(1,6).reshape(-1):

            if j == 6:

                a=row[dst[j]:end() - 23]

            else:

                a=row[dst[j]:dst[j + 1]]

            earnings[i,j]=a[5:(end() - 38)]

    emptyCells=cellfun_(isempty,earnings)

    row,col=find_(emptyCells,nargout=2)

    earnings[row,:]=[]

    return earnings

print scrapeEarningsZacks_(AAPL)

Open in new window

0
Comment
Question by:earngreen
7 Comments
 
LVL 45

Expert Comment

by:aikimark
ID: 40512319
Have you tried passing a string into the function?
print scrapeEarningsZacks_("AAPL")

Open in new window

0
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 40512322
It looks like all the data on that page is posted thru javascript.  Your code will not run the javascript to get the data so it is unlikely that you will be able to scrape that page.  In particular, the input for selecting a stock is done with javascript.  It is not something you can 'post' to and get a result.  This is that code:

<input  type="text" name="search_company"  id="search_company" value="Enter company name" size=18 onFocus="JavaScript:this.value=''" onBlur="JavaScript:Fill_Lookup()" onkeyup="get_ticker_info();">
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40512591
are you running this code in Windows or Linux?
0
Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

 

Author Comment

by:earngreen
ID: 40513143
This is Linux
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40513222
what libraries have you imported?
0
 
LVL 25

Accepted Solution

by:
clockwatcher earned 500 total points
ID: 40514416
It would probably be easier if you just told us what you're hoping to return rather than fix whatever is going on with that code that you have there.

From the sample URL

   http://zacks.thestreet.com/CompanyView.php?ticker=AAPL

What would you like your scrapeEarnings to return?  The entire table?    Here's a python3 example of parsing that into python objects using beautiful soup:

from bs4 import BeautifulSoup
import urllib.request

class Earning(object):
    def __init__(self, table_row):
        (self.date, 
         self.period_ending,
         self.estimate,
         self.reported,
         self.surprise,
         self.surprise_percent) = [i.text for i in table_row("td")]

    def __str__(self):
        return "\t".join((self.date, self.period_ending, self.estimate,
                         self.reported, self.surprise, self.surprise_percent))

class Earnings(object):
    def __init__(self, soup):
        self.soup = soup
        self.earnings_table = soup.find(id="divPrint")("table")[1]
        self.earnings_rows = self.earnings_table("tr")[1:]
        self.earnings = [Earning(e) for e in self.earnings_rows]

    def __str__(self):
        return "\n".join([str(e) for e in self.earnings])

def getEarningsForTicker(ticker):
    url = "http://zacks.thestreet.com/CompanyView.php?ticker={0}".format(ticker)
    return Earnings(BeautifulSoup(urllib.request.urlopen(url)))

def main():
    print(getEarningsForTicker('AAPL'))

if __name__ == '__main__':
    main()

Open in new window

0
 

Author Comment

by:earngreen
ID: 40520120
clockwatcher that worked out great. thx
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Python tuple index error 4 87
Installing Flask app on IIS7 10 407
Python - ImportError: No module named 'urllib2' 2 314
Python 2.7 - Passing arguments 8 73
Less strange, but still introduction This introduction was added (1st August, 2011) to reflect some reactions.  Firstly, the term basics in the title of the article...  As any other word, it is a symbol with meaning attached to the word by some a…
The purpose of this article is to demonstrate how we can use conditional statements using Python.
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…

820 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question