Solved

Python scraper

Posted on 2014-12-21
7
230 Views
Last Modified: 2014-12-27
I am trying to scrape a website for some information. I found a script and tried to convert it to python but the conversion still has some errors. I wondered if anyone can assist with the errors. Thanks

def scrapeEarningsZacks_(Stock=None,*args,**kwargs):

    varargin = cellarray(args)

    nargin = 1-[Stock].count(None)+len(args)



    s=urlread_(char('http://zacks.thestreet.com/CompanyView.php'),char('post'),[char('ticker'),Stock])

    try:

        etst=strfind_(s,char('Surprise%</strong></div></td>'))

    finally:

        pass

    etend=strfind_(s[etst:end()],char(' </table>'))

    et=s[etst:etst + etend]

    rowend=strfind_(et,char('</tr>'))

    earnings=cell_(length_(rowend) - 2,6)

    for i in arange_(1,(length_(rowend) - 1)).reshape(-1):

        if i == length_(rowend):

            row=et[rowend[i]:end()]

        else:

            row=et[rowend[i]:rowend[i + 1]]

        dst=strfind_(row,char('<td>'))

        for j in arange_(1,6).reshape(-1):

            if j == 6:

                a=row[dst[j]:end() - 23]

            else:

                a=row[dst[j]:dst[j + 1]]

            earnings[i,j]=a[5:(end() - 38)]

    emptyCells=cellfun_(isempty,earnings)

    row,col=find_(emptyCells,nargout=2)

    earnings[row,:]=[]

    return earnings

print scrapeEarningsZacks_(AAPL)

Open in new window

0
Comment
Question by:earngreen
7 Comments
 
LVL 45

Expert Comment

by:aikimark
ID: 40512319
Have you tried passing a string into the function?
print scrapeEarningsZacks_("AAPL")

Open in new window

0
 
LVL 82

Expert Comment

by:Dave Baldwin
ID: 40512322
It looks like all the data on that page is posted thru javascript.  Your code will not run the javascript to get the data so it is unlikely that you will be able to scrape that page.  In particular, the input for selecting a stock is done with javascript.  It is not something you can 'post' to and get a result.  This is that code:

<input  type="text" name="search_company"  id="search_company" value="Enter company name" size=18 onFocus="JavaScript:this.value=''" onBlur="JavaScript:Fill_Lookup()" onkeyup="get_ticker_info();">
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40512591
are you running this code in Windows or Linux?
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 

Author Comment

by:earngreen
ID: 40513143
This is Linux
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40513222
what libraries have you imported?
0
 
LVL 25

Accepted Solution

by:
clockwatcher earned 500 total points
ID: 40514416
It would probably be easier if you just told us what you're hoping to return rather than fix whatever is going on with that code that you have there.

From the sample URL

   http://zacks.thestreet.com/CompanyView.php?ticker=AAPL

What would you like your scrapeEarnings to return?  The entire table?    Here's a python3 example of parsing that into python objects using beautiful soup:

from bs4 import BeautifulSoup
import urllib.request

class Earning(object):
    def __init__(self, table_row):
        (self.date, 
         self.period_ending,
         self.estimate,
         self.reported,
         self.surprise,
         self.surprise_percent) = [i.text for i in table_row("td")]

    def __str__(self):
        return "\t".join((self.date, self.period_ending, self.estimate,
                         self.reported, self.surprise, self.surprise_percent))

class Earnings(object):
    def __init__(self, soup):
        self.soup = soup
        self.earnings_table = soup.find(id="divPrint")("table")[1]
        self.earnings_rows = self.earnings_table("tr")[1:]
        self.earnings = [Earning(e) for e in self.earnings_rows]

    def __str__(self):
        return "\n".join([str(e) for e in self.earnings])

def getEarningsForTicker(ticker):
    url = "http://zacks.thestreet.com/CompanyView.php?ticker={0}".format(ticker)
    return Earnings(BeautifulSoup(urllib.request.urlopen(url)))

def main():
    print(getEarningsForTicker('AAPL'))

if __name__ == '__main__':
    main()

Open in new window

0
 

Author Comment

by:earngreen
ID: 40520120
clockwatcher that worked out great. thx
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

Dictionaries contain key:value pairs. Which means a collection of tuples with an attribute name and an assigned value to it. The semicolon present in between each key and values and attribute with values are delimited with a comma.  In python we can…
The purpose of this article is to demonstrate how we can upgrade Python from version 2.7.6 to Python 2.7.10 on the Linux Mint operating system. I am using an Oracle Virtual Box where I have installed Linux Mint operating system version 17.2. Once yo…
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now