[Webinar] Streamline your web hosting managementRegister Today

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 462
  • Last Modified:

Python scraper

I am trying to scrape a website for some information. I found a script and tried to convert it to python but the conversion still has some errors. I wondered if anyone can assist with the errors. Thanks

def scrapeEarningsZacks_(Stock=None,*args,**kwargs):

    varargin = cellarray(args)

    nargin = 1-[Stock].count(None)+len(args)



    s=urlread_(char('http://zacks.thestreet.com/CompanyView.php'),char('post'),[char('ticker'),Stock])

    try:

        etst=strfind_(s,char('Surprise%</strong></div></td>'))

    finally:

        pass

    etend=strfind_(s[etst:end()],char(' </table>'))

    et=s[etst:etst + etend]

    rowend=strfind_(et,char('</tr>'))

    earnings=cell_(length_(rowend) - 2,6)

    for i in arange_(1,(length_(rowend) - 1)).reshape(-1):

        if i == length_(rowend):

            row=et[rowend[i]:end()]

        else:

            row=et[rowend[i]:rowend[i + 1]]

        dst=strfind_(row,char('<td>'))

        for j in arange_(1,6).reshape(-1):

            if j == 6:

                a=row[dst[j]:end() - 23]

            else:

                a=row[dst[j]:dst[j + 1]]

            earnings[i,j]=a[5:(end() - 38)]

    emptyCells=cellfun_(isempty,earnings)

    row,col=find_(emptyCells,nargout=2)

    earnings[row,:]=[]

    return earnings

print scrapeEarningsZacks_(AAPL)

Open in new window

0
earngreen
Asked:
earngreen
1 Solution
 
aikimarkCommented:
Have you tried passing a string into the function?
print scrapeEarningsZacks_("AAPL")

Open in new window

0
 
Dave BaldwinFixer of ProblemsCommented:
It looks like all the data on that page is posted thru javascript.  Your code will not run the javascript to get the data so it is unlikely that you will be able to scrape that page.  In particular, the input for selecting a stock is done with javascript.  It is not something you can 'post' to and get a result.  This is that code:

<input  type="text" name="search_company"  id="search_company" value="Enter company name" size=18 onFocus="JavaScript:this.value=''" onBlur="JavaScript:Fill_Lookup()" onkeyup="get_ticker_info();">
0
 
aikimarkCommented:
are you running this code in Windows or Linux?
0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
earngreenAuthor Commented:
This is Linux
0
 
aikimarkCommented:
what libraries have you imported?
0
 
clockwatcherCommented:
It would probably be easier if you just told us what you're hoping to return rather than fix whatever is going on with that code that you have there.

From the sample URL

   http://zacks.thestreet.com/CompanyView.php?ticker=AAPL

What would you like your scrapeEarnings to return?  The entire table?    Here's a python3 example of parsing that into python objects using beautiful soup:

from bs4 import BeautifulSoup
import urllib.request

class Earning(object):
    def __init__(self, table_row):
        (self.date, 
         self.period_ending,
         self.estimate,
         self.reported,
         self.surprise,
         self.surprise_percent) = [i.text for i in table_row("td")]

    def __str__(self):
        return "\t".join((self.date, self.period_ending, self.estimate,
                         self.reported, self.surprise, self.surprise_percent))

class Earnings(object):
    def __init__(self, soup):
        self.soup = soup
        self.earnings_table = soup.find(id="divPrint")("table")[1]
        self.earnings_rows = self.earnings_table("tr")[1:]
        self.earnings = [Earning(e) for e in self.earnings_rows]

    def __str__(self):
        return "\n".join([str(e) for e in self.earnings])

def getEarningsForTicker(ticker):
    url = "http://zacks.thestreet.com/CompanyView.php?ticker={0}".format(ticker)
    return Earnings(BeautifulSoup(urllib.request.urlopen(url)))

def main():
    print(getEarningsForTicker('AAPL'))

if __name__ == '__main__':
    main()

Open in new window

0
 
earngreenAuthor Commented:
clockwatcher that worked out great. thx
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now