We help IT Professionals succeed at work.

Need help parsing html to python dataframe

Leo Torres
Leo Torres asked
on
11 Views
Last Modified: 2020-04-23
Trying to extract this into a dataframe
Time					Link																																											Description																									Source
Apr-22-20 01:30AM       https://finance.yahoo.com/news/stmicro-sees-declining-demand-automotive-053033014.html																							STMicro Sees Declining Demand for Automotive Chips Next Quarter												Bloomberg																								
Apr-21-20 10:43PM		https://www.investors.com/market-trend/stock-market-today/dow-jones-futures-crude-oil-prices-test-coronavirus-stock-market-rally-netflix-snap-chipotle-earnings/?src=A00220		Dow Jones Futures: Crashing Crude Oil Prices Test Coronavirus Stock Market Rally; 5 Big Earnings Movers		Investor's Business Daily
09:31PM					https://finance.yahoo.com/news/facebook-plow-5-7-billion-005209259.html																											Facebook to Invest $5.7 Billion in Ambanis Jio Platforms													Bloomberg
08:00PM					https://finance.yahoo.com/news/plastic-bags-making-comeback-last-000001077.html																									Plastic Bags Are Making a Comeback. Will It Last?															Bloomberg
07:27PM					https://finance.yahoo.com/news/rpt-bluetooth-phone-apps-tracking-232727649.html																									RPT-Bluetooth phone apps for tracking COVID-19 show modest early results									Reuters
Apr-20-20 09:00PM		https://finance.yahoo.com/news/jerremy-newsome-shares-rules-options-010014004.html																								Jerremy Newsome Shares The Rules For His Options Strategy													Benzinga

Open in new window



from this piece of html
<table width="100%" cellpadding="0" cellspacing="0"><tr><td><table width="100%" cellpadding="1" cellspacing="0" border="0" id="news-table" class="fullview-news-outer">
<tr><td width="130" align="right" style="white-space:nowrap">Apr-22-20 01:30AM&nbsp;&nbsp;</td><td align="left"><a href="https://finance.yahoo.com/news/stmicro-sees-declining-demand-automotive-053033014.html" target="_blank" class="tab-link-news">STMicro Sees Declining Demand for Automotive Chips Next Quarter</a> <span style="color:#aa6dc0;font-size:9px">Bloomberg</span></td></tr>
<tr><td width="130" align="right" style="white-space:nowrap">Apr-21-20 10:43PM&nbsp;&nbsp;</td><td align="left"><a href="https://www.investors.com/market-trend/stock-market-today/dow-jones-futures-crude-oil-prices-test-coronavirus-stock-market-rally-netflix-snap-chipotle-earnings/?src=A00220" target="_blank" class="tab-link-news">Dow Jones Futures: Crashing Crude Oil Prices Test Coronavirus Stock Market Rally; 5 Big Earnings Movers</a> <span style="color:#aa6dc0;font-size:9px">Investor's Business Daily</span></td></tr>
<tr><td width="130" align="right">09:31PM&nbsp;&nbsp;</td><td align="left"><a href="https://finance.yahoo.com/news/facebook-plow-5-7-billion-005209259.html" target="_blank" class="tab-link-news">Facebook to Invest $5.7 Billion in Ambanis Jio Platforms</a> <span style="color:#aa6dc0;font-size:9px">Bloomberg</span></td></tr>
<tr><td width="130" align="right">08:00PM&nbsp;&nbsp;</td><td align="left"><a href="https://finance.yahoo.com/news/plastic-bags-making-comeback-last-000001077.html" target="_blank" class="tab-link-news">Plastic Bags Are Making a Comeback. Will It Last?</a> <span style="color:#aa6dc0;font-size:9px">Bloomberg</span></td></tr>
<tr><td width="130" align="right">07:27PM&nbsp;&nbsp;</td><td align="left"><a href="https://finance.yahoo.com/news/rpt-bluetooth-phone-apps-tracking-232727649.html" target="_blank" class="tab-link-news">RPT-Bluetooth phone apps for tracking COVID-19 show modest early results</a> <span style="color:#aa6dc0;font-size:9px">Reuters</span></td></tr>
<tr><td width="130" align="right" style="white-space:nowrap">Apr-20-20 09:00PM&nbsp;&nbsp;</td><td align="left"><a href="https://finance.yahoo.com/news/jerremy-newsome-shares-rules-options-010014004.html" target="_blank" class="tab-link-news">Jerremy Newsome Shares The Rules For His Options Strategy</a> <span style="color:#aa6dc0;font-size:9px">Benzinga</span></td></tr>
</table>

Open in new window

Comment
Watch Question

Leo TorresSQL Developer
CERTIFIED EXPERT

Author

Commented:
Figured it out on my own eventually

import pandas as pd
from pandas import DataFrame
from finviz.helper_functions.request_functions import http_request_get
from finviz.helper_functions.scraper_functions import get_table
def get_news2(ticker):
    """
    Returns a list of sets containing news headline and url

    :param ticker: stock symbol
    :return: list
    """
    ticker = 'AAPL'
    NEWS_URL = 'https://finviz.com/news.ashx'
    STOCK_URL = 'https://finviz.com/quote.ashx'
    page_parsed, _ = http_request_get(url=STOCK_URL, payload={'t': ticker}, parse=True)
    table = page_parsed.cssselect('table[class="fullview-news-outer"]')[0]
    all_news = page_parsed.cssselect('a[class="tab-link-news"]')
    headers = ['Datetime', 'Description', 'Space', 'Source']
    urls = [row.get('href') for row in all_news] 
    data = [dict(zip(headers, row.xpath('td//text()'))) for row in table[0:]]
    df1 = pd.DataFrame(urls) 
    df2 = pd.DataFrame(data) 
    mergedDf = df2.merge(df1, left_index=True, right_index=True)
    mergedDf = mergedDf.rename(columns={0: "url"})
    mergedDf = mergedDf.drop(['Space'], axis=1)
    mergedDf['ticker'] = ticker
    return mergedDf

Open in new window

SQL Developer
CERTIFIED EXPERT
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.