Avatar of Leo Torres
Leo TorresFlag for United States of America asked on

Need help parsing html to python dataframe

Trying to extract this into a dataframe
Time					Link																																											Description																									Source
Apr-22-20 01:30AM       https://finance.yahoo.com/news/stmicro-sees-declining-demand-automotive-053033014.html																							STMicro Sees Declining Demand for Automotive Chips Next Quarter												Bloomberg																								
Apr-21-20 10:43PM		https://www.investors.com/market-trend/stock-market-today/dow-jones-futures-crude-oil-prices-test-coronavirus-stock-market-rally-netflix-snap-chipotle-earnings/?src=A00220		Dow Jones Futures: Crashing Crude Oil Prices Test Coronavirus Stock Market Rally; 5 Big Earnings Movers		Investor's Business Daily
09:31PM					https://finance.yahoo.com/news/facebook-plow-5-7-billion-005209259.html																											Facebook to Invest $5.7 Billion in Ambanis Jio Platforms													Bloomberg
08:00PM					https://finance.yahoo.com/news/plastic-bags-making-comeback-last-000001077.html																									Plastic Bags Are Making a Comeback. Will It Last?															Bloomberg
07:27PM					https://finance.yahoo.com/news/rpt-bluetooth-phone-apps-tracking-232727649.html																									RPT-Bluetooth phone apps for tracking COVID-19 show modest early results									Reuters
Apr-20-20 09:00PM		https://finance.yahoo.com/news/jerremy-newsome-shares-rules-options-010014004.html																								Jerremy Newsome Shares The Rules For His Options Strategy													Benzinga

Open in new window

from this piece of html
<table width="100%" cellpadding="0" cellspacing="0"><tr><td><table width="100%" cellpadding="1" cellspacing="0" border="0" id="news-table" class="fullview-news-outer">
<tr><td width="130" align="right" style="white-space:nowrap">Apr-22-20 01:30AM&nbsp;&nbsp;</td><td align="left"><a href="https://finance.yahoo.com/news/stmicro-sees-declining-demand-automotive-053033014.html" target="_blank" class="tab-link-news">STMicro Sees Declining Demand for Automotive Chips Next Quarter</a> <span style="color:#aa6dc0;font-size:9px">Bloomberg</span></td></tr>
<tr><td width="130" align="right" style="white-space:nowrap">Apr-21-20 10:43PM&nbsp;&nbsp;</td><td align="left"><a href="https://www.investors.com/market-trend/stock-market-today/dow-jones-futures-crude-oil-prices-test-coronavirus-stock-market-rally-netflix-snap-chipotle-earnings/?src=A00220" target="_blank" class="tab-link-news">Dow Jones Futures: Crashing Crude Oil Prices Test Coronavirus Stock Market Rally; 5 Big Earnings Movers</a> <span style="color:#aa6dc0;font-size:9px">Investor's Business Daily</span></td></tr>
<tr><td width="130" align="right">09:31PM&nbsp;&nbsp;</td><td align="left"><a href="https://finance.yahoo.com/news/facebook-plow-5-7-billion-005209259.html" target="_blank" class="tab-link-news">Facebook to Invest $5.7 Billion in Ambanis Jio Platforms</a> <span style="color:#aa6dc0;font-size:9px">Bloomberg</span></td></tr>
<tr><td width="130" align="right">08:00PM&nbsp;&nbsp;</td><td align="left"><a href="https://finance.yahoo.com/news/plastic-bags-making-comeback-last-000001077.html" target="_blank" class="tab-link-news">Plastic Bags Are Making a Comeback. Will It Last?</a> <span style="color:#aa6dc0;font-size:9px">Bloomberg</span></td></tr>
<tr><td width="130" align="right">07:27PM&nbsp;&nbsp;</td><td align="left"><a href="https://finance.yahoo.com/news/rpt-bluetooth-phone-apps-tracking-232727649.html" target="_blank" class="tab-link-news">RPT-Bluetooth phone apps for tracking COVID-19 show modest early results</a> <span style="color:#aa6dc0;font-size:9px">Reuters</span></td></tr>
<tr><td width="130" align="right" style="white-space:nowrap">Apr-20-20 09:00PM&nbsp;&nbsp;</td><td align="left"><a href="https://finance.yahoo.com/news/jerremy-newsome-shares-rules-options-010014004.html" target="_blank" class="tab-link-news">Jerremy Newsome Shares The Rules For His Options Strategy</a> <span style="color:#aa6dc0;font-size:9px">Benzinga</span></td></tr>

Open in new window

* python3* web scrapingHTML

Avatar of undefined
Last Comment
Leo Torres

8/22/2022 - Mon
Leo Torres

Figured it out on my own eventually

import pandas as pd
from pandas import DataFrame
from finviz.helper_functions.request_functions import http_request_get
from finviz.helper_functions.scraper_functions import get_table
def get_news2(ticker):
    Returns a list of sets containing news headline and url

    :param ticker: stock symbol
    :return: list
    ticker = 'AAPL'
    NEWS_URL = 'https://finviz.com/news.ashx'
    STOCK_URL = 'https://finviz.com/quote.ashx'
    page_parsed, _ = http_request_get(url=STOCK_URL, payload={'t': ticker}, parse=True)
    table = page_parsed.cssselect('table[class="fullview-news-outer"]')[0]
    all_news = page_parsed.cssselect('a[class="tab-link-news"]')
    headers = ['Datetime', 'Description', 'Space', 'Source']
    urls = [row.get('href') for row in all_news] 
    data = [dict(zip(headers, row.xpath('td//text()'))) for row in table[0:]]
    df1 = pd.DataFrame(urls) 
    df2 = pd.DataFrame(data) 
    mergedDf = df2.merge(df1, left_index=True, right_index=True)
    mergedDf = mergedDf.rename(columns={0: "url"})
    mergedDf = mergedDf.drop(['Space'], axis=1)
    mergedDf['ticker'] = ticker
    return mergedDf

Open in new window

Leo Torres

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
See how we're fighting big data
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes