Link to home
Start Free TrialLog in
Avatar of adbyits
adbyits

asked on

how do i split my website scrapping results

Hi all , so i have scrapped a website and got the data, ( i tihnk :)) i want to know how i can split it all apart and save each item of text in their own element list, , at the end of it i want to save the data to a mysql database , here is my code, please let me know if you feel anything else can be changed

from requests_html import HTMLSession

# create an HTML Session object
session = HTMLSession()

# Use the object above to connect to needed webpage
resp = session.get("https://www.adelaideairport.com.au/flight-information/flight-search/")

# Run JavaScript code on webpage
resp.html.render()

# parse <span class="with-image"> elements containing airline names
airline_list = []
airline_spans = resp.html.find('.SearchResultFlightListRow')
for span in airline_spans:
    airline_list.append(span.text)



print(airline_list)
Avatar of Norie
Norie

This will extract the flight no, from, to, scheduled departure etc. for each flight.

airline_list = [span.text.split('\n') for span in airline_spans]

for flight in airline_list:
    if len(flight) == 7:
        flightno, From, to, scheduled, estimated, gate, status = flight 
    elif len(flight) == 6:
        flightno, From, to, scheduled, estimated, gate = flight 
        status = 'N/A'
    elif len(flight) == 5:
        flightno, From, to, scheduled, estimated = flight 
        gate = 'N/A'

   ' replace with code to add flight details to database
    print (f'Flight no {flightno} from  {From} to {to} is scheduled to depart at {scheduled} from gate {gate}')

Open in new window

You should ask them for an rest API access..
Avatar of adbyits

ASKER

@Norie thanks mate i get hte first part but what are the 2nd and  3rd parts doing
Not sure what you mean.

If you mean the elif parts of the code then they are being used to handle when there's data missing, e.g. a missing gate number.
Avatar of adbyits

ASKER

ok cool i understand now thanks for the added code
Avatar of adbyits

ASKER

OK here is the code now mate
from requests_html import HTMLSession

# create an HTML Session object
session = HTMLSession()

# Use the object above to connect to needed webpage
resp = session.get("https://www.adelaideairport.com.au/flight-information/flight-search/")

# Run JavaScript code on webpage
resp.html.render()

# parse <span class="with-image"> elements containing airline names
airline_list = [span.text.split('\n') for span in airline_spans]

for flight in airline_list:
    if len(flight) == 7:
        flightno, From, to, scheduled, estimated, gate, status = flight
    elif len(flight) == 6:
        flightno, From, to, scheduled, estimated, gate = flight
        status = 'N/A'
    elif len(flight) == 5:
        flightno, From, to, scheduled, estimated = flight
        gate = 'N/A'

#replace with code to add flight details to database
print (f'Flight no {flightno} from  {From} to {to} is scheduled to depart at {scheduled} from gate {gate}')



print(airline_list)

Open in new window


i am getting a error    File "sa_new.py", line 13, in <module>
    airline_list = [span.text.split('\n') for span in airline_list]
NameError: name 'airline_list' is not defined
Avatar of adbyits

ASKER

ok mate i got it working the one thing i am now seeing that i was getting with my code is the airline name, this is one big thing i need
The airlines aren't included in airline_spans.
Avatar of adbyits

ASKER

it is mate its the Title of the logo

<img alt="Qantas Airlines" title="Qantas Airlines" src="https://www.adelaideairport.com.au/wp-content/uploads/2018/01/1280px-Qantas_Airways_logo_2016-top_padding.png" data-width="1280" data-height="826">
This question needs an answer!
Become an EE member today
7 DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform.
View membership options
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.