adbyits
asked on
how do i split my website scrapping results
Hi all , so i have scrapped a website and got the data, ( i tihnk :)) i want to know how i can split it all apart and save each item of text in their own element list, , at the end of it i want to save the data to a mysql database , here is my code, please let me know if you feel anything else can be changed
from requests_html import HTMLSession
# create an HTML Session object
session = HTMLSession()
# Use the object above to connect to needed webpage
resp = session.get("https://www.adelaideairport.com.au/flight-information/flight-search/")
# Run JavaScript code on webpage
resp.html.render()
# parse <span class="with-image"> elements containing airline names
airline_list = []
airline_spans = resp.html.find('.SearchRes ultFlightL istRow')
for span in airline_spans:
airline_list.append(span.t ext)
print(airline_list)
from requests_html import HTMLSession
# create an HTML Session object
session = HTMLSession()
# Use the object above to connect to needed webpage
resp = session.get("https://www.adelaideairport.com.au/flight-information/flight-search/")
# Run JavaScript code on webpage
resp.html.render()
# parse <span class="with-image"> elements containing airline names
airline_list = []
airline_spans = resp.html.find('.SearchRes
for span in airline_spans:
airline_list.append(span.t
print(airline_list)
You should ask them for an rest API access..
ASKER
@Norie thanks mate i get hte first part but what are the 2nd and 3rd parts doing
Not sure what you mean.
If you mean the elif parts of the code then they are being used to handle when there's data missing, e.g. a missing gate number.
If you mean the elif parts of the code then they are being used to handle when there's data missing, e.g. a missing gate number.
ASKER
ok cool i understand now thanks for the added code
ASKER
OK here is the code now mate
i am getting a error File "sa_new.py", line 13, in <module>
airline_list = [span.text.split('\n') for span in airline_list]
NameError: name 'airline_list' is not defined
from requests_html import HTMLSession
# create an HTML Session object
session = HTMLSession()
# Use the object above to connect to needed webpage
resp = session.get("https://www.adelaideairport.com.au/flight-information/flight-search/")
# Run JavaScript code on webpage
resp.html.render()
# parse <span class="with-image"> elements containing airline names
airline_list = [span.text.split('\n') for span in airline_spans]
for flight in airline_list:
if len(flight) == 7:
flightno, From, to, scheduled, estimated, gate, status = flight
elif len(flight) == 6:
flightno, From, to, scheduled, estimated, gate = flight
status = 'N/A'
elif len(flight) == 5:
flightno, From, to, scheduled, estimated = flight
gate = 'N/A'
#replace with code to add flight details to database
print (f'Flight no {flightno} from {From} to {to} is scheduled to depart at {scheduled} from gate {gate}')
print(airline_list)
i am getting a error File "sa_new.py", line 13, in <module>
airline_list = [span.text.split('\n') for span in airline_list]
NameError: name 'airline_list' is not defined
ASKER
ok mate i got it working the one thing i am now seeing that i was getting with my code is the airline name, this is one big thing i need
The airlines aren't included in airline_spans.
ASKER
it is mate its the Title of the logo
<img alt="Qantas Airlines" title="Qantas Airlines" src="https://www.adelaideairport.com.au/wp-content/uploads/2018/01/1280px-Qantas_Airways_logo_2016-top_padding.png" data-width="1280" data-height="826">
<img alt="Qantas Airlines" title="Qantas Airlines" src="https://www.adelaideairport.com.au/wp-content/uploads/2018/01/1280px-Qantas_Airways_logo_2016-top_padding.png" data-width="1280" data-height="826">
This question needs an answer!
Become an EE member today
7 DAY FREE TRIALMembers can start a 7-Day Free trial then enjoy unlimited access to the platform.
View membership options
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Open in new window