Could you point how to automatize information transactions between systems on this challenger scenario?

Eduardo Fuerte
Eduardo Fuerte used Ask the Experts™
on
Hi Experts

 Could you point how to automatize information transactions between systems on this challenger scenario?

A web site called  PMS stores all the information we need to create our booking portal, it holds the list of houses, the availability calendar of these houses, photos, descriptions and prices traded.

To get this information, however, we have a difficult mission, as PMS does not provide any integration APIs. Customer may provide us with credentials for access to the PMS, which is accessible through the web browser.

Could you point if a possible solution on this scenario in general lines, to give a starting point, maybe by using a robot.

Thanks in advance!
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Fractional CTO
Distinguished Expert 2018
Commented:
If I understand.

1) You have a booking system for houses, similar to AirBNB.

2) Data lives on some website somewhere.

3) You're reselling the data, maybe as a booking agent.

A very common practice.

Without an API in place to pull data, this means you'll have to scrape data off the site as HTML, then parse the data.

This is incredibly complex to get working, then keep working, because each time the Website HTML changes in some minor way, you'll potentially have to rework large amounts of code on your end. And during the time of the HTML change + your code rework, you're income will go to $0.

This is a very tricky way to generate income, as any minute your income can go to $0 till your dev team does a lot of work.

Also, you must be careful the data site allows scraping, because if they don't catching + blocking scrapers is trivial.

If I were doing this... first I would use a different data site with an API... If that was impossible, then I'd write a scraper which did a set of test scrapes every few minutes, which would produce known results. Then anytime the known results failed, this would indicate an HTML change on the data site. At this point, I'd put my own site in maintenance mode (rather than producing some random/crazy error) till I'd reworked all my code back to a functional state.

Tip: Many sites offer APIs. Better to pick a data site with an API.

Tip: Be sure to code using PhantomJS or Selenium (since PhantomJS is now abandon-ware), because normal scrapers have no Javascript or CSS processing, both of which are almost certainly required for this project.
Eduardo FuerteDeveloper and Analyst

Author

Commented:
Hi David

Thank you for so elaborated reply.

What is the inconvenience on not to use a Javascript/ CSS processing scrapers since just the data obtained is the relevant?
nociSoftware Engineer
Distinguished Expert 2018

Commented:
Sometimes all handling is done in CSS/JS and only part of a URL is in the HTML part.

Construct like:   <div imx id=adfblae123></div>    then the CSS can transform that  ID refering to f.e. an image, or even using some AJAX to add data to it.
(in the latter case it might be possible to make the AJAX calls yourself though, that would need more intelligent investigation).
Hello Eduardo,

You will have to prepare yourself for different Scenarios/Cases:
1) Initially track, where exactly and how the Information is stored (which div  etc....)
    There are several ways to do this, by using python, or even other licensed software (you have to pay for)

2) Make sure you are always visiting the actual/Updated Website - be aware of cached pages, cookies etc.

3) Keep track of the Website Update Dates

4) Decide, how often you have to read/update your data for their accuracy.

5) Above all, read the Website Terms of Use and  license (If this changes, you may not be allowed to "scrape"  this Site anymore)


Finally, I think, it is mostly a matter of tactic and at a later/second stage, a matter of which tools or programming language to use.
;-)
David FavorFractional CTO
Distinguished Expert 2018

Commented:
Just because a simple scrapper works today, doesn't mean it will work tomorrow.

In other words, if you write all your code expecting to never have any Javascript or CSS to process, at this point in time, this likely means your code will eventually fail... just as soon as some simple Javascript is added to the site.

Better to tool all your code today using some system which can handle any future site code change.

If budget is a consideration. If you have unlimited budget + can afford scraping + rewriting your entire system from scratch, any approach will work.

Also remember, if a site's TOS (terms of service) prohibit scrapping, you'll almost certainly be caught + blocked, which is another key reason to use an API system.
Eduardo FuerteDeveloper and Analyst

Author

Commented:
Hi

Thank you for all the replies.

The scrapper application should be triggered by an S.O. chron what do you suggest?
nociSoftware Engineer
Distinguished Expert 2018

Commented:
vixie-cron is buried due to lack of maintenance. cronie seems the best successor, if you can choose.
otherwise the one delivered with you Linux/Unix.
Eduardo FuerteDeveloper and Analyst

Author

Commented:
Thank you for the guidance!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial