Solved

Present data from different sources in realtime

Posted on 2013-11-26
7
293 Views
Last Modified: 2013-11-26
Hello,

I have several data sources which let's say (for simplification reasons) that send the data like the following way:

source, id, key, value
.
.
.

I have about 200 such data sources which all send these kind of data when I request them. When the data arrive I need to do some calculations. For example I need to map each of their IDs with my IDs in the database.

What would be the best practice for that? Shall I keep all data in memory and do the calculations there? Currently I use curl to get data with php and then perform all the calculations once all the sources have completed sending data, which is too slow.
0
Comment
Question by:infodigger
  • 4
  • 3
7 Comments
 
LVL 4

Expert Comment

by:smeghammer
Comment Utility
Hi infodigger,

Several questions:

 - How are your data sources accessed (ODBC, HTTP, text files etc.)? - I assume HTTP
 - What format are your data sources sending the data as (DB records, JSON, ASCII text, SOAP etc.)? - I assume SOAP/JSON/ASCII
 - What is the frequency that the data is updated?
 - Are the different data sources related (i.e. are there dependencies where data from A is needed before data from B etc.)?
 - Do you need to keep a history of old data (i.e. calculations on CURRENT data depend on OLD data)?
 - How complex are the calculations? (it is likely that the time consuming bit will be the retrieval of data)
- what do you need to do with the results after you have calculated them (present dynamic web page, write into your own DB etc.)?

It is inclear from your description whether the data sources are PUSHING the data and you have listeners in your PHP code, or whether you are PULLING the data at specified time intervals (regardless of whether the data source has updated); or whether the data is retrieved for every web page request.

In general:
CURL is getting data from a HTTP location, which potentially will be slow. If there is no other way of accessing the data, then you are stuck with this method.

If the webpage is currently retrieving the source data and doing all the necessary calculations each time it is called in a browser, then of course a remote HTTP call is made by CURL, and possibly complex calculations performed, for every web page request. This will potentially slow down the page load quite considerably. A much better solution is to retrieve the data periodically on your server, process it and cache the result somewhere (in RAM if there is sufficient, as local files, in your local DB). The webpage should then retrieve this LOCAL copy of the processed data - this will be percieved as much faster page loads etc.

The format and how that format is processed to retrieve the data can potentially affect performance as well - make sure any string/XML manipulation is as efficient as it can be.

The calculations cannot be done anywhere except in memory, so I am not sure what you mean there - if you mean should you cache the data from the data source, then that will depend on expected frequency of update of the source.

You also suggest that you DO need all the data before you can calculate the result - "Currently I use curl to get data with php and then perform all the calculations once all the sources have completed sending data, which is too slow" - in which case, optimisation of the data processing, calculation and output code should be your first area for investigation.

I have made a lot of assumptions here, so apologies if some of the above is obvious or wrong...

Given what I understand, I would initially look into some kind of asyncronous process to retrieve, process and store LOCALLY the results. I would then code the web pages to retrieve the local result.

Cheers
0
 

Author Comment

by:infodigger
Comment Utility
Hi smeghammer,

Thank you very much for your extensive answer.

The sources I am getting data from, send XMLs when I send a request. The data change all the time like flight tickets for example. So every time a user makes a request, I have to check with every sources and get their xml file and it's no use to cache the results on my server.

I can have asynchronous loading of these sources but somehow they should be matched with each other, so I guess it will need to be a combination of javascript and server-side processing. If you have any example/case/scenario of such thing plese let me know. It is the same way that all the meta search engines work like flights/hotels/insurance/price comparison/etc.

Thank a lot!
0
 
LVL 4

Expert Comment

by:smeghammer
Comment Utility
Ah - it's a portal?

You say you already have async caching, but the cached data need to be matched? What is it that matches them together? Is it just the act of searching for something? Logged-in user ID? If it is possible to 'pre-link' some of this cached data that might help.

Sorry, can't really suggest more without knowing a bit more about the logic of how your process works.

I suspect the deciding factor will be how often the source data is updated. If these are updated very frequently, or at random, then the only choice you have is to make requests in real time on each page load. You can probably do some optimisation for the data sources that are unlikely to change very often (addresses, maps or whatever) and cache these results locally.

Cheers
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 

Author Comment

by:infodigger
Comment Utility
smeghammer,

Let's it's a hotel comparison site (it's a little more complicated but that would help me explain the process).

You have your own hotels in database with their ids, and each one has another id for each of the data sources you are receiving data from. For example you can have:

hotelid = 1, expedia_id =203, booking_id =394, etc.

When the user requests the data, you hit each source and the pricing for their id, comes as a result. For example:

expedia_id = 203 | price = $45
booking_id = 394 | price = $59

you need to match those ids with your id so that you present:

hotelid = 1| price_expedia= $45, price_booking = $59

but the price from expedia and booking comes in different time (for example booking might send the request faster).

As you wait for all the sources to complete, you need to calculate in background the data that you have already got and present it.

Here is a good example:
http://www.kayak.co.uk/hotels/Crowne-Plaza-London,Kensington,London,England,United-Kingdom-c28501-h168980-details/2013-12-03/2013-12-06/2guests/expanded/#overview

You will see that the page loads and as it loads it does this job connecting the data it receives with their database and presenting them in realtime.
0
 
LVL 4

Accepted Solution

by:
smeghammer earned 500 total points
Comment Utility
OK... I see the issue clearly now :-)

Other than code optimisation, the obvious approach - unless you do this already, in which case I apologise - is to use AJAX for each of the positions where the comparison price is to be rendered. Your main page will be rendered, and you will get each comparison price displaying as and when it is delivered - this is exactly what the example URL you sent appears to be doing.

The big issue of course with this is cross domain access. You would need to create a bunch of server-side proxy scripts that called each remote service and simply returned the XML. Your AJAX code would call these proxy PHP (I guess..) files, rather than trying to call the remote URLs directly.

Using the above methodology, the ACTUAL HTTP load time would not be any different, but the PERCIEVED page load time would be considerably better as the main page would return quickly, and each data point would fill in over the next few seconds as each AJAX call completed. This way, you don't have to wait for all data to return before performing the calculations and sending the page to the browser.

Cheers
0
 

Author Closing Comment

by:infodigger
Comment Utility
Thank you very much for your time to answer this question in so much detail.
0
 
LVL 4

Expert Comment

by:smeghammer
Comment Utility
You are welcome.

Cheers
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Things That Drive Us Nuts Have you noticed the use of the reCaptcha feature at EE and other web sites?  It wants you to read and retype something that looks like this.Insanity!  It's not EE's fault - that's just the way reCaptcha works.  But it is …
Introduction Got endorsements from your clients?  Great!  There is almost nothing better than word-of-mouth advertising.  But how can you do that on the internet?  Sure you can make a page for endorsement quotations and list them all, but who is …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now