Present data from different sources in realtime

Posted on 2013-11-26
Last Modified: 2013-11-26

I have several data sources which let's say (for simplification reasons) that send the data like the following way:

source, id, key, value

I have about 200 such data sources which all send these kind of data when I request them. When the data arrive I need to do some calculations. For example I need to map each of their IDs with my IDs in the database.

What would be the best practice for that? Shall I keep all data in memory and do the calculations there? Currently I use curl to get data with php and then perform all the calculations once all the sources have completed sending data, which is too slow.
Question by:infodigger
  • 4
  • 3

Expert Comment

ID: 39676953
Hi infodigger,

Several questions:

 - How are your data sources accessed (ODBC, HTTP, text files etc.)? - I assume HTTP
 - What format are your data sources sending the data as (DB records, JSON, ASCII text, SOAP etc.)? - I assume SOAP/JSON/ASCII
 - What is the frequency that the data is updated?
 - Are the different data sources related (i.e. are there dependencies where data from A is needed before data from B etc.)?
 - Do you need to keep a history of old data (i.e. calculations on CURRENT data depend on OLD data)?
 - How complex are the calculations? (it is likely that the time consuming bit will be the retrieval of data)
- what do you need to do with the results after you have calculated them (present dynamic web page, write into your own DB etc.)?

It is inclear from your description whether the data sources are PUSHING the data and you have listeners in your PHP code, or whether you are PULLING the data at specified time intervals (regardless of whether the data source has updated); or whether the data is retrieved for every web page request.

In general:
CURL is getting data from a HTTP location, which potentially will be slow. If there is no other way of accessing the data, then you are stuck with this method.

If the webpage is currently retrieving the source data and doing all the necessary calculations each time it is called in a browser, then of course a remote HTTP call is made by CURL, and possibly complex calculations performed, for every web page request. This will potentially slow down the page load quite considerably. A much better solution is to retrieve the data periodically on your server, process it and cache the result somewhere (in RAM if there is sufficient, as local files, in your local DB). The webpage should then retrieve this LOCAL copy of the processed data - this will be percieved as much faster page loads etc.

The format and how that format is processed to retrieve the data can potentially affect performance as well - make sure any string/XML manipulation is as efficient as it can be.

The calculations cannot be done anywhere except in memory, so I am not sure what you mean there - if you mean should you cache the data from the data source, then that will depend on expected frequency of update of the source.

You also suggest that you DO need all the data before you can calculate the result - "Currently I use curl to get data with php and then perform all the calculations once all the sources have completed sending data, which is too slow" - in which case, optimisation of the data processing, calculation and output code should be your first area for investigation.

I have made a lot of assumptions here, so apologies if some of the above is obvious or wrong...

Given what I understand, I would initially look into some kind of asyncronous process to retrieve, process and store LOCALLY the results. I would then code the web pages to retrieve the local result.


Author Comment

ID: 39677205
Hi smeghammer,

Thank you very much for your extensive answer.

The sources I am getting data from, send XMLs when I send a request. The data change all the time like flight tickets for example. So every time a user makes a request, I have to check with every sources and get their xml file and it's no use to cache the results on my server.

I can have asynchronous loading of these sources but somehow they should be matched with each other, so I guess it will need to be a combination of javascript and server-side processing. If you have any example/case/scenario of such thing plese let me know. It is the same way that all the meta search engines work like flights/hotels/insurance/price comparison/etc.

Thank a lot!

Expert Comment

ID: 39677228
Ah - it's a portal?

You say you already have async caching, but the cached data need to be matched? What is it that matches them together? Is it just the act of searching for something? Logged-in user ID? If it is possible to 'pre-link' some of this cached data that might help.

Sorry, can't really suggest more without knowing a bit more about the logic of how your process works.

I suspect the deciding factor will be how often the source data is updated. If these are updated very frequently, or at random, then the only choice you have is to make requests in real time on each page load. You can probably do some optimisation for the data sources that are unlikely to change very often (addresses, maps or whatever) and cache these results locally.

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!


Author Comment

ID: 39677246

Let's it's a hotel comparison site (it's a little more complicated but that would help me explain the process).

You have your own hotels in database with their ids, and each one has another id for each of the data sources you are receiving data from. For example you can have:

hotelid = 1, expedia_id =203, booking_id =394, etc.

When the user requests the data, you hit each source and the pricing for their id, comes as a result. For example:

expedia_id = 203 | price = $45
booking_id = 394 | price = $59

you need to match those ids with your id so that you present:

hotelid = 1| price_expedia= $45, price_booking = $59

but the price from expedia and booking comes in different time (for example booking might send the request faster).

As you wait for all the sources to complete, you need to calculate in background the data that you have already got and present it.

Here is a good example:,Kensington,London,England,United-Kingdom-c28501-h168980-details/2013-12-03/2013-12-06/2guests/expanded/#overview

You will see that the page loads and as it loads it does this job connecting the data it receives with their database and presenting them in realtime.

Accepted Solution

smeghammer earned 500 total points
ID: 39677276
OK... I see the issue clearly now :-)

Other than code optimisation, the obvious approach - unless you do this already, in which case I apologise - is to use AJAX for each of the positions where the comparison price is to be rendered. Your main page will be rendered, and you will get each comparison price displaying as and when it is delivered - this is exactly what the example URL you sent appears to be doing.

The big issue of course with this is cross domain access. You would need to create a bunch of server-side proxy scripts that called each remote service and simply returned the XML. Your AJAX code would call these proxy PHP (I guess..) files, rather than trying to call the remote URLs directly.

Using the above methodology, the ACTUAL HTTP load time would not be any different, but the PERCIEVED page load time would be considerably better as the main page would return quickly, and each data point would fill in over the next few seconds as each AJAX call completed. This way, you don't have to wait for all data to return before performing the calculations and sending the page to the browser.


Author Closing Comment

ID: 39677305
Thank you very much for your time to answer this question in so much detail.

Expert Comment

ID: 39677425
You are welcome.


Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

The Confluence of Individual Knowledge and the Collective Intelligence At this writing (summer 2013) the term API ( has made its way into the popular lexicon of the English language.  A few years ago, …
OverviewThis article demonstrates a simple search form using AJAX. The purpose of the article is to demonstrate how to use the same code to render a page and javascript (JQuery) and AJAX to make subsequent calls to refine the results. The princip…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

685 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question