Go Premium for a chance to win a PS4. Enter to Win


Present data from different sources in realtime

Posted on 2013-11-26
Medium Priority
Last Modified: 2013-11-26

I have several data sources which let's say (for simplification reasons) that send the data like the following way:

source, id, key, value

I have about 200 such data sources which all send these kind of data when I request them. When the data arrive I need to do some calculations. For example I need to map each of their IDs with my IDs in the database.

What would be the best practice for that? Shall I keep all data in memory and do the calculations there? Currently I use curl to get data with php and then perform all the calculations once all the sources have completed sending data, which is too slow.
Question by:infodigger
  • 4
  • 3

Expert Comment

ID: 39676953
Hi infodigger,

Several questions:

 - How are your data sources accessed (ODBC, HTTP, text files etc.)? - I assume HTTP
 - What format are your data sources sending the data as (DB records, JSON, ASCII text, SOAP etc.)? - I assume SOAP/JSON/ASCII
 - What is the frequency that the data is updated?
 - Are the different data sources related (i.e. are there dependencies where data from A is needed before data from B etc.)?
 - Do you need to keep a history of old data (i.e. calculations on CURRENT data depend on OLD data)?
 - How complex are the calculations? (it is likely that the time consuming bit will be the retrieval of data)
- what do you need to do with the results after you have calculated them (present dynamic web page, write into your own DB etc.)?

It is inclear from your description whether the data sources are PUSHING the data and you have listeners in your PHP code, or whether you are PULLING the data at specified time intervals (regardless of whether the data source has updated); or whether the data is retrieved for every web page request.

In general:
CURL is getting data from a HTTP location, which potentially will be slow. If there is no other way of accessing the data, then you are stuck with this method.

If the webpage is currently retrieving the source data and doing all the necessary calculations each time it is called in a browser, then of course a remote HTTP call is made by CURL, and possibly complex calculations performed, for every web page request. This will potentially slow down the page load quite considerably. A much better solution is to retrieve the data periodically on your server, process it and cache the result somewhere (in RAM if there is sufficient, as local files, in your local DB). The webpage should then retrieve this LOCAL copy of the processed data - this will be percieved as much faster page loads etc.

The format and how that format is processed to retrieve the data can potentially affect performance as well - make sure any string/XML manipulation is as efficient as it can be.

The calculations cannot be done anywhere except in memory, so I am not sure what you mean there - if you mean should you cache the data from the data source, then that will depend on expected frequency of update of the source.

You also suggest that you DO need all the data before you can calculate the result - "Currently I use curl to get data with php and then perform all the calculations once all the sources have completed sending data, which is too slow" - in which case, optimisation of the data processing, calculation and output code should be your first area for investigation.

I have made a lot of assumptions here, so apologies if some of the above is obvious or wrong...

Given what I understand, I would initially look into some kind of asyncronous process to retrieve, process and store LOCALLY the results. I would then code the web pages to retrieve the local result.


Author Comment

ID: 39677205
Hi smeghammer,

Thank you very much for your extensive answer.

The sources I am getting data from, send XMLs when I send a request. The data change all the time like flight tickets for example. So every time a user makes a request, I have to check with every sources and get their xml file and it's no use to cache the results on my server.

I can have asynchronous loading of these sources but somehow they should be matched with each other, so I guess it will need to be a combination of javascript and server-side processing. If you have any example/case/scenario of such thing plese let me know. It is the same way that all the meta search engines work like flights/hotels/insurance/price comparison/etc.

Thank a lot!

Expert Comment

ID: 39677228
Ah - it's a portal?

You say you already have async caching, but the cached data need to be matched? What is it that matches them together? Is it just the act of searching for something? Logged-in user ID? If it is possible to 'pre-link' some of this cached data that might help.

Sorry, can't really suggest more without knowing a bit more about the logic of how your process works.

I suspect the deciding factor will be how often the source data is updated. If these are updated very frequently, or at random, then the only choice you have is to make requests in real time on each page load. You can probably do some optimisation for the data sources that are unlikely to change very often (addresses, maps or whatever) and cache these results locally.

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.


Author Comment

ID: 39677246

Let's it's a hotel comparison site (it's a little more complicated but that would help me explain the process).

You have your own hotels in database with their ids, and each one has another id for each of the data sources you are receiving data from. For example you can have:

hotelid = 1, expedia_id =203, booking_id =394, etc.

When the user requests the data, you hit each source and the pricing for their id, comes as a result. For example:

expedia_id = 203 | price = $45
booking_id = 394 | price = $59

you need to match those ids with your id so that you present:

hotelid = 1| price_expedia= $45, price_booking = $59

but the price from expedia and booking comes in different time (for example booking might send the request faster).

As you wait for all the sources to complete, you need to calculate in background the data that you have already got and present it.

Here is a good example:

You will see that the page loads and as it loads it does this job connecting the data it receives with their database and presenting them in realtime.

Accepted Solution

smeghammer earned 2000 total points
ID: 39677276
OK... I see the issue clearly now :-)

Other than code optimisation, the obvious approach - unless you do this already, in which case I apologise - is to use AJAX for each of the positions where the comparison price is to be rendered. Your main page will be rendered, and you will get each comparison price displaying as and when it is delivered - this is exactly what the example URL you sent appears to be doing.

The big issue of course with this is cross domain access. You would need to create a bunch of server-side proxy scripts that called each remote service and simply returned the XML. Your AJAX code would call these proxy PHP (I guess..) files, rather than trying to call the remote URLs directly.

Using the above methodology, the ACTUAL HTTP load time would not be any different, but the PERCIEVED page load time would be considerably better as the main page would return quickly, and each data point would fill in over the next few seconds as each AJAX call completed. This way, you don't have to wait for all data to return before performing the calculations and sending the page to the browser.


Author Closing Comment

ID: 39677305
Thank you very much for your time to answer this question in so much detail.

Expert Comment

ID: 39677425
You are welcome.


Featured Post


Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
This article discusses how to implement server side field validation and display customized error messages to the client.
The viewer will learn how to count occurrences of each item in an array.
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)
Suggested Courses

927 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question