Solved

Need expert opinion for a real time data processing web application

Posted on 2016-09-15
4
30 Views
Last Modified: 2016-10-06
I have a financial project which receive real time stock data from some data vendor , save it into mysql database, then retrieve the data and send to the end user browser. The client software provided by the data vendor used to receive stock data is a program written by c/c++ running on the server.  this client can save the data into the mysql database(does not have to be mysql, could be switched to any other database). In order to retrieve the data from the database as quickly as possible, any framework can I use? heard about CES or ESP? spark streaming? any of them can be used for my project?  if not, how can I only retrieve the un-read data from the database as soon as it reach the database? the stock data feed is probably about maxium1000 records(my wild guess, might not be correct)  a second. see the sample below.

+---------------------+--------+-------------------+-------------+
| insertTime                  | symbal | trade_time                 | trade_price |
+---------------------+--------+-------------------+-------------+
| 2016-09-15 04:00:00 | AAPL   | 20160915040000017 |      111.70 |
| 2016-09-15 04:00:00 | AAPL   | 20160915040000017 |      111.70 |
| 2016-09-15 04:00:00 | AAPL   | 20160915040000200 |      111.69 |
| 2016-09-15 04:00:00 | AAPL   | 20160915040000200 |      111.69 |
| 2016-09-15 04:00:00 | AAPL   | 20160915040000272 |      111.51 |
| 2016-09-15 04:01:14 | AAPL   | 20160915040113878 |      111.57 |
| 2016-09-15 04:01:14 | AAPL   | 20160915040113887 |      111.57 |
| 2016-09-15 04:01:14 | AAPL   | 20160915040114011 |      111.57 |
| 2016-09-15 04:01:20 | AAPL   | 20160915040120342 |      111.57 |
........

| 2016-09-15 04:28:29 | AAPL   | 20160915042828740 |      112.16 |
| 2016-09-15 04:28:33 | AAPL   | 20160915042833306 |      112.18 |
| 2016-09-15 04:31:39 | AAPL   | 20160915043138895 |      112.18 |
| 2016-09-15 04:31:39 | AAPL   | 20160915043138895 |      112.24 |
| 2016-09-15 04:35:13 | AAPL   | 20160915043513179 |      112.10 |
| 2016-09-15 04:35:16 | AAPL   | 20160915043515888 |      112.10 |
| 2016-09-15 04:35:16 | AAPL   | 20160915043515888 |      112.09 |
| 2016-09-15 04:35:58 | AAPL   | 20160915043558378 |      112.18 |
| 2016-09-15 04:35:58 | AAPL   | 20160915043558378 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043734987 |      112.19 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043734987 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043734995 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043735002 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043735056 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043735063 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043735071 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043735116 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043735123 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043735131 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043735138 |      112.18 |
| 2016-09-15 04:37:36 | AAPL   | 20160915043736164 |      112.18 |
| 2016-09-15 04:37:52 | AAPL   | 20160915043752468 |      112.18 |
| 2016-09-15 04:37:52 | AAPL   | 20160915043752476 |      112.18 |
| 2016-09-15 04:37:52 | AAPL   | 20160915043752539 |      112.18 |
| 2016-09-15 04:37:52 | AAPL   | 20160915043752547 |      112.18 |
| 2016-09-15 04:37:52 | AAPL   | 20160915043752555 |      112.18 |
| 2016-09-15 04:38:01 | AAPL   | 20160915043801260 |      112.18 |
| 2016-09-15 04:38:31 | AAPL   | 20160915043831574 |      112.20 |
+---------------------+--------+-------------------+-------------+

I know the simple way to do is to add another column to mark if each record has been read or  not, then change the value after read. is this the best way to do?

I was thinking about the other option which is to capture the data from the data vendor client (c/c++ program) first, then forward the data to both the browser and the database simultaneously, in this way it could save the the time for the round trip to the database. however, I am not good at c/c++, only good at java,  have no idea how to get data from the c/c++ program. any work around solution? probably adopt apache kafka?
0
Comment
Question by:arthurwang
  • 2
4 Comments
 
LVL 11

Assisted Solution

by:Máté Farkas
Máté Farkas earned 250 total points (awarded by participants)
Comment Utility
This is a typical streamed time-series data problem.
Maybe MySQL (or any other relational database) is not the best solution for this. The key is the proper indexing and clustering of the table and then the querying and processing of data.
Everything depends on the bandwidth of data stream (how much data / sec received from vendor).
For many of these cases usually an CEP system is used (for example OneTick).
These system can receive, save and continously process time series data in real time.
0
 

Author Comment

by:arthurwang
Comment Utility
Thanks Mate, though I have no idea about the price of OneTick software(database or the streaming service) at this moment,  a while guess is that I might not be able to afford it under the current situation. So I prefer to go for open source project at this moment.
0
 
LVL 26

Accepted Solution

by:
dpearson earned 250 total points (awarded by participants)
Comment Utility
For the database access - there should be no need to add a new column and mark it as "read" - indeed that would make the process a lot slower (because each read would also require a write per row being read).

All you need is to make sure that there is an index on trade_time in the database and then have the client (either the browser or your Java code on the server) keep track of the last trade_time seen.  Then request all records where trade_time > last_seen.

The real challenge is deciding that you need to check for new data.  As you say, ideally that would happen as a side effect of the C++ program writing new records.  After it does that write, you need it to somehow signal to the Java code that a new write has just happened.

It doesn't need to necessarily push the data to Java and/or the browser - it just needs to let them know that new data has arrived.  That would be done through something like a socket from the C++ code to the Java code and you just push some data through it whenever new rows are written - which the Java code then reads from the database.  However if hooking that together seems too tricky, you could also just have the C++ code touch a file on the filesystem and have the Java codebase watch the last modified timestamp of that file.  That can be done in a tight loop (e.g. 1 ms sleeps) and so you'll realize new data has arrived very quickly.

Once you have that working, Java should be seeing the new MySQL rows very rapidly (within a millisecond or two of their arrival from C++) and now you just need to get that to the browser as quickly as possible.

Hope that helps,

Doug
0
 
LVL 26

Expert Comment

by:dpearson
Comment Utility
Inactive for 14 days.
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Using SQL Scripts we can save all the SQL queries as files that we use very frequently on our database later point of time. This is one of the feature present under SQL Workshop in Oracle Application Express.
A list of useful business intelligence software.
Via a live example, show how to set up a backup for SQL Server using a Maintenance Plan and how to schedule the job into SQL Server Agent.
This tutorial will introduce the viewer to VisualVM for the Java platform application. This video explains an example program and covers the Overview, Monitor, and Heap Dump tabs.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now