Solved

Need expert opinion for a real time data processing web application

Posted on 2016-09-15
4
44 Views
Last Modified: 2016-10-06
I have a financial project which receive real time stock data from some data vendor , save it into mysql database, then retrieve the data and send to the end user browser. The client software provided by the data vendor used to receive stock data is a program written by c/c++ running on the server.  this client can save the data into the mysql database(does not have to be mysql, could be switched to any other database). In order to retrieve the data from the database as quickly as possible, any framework can I use? heard about CES or ESP? spark streaming? any of them can be used for my project?  if not, how can I only retrieve the un-read data from the database as soon as it reach the database? the stock data feed is probably about maxium1000 records(my wild guess, might not be correct)  a second. see the sample below.

+---------------------+--------+-------------------+-------------+
| insertTime                  | symbal | trade_time                 | trade_price |
+---------------------+--------+-------------------+-------------+
| 2016-09-15 04:00:00 | AAPL   | 20160915040000017 |      111.70 |
| 2016-09-15 04:00:00 | AAPL   | 20160915040000017 |      111.70 |
| 2016-09-15 04:00:00 | AAPL   | 20160915040000200 |      111.69 |
| 2016-09-15 04:00:00 | AAPL   | 20160915040000200 |      111.69 |
| 2016-09-15 04:00:00 | AAPL   | 20160915040000272 |      111.51 |
| 2016-09-15 04:01:14 | AAPL   | 20160915040113878 |      111.57 |
| 2016-09-15 04:01:14 | AAPL   | 20160915040113887 |      111.57 |
| 2016-09-15 04:01:14 | AAPL   | 20160915040114011 |      111.57 |
| 2016-09-15 04:01:20 | AAPL   | 20160915040120342 |      111.57 |
........

| 2016-09-15 04:28:29 | AAPL   | 20160915042828740 |      112.16 |
| 2016-09-15 04:28:33 | AAPL   | 20160915042833306 |      112.18 |
| 2016-09-15 04:31:39 | AAPL   | 20160915043138895 |      112.18 |
| 2016-09-15 04:31:39 | AAPL   | 20160915043138895 |      112.24 |
| 2016-09-15 04:35:13 | AAPL   | 20160915043513179 |      112.10 |
| 2016-09-15 04:35:16 | AAPL   | 20160915043515888 |      112.10 |
| 2016-09-15 04:35:16 | AAPL   | 20160915043515888 |      112.09 |
| 2016-09-15 04:35:58 | AAPL   | 20160915043558378 |      112.18 |
| 2016-09-15 04:35:58 | AAPL   | 20160915043558378 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043734987 |      112.19 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043734987 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043734995 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043735002 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043735056 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043735063 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043735071 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043735116 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043735123 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043735131 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043735138 |      112.18 |
| 2016-09-15 04:37:36 | AAPL   | 20160915043736164 |      112.18 |
| 2016-09-15 04:37:52 | AAPL   | 20160915043752468 |      112.18 |
| 2016-09-15 04:37:52 | AAPL   | 20160915043752476 |      112.18 |
| 2016-09-15 04:37:52 | AAPL   | 20160915043752539 |      112.18 |
| 2016-09-15 04:37:52 | AAPL   | 20160915043752547 |      112.18 |
| 2016-09-15 04:37:52 | AAPL   | 20160915043752555 |      112.18 |
| 2016-09-15 04:38:01 | AAPL   | 20160915043801260 |      112.18 |
| 2016-09-15 04:38:31 | AAPL   | 20160915043831574 |      112.20 |
+---------------------+--------+-------------------+-------------+

I know the simple way to do is to add another column to mark if each record has been read or  not, then change the value after read. is this the best way to do?

I was thinking about the other option which is to capture the data from the data vendor client (c/c++ program) first, then forward the data to both the browser and the database simultaneously, in this way it could save the the time for the round trip to the database. however, I am not good at c/c++, only good at java,  have no idea how to get data from the c/c++ program. any work around solution? probably adopt apache kafka?
0
Comment
Question by:arthurwang
  • 2
4 Comments
 
LVL 12

Assisted Solution

by:Máté Farkas
Máté Farkas earned 250 total points (awarded by participants)
ID: 41801083
This is a typical streamed time-series data problem.
Maybe MySQL (or any other relational database) is not the best solution for this. The key is the proper indexing and clustering of the table and then the querying and processing of data.
Everything depends on the bandwidth of data stream (how much data / sec received from vendor).
For many of these cases usually an CEP system is used (for example OneTick).
These system can receive, save and continously process time series data in real time.
0
 

Author Comment

by:arthurwang
ID: 41801103
Thanks Mate, though I have no idea about the price of OneTick software(database or the streaming service) at this moment,  a while guess is that I might not be able to afford it under the current situation. So I prefer to go for open source project at this moment.
0
 
LVL 26

Accepted Solution

by:
dpearson earned 250 total points (awarded by participants)
ID: 41803222
For the database access - there should be no need to add a new column and mark it as "read" - indeed that would make the process a lot slower (because each read would also require a write per row being read).

All you need is to make sure that there is an index on trade_time in the database and then have the client (either the browser or your Java code on the server) keep track of the last trade_time seen.  Then request all records where trade_time > last_seen.

The real challenge is deciding that you need to check for new data.  As you say, ideally that would happen as a side effect of the C++ program writing new records.  After it does that write, you need it to somehow signal to the Java code that a new write has just happened.

It doesn't need to necessarily push the data to Java and/or the browser - it just needs to let them know that new data has arrived.  That would be done through something like a socket from the C++ code to the Java code and you just push some data through it whenever new rows are written - which the Java code then reads from the database.  However if hooking that together seems too tricky, you could also just have the C++ code touch a file on the filesystem and have the Java codebase watch the last modified timestamp of that file.  That can be done in a tight loop (e.g. 1 ms sleeps) and so you'll realize new data has arrived very quickly.

Once you have that working, Java should be seeing the new MySQL rows very rapidly (within a millisecond or two of their arrival from C++) and now you just need to get that to the browser as quickly as possible.

Hope that helps,

Doug
0
 
LVL 26

Expert Comment

by:dpearson
ID: 41831396
Inactive for 14 days.
0

Featured Post

Enterprise Mobility and BYOD For Dummies

Like “For Dummies” books, you can read this in whatever order you choose and learn about mobility and BYOD; and how to put a competitive mobile infrastructure in place. Developed for SMBs and large enterprises alike, you will find helpful use cases, planning, and implementation.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
sqlquerystress - To test db performance 8 39
sql server computed columns 11 29
CSV file parsing thru Java 13 26
sql import cannot be resolved jsp 3 16
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
I have a large data set and a SSIS package. How can I load this file in multi threading?
This theoretical tutorial explains exceptions, reasons for exceptions, different categories of exception and exception hierarchy.
This video teaches viewers about errors in exception handling.

816 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now