Avatar of Arthur Wang
Arthur Wang
Flag for United States of America asked on

Need expert opinion for a real time data processing web application

I have a financial project which receive real time stock data from some data vendor , save it into mysql database, then retrieve the data and send to the end user browser. The client software provided by the data vendor used to receive stock data is a program written by c/c++ running on the server.  this client can save the data into the mysql database(does not have to be mysql, could be switched to any other database). In order to retrieve the data from the database as quickly as possible, any framework can I use? heard about CES or ESP? spark streaming? any of them can be used for my project?  if not, how can I only retrieve the un-read data from the database as soon as it reach the database? the stock data feed is probably about maxium1000 records(my wild guess, might not be correct)  a second. see the sample below.

+---------------------+--------+-------------------+-------------+
| insertTime                  | symbal | trade_time                 | trade_price |
+---------------------+--------+-------------------+-------------+
| 2016-09-15 04:00:00 | AAPL   | 20160915040000017 |      111.70 |
| 2016-09-15 04:00:00 | AAPL   | 20160915040000017 |      111.70 |
| 2016-09-15 04:00:00 | AAPL   | 20160915040000200 |      111.69 |
| 2016-09-15 04:00:00 | AAPL   | 20160915040000200 |      111.69 |
| 2016-09-15 04:00:00 | AAPL   | 20160915040000272 |      111.51 |
| 2016-09-15 04:01:14 | AAPL   | 20160915040113878 |      111.57 |
| 2016-09-15 04:01:14 | AAPL   | 20160915040113887 |      111.57 |
| 2016-09-15 04:01:14 | AAPL   | 20160915040114011 |      111.57 |
| 2016-09-15 04:01:20 | AAPL   | 20160915040120342 |      111.57 |
........

| 2016-09-15 04:28:29 | AAPL   | 20160915042828740 |      112.16 |
| 2016-09-15 04:28:33 | AAPL   | 20160915042833306 |      112.18 |
| 2016-09-15 04:31:39 | AAPL   | 20160915043138895 |      112.18 |
| 2016-09-15 04:31:39 | AAPL   | 20160915043138895 |      112.24 |
| 2016-09-15 04:35:13 | AAPL   | 20160915043513179 |      112.10 |
| 2016-09-15 04:35:16 | AAPL   | 20160915043515888 |      112.10 |
| 2016-09-15 04:35:16 | AAPL   | 20160915043515888 |      112.09 |
| 2016-09-15 04:35:58 | AAPL   | 20160915043558378 |      112.18 |
| 2016-09-15 04:35:58 | AAPL   | 20160915043558378 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043734987 |      112.19 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043734987 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043734995 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043735002 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043735056 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043735063 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043735071 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043735116 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043735123 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043735131 |      112.18 |
| 2016-09-15 04:37:35 | AAPL   | 20160915043735138 |      112.18 |
| 2016-09-15 04:37:36 | AAPL   | 20160915043736164 |      112.18 |
| 2016-09-15 04:37:52 | AAPL   | 20160915043752468 |      112.18 |
| 2016-09-15 04:37:52 | AAPL   | 20160915043752476 |      112.18 |
| 2016-09-15 04:37:52 | AAPL   | 20160915043752539 |      112.18 |
| 2016-09-15 04:37:52 | AAPL   | 20160915043752547 |      112.18 |
| 2016-09-15 04:37:52 | AAPL   | 20160915043752555 |      112.18 |
| 2016-09-15 04:38:01 | AAPL   | 20160915043801260 |      112.18 |
| 2016-09-15 04:38:31 | AAPL   | 20160915043831574 |      112.20 |
+---------------------+--------+-------------------+-------------+

I know the simple way to do is to add another column to mark if each record has been read or  not, then change the value after read. is this the best way to do?

I was thinking about the other option which is to capture the data from the data vendor client (c/c++ program) first, then forward the data to both the browser and the database simultaneously, in this way it could save the the time for the round trip to the database. however, I am not good at c/c++, only good at java,  have no idea how to get data from the c/c++ program. any work around solution? probably adopt apache kafka?
JavaNoSQL DatabasesOracle DatabaseMicrosoft SQL ServerBig Data

Avatar of undefined
Last Comment
dpearson

8/22/2022 - Mon
SOLUTION
Máté Farkas

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
Arthur Wang

ASKER
Thanks Mate, though I have no idea about the price of OneTick software(database or the streaming service) at this moment,  a while guess is that I might not be able to afford it under the current situation. So I prefer to go for open source project at this moment.
ASKER CERTIFIED SOLUTION
dpearson

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
dpearson

Inactive for 14 days.
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes