points for bazarny (storing/retrieving large double[] arrays)

thank you
Who is Participating?
Igor BazarnyCommented:
Original question is pending deletion, saving some bit of it...

Question was:

As the title reads, I have to manage many (> 20000) large double[] arrays (length > 100000). These arrays
represent measured power consumption values (KWh) with an interval of approx. five minutes.

These array values are used for billing purposes as well as near future power consumption forecasts.
These arrays have to be stored in an RDBMS, notably Oracle comes to mind. Object databases are out of
the question, because our customers don't even know what these things are ...

I am not an SQL/RDBMS expert; I'm a mathematician I have to confess, so basically I know nothing at
all. My question boils down to this: how can I store/retrieve/update these arrays as efficiently as
possible? Storing a (small) fixed number of values per row is not an option; I've tried it and it's
simply too slow.

I've been reading my way into BLOBS. Saving/loading a double[] array into a blob is a two step process:
wrapping a DataOutputStream around a ByteArrayOutputStream, hooked up to a BLOB does the job, but somehow
I feel that this is going to be slow.

OTOH, most (all?) of those RDBMS's implement an 'ARRAY' type; are these ARRAYs resizable? They need
to be, because newly measured data have to be appended to the existing 'vectors' of data. If these arrays
aren't resizable, they're useless for my purposes, but if they aren't could they be a solution to my

I'm a bit stuck at the moment, so I'd really appreciate tips, hints, solutions, anything.

thanks in advance and kind regards

My part of discussion:


Could you describe to some extent operations you are going to perform on your data? Maybe you simply
don't need to retrieve all your numbers and let database do your calculations. You know, in SQL you
can specify that you want to calculate average, max, sum of some set of values. This way you probably
wouldn't need to get a lot of data from RDBMS.

Igor Bazarny
bazarny: most of the operations on those arrays (or vectors) are quite simple actually. Vector addition
is one of them. OTOH the selection criteria, i.e. which vectors are to be added are quite complicated
(notably auto-correlation, in order to find periodical behaviour of the measured power consumption).

The data vectors represent power consumption (KWh) of large electriciy consumers (factories, national
railroad segments etc.) Knowledge of their consumption 'behaviour' is critical for electricity producing
and trading companies (it's all about money ;-)

These trading companies have to decide (sometimes using a granularity of fifteen minutes) how much electric
power to buy and distribute on their respective networks. Therefore they need a certain forecasting
model using these large arrays of data.

Basically the operations are simple addition (the grand total power consumption) and vector addition
(the 'behaviour' of a selected group of consumers). The nasty operations are: spike detection (an unsuspected,
unusually large value with no periodical behaviour, hence the auto-correlation stuff), invalid data
detection (unusal low valued sequences, representing measurement errors), low pass filtering (for base
load calculations) and simple (linear) regression methods for correlation purposes.

Thank you for your comment and kind regards

I guess you don't need all of your calculations to be performed in real time. Or you can make your calculations
incrementally, using database to store intermediate data.

I would try analize requirements and build data model in attempt to avoid transfering of large amount
of data to/from database. I believe you can do a lot of calculations using SQL, triggers and stored

Note that this way your code would not be portable across servers, but if data access performance is
issue you have to choose.

Another issue to consider is backup frequency requirements. You may decide to store your data in temporary
files and periodically save that files to database, so you won't need to access database each time you
neew data.

Igor Bazarny,
Brainbench MVP for Java 1

you wrote:
>I guess you don't need all of your calculations to be
>performed in real time. Or you can make your calculations
>incrementally, using database to store intermediate data.

Indeed, not all calculations need to be performed real time; they can be done overnight (back office
billing comes to mind). OTOH, if incoming data (measured power consumption) does not comply with the
forecasts (and hence the previously ordered amounts of KWh) a new forecast for the next, say, hour has
to be made in order to correct the mismatch between actual consumption and the predicted consumption.
These calculations have to be blazingly fast because these trading companies have to accept penalties
for every second where a mismatch between total power consumption and total power production occurs.

>I would try analize requirements and build data model in
>attempt to avoid transfering of large amount
>of data to/from database. I believe you can do a lot of
>calculations using SQL, triggers and stored
>Note that this way your code would not be portable across
>servers, but if data access performance is
>issue you have to choose.

I'd like to avoid such non-poratbility; not all our customers run the same RDBM systems. This would
be a maintenance nightmare.

>Another issue to consider is backup frequency
>requirements. You may decide to store your data in
>temporary  files and periodically save that files to
>database, so you won't need to access database each time
>you neew data.

I think you've got something here ... if our customers allow us to store a bunch of files (containing
those arrays) on their systems which they don't need to backup or manipulate in any way, I could use
these files on a daily basis, i.e. old data and results could be stored in a database, while new incoming
data will be stored in those files. Somewhere during the night these files should be stored in the database
also and the whole cycle could start over again ... Thank you, I haven't thought about this; I'll give
it a real serious thought.

kind regards
Here's a small update: indeed, I'm giving up on the idea of using BLOBs here; their functionality is
just not it. The last couple of days I've been reading myself into release 1.4 and found some very interesting
packages, notably the java.nio.* packages could be extremely helpful, especially the memory mapped file
classes come to mind.

Since this topic is in a 'quiescent' state now, I'd like to split the points among all three of you,
just to be fair. However, I have no idea how to accomplish that; could anyone tell me how to divide
the 300 points into 100/100/100? Should I delete this question and open three bogus topics, just to
collect those points?

kind regards

jos010697Author Commented:
Thank you. If you're still interested, I could keep you informed about any progression I make (if any ;-) I can be reached through email quite often on the following address:


kind regards
Igor BazarnyCommented:

Thanks for points

You also can contact me via e-mail (just check out my member profile). But experience shows that using EE is preferable way to solve problems (compared to direct e-mail to me)--on EE you can get solution from a lot of people, on the other hand I'm not always responsive on e-mail--I have job, you know... And on EE I have luxury of choosing questions I want to investigate, depending on available time and whatever comes to mi mind. But you still can notify me directly if you think I can help you with further questions.

Igor Bazarny.

P.S. some point-split tricks:
- when you split points, post references to additional questions in the original one, just copy browser address line to comment, EE will convert anything which looks like URL into hyperlink.
- If you get your question answered, it's a good idea to keep it even if you want to split the points, so that others could use accumulated knowledge (PAQ search could be cheaper than posting new question)
- Moderators really help to split points, and do it reasonably fast
- Oh, I see that you answered more questions than me, I guess you should have known all that, don't you?
jos010697Author Commented:
Igor Bazarny,

No need to thank me for those points, well deserved.
About email vs EE: you're right, I'll keep my progression/questions EE-forum based. As you indicated, that's the more sensible thing to do.

About those questions answered by me: I don't really care about points/answers etc. If a comment works out fine, that's fine with me. I only answer questions when I'm 100% sure that the topic would go astray without my answer.

About knowing those 'mechanisms' about splitting points etc. Like I wrote before, I'm a clumsy oaf ;-) I really had no idea how to do that ... being the mathematician I am, I basically know nothing at all ;-)

kind regards
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.