Solved

PostgreSQL Database design for time based data

Posted on 2014-02-27
8
508 Views
Last Modified: 2014-03-14
Hi,

I am trying to work out the best database design for the following application.

In a factory we have about 500 machines/sensors that send back data every minute when in operation.

Each record received has
Date Time
Machine ID
Event Type - High / Low / Normal / Startup / Shutdown etc.
Various small data fields

For an average 8 hour day there would be 240,000 records and we keep records for many years.

All of the queries will have a data range as part of the search. We will be querying things like -
Records between Date1 and Date2 WHERE MachineID = X
Records between Date1 and Date2 WHERE EventType = Low
Records between Date1 and Date2 WHERE MachineID = X and EventType = 5
Latest record WHERE MachineID = X and EventType = 1

Questions
Should each machineID have its own table?
Should by primary key be a composite of DateTime MachineID and EventType? or should it be a 'surrogate' key?
What sort of index should I create?

Thanks
0
Comment
Question by:mhdi
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
8 Comments
 

Author Comment

by:mhdi
ID: 39893830
Yes, I am intending to use Postgre. I selected the other topics as I figured the question on database design will most likely be similar across all SQL databases.
0
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 167 total points
ID: 39893841
My experience with large databases is that performance is generally ok as long as the indexes are suitable for the query being run. If you had one table with all the machines, then for example you would want indexes on:
1. machine_id and event type and date (still works ok if you don't provide a date)
2. machine_id and date (caters for when you don't have an event type)

For any other fields that are regularly queried, you'd need further indexes.

All that said, I've worked with informix and oracle rather than postgre when it comes to large quantities of data. It would be worthwhile writing a script to generate the quantity of data (ie several years worth) you're going to need to handle and load it into the database to test performance before committing to a database design and application that may start to run into trouble later (if you don't test it in advance).

I personally would try to put it all in one table if postgre could handle it. It is time consuming to make up for de-normalised data.
0
 
LVL 26

Expert Comment

by:Zberteoc
ID: 39894946
I would do use one table with the following indexes:

DateTime, MachineId, EventType
MachineId,EventType,DateTime
EventType ,DateTime,MachineId
0
Simplifying Server Workload Migrations

This use case outlines the migration challenges that organizations face and how the Acronis AnyData Engine supports physical-to-physical (P2P), physical-to-virtual (P2V), virtual to physical (V2P), and cross-virtual (V2V) migration scenarios to address these challenges.

 
LVL 35

Expert Comment

by:Terry Woods
ID: 39895719
@Zberteoc, could you please explain your reasoning for that choice of indexes? I don't understand why you've suggested those, and having extra columns in an index for a table containing an enormous quantity of data may have a performance cost.
0
 
LVL 26

Assisted Solution

by:Zberteoc
Zberteoc earned 166 total points
ID: 39896043
You are right, my bad. It should only be:

DateTime, MachineId, EventType
MachineId,EventType
EventType

Just in case you have to search any of the columns only. It all depends really how you query the table. If you are sure you will never search on EventType only then you don't need that index. However, don't forget that for a composite index to be used you HAVE TO have the first column of the index in the search criteria or in join clauses.
0
 
LVL 26

Expert Comment

by:Zberteoc
ID: 39896071
Sorry, I removed a comment meant for other question. :)
0
 
LVL 62

Accepted Solution

by:
gheist earned 167 total points
ID: 39897844
Insert 30000 records / h = 500/min = 8 rows/s
It will work just great on any average machine.
If you want to keep 500 persistent connections consider pgpool instead of beefing up postgresql.


Indexes are for data retrieval. For collecting data you dont need them. They actually add some IOs (say ~5 IO/s on single insert + 3 per index)
e.g have 3 indices 8row/s = (5+9)*8 = 100 IO/s = 3600RPM for collecting data alone
That leads us in placing 1000+IO/s SSD storage in data collection path
0

Featured Post

Resolve Critical IT Incidents Fast

If your data, services or processes become compromised, your organization can suffer damage in just minutes and how fast you communicate during a major IT incident is everything. Learn how to immediately identify incidents & best practices to resolve them quickly and effectively.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
SP inserts data with order number, then push the rest one order up 11 19
SQL Syntax 6 41
Merge join vs exist 3 27
SQL query 7 18
This article describes how to use the timestamp of existing data in a database to allow Tableau to calculate the prior work day instead of relying on case statements or if statements to calculate the days of the week.
In this article we will learn how to fix  “Cannot install SQL Server 2014 Service Pack 2: Unable to install windows installer msi file” error ?
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.

733 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question