Normal Practice for OLTP and OLAP Systems in Companies

Hi guys I hope you dont mind me asking a really newbie question.
Guys Im trying to build an inventory database of our vmware setup.
So, let's say my database is called 'vmdb'.
Kinds of things i want to collect are:
- Computername of hosts, vms
- Uptime of hosts, vms
- Memory on hosts, vms etc etc

What I want to understand is if you dont to do trend analysis, historical reporting, for example:
"How many times in the last 6 months has a host rebooted", or
"How many new vms have been created in the last year".... etc.

Is this where the term 'OLAP' comes in? Is this like a data warehouse, and usually would this 'OLAP' system contain:
- Much larger data because its historical?
- A mirror image of what is in the 'OLTP' system?

Sorry for these questions.

So, in 'best practice' environments, would you create TWO databases on TWO different servers?
- ONE for the OLTP system?
- ONE for the OLAP system?

Im trying to understand what the 2 terms mean 'OLTP' and 'OLAP' and "how' from a high level they work.

Another thing im trying to understand is if I have my vmdb database, with tables such as....
virtualMachines
physicalMachines etc etc.

Let's say i want to see historical data of uptime for a specific system, but uptime is a field in the 'physicalMachines' table, so some records in the physicalMachines table might be:

host_id (PK)   host_name (UNIQUE)        uptime          date
1                      system01         3 days        
2                      system05         3 days
3                      system03         3 days

So what I mean is, in this situation, you can only have one record in this table for a system, and thus only one record for this host on uptime. In this situation, how could you do reporting on this particular system over a period of time on uptime, for example...

host_id (PK)   host_name (UNIQUE)       uptime      date
1                      system01         3 days
1                      system01         6 days
1                      system01         9 days
1                      system01         3 days

You couldnt have the above situation where you have multiple records for the SAME system, because
the host_name is unique.

So, another question is, would you probably do the following....

- Create 2 tables for 'physicalMachines'
- One in the OLTP system
- One in the OLAP system

Copy at a scheduled daily time, the records from the OLTP 'physicalMachines' table, to the OLAP 'physicalMachines' table, then remove those records from the OLTP 'physicalMachines' table, so that when a new day commences, new data for that day can be entered into the OLTP system?

Sorry guys for all these questions. Im just not quite sure how most of the gurus, that is, you guys, do this incredible stuff, and would love to learn.

Any help greatly appreciated.



Would you suggest CLEANING out (that is, deleting/removing all data)

LVL 1
Simon336697Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

David ToddSenior DBACommented:
Hi,

I don't think that you have enough data to warrant thinking about olap.

I would carefully design a OLTP to store the VM transaction data - an row in the table to say that system was rebooted etc.
You may need some flags or status values to classify the row

The query to find the last reboot is something like this:
select
  l.hostname
  , max( l.logDateTime )
from dbo.vmlog l
group by
  l.hostname

uptime is based on something like this:
select
  max( l.LogDateTime )
from dbo.vmlog

select
  l.hostname
  , max( l.LogDateTime )
from dbo.vmlog l
where l.LogDateTime <
  (
  select max( li.LogDateTime )
  from dbo.vmlog li
  where li.hostname = l.hostname
  )

All this to say that much of your queries can at first glance be answered from a OLTP system. Bear in mind that there isn't going to be a lot of volume in this log table - assume that you are working a 40 hour week, and it will take 45 minutes per task and 5 to log it. That is only 8 entries a day, or approx 2000 entries for a year.

Compared to some tables where things get interesting at 1 million or 10 million rows, you could run this for almost your entire working life before things get exciting from a size point of view.

Bottom Line:
Start with a OLTP system to get the data first. Then start the reporting when there is something to see - its quite hard to create reports on empty data. If things warrant it in a year or two, think about OLAP and cubes.

Cheers
  David
0
Simon336697Author Commented:
David I cant thank you enough.

Your examples were great.

I spose where Im getting confused, and hope you can clarify for me.

If you are talking about collecting information on systems, for example, on the number of reboots a system has had over the last 6 months.

Would I be right in saying that you would not but the reboot field time and date in the systems table, because each system can have only ONE entry.

Would you create a reboots table for this, and a separate table for EVERY "olap-type" reporting requirement you need?

For example, let's say I wanted to do reporting on....
a) Reboots
b) Uptime
c) Disk Space
on a PER System Basis

Would you do the following, or something different?

===================================== Systems table
system_id    (PK)
system_name  (UNIQUE)

====================================== Reboots table
reboot_id
system_id (FK from systems table)
reboot_time_date
date_inserted

====================================== Uptime table
uptime_id
uptime
system_id (FK)
date_inserted

======================================= Disk Space
disk_space_id
system_id (FK)
drive_c
drive_d
drive_e
date_inserted

Would the above, be an okay way of doing it, or could you see improvements?

Sorry to ask, just really keen to get a handle on it :>)
0
David ToddSenior DBACommented:
Hi,

I don't see how the uptime can be meaningfully measured or logged. Isn't the uptime the time between adjacent reboots?

So we are down to logging the reboots and work vs the diskspace.

I have the following two tables for logging and reporting on diskspace
create table dbo.Server(
      ServerName sysname NOT NULL,
      ServerDesc varchar(255) NULL,
      ProductionSQL bit NULL DEFAULT (0),
      ProductionSQLInstanceName sysname NULL,
 constraint pk_Server primary key clustered
(
      ServerName ASC
) on [PRIMARY]
) on [PRIMARY]

create table dbo.ServerDriveLog(
      LogDateTime datetime NOT NULL DEFAULT (getdate()),
      ServerName sysname NOT NULL,
      DriveLetter char(1) NOT NULL,
      VolumeName char(20) NULL,
      FileSystem char(20) NULL,
      TotalSpace bigint NULL,
      FreeSpace bigint NULL,
 constraint pk_ServerDiskStats primary key clustered
(
      LogDateTime ASC,
      ServerName ASC,
      DriveLetter ASC
)on [PRIMARY]
) on [PRIMARY]

The query to report the current data is
SELECT v_CurrentDisks.ServerName, v_CurrentDisks.DriveLetter, v_CurrentDisks.ServerDesc, v_CurrentDisks.ServerDrive, v_CurrentDisks.UsedGB, v_CurrentDisks.AvailableGB, v_CurrentDisks.TwentyPGB, v_CurrentDisks.FreeGB, v_CurrentDisks.UsedPercent, v_CurrentDisks.LogDateTime
FROM infrastructure.dbo.v_CurrentDisks v_CurrentDisks
ORDER BY v_CurrentDisks.ServerName, v_CurrentDisks.DriveLetter

and a view of
create view dbo.v_CurrentDisks as
      select
            s.ServerName
            , l.DriveLetter
            , s.ServerDesc
            , rtrim( s.ServerName ) + ' ' + rtrim( coalesce( l.DriveLetter, '' ) ) as ServerDrive
      --      , VolumeName
      --      , FileSystem
            , round( ( coalesce( l.ATotalSpace, 0 ) - coalesce( l.AFreeSpace, 0 ) ) / 1024.0 / 1024.0 / 1024.0, 2 ) as UsedGB
            , round( ( coalesce( l.AFreeSpace, 0 ) - 0.2 * coalesce( l.ATotalSpace, 0 ) ) / 1024.0 / 1024.0 / 1024.0, 2 ) as AvailableGB
            , round( coalesce( l.ATotalSpace, 0 ) / 1024.0 / 1024.0 / 1024.0 * 0.2, 2 ) as TwentyPGB
            , round( ( coalesce( l.AFreeSpace, 0 ) ) / 1024.0 / 1024.0 / 1024.0, 2 ) as FreeGB
            , case
                  when l.ATotalSpace is not null then
                        1.0 - coalesce( l.AFreeSpace, 0 ) * 1.0 / l.ATotalSpace
                  else 0
            end as UsedPercent
            , (
            select
                  max( li.LogDateTime )
            from dbo.ServerDriveLog li
            ) LogDateTime
      from dbo.Server s
      left outer join
            (
            select
                  li.ServerName
                  , li.DriveLetter
                  , avg( li.TotalSpace ) ATotalSpace
                  , avg( li.FreeSpace ) AFreeSpace
            from dbo.ServerDriveLog li
            where li.LogDateTime > dateadd( day, datediff( day, 0, getdate()), 0 )
            group by
                  li.ServerName
                  , li.DriveLetter       
            ) l
            on s.ServerName = l.ServerName

I'd have a table called work or notes or something that encompasses reboots and uptime and other significant events - call it events

create table ServerEnvents(
      LogDateTime datetime NOT NULL DEFAULT (getdate()),
      ServerName sysname NOT NULL,
      Reboot bit NOT NULL default 0,
      Notes varchar(2000) NULL
 constraint pk_ServerEvents primary key clustered
(
      LogDateTime ASC,
      ServerName ASCC
)on [PRIMARY]
) on [PRIMARY]

HTH
  David
0
Powerful Yet Easy-to-Use Network Monitoring

Identify excessive bandwidth utilization or unexpected application traffic with SolarWinds Bandwidth Analyzer Pack.

Simon336697Author Commented:
You are so much more advanced than me Dave  (thnku)

Dave, what does your "primary key clustered" mean?
constraint pk_ServerEvents primary key clustered

 constraint pk_ServerEvents primary key clustered
(
      LogDateTime ASC,
      ServerName ASCC
)on [PRIMARY]
) on [PRIMARY]

Does the above mean that you are creating a primary key on both LogDateTime and ServerName?
0
David ToddSenior DBACommented:
Hi,

The primary key is whats called a composite key which uses both fields.

Clustered is as opposed to non-clustered, and is a property of the index that is associated with the primary key. Generally its easiest to start with the primary key being clustered, and other indexes are non-clustered. I've seen it illustrated where making another index the clustered index yeilds better performance, but its a good start.

There can be only one clustered index per table on MSSQL.

Cheers
  David
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Simon336697Author Commented:
Gotcha!

THANKS SO MUCH DAVE for helping me. :>)

Cant thank you enough :>)
0
Simon336697Author Commented:
Dave im going to post another quick one on olap because i think ill need to go down this track eventually.
0
David ToddSenior DBACommented:
Hi,

Yip, post another question on OLAP. I'm not the best person to answer OLAP questions though.

Cheer
  David
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Databases

From novice to tech pro — start learning today.