I'm running a Solaris house, very homogeneous, about 75 servers in production and another 75 servers for LAB and Staging. I run all with two admins (One junior and myself) and I have been asked to provide justification for more or less (??) personnel, including our tasks, etc (My management doesn't have prior experience with running this kind of operation at this level)
I run a 24/7 System, with an SLA aiming to five 9's. This is a startup company looking to get bigger contracts and we have a tight budget.
All my servers are from Sun Microsystems, running Solaris 10. On the production environment the physical layer is about fourteen sun T1000 and six T5120. Inside the physical layer I run SUN Ldoms, each server is partitioned from two to six guest machines. I use Sun Cluster on 8 of them, Sun Glassfish 2.0 in three of the Web Farms and Sun Webserver 7 for Load Balancing running also on Sun Cluster.
I also administer the two database systems, one with Postgres 8.3 for the Web Services farms and another one with MySQL for our proprietary application.
All the system is connected to a small brocade Fabric, with two SAN switches and redundant paths, connected to SUN storatek disk arrays.
I administer everything inside the cage on a collocated cage (I have a network admin that handles the Cisco Network, Firewalls, switches etc,, but have not skill son Unix or SAN)
We cover everything from , day to day admin tasks, design, test of new implementations, LABs, Disaster Recovery, etc etc etc. Not to mention running the corporate services that nobody cares about but must be running all the time (In house exchange email server, Sharepoint server, SQL Server and Peach tree for accounting, Internet, Mobile VPNs, Desktop support for about 10 Laptops, etc, etc) . As you guys can see, I dont get too much sleep and the levels of pressure are a little high (not to mention the risk the company is taking putting all this on the hands of two people,, one of them a Junior admin)
Basically, I need a way to explain my management the reasons why I think our technical environment is over dimensioned for the amount of personnel I have.
I don't know if there are official number, for admin/server ratio, considering risk, the kind of operation, etc, etc. Some document with best practices will be ok,, everything I find is for medium to big companies and we are a startup here, so I need to be realistic in terms of what I can get. The original request from my management was to provide a list of tasks we perform day to day to see if we can take some more, and then when I reply about not having room for more I was asked to provide some kind of metrics to check in what we spend the time. Since my management is not coming from IT it is very difficult to explain in terms of our tasks, why they take all of our time, etc.
Any help would be great
Thanks
Manny