Link to home
Start Free TrialLog in
Avatar of Mike Paradis
Mike Paradis

asked on

Cluster running same OS

I'm wanting some input from someone that has hands on experience with clusters.
We already use virtual hosts (vmware, proxmox) and they have their place and so do containers but I'm not asking about those, 


I'm wanting to find a solution where as growth is needed, I can simply keep adding blades for at least two of the main functions, serving web pages and db nodes.


Years ago, I put together a search engine project. I had 64 nodes (servers) to begin getting things set up. Different nodes would be given different jobs but all of them would read/write to one shared directory structure. Every node would work on different things but all of the tasks and results were put into centralized storage.

For this, I used the Linux GFS file system.


However, it's been so long that I cannot recall some things and looking on the net, I can't confirm what I'm looking for, mainly, did I use just one OS?


I seem to recall that each node would network boot from one shared boot, all using the same OS. I seem to recall I just needed to maintain that single OS from any one node and the rest would use that.


The point is, rather than maintaining a large number of web servers and db servers that all have their own OS's, it would be more efficient to have only say two sets of operating systems, keep those updated, firing up as many blades as I need. A cluster where all I need to do is add nodes for growth.


Does anyone here have this kind of hands on experience and could offer me a little insight on what might be available to do this kind of thing.

ASKER CERTIFIED SOLUTION
Avatar of arnold
arnold
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of skullnobrains
skullnobrains

the question is a bit limited in terms of context but unattended installation and using either scripts or whichever deploy tool such as ansible would allow that.

i commonly use dns to populate the nodes list so reverse proxies or other sub clients get an up to date list of nodes for each service.
Avatar of Mike Paradis

ASKER

I wasn't quite sure what to include in the question as it's been a long time since I used that GFS cluster.
What I need is a way to expand web server and db nodes without having to maintain every single instance meaning the operating system and their tools etc on each server/vm. That is taking up a lot of time.

It would be nice to have one or two OS's to maintain and simply fire up another server that boots from network, runs the same OS, is added to the load balanced list of servers responding to traffic.

What else can I provide for information? I'm happy to try.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Ok, let's see if I can provide more details that can be useful. I hope it's not too much though.

The OS could be anything but we mostly use Centos 7.9 or 8.

>Is your thought resource issues related to the DB or to the Front end systems?

It's actually both.

Right now, we have virtual machines that have Apache/php installed on them handling the incoming web clients. Adding another is as simple as cloning it to another host. This applies to any servers/services where we need to add more resources to handle the load but it's mainly just web servers. Of those servers, one load balanced set handle only incoming data from devices all over the net and another load balanced set handle members getting to their control panels.

DB is a single hefty instance of mariadb for all of the above.
However, I have to make some decisions soon on how to gain db redundancy. One instance is causing a lot of problems so we need DB redundancy. I'm not sure if that will end up being a galera cluster for example or just a master/master or master/slave setup. Using a cluster adds proxysql in front of each client or maybe a couple of load balanced instances centralized.

When cloning the above servers, there is no point in having multiples of the same vms on one host so instances are spread across multiple hosts and given as much of the hosts resources as possible and available.

So even if I go with a DB cluster, I'll end up adding one node on each vm host and at some point, the vm layer no longer makes sense. It would make more sense to use the entire blade to run Centos or BSD or something and it would handle web connections and have an instance of madiadb, probably another node of a cluster.

The rest of the services could live on vm hosts without issues.

Other than the DB issue that needs to be solved, the main problem is that cloning/replicating is leading to more and more servers to maintain.

In our application, the main thing we'll have to keep expanding is the ability to handle web requests and DB redundancy. I would like to find a way to have one single image that we maintain that all vms or soon, entire blades run.

Fire one up, done, it's now serving just like the others and only one OS to maintain. Ok, maybe a handful but the same idea. In my perfect world, I could use the same image across data centers as well and it's simply expanding as needed.

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
So much usable information, thank you!

>a galera cluster is a good idea. you would need a little mild scripting rather than just a clone but merely a
>few lines. anyway, you probably won't scale the db often.

Yes, already running a three node with proxysql in front on each client/server.
I was thinking about putting a node on every nth server if I was going with some sort of cluster as there would be no need for 100 DB servers and I imagine things would not work well after a while since all those nodes would need to stay synced up.

>the physical server is vmw5, the web server is www5, the db is db5. www5 will send all connections
>to db5 unless it is dead or severely overloaded. in that case it will use the backend "db" which is a
>dns name containing all the running dbs. note that if you have lots of physical hosts, you probably do not
>want that many dbs. that was merely an example.

That's a nice clean approach. I'll keep that in mind. I do something similar but not quite the same. As I add clones, I'll do something like cp4 and dbc04 etc running on blade 4. But I'm not using dns, just keeping track of machine names and where they are/belong.

>i would personally probably run a cluster of 3 or 5 physical dbs, and virtualize the rest. but that vastly
>depend on the setup, number of hosts, etc

I wondered about a mix of bare metal and vms. Would doing that not add up to the same thing as the slowest drive is your top speed or the slowest network connection is your fastest speed, etc. So in the case of db servers, I'm really not sure what would happen with a mix of bare metal and vms. Never had the opportunity to test that.

>deploying on multiple sites usually works with the dns as well : each site is associated with search suffixes
>such as "germany.lan lan". all progs use short names. you can easily populate cluster wide or dc wide
>host lists using for example www.germany.lan or www.lan. the same logic can easily apply to host types.

Yes, I do that too but only internally. For example, a san diego site would have all diego.loc, a st.paul mn site would have all stpaul.loc etc.

>working with BSD is in that regard MUCH easier since they hardly ever break backwards compatibility.

We have been using centos for so many years but have used lots of others along the way and still have a mix. I'm starting to lean toward Debian and BSD and using ZFS more for storage. Just have to find time to play with it more.
 
>haprxoxy) is usually a rather simple way to deal with such moving setups without issues.

So, haproxy is currently how we are doing load balancing but what I'm more interested in is having less instances of OS's to maintain. The other day I found an odd problem while trying to test another problem I've posted about. I used a bunch of web site speed and other testing tools to see if our site was always reachable since we've been told folks are getting gateway errors, presumably because of this DB or NFS issue we're having in this other post.

Several of the test sites could not even reach our site so I decided to shut down the proxy and have just one web server handling things. Those same services now could reach our web site. No idea why. Have not had the time to investigate that new problem.

>you probably want a way to update your hosts config from a central place. i usually mix both : configs
>are centralized  and actually the entirety of the deployment is, but i still use the dns so i do not need to
>restart services for no reason.

If I understand, you're not saying use a cluster but a centralized way to keep all of the web servers (and others) maintained. OS's don't only have configs though, they have packages that get updated so I'm missing something in this.
I can see maintaining one version of a vm that gets cloned and the different configs are fetched as the new version of the vm goes up. I think this is what you're talking about.
number of dbs : more than 5 is probably useless or counter productive except for very heavy read loads with almost no writes. prefer odd numbers. i see little point in trying to scale dbs frequently as syncing will take time. do not virtualize dbs at all imho.

--

automated instal' goes far beyond configs.

i usualy run install scripts periodically every 5 minutes. a typical script will install the soft, check the config, install the config, stop the service if the config files are newer than the running process, syart the service, run a few basic tests, complain if anything failed. on an already installed machine, the first steps merely detect there is nothing to do.

during initial setup and until you laster things properly, you probably want to do that automatically on some hosts in advance.

a separate task would select the useful scripts for the host and stick them in crontab

it is easy to wrap package installation so they work on multiple os. config files are messier since locations vary. i often find it simpler to run dedicated instances of each service tied to a config file which is totally independant from the system.

sky is the limit ;)
Ah yes, the evolution of what starts as a simple script and eventually turns into an amazing tool.

So I guess you are leaning away from 'clusters' at least in the sense of multiple machines all running the same image, etc. But you are advocating more of a managed method of keeping many servers matched at software and config levels.

I was hoping for a cluster because I love the idea of maintaining just one something then cloning it as needed but since you have been doing this for many years and understand what I'll be facing and planning for, I think I should pay attention to your words and really, do appreciate this input.

I think the input gives me a path and it's one I was already on so no major changes, just get better at what I've been doing and maybe ask for tips now and then.

The only question I have which might be a little unrelated is about your db mention.

>number of dbs : more than 5 is probably useless or counter productive except for very heavy read loads
>with almost no writes. prefer odd numbers.

Curious why you mention liking odd numbers?

> i see little point in trying to scale dbs frequently as syncing will take time. do not visualize dbs at all imho.

The main DB I have now is running on a proxmox host but it's the only vm on the host. I figured this would be a good way to easily back up the server or clone it etc. Mind you, there isn't much to setting up a DB server but I wanted to do it this way because things were fluid recently during a never ending upgrade.

The HPE blades I'm using have a RAID controller but I didn't want to use that so I put two SSD drives into it and set up a ZFS RAID1 when installing proxmox.

Installing OS's on these blades is a pain in the rear. You either have to install a SAS license to use RAID or you have to use an operating system that has sw RAID. But, this is why I was thinking along the lines of no virtualization but installing an OS and LAMP and making each of the DB instances part of a galera cluster. Maybe not every single blade but as needed.

I've been using IBM blades for many years but I needed 120VAC power supplies for this particular rack so ended up using HPE.

The galera cluster I've build does use three vms at the moment. I thought it would be a good starting point as we slowly move non production databases over to that to see how things go.
i found that cloning was a very inefficient way to manage. it takes time to update stuff, install something new, make config changes. building the masters is a pain. deploying as well in most cases.

i forgot you obviously also need a script to sync them all. i usually do this through git or a simple periodic wget but you may as well pull each script on demand. same same.

 i will help with whatever you choose if you run into issues

if you craft it properly, you can make changes on all hosts, specific hosts, a category of hosts within minutes. and with that great a power, break everything as well ;)

--

odd numbers is due to the quorum mechanism. if a cluster of 4 members is split into two halves, neither of the 2x2 hosts halves will have morf than half the number of last known working hosts. all hosts will decide they got separated from the cluster and stop working at the same time.

--

i would totally advocate against full disk backups for dbs. this is awfully inefficient, not incremental, and busy dbs will not restart properly from a regular disk backup and not even always from a snapshot.

--

totally with you regarding soft raid. these are imho hasbeen. entirely. they mostly still exist because vmware does not handle software raid.

even with virtio, the overhead of virtualisation is consequent.

fuse zfs as well. i am unsure whether proxmox uses the in kernel or fuse version

also beware that double caching in zfs and the vm ram essentially wastes a very sugnificant portion of your ram. up to half in a worst case scenario

--

i do not know of a single operating system that does not handle software raid besides vmware.

--

i have no idea regarding your case but most web apps are bound by the sql. i tend to think a reasonable number of dbs on bare bone will be enough.

you can consider running the backend part of your apps in containers directly on the db. this allows efficient, secure and low latency communication over unix sockets.

the security is not worse than more traditional architectures and way more efficient. you would obviously keep a reverse proxy with protocol rupture if your backs do fastcgi. said proxies also act as load balancers.

a well crafted static platform often performs way better than an automatically scaled less optimized one.

... but i have no idea what you are dealing with specifically so just throwing ideas. i did all the above or variations of the above on multiple production platforms, though.
I realize based on your input and another question that everything I'm trying to find a solution for will end up overlapping in terms of how to handle web servers and db servers.

https://www.experts-exchange.com/questions/29238555/db-cluster-ha-vs-load-balancing-and-or-both.html

I also feel that I am unsure how to have these conversations online in this way. It's a lot of information, it's a lot of input, it's a long thread where points/edits can be missed. The posts simply get very long and it's a lot of information shared that many IT people or at least company policy require that much of it not be shared.

I'm assuming you've seen my other one with the weird mariadb problems we're having.

Anyhow, if I am understanding the overall picture better, I don't need a cluster but I still need three things, some of which I already have or at least partially.

-A method of building/adding/maintaining nodes as traffic requires.
-A method of load balancing and fail over for web connections.
-A method of having at least two DB servers without too much complexity.

A few months ago, I thought about getting rid of front end load balancing in place of application logic.
I was thinking about adding self balancing logic into the app, meaning, it would keep things balanced based on its own knowledge of how many people or devices are connecting. It would split that traffic off on its own not only to keep it balanced but to keep it regional as well in some cases. Why have UK traffic coming to the US if the app can know to keep it in the UK.

All this sounds nice and since we do use gitlab internally, I like your point about using git to keep things synched up.
Based on this and your dns input, I think I can find a path forward and have been developing an internal DB based tasking system as well that could work perfectly.

In the above question (the other link), there was a comment that sounds similar to what you are suggesting.

>One approach as David points out is to contain app,api and a db .

Since this all ties together, I thought it would be better to join all that here. The main thing I am missing then is how to deal with having at minimum two databases so we can have fail over. This has been the nightmare failure point so far, the DB. When something happens to the DB, everything stops and it has been.

You and David seem to be pointing out that a host/server could contain everything needed to handle connections, web services and DB all self contained. What's not clear is how I keep these things synchronized.
Sure, one server could handle X number of what ever but if incoming connections always want to use that server, and if it's down, then that client has to hit something else that has the same data it expects.

This is where I'm missing something in these ideas of having full instances on individual servers.

>i do not know of a single operating system that does not handle software raid besides vmware.

I think you mean after installing the OS. I've not seem any other than BSD/Debian that offer ZFS during install so I don't have to mess with all kinds of configuring after the installation. Maybe I need to look at some other distros again.

>but i have no idea what you are dealing with specifically so just throwing ideas. i did all the above or
>variations of the above on multiple production platforms, though.

It's not large at the moment but my concern is how it could grow exponentially so we need to be ready before that happens by building the right environment. I want our venture to spend less time on the handing of traffic and more time on delivering the best features, functions possible. We are constantly distracted by the problem above and I have to solve that somehow.

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi noci,

>Question is does you setup require a cluster.... (tightly coupled set of systems working on a job)....
>what you probablly need is a  "standard" webserver and deploy it using f.e. ansible.
>Then an extra server is only adding a system to the array.

Yes, this is the conclusion I am getting too also based on the input is doing what we're doing and finding ways to improve upon what we're doing. I opened a tab to ansible too.

We recently starting using proxmox thanks to your input noci and it's been a pretty interesting and reliable solution. The live migration without any down time is very useful. I like how we can just keep adding hosts and they simply become part of the cluster. Definitely liking it more than vmware but still running both for a while.

OpenVMS isn't an option at this time, it sounds pretty costly.

I think I have enough input to get somewhere now. The big challenge is dealing with DB which is another question.

Thanks everyone.
Closing a question and picking a solution is sometimes very difficult.

There are many times where multiple answers were the answer/solution or helpful enough to have some directions.
Picking every comment that was in fact useful might dilute the overall post.

I don't know... over thinking it maybe :).
a mariadb cluster will solve the db issue as far as reliability is concerned.

it will also scale reads to some extent.

scaling writes with sql nodes is on the other hand very difficult. there are systems that do shard the data and will scale writes to some extent but running actual complex sql queries on them is generally slow.

such needs are better handled by nosql tech such as cassandra

it is quite easy to scale a single machine io with zfs in simili raid 10 by simply throwing more disks. unmatched disk sizes and speeds are not much of an issue as long as mirrored pairs are matched.

--

splitting traffic over multiple locations can be done with redirections and/or dns geolocalisation or external services such as cloudflare. this would be a topic by itself. note that nowadays, most browsers are capable of failover between dns addresses which allows simple and robust architectures.

good luck.
Luckily, our traffic is something like 90% reads and no complex queries.
The only issues is moving over to that with limited hands on time and needing to get to redundancy asap.
I'll start a new question about that.

if that is your traffic profile and you do not expect disk space shortage, don t loose time. you are in the exact situation that is best suited for a mariadb cluster. run it on raid 10 and on physical hosts. be done with it.

if you use zfs, prefer a bsd or solaris os. and either provide raw pools to mariadb, or use directio over a zfs filesystem.
Yes, as mentioned in my other question, wanted to give it a little more time to come to light but no leads have come up that shows clearly where the issue is so time to make decisions.

On these blades, my options are hardware RAID of 0,1 and 1+0 and I have cache cards.
Or, using FreeBSD which is what I was leaning toward for ZFS.

I'll have to read up on using raw pools etc later.

>don t loose time. you are in the exact

Agreed. I updated the other question in terms of what hardware/drives I have handy to see what I can do today.