Link to home
Start Free TrialLog in
Avatar of mokkan
mokkan

asked on

YARN memory allocation

Hello,

I am new to Hadoop.  I have a question regarding yarn memory allocation.  If  we have 16GB memory in cluster,  we can have least 3 4GB cluster an keep 4 GB for other uses.  If a job needs 10 GB RAM, would it use 3 containers or  use one container and will start using the ram rest of the RAM ?
Avatar of David Favor
David Favor
Flag of United States of America image

YARN doesn't manage memory.

Think of YARN as a fancy CRON. All it does is schedule what runs where + when.

Unsure what you mean by 16GB in a cluster.

You'll have a set of applications running, which will either consume all resources of a machine or better, use LXD.

Most HADOOP processes crunch data, so mainly CPU usage, rather than Disk I/O.

This said, if you run LXD on all your machines, you can run your HADOOP processes in their own LXD container + run then at a low priority.

The net effect will be your production applications will serve visitor requests quickly + when production applications are dormant (like for I/O waits or low traffic) HADOOP processes will consume all free CPU cycles.

So when you refer to memory of a cluster, this has no meaning.

When you refer to total memory of a machine or container used within your cluster, this has meaning.

How your memory utilization works is dependent on many factors, like if there are any other processes running or does HADOOP "own" all of memory. With LXD, you can set memory on a per container basis or per process (in a container) basis.
This question needs an answer!
Become an EE member today
7 DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform.
View membership options
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.