too many files open issue with spark

Joseph Jean pierre
Joseph Jean pierre used Ask the Experts™
on
I'm supporting a spark scala application with node js front end with d3 js etc.,. The spark uses spark job server for taking in api requests in the form of curl json inputs and return back json outputs. I'm getting a too many open files issues.

09:27:23.649 [task-result-getter-1] WARN  o.a.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 151.1 (TID 10373, localhost, executor driver): java.io.FileNotFoundException: /data/iot/installs/data/spark_work/blockmgr-952fad2f-2721-41da-a513-aebc7fc83c1a/35/shuffle_65_0_0.index.ffe8ecd3-64ab-445f-8fa7-8ba9aefcdeff (Too many open files)
      at java.io.FileOutputStream.open0(Native Method)
      at java.io.FileOutputStream.open(FileOutputStream.java:270)
      at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
      at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
      at org.apache.spark.shuffle.IndexShuffleBlockResolver.writeIndexFileAndCommit(IndexShuffleBlockResolver.scala:144)
      at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:127)
      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
      at org.apache.spark.scheduler.Task.run(Task.scala:109)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)

I increased the ulimits and below is the output

 ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 241474
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 999999
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 999999
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

still i get the same error

My application.conf
spark {
  master = "local[*]"
  streamDuration = 15000
 
  local {
    #dir=${?MOTHER_DIRECTORY}"/iot/installs/data/spark_work"
    dir="/data/iot/installs/data/spark_work"

  }

  ui
  {
  enabled=false
  }
  driver
  {
    memory = 2g
    cores = 16
    maxResultSize=2g      
  }
  executor {
    memory = 2g
    cores = 16
  }
}
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Fractional CTO
Distinguished Expert 2018
Commented:
Here's how I'd approach this, which I have must do on every machine running LXD containers, as running 100s-1000s of containers on a machine requires a massive file open/watches config.

Best up your watches too, in case you require doing any type of inotify() interactions...

net16 # cat 40-max-open-files
fs.file-max=10000000

net16 # cat 40-max-user-watches.conf
fs.inotify.max_queued_events=1048576
fs.inotify.max_user_instances=1048576
fs.inotify.max_user_watches=1048576

Open in new window


Then modify your security limits to set hard/system limit of number of open files to unlimited.

Then reboot.
Joseph Jean pierreSr Delivery Manager

Author

Commented:
Thank you David

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial