Articles in this section
Category / Section

Memory recommendation for Spark execution

3 mins read

Spark can be configured to run in standalone mode or on top of Hadoop YARN or Mesos. In Syncfusion Big Data Platform, Spark is configured to run on top of YARN. In Hadoop cluster, YARN allocates resources for applications to run in cluster. Spark applications run as independent sets of processes (executors) on a cluster, coordinated by the SparkContext object in your main program (called the driver program). Based on default configuration, Spark command line interface runs with one driver and two executors.

 

To know more about Spark execution, please refer below link,

http://spark.apache.org/docs/latest/cluster-overview.html

 

Please find the properties to configure for spark driver and executor memory from below table,

Properties

Default / Configured Value

Description

spark.executor.memory

512 MB

Amount of memory to use per executor process.

spark.executor.instances

2

The number of executors to be run.

spark.driver.memory

1024 MB

Amount of memory to use for driver process, i.e. where SparkContext is initialized

 

Spark shell required memory = (Driver Memory + 384 MB) + (Number of executors * (Executor memory + 384 MB))

Here 384 MB is maximum memory (overhead) value that may be utilized by Spark when executing jobs.

 

Example:

Spark required memory = (1024 + 384) + (2*(512+384)) = 3200 MB

To know more about Spark configuration, please refer below link:

http://spark.apache.org/docs/latest/running-on-yarn.html

 

Below equation is to calculate and check whether there is enough memory available in YARN for proper functioning of Spark shell,

Enough Memory for Spark (Boolean) = (Memory Total – Memory Used) > Spark required memory

 

You can ensure the Spark required memory available in YARN Resource Manager web interface.

Resource Manager URL:  http://<name_node_host>:8088/cluster 

Here Memory Total is memory configured for YARN Resource Manager using the property “yarn.nodemanager.resource.memory-mb”. You can get the details from the Resource Manager UI as illustrated in below screenshot.

 

 

Note:

It is also mandatory to check for available physical memory (RAM) along with ensuring required memory for Spark execution based on YARN metrics. For instance, you have required available memory on YARN but there is a chance that other applications or processes outside Hadoop and Spark on the machine can consume more physical memory, in that case Spark shell cannot be run properly, so equivalent amount of physical memory is required in RAM as well.

 

To know more about editing configuration of Hadoop and its ecosystem including Spark using our Cluster Manager application, please refer below link.

https://help.syncfusion.com/bigdata/cluster-manager/cluster-management#customization-of-hadoop-and-all-hadoop-ecosystem-configuration-files

 

To fine tune Spark based on available machines and its hardware specification to get maximum performance, please refer below link

https://help.syncfusion.com/bigdata/cluster-manager/performance-improvements#spark 

Did you find this information helpful?
Yes
No
Help us improve this page
Please provide feedback or comments
Comments (0)
Please sign in to leave a comment
Access denied
Access denied