Welcome to Ranga Reddy Blog!

Data really powers everything that we do

How to fix java.lang.StackOverflowError in Apache Spark?

2021-10-22

Ranga Reddy

Spark

Spark Troubleshoot Error
- Spark StackOverflowError
- Usage
Spark StackOverflowError

While running Spark application with huge data or more number columns we can see java.lang.StackOverflowError. To solve this issue we need to increase the stack size.

First we need to identify where the StackOverflowError is occurred in driver side or executor side. If issue is occurred in driver side then we need to set the stack size using ‘spark.driver.extraJavaOptions’. If issue is occurred in executor side then we need to set the stack size using ‘spark.executor.extraJavaOptions’. If you dont know where to set, then set the stack size at both driver and executor side.

Usage
```
--conf spark.driver.extraJavaOptions="-Xss4m"
--conf spark.executor.extraJavaOptions="-Xss4m"
```
If issue is not fixed then we need to increase the stack size like 4m, 8m, 16m, 32m etc. After increasing the some level of stack size for example 1024m issue is still persists then leak is happen at code level. In order to fix the issue, we need to solve the leak at code.
Read All
Enable Verbose class output for the Spark applications

2021-10-21

Ranga Reddy

Spark

Spark Troubleshoot
- Spark Verbose
- Usage
Spark Verbose

Spark has two JVM process i.e Driver and Executor. To print any JVM process information, JVM supports 3 types of verbose options i.e -verbose:class, -verbose:gc and -verbose:jni.

To enable the verbose class output we need to use -verbose:class option. This option will help us to find out any class loading errors like ClassNotFoundException and NoClassDefFoundError.

Usage

To enable from spark side, we need to add the following two parameters:
```
--conf "spark.driver.extraJavaOptions=-verbose:class" \
--conf "spark.executor.extraJavaOptions=-verbose:class" \
```
Note:
1. --verbose - Prints the verbose information like spark configuration.
2. -verbose:class - Prints the details about class loader activity.
Read All

SparkPi Example

2021-10-11

Ranga Reddy

Spark

Spark Example

SparkPi Example
Usage

SparkPi Example

SparkPi is a simple spark example used to calculate the PI value.

Usage

The following are steps to use this tool at various clusters:

1. SparkPi example using local cluster

Client mode:

$SPARK_HOME/bin/spark-submit \
    --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode client \
    --num-executors 1 \
    --driver-memory 512m \
    --executor-memory 512m \
    --executor-cores 2 \
    $SPARK_HOME/examples/jars/spark-examples_*.jar 1000

Cluster mode:

$SPARK_HOME/bin/spark-submit \
    --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode cluster \
    --num-executors 1 \
    --driver-memory 512m \
    --executor-memory 512m \
    --executor-cores 2 \
    $SPARK_HOME/examples/jars/spark-examples_*.jar 1000

2. Spark2 - SparkPi example using CDP

Client mode:

spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode client \
  --num-executors 1 \
  --driver-memory 512m \
  --executor-memory 512m \
  --executor-cores 1 \
  /opt/cloudera/parcels/CDH/lib/spark/examples/jars/spark-example*.jar 10

Cluster mode:

spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode cluster \
  --num-executors 1 \
  --driver-memory 512m \
  --executor-memory 512m \
  --executor-cores 1 \
  /opt/cloudera/parcels/CDH/lib/spark/examples/jars/spark-example*.jar 10

3. Spark3 - SparkPi example using CDP

Client mode:

spark3-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode client \
  --num-executors 1 \
  --driver-memory 512m \
  --executor-memory 512m \
  --executor-cores 1 \
  /opt/cloudera/parcels/SPARK3/lib/spark3/examples/jars/spark-example*.jar 10

Cluster mode:

spark3-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode cluster \
  --num-executors 1 \
  --driver-memory 512m \
  --executor-memory 512m \
  --executor-cores 1 \
  /opt/cloudera/parcels/SPARK3/lib/spark3/examples/jars/spark-example*.jar 10

4. SparkPi example using CDH

Client mode:

spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode client \
  --num-executors 1 \
  --driver-memory 512m \
  --executor-memory 512m \
  --executor-cores 1 \
  /opt/cloudera/parcels/CDH/lib/spark/examples/jars/spark-examples_*.jar 10

Cluster mode:

spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode cluster \
  --num-executors 1 \
  --driver-memory 512m \
  --executor-memory 512m \
  --executor-cores 1 \
  /opt/cloudera/parcels/CDH/lib/spark/examples/jars/spark-examples_*.jar 10

5. SparkPi example using HDP cluster

Client mode:

spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode client \
  --num-executors 1 \
  --driver-memory 512m \
  --executor-memory 512m \
  --executor-cores 1 \
  /usr/hdp/current/spark2-client/examples/jars/spark-examples_*.jar 10

Cluster mode:

spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode cluster \
  --num-executors 1 \
  --driver-memory 512m \
  --executor-memory 512m \
  --executor-cores 1 \
  /usr/hdp/current/spark2-client/examples/jars/spark-examples_*.jar 10

Read All

Spark with custom logging

2021-09-20

Ranga Reddy

Spark

Spark Utilities Logging

Spark Custom Logging
Usage
- 1. Specify the single custom log4j.properties for driver and executors
- 2. Specify the different custom log4j.properties for driver and executors

Spark Custom Logging

By default, Spark uses $SPARK_HOME/conf/log4j.properties file to configure log4j and this log4j configuration set at cluster level. Sometimes you need to troubleshoot or fix the performance issues then you need to customize the logs. This blog post will help how to customize the Spark logs for both driver and executor.

Usage

The following are steps to customize the logs:

1. Specify the single custom `log4j.properties` for driver and executors

Step1: Copy the log4.properties to temporary directory for example /tmp

cp $SPARK_HOME/conf/log4.properties /tmp

Step2: Update the log4j.properties file

In the following properties, i have modified logging level from INFO to DEBUG mode.

vi /tmp/log4j.properties

log4j.rootLogger=${root.logger}
root.logger=DEBUG,console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
shell.log.level=WARN
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
log4j.logger.org.apache.spark.repl.Main=${shell.log.level}
log4j.logger.org.apache.spark.api.python.PythonGatewayServer=${shell.log.level}

Step3: Run the Spark Application

Client mode

spark-submit \
  --verbose \
  --master yarn \
  --deploy-mode client \
  --files /tmp/log4j.properties#log4j.properties \
  --driver-java-options "-Dlog4j.configuration=/tmp/log4j.properties" \
  --conf spark.executor.extraJavaOptions="-Dlog4j.configuration=log4j.properties" \
  --class org.apache.spark.examples.SparkPi \
  /opt/cloudera/parcels/CDH/lib/spark/examples/jars/spark-example*.jar 10

Cluster mode

spark-submit \
  --verbose \
  --master yarn \
  --deploy-mode cluster \
  --files /tmp/log4j.properties#log4j.properties \
  --conf spark.driver.extraJavaOptions="-Dlog4j.configuration=log4j.properties" \
  --conf spark.executor.extraJavaOptions="-Dlog4j.configuration=log4j.properties" \
  --class org.apache.spark.examples.SparkPi \
  /opt/cloudera/parcels/CDH/lib/spark/examples/jars/spark-example*.jar 10

2. Specify the different custom `log4j.properties` for driver and executors

Step1: For driver, create the seperate log4j.properties file say log4j_driver.properties and update the configuration

cp $SPARK_HOME/conf/log4.properties /tmp/log4j_driver.properties

Step2: For executor, create the seperate log4j.properties file say log4j_executor.properties and update the configuration

cp $SPARK_HOME/conf/log4.properties /tmp/log4j_executor.properties

Step3: Run the Spark Application

Client mode

spark-submit \
  --verbose \
  --master yarn \
  --deploy-mode client \
  --files /tmp/log4j_driver.properties,/tmp/log4j_executor.properties \
  --driver-java-options "-Dlog4j.configuration=/tmp/log4j_driver.properties" \
  --conf spark.executor.extraJavaOptions="-Dlog4j.configuration=log4j_executor.properties" \
  --class org.apache.spark.examples.SparkPi \
  /opt/cloudera/parcels/CDH/lib/spark/examples/jars/spark-example*.jar 10

Cluster mode

spark-submit \
  --verbose \
  --master yarn \
  --deploy-mode cluster \
  --files /tmp/log4j_driver.properties,/tmp/log4j_executor.properties \
  --conf spark.driver.extraJavaOptions="-Dlog4j.configuration=log4j_driver.properties" \
  --conf spark.executor.extraJavaOptions="-Dlog4j.configuration=log4j_executor.properties" \
  --class org.apache.spark.examples.SparkPi \
  /opt/cloudera/parcels/CDH/lib/spark/examples/jars/spark-example*.jar 10

Read All

Spark Logs Extractor tool

2021-08-27

Ranga Reddy

Spark

Spark Utilities
- Spark Logs Extractor
- Usage
Spark Logs Extractor

Spark Logs Extractor is a simple shell script tool used to collect the Spark Application Logs and Event Logs with the compressed format.

Advantages:
1. Spark logs will collect automatically without running any command.
2. After collecting logs, logs will be compressed into a single file.
Usage

The following are steps to use this tool:

Step1:

Download the spark_logs_extractor.sh script to any location and give the execute permission.
```
wget https://raw.githubusercontent.com/rangareddy/spark-logs-extractor/main/spark_logs_extractor.sh
chmod +x spark_logs_extractor.sh
```
Step2:

While Runing the spark_logs_extractor.sh script, provide the application_id.
```
sh spark_logs_extractor.sh <application_id>
```
Replace application_id with your spark application id.
Read All

First Previous 3/3 Next Last

Welcome to Ranga Reddy Blog!

How to fix java.lang.StackOverflowError in Apache Spark?

Spark StackOverflowError

Usage

Enable Verbose class output for the Spark applications

Spark Verbose

Usage

SparkPi Example

SparkPi Example

Usage

1. SparkPi example using local cluster

2. Spark2 - SparkPi example using CDP

3. Spark3 - SparkPi example using CDP

4. SparkPi example using CDH

5. SparkPi example using HDP cluster

Spark with custom logging

Spark Custom Logging

Usage

1. Specify the single custom log4j.properties for driver and executors

2. Specify the different custom log4j.properties for driver and executors

Spark Logs Extractor tool

Spark Logs Extractor

Usage

1. Specify the single custom `log4j.properties` for driver and executors

2. Specify the different custom `log4j.properties` for driver and executors