Ranga Reddy Software Engineer

Spark with custom logging

2021-09-20
Ranga Reddy

Spark Custom Logging

By default, Spark uses $SPARK_HOME/conf/log4j.properties file to configure log4j and this log4j configuration set at cluster level. Sometimes you need to troubleshoot or fix the performance issues then you need to customize the logs. This blog post will help how to customize the Spark logs for both driver and executor.

Usage

The following are steps to customize the logs:

1. Specify the single custom log4j.properties for driver and executors

Step1: Copy the log4.properties to temporary directory for example /tmp

cp $SPARK_HOME/conf/log4.properties /tmp

Step2: Update the log4j.properties file

In the following properties, i have modified logging level from INFO to DEBUG mode.

vi /tmp/log4j.properties
log4j.rootLogger=${root.logger}
root.logger=DEBUG,console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
shell.log.level=WARN
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
log4j.logger.org.apache.spark.repl.Main=${shell.log.level}
log4j.logger.org.apache.spark.api.python.PythonGatewayServer=${shell.log.level}

Step3: Run the Spark Application

Client mode

spark-submit \
  --verbose \
  --master yarn \
  --deploy-mode client \
  --files /tmp/log4j.properties#log4j.properties \
  --driver-java-options "-Dlog4j.configuration=/tmp/log4j.properties" \
  --conf spark.executor.extraJavaOptions="-Dlog4j.configuration=log4j.properties" \
  --class org.apache.spark.examples.SparkPi \
  /opt/cloudera/parcels/CDH/lib/spark/examples/jars/spark-example*.jar 10

Cluster mode

spark-submit \
  --verbose \
  --master yarn \
  --deploy-mode cluster \
  --files /tmp/log4j.properties#log4j.properties \
  --conf spark.driver.extraJavaOptions="-Dlog4j.configuration=log4j.properties" \
  --conf spark.executor.extraJavaOptions="-Dlog4j.configuration=log4j.properties" \
  --class org.apache.spark.examples.SparkPi \
  /opt/cloudera/parcels/CDH/lib/spark/examples/jars/spark-example*.jar 10

2. Specify the different custom log4j.properties for driver and executors

Step1: For driver, create the seperate log4j.properties file say log4j_driver.properties and update the configuration

cp $SPARK_HOME/conf/log4.properties /tmp/log4j_driver.properties

Step2: For executor, create the seperate log4j.properties file say log4j_executor.properties and update the configuration

cp $SPARK_HOME/conf/log4.properties /tmp/log4j_executor.properties

Step3: Run the Spark Application

Client mode

spark-submit \
  --verbose \
  --master yarn \
  --deploy-mode client \
  --files /tmp/log4j_driver.properties,/tmp/log4j_executor.properties \
  --driver-java-options "-Dlog4j.configuration=/tmp/log4j_driver.properties" \
  --conf spark.executor.extraJavaOptions="-Dlog4j.configuration=log4j_executor.properties" \
  --class org.apache.spark.examples.SparkPi \
  /opt/cloudera/parcels/CDH/lib/spark/examples/jars/spark-example*.jar 10

Cluster mode

spark-submit \
  --verbose \
  --master yarn \
  --deploy-mode cluster \
  --files /tmp/log4j_driver.properties,/tmp/log4j_executor.properties \
  --conf spark.driver.extraJavaOptions="-Dlog4j.configuration=log4j_driver.properties" \
  --conf spark.executor.extraJavaOptions="-Dlog4j.configuration=log4j_executor.properties" \
  --class org.apache.spark.examples.SparkPi \
  /opt/cloudera/parcels/CDH/lib/spark/examples/jars/spark-example*.jar 10

If you liked this post, you can also donate me for a coffee, and I'll do better. Thanks.

PayPal
PayPal
Donate via PayPal

Similar Posts

Previous Spark Logs Extractor tool

Next SparkPi Example

Content