DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Azure VM Instance Types and Their Roles in Different Distributed Software Systems
  • From Development to Deployment: Automating Machine Learning
  • Deploying Containers on Azure Container Apps
  • Docker Model Runner: Streamlining AI Deployment for Developers

Trending

  • Why DDoS Protection Is an Architectural Decision for Developers
  • Is the Data Warehouse Dead? 3 Patterns From Enterprise Architecture That Answer This Question
  • Data Contracts as the "Circuit Breaker" for Model Reliability
  • Optimizing High-Volume REST APIs Using Redis Caching and Spring Boot (With Load Testing Code)
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Collecting Logs in Azure Databricks

Collecting Logs in Azure Databricks

This article demonstrates how you can use Azure Databricks with Spark to create and collect logs and Docker.

By 
Shubham Dangare user avatar
Shubham Dangare
·
Mar. 10, 20 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
9.4K Views

Join the DZone community and get the full member experience.

Join For Free

Azure Databricks is an Apache Spark-based analytics platform optimized for the Microsoft Azure cloud services platform. In this blog, we are going to see how we can collect logs from Azure to ALA. Before going further we need to look how to set up a Spark cluster in Azure.

Create a Spark Cluster in Databricks

  1. In the Azure portal, go to the Databricks workspace that you created, and then click Launch Workspace.
  2. You are redirected to the Azure Databricks portal. From the portal, click New Cluster.
  3. Under “Advanced Options,” click on the “Init Scripts” tab. Go to the last line under the “Init Scripts" section. Under the “destination” dropdown, select “DBFS," and enter “dbfs:/databricks/spark-monitoring/spark-monitoring.sh” in the text box. Click the “add” button. 

Run a Spark SQL job

  1. In the left pane, select Azure Databricks. From the Common Tasks, select New Notebook.
  2. In the Create Notebook dialog box, enter a name, select language, and select the Spark cluster that you created earlier.

Create a Notebook

  1. Click the Workspace button 
  2. In the Create Notebook dialog, enter a name and select the notebook’s default language
  3. There are running clusters, the Cluster drop-down displays. Select the cluster.

Adding Logger into DataBricks Notebook

Java
 




x






1
dbutils.fs.put("/databricks/init/dev-heb-spark-cluster/verbose_logging.sh", """
2
#!/bin/bash
3
echo "log4j.appender.A1=com.microsoft.pnp.logging.loganalytics.LogAnalyticsAppender
4
log4j.appender.A1.layout=com.microsoft.pnp.logging.JSONLayout
5
log4j.appender.A1.layout.LocationInfo=false
6
log4j.additivity.com.knoldus.pnp.samplejob=false
7
log4j.logger.com.knoldus.pnp.samplejob=INFO, A1" >> /home/ubuntu/databricks/spark/dbconf/log4j/executor/log4j.properties
8
""", true)
9

          
10
import com.microsoft.pnp.logging.Log4jConfiguration
11
import org.apache.spark.internal.Logging
12
import org.apache.spark.metrics.UserMetricsSystems
13
import org.apache.spark.sql.SparkSession
14
import org.apache.spark.sql.functions.window
15
import org.apache.spark.sql.types.{StringType, StructType, TimestampType}
16
import org.apache.log4j.Logger
17

          
18
object SumNumbers extends Logging {
19
  private final val METRICS_NAMESPACE = "SumNumbers"
20
  private final val COUNTER_NAME = "counter1"
21
  val test = Logger.getLogger(getClass.getName)
22
  def computeSumOfNumbersFromOneTo(value: Long, spark: SparkSession): Long = {
23

          
24
  //  Log4jConfiguration.configure("/databricks/spark-monitoring/log4j.properties")
25
    
26
    test.info("Hello There")
27
    logTrace("data testing ")
28
    logDebug("data testing")
29
    logInfo("data testing")
30
    logWarning("Wdata testing")
31
    logError("data testing")
32

          
33
    val driverMetricsSystem = UserMetricsSystems
34
        .getMetricSystem(METRICS_NAMESPACE, builder => {
35
          builder.registerCounter(COUNTER_NAME)
36
        })
37
    driverMetricsSystem.counter(COUNTER_NAME).inc
38
    val sumOfNumbers = spark.range(value + 1).reduce(_ + _)
39
    driverMetricsSystem.counter(COUNTER_NAME).inc(5)
40
    return sumOfNumbers
41
  }
42
}
43
SumNumbers.computeSumOfNumbersFromOneTo(100, spark)


Now that you are all set up with Notebook, let’s configure the cluster for sending logs to Azure Log Analytics workspace. For that, we will be creating a log analytic workspace in Azure.

  • First, deploy the Spark monitoring library in Azure cluster.
  • Clone or download this GitHub repository. 
  • Install the Azure Databricks CLI.
    • A personal access token is required to use the CLI. For instructions, see token management.
    • You can also use the CLI from the Azure Cloud Shell

Build the Azure Databricks Monitoring Library Using Docker

Linux:

Shell
 




xxxxxxxxxx
1


 
1
chmod +x spark-monitoring/build.sh
2
docker run -it --rm -v `pwd`/spark-monitoring:/spark-monitoring -v "$HOME/.m2":/root/.m2 maven:3.6.1-jdk-8 /spark-monitoring/build.sh


Windows:

Shell
 




xxxxxxxxxx
1


 
1
docker run -it --rm -v %cd%/spark-monitoring:/spark-monitoring -v "%USERPROFILE%/.m2":/root/.m2 maven:3.6.1-jd
2

          


Configure the Azure Databricks Workspace

Copy the JAR files and init scripts to Databricks.

  1. Use the Azure Databricks CLI to create a directory named dbfs:/databricks/spark-monitoring:dbfs mkdirs dbfs:/databricks/spark-monitoring
  2. Open the /src/spark-listeners/scripts/spark-monitoring.sh script file and add your Log Analytics Workspace ID and Key to the lines  below:export LOG_ANALYTICS_WORKSPACE_ID= export LOG_ANALYTICS_WORKSPACE_KEY=
  3. Use the Azure Databricks CLI to copy /src/spark-listeners/scripts/spark-monitoring.sh to the directory created in step 3:dbfs cp <local path to spark-monitoring.sh> dbfs:/databricks/spark-monitoring/spark-monitoring.sh
  4. Use the Azure Databricks CLI to copy all of the jar files from the spark-monitoring/src/target folder to the directory created in step 3:dbfs cp –overwrite –recursive <local path to target folder> dbfs:/databricks/spark-monitoring/

Now it is all set to query in log analytics workspace to get logs.

Event | search "error"

This query will get all the error level logs of the generate event. Similarly, we can get logs of different classes.

References

  1. https://github.com/mspnp/spark-monitoring
azure Docker (software) Machine learning

Published at DZone with permission of Shubham Dangare. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Azure VM Instance Types and Their Roles in Different Distributed Software Systems
  • From Development to Deployment: Automating Machine Learning
  • Deploying Containers on Azure Container Apps
  • Docker Model Runner: Streamlining AI Deployment for Developers

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook