Skip to main content

Alerting β€” Email & Slack Alerts for Job Failures

🎬 Story Time β€” β€œThe Job Failed… and Nobody Knew”​

Rahul, a data engineer at an e-commerce company, receives a frantic message at 10 AM:

β€œWhy is today’s dashboard blank?”

It turns out:

  • ETL pipeline failed at 2:00 AM
  • No one received an alert
  • No monitoring was set up
  • Dashboards showed stale data

Rahul thinks:

β€œA pipeline without alerts is like a plane without sensors.”

He opens Databricks Workflows to configure Email + Slack alerts for every step.


πŸ”₯ 1. Why Alerts Matter in Production​

Alerts help teams react immediately to failures like:

  • Cluster issues
  • Node failures
  • Schema mismatches
  • API rate limits
  • File unavailability
  • Data validation failures
  • Logic bugs in notebooks/scripts

Without alerts, teams lose hours β€” or worse, publish incorrect data.


πŸ“§ 2. Email Alerts in Databricks​

Email alerts are the simplest and fastest way to get notified.

How to Add Email Alerts​

  1. Go to your Job / Workflow
  2. Click Alerts
  3. Add:
    • Your email
    • Team distribution email
    • On-call group email

Choose alert type:

  • On Failure
  • On Success
  • On Start
  • On Duration Over Threshold

Example β€” Alert Configuration​


On Failure β†’ [analytics-team@company.com](mailto:analytics-team@company.com)
On Duration Exceeded β†’ [dataops@company.com](mailto:dataops@company.com)

Databricks automatically sends:

  • Error message
  • Failed task name
  • Logs link
  • Run details
  • Cluster info

Perfect for morning triage.


πŸ“¨ 3. Slack Alerts β€” For Real-Time Team Visibility​

Most modern teams prefer Slack notifications because:

  • Everyone sees alerts
  • Rapid response coordination
  • On-call rotation visibility
  • Faster triage

Step 1 β€” Create a Slack Webhook URL​

In Slack:

Apps β†’ Incoming Webhooks β†’ Create New Webhook

Select channel, e.g., #data-alerts.

Copy the webhook URL.

Step 2 β€” Add Slack Webhook to Databricks Workflows​

In the Job configuration:


Alerts β†’ Add β†’ Webhook β†’ Paste URL

Step 3 β€” Customize Slack Message (Optional)​

Databricks sends structured info like:

  • Status
  • Workflow name
  • Link to job run
  • Failed task
  • Failure reason

But you can also design your own message via a Python task:

import requests

payload = {
"text": f"🚨 Databricks Job Failed: {dbutils.jobs.taskValues.get('task_name')}"
}

requests.post(slack_webhook_url, json=payload)

Now failures appear instantly in Slack.


⛑️ 4. Alerts for Multi-Task Workflows (Per Task)​

Databricks allows:

βœ” Alerts for the entire workflow​

βœ” Alerts per individual task​

This is extremely helpful when:

  • The validation task fails
  • But upstream ingestion tasks run fine
  • Only the downstream team needs notification

Example:

validate_data β†’ On Failure β†’ #quality-alerts
load_gold β†’ On Failure β†’ #data-engineering

πŸ› οΈ 5. On-Failure Trigger Tasks (Advanced Alerts)​

You can create error handling tasks inside workflows.

Example:

validate β†’ load_gold  
↓
notify_failure

The notify_failure task runs only when:

{
"condition": "failed()"
}

Inside this task:

requests.post(slack_url, json={"text": "Validation failed in Databricks!"})

This enables fully automated error routing.


πŸ§ͺ 6. Real Example β€” Notebook Alert on Error​

In a notebook:

try:
df = spark.table("silver.sales")
assert df.count() > 0
except Exception as e:
dbutils.notebook.exit(f"ERROR: {str(e)}")

Databricks will automatically trigger failure alerts.


πŸ“Š 7. Alerts With Databricks SQL (Dashboards)​

Databricks SQL supports real-time condition-based alerts:

  • Revenue drop alerts
  • Data drift detection
  • SLA monitoring
  • Missing data alerts

Example:

Alert when COUNT(*) < 1000 in daily_sales table

Alerts can fire:

  • Email
  • Slack webhooks
  • PagerDuty
  • Custom HTTP endpoints

🧠 Best Practices​

  1. Always configure on-failure alerts
  2. Use Slack β†’ primary, email β†’ secondary
  3. Create separate channels per pipeline type
  4. Add file-based triggers + alerts for ingestion issues
  5. Include run URL in alert message
  6. Add retry logic + alerts only after retries fail
  7. Use service principals for webhook authentication

πŸŽ‰ Real-World Ending β€” β€œThe Alert Saved the Morning”​

Next day, at exactly 2:01 AM:

  • API returned empty data
  • The validation task failed
  • Slack alerted the team instantly
  • Issue resolved before business hours

At 9:00 AM, dashboards were fresh.

Rahul’s manager said:

β€œFinally… the pipeline can talk to us when things go wrong.”

And that’s the magic of Databricks Alerts.


πŸ“˜ Summary​

Databricks supports:

  • βœ” Email alerts

  • βœ” Slack alerts

  • βœ” Webhook-based alerts

  • βœ” On-failure tasks

  • βœ” SQL alerts

  • βœ” Per-task notification targeting

A must-have component for production-grade pipeline monitoring.


πŸ‘‰ Next Topic

Cluster Policies β€” Cost & Security Enforcement