Skip to main content

Tables in Databricks β€” Managed vs External

🧭 A Simple Story to Begin​

Imagine Databricks has two types of β€œhomes” where your tables can live.

🏠 Home Type 1: Databricks Takes Care of Everything​

You store your table and Databricks decides where to put the data files, how to organize them, and even cleans up after you.
This is a Managed Table.

🏑 Home Type 2: You Bring Your Own Folder​

You point Databricks to a location you control in cloud storage (S3, ADLS, GCS).
Databricks stores table metadata, but the actual files live where you choose.
This is an External Table.

That’s the entire concept in one simple picture.


πŸ’Ό What Is a Managed Table?​

A Managed Table is one where:

  • Databricks decides where the data files are stored
  • Data and metadata are both controlled by Databricks
  • Dropping the table deletes the data files automatically
  • Storage path lives inside your workspace’s managed storage location

πŸ“¦ Example​

CREATE TABLE sales_bronze (
id INT,
amount DOUBLE
);

No LOCATION given β†’ automatically managed.

βœ” Benefits of Managed Tables​

  • Easiest to use
  • Automatic cleanup
  • Perfect for internal Lakehouse workflows
  • Delta features work smoothly

βœ– When Managed Tables Are NOT Ideal​

  • When multiple tools, systems, or teams need file-level access
  • When you must keep tight control over the physical storage layout
  • When you use external governance (e.g., AWS Glue, Unity Catalog external volumes)

πŸ“ What Is an External Table?​

An External Table stores:

  • Metadata inside Databricks
  • Data files outside Databricks (in a place you choose)

πŸ“¦ Example​

CREATE TABLE logs_raw
USING delta
LOCATION 'abfss://raw@datalake.dfs.core.windows.net/logs/';

You are telling Databricks:

β€œMy files are stored here β€” just manage the table definition.”

βœ” Benefits of External Tables​

  • You control the cloud storage location
  • Easier for sharing data with non-Databricks systems
  • Good for multi-cloud or shared architectures
  • File-level access is always available

βœ– Downsides​

  • If you drop the table, the files remain (you must clean manually)
  • More responsibility on your side
  • Slightly more setup required

πŸ” Managed vs External β€” The One-Sentence Difference​

Managed tables store both metadata and data in Databricks. External tables store metadata in Databricks, but data in a location you choose.


πŸ“ How to Check Table Type​

DESCRIBE DETAIL table_name;

You'll see:

  • type: MANAGED or EXTERNAL
  • location: where the data actually lives

🧠 When Should You Use Which?​

βœ” Use Managed Tables When:​

  • You want Databricks to handle everything
  • You are building Bronze β†’ Silver β†’ Gold tables
  • The data is internal to your Lakehouse
  • You don't care about controlling the cloud path

βœ” Use External Tables When:​

  • You must control your own storage folder
  • You share files with other systems or teams
  • You are migrating existing data into Databricks
  • You use external governance/security layers
  • Data must remain even if the table is dropped

πŸ“¦ Simple Visual​

Managed Table
β”œβ”€ Metadata -> Databricks
└─ Data Files -> Databricks-managed storage

External Table
β”œβ”€ Metadata -> Databricks
└─ Data Files -> Your cloud storage path

πŸ“˜ Summary​

  • Databricks has two types of tables: Managed and External.
  • Managed tables store both the data and metadata inside Databricks.
  • External tables store metadata in Databricks but data in a location you choose.
  • Managed tables are simple and great for internal Lakehouse workflows.
  • External tables give you full control and are ideal for multi-tool ecosystems.
  • Dropping a managed table deletes data; dropping an external table does not.

Both table types are essential β€” you choose based on how much control you need.


πŸ‘‰ Next Topic

Delta Lake Overview β€” The Storage Layer of Databricks