Databricks File Browser & Workspace Files API

Not all files in Databricks are data.

Some are:

SQL scripts
Configuration files
ML artifacts
Small reference datasets

Understanding where files live and how to manage them correctly is critical for a clean, secure, and scalable Databricks environment.

This article explains Databricks File Browser and the Workspace Files API, using real-world scenarios and best practices.

A Common Confusion (A Short Story)

Meet Ravi, a data engineer.

Ravi uploads:

A CSV to DBFS using the File Browser
A Python config file to the workspace
A model artifact using MLflow

Weeks later, Ravi asks:

“Where should files actually live in Databricks?”

The answer depends on purpose, size, and lifecycle.

What Is the Databricks File Browser?

The Databricks File Browser is a UI-based tool that allows you to:

Upload small files
Browse DBFS paths
Inspect file contents

📍 You can find it in the Databricks workspace under Data → DBFS.

What Is DBFS (Databricks File System)?

DBFS is an abstraction layer over cloud object storage:

Amazon S3
Azure Data Lake Storage (ADLS)
Google Cloud Storage

dbfs:/mnt/raw-data/orders.csv

DBFS is best suited for: ✔ Small reference files ✔ Temporary artifacts ✔ Development utilities

❌ Not for large-scale production ingestion

Uploading Files Using File Browser

Typical use cases:

Lookup tables
Sample datasets
Config files for notebooks

Data → DBFS → Upload

⚠️ Size Limitation

File Browser is not designed for large datasets
Production ingestion should use cloud storage directly

Accessing Files in Notebooks

Python

df = spark.read.csv("dbfs:/mnt/raw-data/orders.csv", header=True)

SQL

SELECT * FROM csv.`dbfs:/mnt/raw-data/orders.csv`;

Workspace Files vs DBFS (Critical Difference)

Feature	Workspace Files	DBFS
Purpose	Code & configs	Data & artifacts
Versioning	Yes	No
Best for	Scripts, YAML, JSON	CSV, Parquet, temp files
Access	Workspace-scoped	Cluster-wide

What Are Workspace Files?

Workspace Files live inside:

/Workspace/Users/...

They are:

Version-controlled
Permission-aware
Ideal for collaboration

Example:

/Workspace/Repos/project/config.yaml

Workspace Files API (Programmatic Access)

The Workspace Files API allows you to:

Upload files
Download files
List directories
Automate file management

Example: Upload a File

curl -X POST \
  -H "Authorization: Bearer <TOKEN>" \
  -F "file=@config.yaml" \
  https://<databricks-instance>/api/2.0/workspace-files/import

When to Use Workspace Files API

✔ CI/CD pipelines ✔ Automated deployments ✔ Config-driven workflows ✔ Infrastructure-as-code setups

File Management Best Practices

1. Don’t Treat DBFS as Git

DBFS is not version-controlled.

✔ Use Workspace Repos for code ✔ Use DBFS only for runtime files

2. Keep Data Out of Workspace Files

Workspace Files are not designed for datasets.

✔ Use cloud storage for data ✔ Register tables via Unity Catalog

3. Secure Access with Unity Catalog & Permissions

Workspace Permissions → Folder → Access Control

Avoid: ❌ Hardcoding secrets in files ❌ Storing credentials in DBFS

Common Mistakes to Avoid

❌ Uploading production datasets via File Browser ❌ Mixing code and data locations ❌ Assuming DBFS is a data lake ❌ Ignoring file permissions

How This Fits in a LakeFlow Architecture

Workspace Files
   |   (configs, code)
   ↓
Databricks Jobs / LakeFlow
   ↓
Cloud Storage + Delta Tables

This separation ensures:

Clean architecture
Secure governance
Scalable pipelines

Final Thoughts

The Databricks File Browser and Workspace Files API are small tools with big impact.

Used correctly, they:

Simplify development
Improve collaboration
Prevent architectural mistakes

Remember:

Code belongs in the workspace. Data belongs in the lakehouse.

Summary

Databricks File Browser and Workspace Files API address different file management needs within the platform. The File Browser and DBFS are best suited for small runtime files and temporary artifacts, while Workspace Files provide version-controlled, permission-aware storage for code and configuration. Proper separation of code and data, along with API-driven automation, ensures a clean, secure, and scalable workspace architecture.

📌 Next topic Databricks Table Maintenance — Vacuum, Retention & Backups

A Common Confusion (A Short Story)​

What Is the Databricks File Browser?​

What Is DBFS (Databricks File System)?​

Uploading Files Using File Browser​

Accessing Files in Notebooks​

Python​

SQL​

Workspace Files vs DBFS (Critical Difference)​

What Are Workspace Files?​

Workspace Files API (Programmatic Access)​

Example: Upload a File​

When to Use Workspace Files API​

File Management Best Practices​

1. Don’t Treat DBFS as Git​

2. Keep Data Out of Workspace Files​

3. Secure Access with Unity Catalog & Permissions​

Common Mistakes to Avoid​

How This Fits in a LakeFlow Architecture​

Final Thoughts​

Summary​