Skip to main content

Databricks File Browser & Workspace Files API

Not all files in Databricks are data.

Some are:

  • SQL scripts
  • Configuration files
  • ML artifacts
  • Small reference datasets

Understanding where files live and how to manage them correctly is critical for a clean, secure, and scalable Databricks environment.

This article explains Databricks File Browser and the Workspace Files API, using real-world scenarios and best practices.


A Common Confusion (A Short Story)​

Meet Ravi, a data engineer.

Ravi uploads:

  • A CSV to DBFS using the File Browser
  • A Python config file to the workspace
  • A model artifact using MLflow

Weeks later, Ravi asks:

β€œWhere should files actually live in Databricks?”

The answer depends on purpose, size, and lifecycle.


What Is the Databricks File Browser?​

The Databricks File Browser is a UI-based tool that allows you to:

  • Upload small files
  • Browse DBFS paths
  • Inspect file contents

πŸ“ You can find it in the Databricks workspace under Data β†’ DBFS.


What Is DBFS (Databricks File System)?​

DBFS is an abstraction layer over cloud object storage:

  • Amazon S3
  • Azure Data Lake Storage (ADLS)
  • Google Cloud Storage
dbfs:/mnt/raw-data/orders.csv

DBFS is best suited for: βœ” Small reference files βœ” Temporary artifacts βœ” Development utilities

❌ Not for large-scale production ingestion


Uploading Files Using File Browser​

Typical use cases:

  • Lookup tables
  • Sample datasets
  • Config files for notebooks
Data β†’ DBFS β†’ Upload

⚠️ Size Limitation

  • File Browser is not designed for large datasets
  • Production ingestion should use cloud storage directly

Accessing Files in Notebooks​

Python​

df = spark.read.csv("dbfs:/mnt/raw-data/orders.csv", header=True)

SQL​

SELECT * FROM csv.`dbfs:/mnt/raw-data/orders.csv`;

Workspace Files vs DBFS (Critical Difference)​

FeatureWorkspace FilesDBFS
PurposeCode & configsData & artifacts
VersioningYesNo
Best forScripts, YAML, JSONCSV, Parquet, temp files
AccessWorkspace-scopedCluster-wide

What Are Workspace Files?​

Workspace Files live inside:

/Workspace/Users/...

They are:

  • Version-controlled
  • Permission-aware
  • Ideal for collaboration

Example:

/Workspace/Repos/project/config.yaml

Workspace Files API (Programmatic Access)​

The Workspace Files API allows you to:

  • Upload files
  • Download files
  • List directories
  • Automate file management

Example: Upload a File​

curl -X POST \
-H "Authorization: Bearer <TOKEN>" \
-F "file=@config.yaml" \
https://<databricks-instance>/api/2.0/workspace-files/import

When to Use Workspace Files API​

βœ” CI/CD pipelines βœ” Automated deployments βœ” Config-driven workflows βœ” Infrastructure-as-code setups


File Management Best Practices​

1. Don’t Treat DBFS as Git​

DBFS is not version-controlled.

βœ” Use Workspace Repos for code βœ” Use DBFS only for runtime files


2. Keep Data Out of Workspace Files​

Workspace Files are not designed for datasets.

βœ” Use cloud storage for data βœ” Register tables via Unity Catalog


3. Secure Access with Unity Catalog & Permissions​

Workspace Permissions β†’ Folder β†’ Access Control

Avoid: ❌ Hardcoding secrets in files ❌ Storing credentials in DBFS


Common Mistakes to Avoid​

❌ Uploading production datasets via File Browser ❌ Mixing code and data locations ❌ Assuming DBFS is a data lake ❌ Ignoring file permissions


How This Fits in a LakeFlow Architecture​

Workspace Files
| (configs, code)
↓
Databricks Jobs / LakeFlow
↓
Cloud Storage + Delta Tables

This separation ensures:

  • Clean architecture
  • Secure governance
  • Scalable pipelines

Final Thoughts​

The Databricks File Browser and Workspace Files API are small tools with big impact.

Used correctly, they:

  • Simplify development
  • Improve collaboration
  • Prevent architectural mistakes

Remember:

Code belongs in the workspace. Data belongs in the lakehouse.


Summary​

Databricks File Browser and Workspace Files API address different file management needs within the platform. The File Browser and DBFS are best suited for small runtime files and temporary artifacts, while Workspace Files provide version-controlled, permission-aware storage for code and configuration. Proper separation of code and data, along with API-driven automation, ensures a clean, secure, and scalable workspace architecture.


πŸ“Œ Next topic Databricks Table Maintenance β€” Vacuum, Retention & Backups