Databricks File Browser & Workspace Files API
Not all files in Databricks are data.
Some are:
- SQL scripts
- Configuration files
- ML artifacts
- Small reference datasets
Understanding where files live and how to manage them correctly is critical for a clean, secure, and scalable Databricks environment.
This article explains Databricks File Browser and the Workspace Files API, using real-world scenarios and best practices.
A Common Confusion (A Short Story)β
Meet Ravi, a data engineer.
Ravi uploads:
- A CSV to DBFS using the File Browser
- A Python config file to the workspace
- A model artifact using MLflow
Weeks later, Ravi asks:
βWhere should files actually live in Databricks?β
The answer depends on purpose, size, and lifecycle.
What Is the Databricks File Browser?β
The Databricks File Browser is a UI-based tool that allows you to:
- Upload small files
- Browse DBFS paths
- Inspect file contents
π You can find it in the Databricks workspace under Data β DBFS.
What Is DBFS (Databricks File System)?β
DBFS is an abstraction layer over cloud object storage:
- Amazon S3
- Azure Data Lake Storage (ADLS)
- Google Cloud Storage
dbfs:/mnt/raw-data/orders.csv
DBFS is best suited for: β Small reference files β Temporary artifacts β Development utilities
β Not for large-scale production ingestion
Uploading Files Using File Browserβ
Typical use cases:
- Lookup tables
- Sample datasets
- Config files for notebooks
Data β DBFS β Upload
β οΈ Size Limitation
- File Browser is not designed for large datasets
- Production ingestion should use cloud storage directly
Accessing Files in Notebooksβ
Pythonβ
df = spark.read.csv("dbfs:/mnt/raw-data/orders.csv", header=True)
SQLβ
SELECT * FROM csv.`dbfs:/mnt/raw-data/orders.csv`;
Workspace Files vs DBFS (Critical Difference)β
| Feature | Workspace Files | DBFS |
|---|---|---|
| Purpose | Code & configs | Data & artifacts |
| Versioning | Yes | No |
| Best for | Scripts, YAML, JSON | CSV, Parquet, temp files |
| Access | Workspace-scoped | Cluster-wide |
What Are Workspace Files?β
Workspace Files live inside:
/Workspace/Users/...
They are:
- Version-controlled
- Permission-aware
- Ideal for collaboration
Example:
/Workspace/Repos/project/config.yaml
Workspace Files API (Programmatic Access)β
The Workspace Files API allows you to:
- Upload files
- Download files
- List directories
- Automate file management
Example: Upload a Fileβ
curl -X POST \
-H "Authorization: Bearer <TOKEN>" \
-F "file=@config.yaml" \
https://<databricks-instance>/api/2.0/workspace-files/import
When to Use Workspace Files APIβ
β CI/CD pipelines β Automated deployments β Config-driven workflows β Infrastructure-as-code setups
File Management Best Practicesβ
1. Donβt Treat DBFS as Gitβ
DBFS is not version-controlled.
β Use Workspace Repos for code β Use DBFS only for runtime files
2. Keep Data Out of Workspace Filesβ
Workspace Files are not designed for datasets.
β Use cloud storage for data β Register tables via Unity Catalog
3. Secure Access with Unity Catalog & Permissionsβ
Workspace Permissions β Folder β Access Control
Avoid: β Hardcoding secrets in files β Storing credentials in DBFS
Common Mistakes to Avoidβ
β Uploading production datasets via File Browser β Mixing code and data locations β Assuming DBFS is a data lake β Ignoring file permissions
How This Fits in a LakeFlow Architectureβ
Workspace Files
| (configs, code)
β
Databricks Jobs / LakeFlow
β
Cloud Storage + Delta Tables
This separation ensures:
- Clean architecture
- Secure governance
- Scalable pipelines
Final Thoughtsβ
The Databricks File Browser and Workspace Files API are small tools with big impact.
Used correctly, they:
- Simplify development
- Improve collaboration
- Prevent architectural mistakes
Remember:
Code belongs in the workspace. Data belongs in the lakehouse.
Summaryβ
Databricks File Browser and Workspace Files API address different file management needs within the platform. The File Browser and DBFS are best suited for small runtime files and temporary artifacts, while Workspace Files provide version-controlled, permission-aware storage for code and configuration. Proper separation of code and data, along with API-driven automation, ensures a clean, secure, and scalable workspace architecture.
π Next topic Databricks Table Maintenance β Vacuum, Retention & Backups