Skip to main content

dbfs

Creates, updates, deletes, gets or lists a dbfs resource.

Overview

Namedbfs
TypeResource
Iddatabricks_workspace.files.dbfs

Fields

The following fields are returned by SELECT queries:

NameDatatypeDescription
file_sizeinteger
is_dirbooleanTrue if the path is a directory.
modification_timeintegerLast modification time of given file in milliseconds since epoch.
pathstringThe absolute path of the file or directory.

Methods

The following methods are available for this resource:

NameAccessible byRequired ParamsOptional ParamsDescription
listselectpath, deployment_nameList the contents of a directory, or details of the file. If the file or directory does not exist,
createinsertdeployment_name, pathOpens a stream to write to a file and returns a handle to this stream. There is a 10 minute idle
deletedeletedeployment_nameDelete the file or directory (optionally recursively delete all files in the directory). This call
add_blockexecdeployment_name, handle, dataAppends a block of data to the stream specified by the input handle. If the handle does not exist,
closeexecdeployment_name, handleCloses the stream specified by the input handle. If the handle does not exist, this call throws an
get_statusexecpath, deployment_nameGets the file information for a file or directory. If the file or directory does not exist, this call
mkdirsexecdeployment_name, pathCreates the given directory and necessary parent directories if they do not exist. If a file (not a
moveexecdeployment_name, source_path, destination_pathMoves a file from one location to another location within DBFS. If the source file does not exist,
putexecdeployment_name, pathUploads a file through the use of multipart form post. It is mainly used for streaming uploads, but
readexecpath, deployment_namelength, offsetReturns the contents of a file. If the file does not exist, this call throws an exception with

Parameters

Parameters can be passed in the WHERE clause of a query. Check the Methods section to see which parameters are required or optional for each operation.

NameDatatypeDescription
deployment_namestringThe Databricks Workspace Deployment Name (default: dbc-abcd0123-a1bc)
pathstringThe path of the file to read. The path should be the absolute DBFS path.
lengthintegerThe number of bytes to read starting from the offset. This has a limit of 1 MB, and a default value of 0.5 MB.
offsetintegerThe offset to read from in bytes.

SELECT examples

List the contents of a directory, or details of the file. If the file or directory does not exist,

SELECT
file_size,
is_dir,
modification_time,
path
FROM databricks_workspace.files.dbfs
WHERE path = '{{ path }}' -- required
AND deployment_name = '{{ deployment_name }}' -- required
;

INSERT examples

Opens a stream to write to a file and returns a handle to this stream. There is a 10 minute idle

INSERT INTO databricks_workspace.files.dbfs (
path,
overwrite,
deployment_name
)
SELECT
'{{ path }}' /* required */,
{{ overwrite }},
'{{ deployment_name }}'
RETURNING
handle
;

DELETE examples

Delete the file or directory (optionally recursively delete all files in the directory). This call

DELETE FROM databricks_workspace.files.dbfs
WHERE deployment_name = '{{ deployment_name }}' --required
;

Lifecycle Methods

Appends a block of data to the stream specified by the input handle. If the handle does not exist,

EXEC databricks_workspace.files.dbfs.add_block 
@deployment_name='{{ deployment_name }}' --required
@@json=
'{
"handle": {{ handle }},
"data": "{{ data }}"
}'
;