Skip to main content

clusters

Creates, updates, deletes, gets or lists a clusters resource.

Overview

Nameclusters
TypeResource
Iddatabricks_workspace.compute.clusters

Fields

The following fields are returned by SELECT queries:

NameDatatypeDescription
cluster_idstringCanonical identifier for the cluster. This id is retained during cluster restarts and resizes, while each new cluster has a globally unique id.
driver_instance_pool_idstringThe optional ID of the instance pool for the driver of the cluster belongs. The pool cluster uses the instance pool with id (instance_pool_id) if the driver pool is not assigned.
driver_node_type_idstringThe node type of the Spark driver. Note that this field is optional; if unset, the driver node type will be set as the same value as `node_type_id` defined above. This field, along with node_type_id, should not be set if virtual_cluster_size is set. If both driver_node_type_id, node_type_id, and virtual_cluster_size are specified, driver_node_type_id and node_type_id take precedence.
instance_pool_idstringThe optional ID of the instance pool to which the cluster belongs.
node_type_idstringThis field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the :method:clusters/listNodeTypes API call.
policy_idstringThe ID of the cluster policy used to create the cluster if applicable.
spark_context_idintegerA canonical SparkContext identifier. This value *does* change when the Spark driver restarts. The pair `(cluster_id, spark_context_id)` is a globally unique identifier over all Spark contexts.
cluster_namestringCluster name requested by the user. This doesn't have to be unique. If not specified at creation, the cluster name will be an empty string. For job clusters, the cluster name is automatically set based on the job and job run IDs.
creator_user_namestringCreator user name. The field won't be included in the response if the user has already been deleted.
single_user_namestringSingle user name if data_security_mode is `SINGLE_USER`
autoscaleobjectParameters needed in order to automatically scale clusters up and down based on load. Note: autoscaling works best with DB runtime versions 3.0 or later.
autotermination_minutesintegerAutomatically terminates the cluster after it is inactive for this time in minutes. If not set, this cluster will not be automatically terminated. If specified, the threshold must be between 10 and 10000 minutes. Users can also set this value to 0 to explicitly disable automatic termination.
aws_attributesobjectAttributes related to clusters running on Amazon Web Services. If not specified at cluster creation, a set of default values will be used.
azure_attributesobjectAttributes related to clusters running on Microsoft Azure. If not specified at cluster creation, a set of default values will be used.
cluster_coresnumberNumber of CPU cores available for this cluster. Note that this can be fractional, e.g. 7.5 cores, since certain node types are configured to share cores between Spark nodes on the same instance.
cluster_log_confobjectThe configuration for delivering spark logs to a long-term storage destination. Three kinds of destinations (DBFS, S3 and Unity Catalog volumes) are supported. Only one destination can be specified for one cluster. If the conf is given, the logs will be delivered to the destination every `5 mins`. The destination of driver logs is `$destination/$clusterId/driver`, while the destination of executor logs is `$destination/$clusterId/executor`.
cluster_log_statusobjectCluster log delivery status.
cluster_memory_mbintegerTotal amount of cluster memory, in megabytes
cluster_sourcestringDetermines whether the cluster was created by a user through the UI, created by the Databricks Jobs Scheduler, or through an API request. (API, JOB, MODELS, PIPELINE, PIPELINE_MAINTENANCE, SQL, UI)
custom_tagsobjectAdditional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS instances and EBS volumes) with these tags in addition to `default_tags`. Notes: - Currently, Databricks allows at most 45 custom tags - Clusters can only reuse cloud resources if the resources' tags are a subset of the cluster tags
data_security_modestringData security mode decides what data governance model to use when accessing data from a cluster.<br /><br />The following modes can only be used when `kind = CLASSIC_PREVIEW`. * `DATA_SECURITY_MODE_AUTO`:<br />Databricks will choose the most appropriate access mode depending on your compute configuration.<br />* `DATA_SECURITY_MODE_STANDARD`: Alias for `USER_ISOLATION`. * `DATA_SECURITY_MODE_DEDICATED`:<br />Alias for `SINGLE_USER`.<br /><br />The following modes can be used regardless of `kind`. * `NONE`: No security isolation for<br />multiple users sharing the cluster. Data governance features are not available in this mode. *<br />`SINGLE_USER`: A secure cluster that can only be exclusively used by a single user specified in<br />`single_user_name`. Most programming languages, cluster features and data governance features<br />are available in this mode. * `USER_ISOLATION`: A secure cluster that can be shared by multiple<br />users. Cluster users are fully isolated so that they cannot see each other's data and<br />credentials. Most data governance features are supported in this mode. But programming languages<br />and cluster features might be limited.<br /><br />The following modes are deprecated starting with Databricks Runtime 15.0 and will be removed for<br />future Databricks Runtime versions:<br /><br />* `LEGACY_TABLE_ACL`: This mode is for users migrating from legacy Table ACL clusters. *<br />`LEGACY_PASSTHROUGH`: This mode is for users migrating from legacy Passthrough on high<br />concurrency clusters. * `LEGACY_SINGLE_USER`: This mode is for users migrating from legacy<br />Passthrough on standard clusters. * `LEGACY_SINGLE_USER_STANDARD`: This mode provides a way that<br />doesn’t have UC nor passthrough enabled. (DATA_SECURITY_MODE_AUTO, DATA_SECURITY_MODE_DEDICATED, DATA_SECURITY_MODE_STANDARD, LEGACY_PASSTHROUGH, LEGACY_SINGLE_USER, LEGACY_SINGLE_USER_STANDARD, LEGACY_TABLE_ACL, NONE, SINGLE_USER, USER_ISOLATION)
default_tagsobjectTags that are added by Databricks regardless of any `custom_tags`, including: - Vendor: Databricks - Creator: &lt;username_of_creator&gt; - ClusterName: &lt;name_of_cluster&gt; - ClusterId: &lt;id_of_cluster&gt; - Name: &lt;Databricks internal use&gt;
docker_imageobjectCustom docker image BYOC
driverobjectNode on which the Spark driver resides. The driver node contains the Spark master and the Databricks application that manages the per-notebook Spark REPLs.
driver_node_type_flexibilityobjectFlexible node type configuration for the driver node.
enable_elastic_diskbooleanAutoscaling Local Storage: when enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space.
enable_local_disk_encryptionbooleanWhether to enable LUKS on cluster VMs' local disks
executorsarrayNodes on which the Spark executors reside.
gcp_attributesobjectAttributes related to clusters running on Google Cloud Platform. If not specified at cluster creation, a set of default values will be used.
init_scriptsarrayThe configuration for storing init scripts. Any number of destinations can be specified. The scripts are executed sequentially in the order provided. If `cluster_log_conf` is specified, init script logs are sent to `<destination>/<cluster-ID>/init_scripts`.
is_single_nodebooleanThis field can only be used when `kind = CLASSIC_PREVIEW`. When set to true, Databricks will automatically set single node related `custom_tags`, `spark_conf`, and `num_workers`
jdbc_portintegerPort on which Spark JDBC server is listening, in the driver nod. No service will be listeningon on this port in executor nodes.
kindstringThe kind of compute described by this compute specification.<br /><br />Depending on `kind`, different validations and default values will be applied.<br /><br />Clusters with `kind = CLASSIC_PREVIEW` support the following fields, whereas clusters with no<br />specified `kind` do not. * [is_single_node](/api/workspace/clusters/create#is_single_node) *<br />[use_ml_runtime](/api/workspace/clusters/create#use_ml_runtime) *<br />[data_security_mode](/api/workspace/clusters/create#data_security_mode) set to<br />`DATA_SECURITY_MODE_AUTO`, `DATA_SECURITY_MODE_DEDICATED`, or `DATA_SECURITY_MODE_STANDARD`<br /><br />By using the [simple form], your clusters are automatically using `kind = CLASSIC_PREVIEW`.<br /><br />[simple form]: https://docs.databricks.com/compute/simple-form.html (CLASSIC_PREVIEW)
last_restarted_timeintegerthe timestamp that the cluster was started/restarted
last_state_loss_timeintegerTime when the cluster driver last lost its state (due to a restart or driver failure).
num_workersintegerNumber of worker nodes that this cluster should have. A cluster has one Spark Driver and `num_workers` Executors for a total of `num_workers` + 1 Spark nodes. Note: When reading the properties of a cluster, this field reflects the desired number of workers rather than the actual current number of workers. For instance, if a cluster is resized from 5 to 10 workers, this field will immediately be updated to reflect the target size of 10 workers, whereas the workers listed in `spark_info` will gradually increase from 5 to 10 as the new nodes are provisioned.
remote_disk_throughputintegerIf set, what the configurable throughput (in Mb/s) for the remote disk is. Currently only supported for GCP HYPERDISK_BALANCED disks.
runtime_enginestringDetermines the cluster's runtime engine, either standard or Photon. This field is not compatible with legacy `spark_version` values that contain `-photon-`. Remove `-photon-` from the `spark_version` and set `runtime_engine` to `PHOTON`. If left unspecified, the runtime engine defaults to standard unless the spark_version contains -photon-, in which case Photon will be used. (NULL, PHOTON, STANDARD)
spark_confobjectAn object containing a set of optional, user-specified Spark configuration key-value pairs. Users can also pass in a string of extra JVM options to the driver and the executors via `spark.driver.extraJavaOptions` and `spark.executor.extraJavaOptions` respectively.
spark_env_varsobjectAn object containing a set of optional, user-specified environment variable key-value pairs. Please note that key-value pair of the form (X,Y) will be exported as is (i.e., `export X='Y'`) while launching the driver and workers. In order to specify an additional set of `SPARK_DAEMON_JAVA_OPTS`, we recommend appending them to `$SPARK_DAEMON_JAVA_OPTS` as shown in the example below. This ensures that all default databricks managed environmental variables are included as well. Example Spark environment variables: `&#123;"SPARK_WORKER_MEMORY": "28000m", "SPARK_LOCAL_DIRS": "/local_disk0"&#125;` or `&#123;"SPARK_DAEMON_JAVA_OPTS": "$SPARK_DAEMON_JAVA_OPTS -Dspark.shuffle.service.enabled=true"&#125;`
spark_versionstringThe Spark version of the cluster, e.g. `3.3.x-scala2.11`. A list of available Spark versions can be retrieved by using the :method:clusters/sparkVersions API call.
specobjectThe spec contains a snapshot of the latest user specified settings that were used to create/edit the cluster. Note: not included in the response of the ListClusters API.
ssh_public_keysarraySSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name `ubuntu` on port `2200`. Up to 10 keys can be specified.
start_timeintegerTime (in epoch milliseconds) when the cluster creation request was received (when the cluster entered a `PENDING` state).
statestringCurrent state of the cluster. (ERROR, PENDING, RESIZING, RESTARTING, RUNNING, TERMINATED, TERMINATING, UNKNOWN)
state_messagestringA message associated with the most recent state transition (e.g., the reason why the cluster entered a `TERMINATED` state).
terminated_timeintegerTime (in epoch milliseconds) when the cluster was terminated, if applicable.
termination_reasonobjectInformation about why the cluster was terminated. This field only appears when the cluster is in a `TERMINATING` or `TERMINATED` state.
total_initial_remote_disk_sizeintegerIf set, what the total initial volume size (in GB) of the remote disks should be. Currently only supported for GCP HYPERDISK_BALANCED disks.
use_ml_runtimebooleanThis field can only be used when `kind = CLASSIC_PREVIEW`. `effective_spark_version` is determined by `spark_version` (DBR release), this field `use_ml_runtime`, and whether `node_type_id` is gpu node or not.
worker_node_type_flexibilityobjectFlexible node type configuration for worker nodes.
workload_typeobjectCluster Attributes showing for clusters workload types.

Methods

The following methods are available for this resource:

NameAccessible byRequired ParamsOptional ParamsDescription
getselectcluster_id, deployment_nameRetrieves the information for a cluster given its identifier. Clusters can be described while they are
listselectdeployment_namefilter_by, page_size, page_token, sort_byReturn information about all pinned and active clusters, and all clusters terminated within the last
createinsertdeployment_name, spark_versionCreates a new Spark cluster. This method will acquire new instances from the cloud provider if
change_ownerexecdeployment_name, cluster_id, owner_usernameChange the owner of the cluster. You must be an admin and the cluster must be terminated to perform
deleteexecdeployment_name, cluster_idTerminates the Spark cluster with the specified ID. The cluster is removed asynchronously. Once the
editexecdeployment_name, cluster_id, spark_versionUpdates the configuration of a cluster to match the provided attributes and size. A cluster can be
eventsexecdeployment_name, cluster_idRetrieves a list of events about the activity of a cluster. This API is paginated. If there are more
permanent_deleteexecdeployment_name, cluster_idPermanently deletes a Spark cluster. This cluster is terminated and resources are asynchronously
pinexecdeployment_name, cluster_idPinning a cluster ensures that the cluster will always be returned by the ListClusters API. Pinning a
resizeexecdeployment_name, cluster_idResizes a cluster to have a desired number of workers. This will fail unless the cluster is in a
restartexecdeployment_name, cluster_idRestarts a Spark cluster with the supplied ID. If the cluster is not currently in a RUNNING state,
startexecdeployment_name, cluster_idStarts a terminated Spark cluster with the supplied ID. This works similar to createCluster except:
unpinexecdeployment_name, cluster_idUnpinning a cluster will allow the cluster to eventually be removed from the ListClusters API.
updateexecdeployment_name, cluster_id, update_maskUpdates the configuration of a cluster to match the partial set of attributes and size. Denote which

Parameters

Parameters can be passed in the WHERE clause of a query. Check the Methods section to see which parameters are required or optional for each operation.

NameDatatypeDescription
cluster_idstringThe cluster about which to retrieve information.
deployment_namestringThe Databricks Workspace Deployment Name (default: dbc-abcd0123-a1bc)
filter_byobjectFilters to apply to the list of clusters.
page_sizeintegerUse this field to specify the maximum number of results to be returned by the server. The server may further constrain the maximum number of results returned in a single page.
page_tokenstringUse next_page_token or prev_page_token returned from the previous request to list the next or previous page of clusters respectively.
sort_byobjectSort the list of clusters by a specific criteria.

SELECT examples

Retrieves the information for a cluster given its identifier. Clusters can be described while they are

SELECT
cluster_id,
driver_instance_pool_id,
driver_node_type_id,
instance_pool_id,
node_type_id,
policy_id,
spark_context_id,
cluster_name,
creator_user_name,
single_user_name,
autoscale,
autotermination_minutes,
aws_attributes,
azure_attributes,
cluster_cores,
cluster_log_conf,
cluster_log_status,
cluster_memory_mb,
cluster_source,
custom_tags,
data_security_mode,
default_tags,
docker_image,
driver,
driver_node_type_flexibility,
enable_elastic_disk,
enable_local_disk_encryption,
executors,
gcp_attributes,
init_scripts,
is_single_node,
jdbc_port,
kind,
last_restarted_time,
last_state_loss_time,
num_workers,
remote_disk_throughput,
runtime_engine,
spark_conf,
spark_env_vars,
spark_version,
spec,
ssh_public_keys,
start_time,
state,
state_message,
terminated_time,
termination_reason,
total_initial_remote_disk_size,
use_ml_runtime,
worker_node_type_flexibility,
workload_type
FROM databricks_workspace.compute.clusters
WHERE cluster_id = '{{ cluster_id }}' -- required
AND deployment_name = '{{ deployment_name }}' -- required
;

INSERT examples

Creates a new Spark cluster. This method will acquire new instances from the cloud provider if

INSERT INTO databricks_workspace.compute.clusters (
spark_version,
apply_policy_default_values,
autoscale,
autotermination_minutes,
aws_attributes,
azure_attributes,
clone_from,
cluster_log_conf,
cluster_name,
custom_tags,
data_security_mode,
docker_image,
driver_instance_pool_id,
driver_node_type_flexibility,
driver_node_type_id,
enable_elastic_disk,
enable_local_disk_encryption,
gcp_attributes,
init_scripts,
instance_pool_id,
is_single_node,
kind,
node_type_id,
num_workers,
policy_id,
remote_disk_throughput,
runtime_engine,
single_user_name,
spark_conf,
spark_env_vars,
ssh_public_keys,
total_initial_remote_disk_size,
use_ml_runtime,
worker_node_type_flexibility,
workload_type,
deployment_name
)
SELECT
'{{ spark_version }}' /* required */,
{{ apply_policy_default_values }},
'{{ autoscale }}',
{{ autotermination_minutes }},
'{{ aws_attributes }}',
'{{ azure_attributes }}',
'{{ clone_from }}',
'{{ cluster_log_conf }}',
'{{ cluster_name }}',
'{{ custom_tags }}',
'{{ data_security_mode }}',
'{{ docker_image }}',
'{{ driver_instance_pool_id }}',
'{{ driver_node_type_flexibility }}',
'{{ driver_node_type_id }}',
{{ enable_elastic_disk }},
{{ enable_local_disk_encryption }},
'{{ gcp_attributes }}',
'{{ init_scripts }}',
'{{ instance_pool_id }}',
{{ is_single_node }},
'{{ kind }}',
'{{ node_type_id }}',
{{ num_workers }},
'{{ policy_id }}',
{{ remote_disk_throughput }},
'{{ runtime_engine }}',
'{{ single_user_name }}',
'{{ spark_conf }}',
'{{ spark_env_vars }}',
'{{ ssh_public_keys }}',
{{ total_initial_remote_disk_size }},
{{ use_ml_runtime }},
'{{ worker_node_type_flexibility }}',
'{{ workload_type }}',
'{{ deployment_name }}'
RETURNING
cluster_id,
driver_instance_pool_id,
driver_node_type_id,
instance_pool_id,
node_type_id,
policy_id,
spark_context_id,
cluster_name,
creator_user_name,
single_user_name,
autoscale,
autotermination_minutes,
aws_attributes,
azure_attributes,
cluster_cores,
cluster_log_conf,
cluster_log_status,
cluster_memory_mb,
cluster_source,
custom_tags,
data_security_mode,
default_tags,
docker_image,
driver,
driver_node_type_flexibility,
enable_elastic_disk,
enable_local_disk_encryption,
executors,
gcp_attributes,
init_scripts,
is_single_node,
jdbc_port,
kind,
last_restarted_time,
last_state_loss_time,
num_workers,
remote_disk_throughput,
runtime_engine,
spark_conf,
spark_env_vars,
spark_version,
spec,
ssh_public_keys,
start_time,
state,
state_message,
terminated_time,
termination_reason,
total_initial_remote_disk_size,
use_ml_runtime,
worker_node_type_flexibility,
workload_type
;

Lifecycle Methods

Change the owner of the cluster. You must be an admin and the cluster must be terminated to perform

EXEC databricks_workspace.compute.clusters.change_owner 
@deployment_name='{{ deployment_name }}' --required
@@json=
'{
"cluster_id": "{{ cluster_id }}",
"owner_username": "{{ owner_username }}"
}'
;