instance_pools
Creates, updates, deletes, gets or lists an instance_pools resource.
Overview
| Name | instance_pools |
| Type | Resource |
| Id | databricks_workspace.compute.instance_pools |
Fields
The following fields are returned by SELECT queries:
- get
- list
| Name | Datatype | Description |
|---|---|---|
instance_pool_id | string | |
node_type_id | string | This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the :method:clusters/listNodeTypes API call. |
instance_pool_name | string | Pool name requested by the user. Pool name must be unique. Length must be between 1 and 100 characters. |
aws_attributes | object | Attributes related to instance pools running on Amazon Web Services. If not specified at pool creation, a set of default values will be used. |
azure_attributes | object | Attributes related to instance pools running on Azure. If not specified at pool creation, a set of default values will be used. |
custom_tags | object | Additional tags for pool resources. Databricks will tag all pool resources (e.g., AWS instances and EBS volumes) with these tags in addition to `default_tags`. Notes: - Currently, Databricks allows at most 45 custom tags |
default_tags | object | Tags that are added by Databricks regardless of any ``custom_tags``, including: - Vendor: Databricks - InstancePoolCreator: <user_id_of_creator> - InstancePoolName: <name_of_pool> - InstancePoolId: <id_of_pool> |
disk_spec | object | Defines the specification of the disks that will be attached to all spark containers. |
enable_elastic_disk | boolean | Autoscaling Local Storage: when enabled, this instances in this pool will dynamically acquire additional disk space when its Spark workers are running low on disk space. In AWS, this feature requires specific AWS permissions to function correctly - refer to the User Guide for more details. |
gcp_attributes | object | Attributes related to instance pools running on Google Cloud Platform. If not specified at pool creation, a set of default values will be used. |
idle_instance_autotermination_minutes | integer | Automatically terminates the extra instances in the pool cache after they are inactive for this time in minutes if min_idle_instances requirement is already met. If not set, the extra pool instances will be automatically terminated after a default timeout. If specified, the threshold must be between 0 and 10000 minutes. Users can also set this value to 0 to instantly remove idle instances from the cache if min cache size could still hold. |
max_capacity | integer | Maximum number of outstanding instances to keep in the pool, including both instances used by clusters and idle instances. Clusters that require further instance provisioning will fail during upsize requests. |
min_idle_instances | integer | Minimum number of idle instances to keep in the instance pool |
node_type_flexibility | object | Flexible node type configuration for the pool. |
preloaded_docker_images | array | Custom Docker Image BYOC |
preloaded_spark_versions | array | A list containing at most one preloaded Spark image version for the pool. Pool-backed clusters started with the preloaded Spark version will start faster. A list of available Spark versions can be retrieved by using the :method:clusters/sparkVersions API call. |
remote_disk_throughput | integer | If set, what the configurable throughput (in Mb/s) for the remote disk is. Currently only supported for GCP HYPERDISK_BALANCED types. |
state | string | Current state of the instance pool. (ACTIVE, DELETED, STOPPED) |
stats | object | Usage statistics about the instance pool. |
status | object | Status of failed pending instances in the pool. |
total_initial_remote_disk_size | integer | If set, what the total initial volume size (in GB) of the remote disks should be. Currently only supported for GCP HYPERDISK_BALANCED types. |
| Name | Datatype | Description |
|---|---|---|
instance_pool_id | string | Canonical unique identifier for the pool. |
node_type_id | string | This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the :method:clusters/listNodeTypes API call. |
instance_pool_name | string | Pool name requested by the user. Pool name must be unique. Length must be between 1 and 100 characters. |
aws_attributes | object | Attributes set during instance pool creation which are related to Amazon Web Services. |
azure_attributes | object | Attributes related to instance pools running on Azure. If not specified at pool creation, a set of default values will be used. |
custom_tags | object | Additional tags for pool resources. Databricks will tag all pool resources (e.g., AWS instances and EBS volumes) with these tags in addition to `default_tags`. Notes: - Currently, Databricks allows at most 45 custom tags |
default_tags | object | Tags that are added by Databricks regardless of any ``custom_tags``, including: - Vendor: Databricks - InstancePoolCreator: <user_id_of_creator> - InstancePoolName: <name_of_pool> - InstancePoolId: <id_of_pool> |
disk_spec | object | Defines the specification of the disks that will be attached to all spark containers. |
enable_elastic_disk | boolean | Autoscaling Local Storage: when enabled, this instances in this pool will dynamically acquire additional disk space when its Spark workers are running low on disk space. In AWS, this feature requires specific AWS permissions to function correctly - refer to the User Guide for more details. |
gcp_attributes | object | Attributes related to instance pools running on Google Cloud Platform. If not specified at pool creation, a set of default values will be used. |
idle_instance_autotermination_minutes | integer | Automatically terminates the extra instances in the pool cache after they are inactive for this time in minutes if min_idle_instances requirement is already met. If not set, the extra pool instances will be automatically terminated after a default timeout. If specified, the threshold must be between 0 and 10000 minutes. Users can also set this value to 0 to instantly remove idle instances from the cache if min cache size could still hold. |
max_capacity | integer | Maximum number of outstanding instances to keep in the pool, including both instances used by clusters and idle instances. Clusters that require further instance provisioning will fail during upsize requests. |
min_idle_instances | integer | Minimum number of idle instances to keep in the instance pool |
node_type_flexibility | object | Flexible node type configuration for the pool. |
preloaded_docker_images | array | Custom Docker Image BYOC |
preloaded_spark_versions | array | A list containing at most one preloaded Spark image version for the pool. Pool-backed clusters started with the preloaded Spark version will start faster. A list of available Spark versions can be retrieved by using the :method:clusters/sparkVersions API call. |
remote_disk_throughput | integer | If set, what the configurable throughput (in Mb/s) for the remote disk is. Currently only supported for GCP HYPERDISK_BALANCED types. |
state | string | Current state of the instance pool. (ACTIVE, DELETED, STOPPED) |
stats | object | Usage statistics about the instance pool. |
status | object | Status of failed pending instances in the pool. |
total_initial_remote_disk_size | integer | If set, what the total initial volume size (in GB) of the remote disks should be. Currently only supported for GCP HYPERDISK_BALANCED types. |
Methods
The following methods are available for this resource:
| Name | Accessible by | Required Params | Optional Params | Description |
|---|---|---|---|---|
get | select | instance_pool_id, deployment_name | Retrieve the information for an instance pool based on its identifier. | |
list | select | deployment_name | Gets a list of instance pools with their statistics. | |
create | insert | deployment_name, instance_pool_name, node_type_id | Creates a new instance pool using idle and ready-to-use cloud instances. | |
replace | replace | deployment_name, instance_pool_id, instance_pool_name, node_type_id | Modifies the configuration of an existing instance pool. | |
delete | delete | deployment_name | Deletes the instance pool permanently. The idle instances in the pool are terminated asynchronously. |
Parameters
Parameters can be passed in the WHERE clause of a query. Check the Methods section to see which parameters are required or optional for each operation.
| Name | Datatype | Description |
|---|---|---|
deployment_name | string | The Databricks Workspace Deployment Name (default: dbc-abcd0123-a1bc) |
instance_pool_id | string | The canonical unique identifier for the instance pool. |
SELECT examples
- get
- list
Retrieve the information for an instance pool based on its identifier.
SELECT
instance_pool_id,
node_type_id,
instance_pool_name,
aws_attributes,
azure_attributes,
custom_tags,
default_tags,
disk_spec,
enable_elastic_disk,
gcp_attributes,
idle_instance_autotermination_minutes,
max_capacity,
min_idle_instances,
node_type_flexibility,
preloaded_docker_images,
preloaded_spark_versions,
remote_disk_throughput,
state,
stats,
status,
total_initial_remote_disk_size
FROM databricks_workspace.compute.instance_pools
WHERE instance_pool_id = '{{ instance_pool_id }}' -- required
AND deployment_name = '{{ deployment_name }}' -- required
;
Gets a list of instance pools with their statistics.
SELECT
instance_pool_id,
node_type_id,
instance_pool_name,
aws_attributes,
azure_attributes,
custom_tags,
default_tags,
disk_spec,
enable_elastic_disk,
gcp_attributes,
idle_instance_autotermination_minutes,
max_capacity,
min_idle_instances,
node_type_flexibility,
preloaded_docker_images,
preloaded_spark_versions,
remote_disk_throughput,
state,
stats,
status,
total_initial_remote_disk_size
FROM databricks_workspace.compute.instance_pools
WHERE deployment_name = '{{ deployment_name }}' -- required
;
INSERT examples
- create
- Manifest
Creates a new instance pool using idle and ready-to-use cloud instances.
INSERT INTO databricks_workspace.compute.instance_pools (
instance_pool_name,
node_type_id,
aws_attributes,
azure_attributes,
custom_tags,
disk_spec,
enable_elastic_disk,
gcp_attributes,
idle_instance_autotermination_minutes,
max_capacity,
min_idle_instances,
node_type_flexibility,
preloaded_docker_images,
preloaded_spark_versions,
remote_disk_throughput,
total_initial_remote_disk_size,
deployment_name
)
SELECT
'{{ instance_pool_name }}' /* required */,
'{{ node_type_id }}' /* required */,
'{{ aws_attributes }}',
'{{ azure_attributes }}',
'{{ custom_tags }}',
'{{ disk_spec }}',
{{ enable_elastic_disk }},
'{{ gcp_attributes }}',
{{ idle_instance_autotermination_minutes }},
{{ max_capacity }},
{{ min_idle_instances }},
'{{ node_type_flexibility }}',
'{{ preloaded_docker_images }}',
'{{ preloaded_spark_versions }}',
{{ remote_disk_throughput }},
{{ total_initial_remote_disk_size }},
'{{ deployment_name }}'
RETURNING
instance_pool_id
;
# Description fields are for documentation purposes
- name: instance_pools
props:
- name: deployment_name
value: "{{ deployment_name }}"
description: Required parameter for the instance_pools resource.
- name: instance_pool_name
value: "{{ instance_pool_name }}"
description: |
Pool name requested by the user. Pool name must be unique. Length must be between 1 and 100 characters.
- name: node_type_id
value: "{{ node_type_id }}"
description: |
This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the :method:clusters/listNodeTypes API call.
- name: aws_attributes
description: |
Attributes related to instance pools running on Amazon Web Services. If not specified at pool creation, a set of default values will be used.
value:
availability: "{{ availability }}"
instance_profile_arn: "{{ instance_profile_arn }}"
spot_bid_price_percent: {{ spot_bid_price_percent }}
zone_id: "{{ zone_id }}"
- name: azure_attributes
description: |
Attributes related to instance pools running on Azure. If not specified at pool creation, a set of default values will be used.
value:
availability: "{{ availability }}"
spot_bid_max_price: {{ spot_bid_max_price }}
- name: custom_tags
value: "{{ custom_tags }}"
description: |
Additional tags for pool resources. Databricks will tag all pool resources (e.g., AWS instances and EBS volumes) with these tags in addition to `default_tags`. Notes: - Currently, Databricks allows at most 45 custom tags
- name: disk_spec
description: |
Defines the specification of the disks that will be attached to all spark containers.
value:
disk_count: {{ disk_count }}
disk_iops: {{ disk_iops }}
disk_size: {{ disk_size }}
disk_throughput: {{ disk_throughput }}
disk_type:
azure_disk_volume_type: "{{ azure_disk_volume_type }}"
ebs_volume_type: "{{ ebs_volume_type }}"
- name: enable_elastic_disk
value: {{ enable_elastic_disk }}
description: |
Autoscaling Local Storage: when enabled, this instances in this pool will dynamically acquire additional disk space when its Spark workers are running low on disk space. In AWS, this feature requires specific AWS permissions to function correctly - refer to the User Guide for more details.
- name: gcp_attributes
description: |
Attributes related to instance pools running on Google Cloud Platform. If not specified at pool creation, a set of default values will be used.
value:
gcp_availability: "{{ gcp_availability }}"
local_ssd_count: {{ local_ssd_count }}
zone_id: "{{ zone_id }}"
- name: idle_instance_autotermination_minutes
value: {{ idle_instance_autotermination_minutes }}
description: |
Automatically terminates the extra instances in the pool cache after they are inactive for this time in minutes if min_idle_instances requirement is already met. If not set, the extra pool instances will be automatically terminated after a default timeout. If specified, the threshold must be between 0 and 10000 minutes. Users can also set this value to 0 to instantly remove idle instances from the cache if min cache size could still hold.
- name: max_capacity
value: {{ max_capacity }}
description: |
Maximum number of outstanding instances to keep in the pool, including both instances used by clusters and idle instances. Clusters that require further instance provisioning will fail during upsize requests.
- name: min_idle_instances
value: {{ min_idle_instances }}
description: |
Minimum number of idle instances to keep in the instance pool
- name: node_type_flexibility
description: |
Flexible node type configuration for the pool.
value:
alternate_node_type_ids:
- "{{ alternate_node_type_ids }}"
- name: preloaded_docker_images
description: |
Custom Docker Image BYOC
value:
- basic_auth:
password: "{{ password }}"
username: "{{ username }}"
url: "{{ url }}"
- name: preloaded_spark_versions
value:
- "{{ preloaded_spark_versions }}"
description: |
A list containing at most one preloaded Spark image version for the pool. Pool-backed clusters started with the preloaded Spark version will start faster. A list of available Spark versions can be retrieved by using the :method:clusters/sparkVersions API call.
- name: remote_disk_throughput
value: {{ remote_disk_throughput }}
description: |
If set, what the configurable throughput (in Mb/s) for the remote disk is. Currently only supported for GCP HYPERDISK_BALANCED types.
- name: total_initial_remote_disk_size
value: {{ total_initial_remote_disk_size }}
description: |
If set, what the total initial volume size (in GB) of the remote disks should be. Currently only supported for GCP HYPERDISK_BALANCED types.
REPLACE examples
- replace
Modifies the configuration of an existing instance pool.
REPLACE databricks_workspace.compute.instance_pools
SET
instance_pool_id = '{{ instance_pool_id }}',
instance_pool_name = '{{ instance_pool_name }}',
node_type_id = '{{ node_type_id }}',
custom_tags = '{{ custom_tags }}',
idle_instance_autotermination_minutes = {{ idle_instance_autotermination_minutes }},
max_capacity = {{ max_capacity }},
min_idle_instances = {{ min_idle_instances }},
remote_disk_throughput = {{ remote_disk_throughput }},
total_initial_remote_disk_size = {{ total_initial_remote_disk_size }}
WHERE
deployment_name = '{{ deployment_name }}' --required
AND instance_pool_id = '{{ instance_pool_id }}' --required
AND instance_pool_name = '{{ instance_pool_name }}' --required
AND node_type_id = '{{ node_type_id }}' --required;
DELETE examples
- delete
Deletes the instance pool permanently. The idle instances in the pool are terminated asynchronously.
DELETE FROM databricks_workspace.compute.instance_pools
WHERE deployment_name = '{{ deployment_name }}' --required
;