serving_endpoints
Creates, updates, deletes, gets or lists a serving_endpoints resource.
Overview
| Name | serving_endpoints |
| Type | Resource |
| Id | databricks_workspace.serving.serving_endpoints |
Fields
The following fields are returned by SELECT queries:
- get
- list
| Name | Datatype | Description |
|---|---|---|
id | string | System-generated ID of the endpoint. This is used to refer to the endpoint in the Permissions API |
name | string | The name of the serving endpoint. |
budget_policy_id | string | The budget policy associated with the endpoint. |
ai_gateway | object | |
config | object | The config that is currently being served by the endpoint. |
creation_timestamp | integer | The timestamp when the endpoint was created in Unix time. |
creator | string | The email of the user who created the serving endpoint. |
data_plane_info | object | Information required to query DataPlane APIs. |
description | string | Description of the serving model |
email_notifications | object | Email notification settings. |
endpoint_url | string | Endpoint invocation url if route optimization is enabled for endpoint |
last_updated_timestamp | integer | The timestamp when the endpoint was last updated by a user in Unix time. |
pending_config | object | The config that the endpoint is attempting to update to. |
permission_level | string | The permission level of the principal making the request. (CAN_MANAGE, CAN_QUERY, CAN_VIEW) |
route_optimized | boolean | Boolean representing if route optimization has been enabled for the endpoint |
state | object | Information corresponding to the state of the serving endpoint. |
tags | array | Tags attached to the serving endpoint. |
task | string | The task type of the serving endpoint. |
| Name | Datatype | Description |
|---|---|---|
id | string | System-generated ID of the endpoint, included to be used by the Permissions API. |
name | string | The name of the serving endpoint. |
budget_policy_id | string | The budget policy associated with the endpoint. |
usage_policy_id | string | The usage policy associated with serving endpoint. |
ai_gateway | object | |
config | object | The config that is currently being served by the endpoint. |
creation_timestamp | integer | The timestamp when the endpoint was created in Unix time. |
creator | string | The email of the user who created the serving endpoint. |
description | string | Description of the endpoint |
last_updated_timestamp | integer | The timestamp when the endpoint was last updated by a user in Unix time. |
state | object | Information corresponding to the state of the serving endpoint. |
tags | array | Tags attached to the serving endpoint. |
task | string | The task type of the serving endpoint. |
Methods
The following methods are available for this resource:
| Name | Accessible by | Required Params | Optional Params | Description |
|---|---|---|---|---|
get | select | name, deployment_name | Retrieves the details for a single serving endpoint. | |
list | select | deployment_name | Get all serving endpoints. | |
create | insert | deployment_name, name | Create a new serving endpoint. | |
update | update | name, deployment_name | Used to batch add and delete tags from a serving endpoint with a single API call. | |
update_config | replace | name, deployment_name | Updates any combination of the serving endpoint's served entities, the compute configuration of those | |
delete | delete | name, deployment_name | Delete a serving endpoint. | |
query | exec | name, deployment_name | Query a serving endpoint |
Parameters
Parameters can be passed in the WHERE clause of a query. Check the Methods section to see which parameters are required or optional for each operation.
| Name | Datatype | Description |
|---|---|---|
deployment_name | string | The Databricks Workspace Deployment Name (default: dbc-abcd0123-a1bc) |
name | string | The name of the serving endpoint. This field is required and is provided via the path parameter. |
SELECT examples
- get
- list
Retrieves the details for a single serving endpoint.
SELECT
id,
name,
budget_policy_id,
ai_gateway,
config,
creation_timestamp,
creator,
data_plane_info,
description,
email_notifications,
endpoint_url,
last_updated_timestamp,
pending_config,
permission_level,
route_optimized,
state,
tags,
task
FROM databricks_workspace.serving.serving_endpoints
WHERE name = '{{ name }}' -- required
AND deployment_name = '{{ deployment_name }}' -- required
;
Get all serving endpoints.
SELECT
id,
name,
budget_policy_id,
usage_policy_id,
ai_gateway,
config,
creation_timestamp,
creator,
description,
last_updated_timestamp,
state,
tags,
task
FROM databricks_workspace.serving.serving_endpoints
WHERE deployment_name = '{{ deployment_name }}' -- required
;
INSERT examples
- create
- Manifest
Create a new serving endpoint.
INSERT INTO databricks_workspace.serving.serving_endpoints (
name,
ai_gateway,
budget_policy_id,
config,
description,
email_notifications,
rate_limits,
route_optimized,
tags,
deployment_name
)
SELECT
'{{ name }}' /* required */,
'{{ ai_gateway }}',
'{{ budget_policy_id }}',
'{{ config }}',
'{{ description }}',
'{{ email_notifications }}',
'{{ rate_limits }}',
{{ route_optimized }},
'{{ tags }}',
'{{ deployment_name }}'
RETURNING
id,
name,
budget_policy_id,
ai_gateway,
config,
creation_timestamp,
creator,
data_plane_info,
description,
email_notifications,
endpoint_url,
last_updated_timestamp,
pending_config,
permission_level,
route_optimized,
state,
tags,
task
;
# Description fields are for documentation purposes
- name: serving_endpoints
props:
- name: deployment_name
value: "{{ deployment_name }}"
description: Required parameter for the serving_endpoints resource.
- name: name
value: "{{ name }}"
description: |
The name of the serving endpoint. This field is required and must be unique across a Databricks workspace. An endpoint name can consist of alphanumeric characters, dashes, and underscores.
- name: ai_gateway
description: |
The AI Gateway configuration for the serving endpoint. NOTE: External model, provisioned throughput, and pay-per-token endpoints are fully supported; agent endpoints currently only support inference tables.
value:
fallback_config:
enabled: {{ enabled }}
guardrails:
input:
invalid_keywords:
- "{{ invalid_keywords }}"
pii:
behavior: "{{ behavior }}"
safety: {{ safety }}
valid_topics:
- "{{ valid_topics }}"
output:
invalid_keywords:
- "{{ invalid_keywords }}"
pii:
behavior: "{{ behavior }}"
safety: {{ safety }}
valid_topics:
- "{{ valid_topics }}"
inference_table_config:
catalog_name: "{{ catalog_name }}"
enabled: {{ enabled }}
schema_name: "{{ schema_name }}"
table_name_prefix: "{{ table_name_prefix }}"
rate_limits:
- renewal_period: "{{ renewal_period }}"
calls: {{ calls }}
key: "{{ key }}"
principal: "{{ principal }}"
tokens: {{ tokens }}
usage_tracking_config:
enabled: {{ enabled }}
- name: budget_policy_id
value: "{{ budget_policy_id }}"
description: |
The budget policy to be applied to the serving endpoint.
- name: config
description: |
The core config of the serving endpoint.
value:
name: "{{ name }}"
auto_capture_config:
catalog_name: "{{ catalog_name }}"
enabled: {{ enabled }}
schema_name: "{{ schema_name }}"
table_name_prefix: "{{ table_name_prefix }}"
served_entities:
- burst_scaling_enabled: {{ burst_scaling_enabled }}
entity_name: "{{ entity_name }}"
entity_version: "{{ entity_version }}"
environment_vars: "{{ environment_vars }}"
external_model:
provider: "{{ provider }}"
name: "{{ name }}"
task: "{{ task }}"
ai21labs_config:
ai21labs_api_key: "{{ ai21labs_api_key }}"
ai21labs_api_key_plaintext: "{{ ai21labs_api_key_plaintext }}"
amazon_bedrock_config:
aws_region: "{{ aws_region }}"
bedrock_provider: "{{ bedrock_provider }}"
aws_access_key_id: "{{ aws_access_key_id }}"
aws_access_key_id_plaintext: "{{ aws_access_key_id_plaintext }}"
aws_secret_access_key: "{{ aws_secret_access_key }}"
aws_secret_access_key_plaintext: "{{ aws_secret_access_key_plaintext }}"
instance_profile_arn: "{{ instance_profile_arn }}"
anthropic_config:
anthropic_api_key: "{{ anthropic_api_key }}"
anthropic_api_key_plaintext: "{{ anthropic_api_key_plaintext }}"
cohere_config:
cohere_api_base: "{{ cohere_api_base }}"
cohere_api_key: "{{ cohere_api_key }}"
cohere_api_key_plaintext: "{{ cohere_api_key_plaintext }}"
custom_provider_config:
custom_provider_url: "{{ custom_provider_url }}"
api_key_auth:
key: "{{ key }}"
value: "{{ value }}"
value_plaintext: "{{ value_plaintext }}"
bearer_token_auth:
token: "{{ token }}"
token_plaintext: "{{ token_plaintext }}"
databricks_model_serving_config:
databricks_workspace_url: "{{ databricks_workspace_url }}"
databricks_api_token: "{{ databricks_api_token }}"
databricks_api_token_plaintext: "{{ databricks_api_token_plaintext }}"
google_cloud_vertex_ai_config:
project_id: "{{ project_id }}"
region: "{{ region }}"
private_key: "{{ private_key }}"
private_key_plaintext: "{{ private_key_plaintext }}"
openai_config:
microsoft_entra_client_id: "{{ microsoft_entra_client_id }}"
microsoft_entra_client_secret: "{{ microsoft_entra_client_secret }}"
microsoft_entra_client_secret_plaintext: "{{ microsoft_entra_client_secret_plaintext }}"
microsoft_entra_tenant_id: "{{ microsoft_entra_tenant_id }}"
openai_api_base: "{{ openai_api_base }}"
openai_api_key: "{{ openai_api_key }}"
openai_api_key_plaintext: "{{ openai_api_key_plaintext }}"
openai_api_type: "{{ openai_api_type }}"
openai_api_version: "{{ openai_api_version }}"
openai_deployment_name: "{{ openai_deployment_name }}"
openai_organization: "{{ openai_organization }}"
palm_config:
palm_api_key: "{{ palm_api_key }}"
palm_api_key_plaintext: "{{ palm_api_key_plaintext }}"
instance_profile_arn: "{{ instance_profile_arn }}"
max_provisioned_concurrency: {{ max_provisioned_concurrency }}
max_provisioned_throughput: {{ max_provisioned_throughput }}
min_provisioned_concurrency: {{ min_provisioned_concurrency }}
min_provisioned_throughput: {{ min_provisioned_throughput }}
name: "{{ name }}"
provisioned_model_units: {{ provisioned_model_units }}
scale_to_zero_enabled: {{ scale_to_zero_enabled }}
workload_size: "{{ workload_size }}"
workload_type: "{{ workload_type }}"
served_models:
- scale_to_zero_enabled: {{ scale_to_zero_enabled }}
model_name: "{{ model_name }}"
model_version: "{{ model_version }}"
burst_scaling_enabled: {{ burst_scaling_enabled }}
environment_vars: "{{ environment_vars }}"
instance_profile_arn: "{{ instance_profile_arn }}"
max_provisioned_concurrency: {{ max_provisioned_concurrency }}
max_provisioned_throughput: {{ max_provisioned_throughput }}
min_provisioned_concurrency: {{ min_provisioned_concurrency }}
min_provisioned_throughput: {{ min_provisioned_throughput }}
name: "{{ name }}"
provisioned_model_units: {{ provisioned_model_units }}
workload_size: "{{ workload_size }}"
workload_type: "{{ workload_type }}"
traffic_config:
routes:
- traffic_percentage: {{ traffic_percentage }}
served_entity_name: "{{ served_entity_name }}"
served_model_name: "{{ served_model_name }}"
- name: description
value: "{{ description }}"
description: |
:param email_notifications: :class:`EmailNotifications` (optional) Email notification settings.
- name: email_notifications
value:
on_update_failure:
- "{{ on_update_failure }}"
on_update_success:
- "{{ on_update_success }}"
- name: rate_limits
description: |
Rate limits to be applied to the serving endpoint. NOTE: this field is deprecated, please use AI Gateway to manage rate limits.
value:
- calls: {{ calls }}
renewal_period: "{{ renewal_period }}"
key: "{{ key }}"
- name: route_optimized
value: {{ route_optimized }}
description: |
Enable route optimization for the serving endpoint.
- name: tags
description: |
Tags to be attached to the serving endpoint and automatically propagated to billing logs.
value:
- key: "{{ key }}"
value: "{{ value }}"
UPDATE examples
- update
Used to batch add and delete tags from a serving endpoint with a single API call.
UPDATE databricks_workspace.serving.serving_endpoints
SET
add_tags = '{{ add_tags }}',
delete_tags = '{{ delete_tags }}'
WHERE
name = '{{ name }}' --required
AND deployment_name = '{{ deployment_name }}' --required
RETURNING
tags;
REPLACE examples
- update_config
Updates any combination of the serving endpoint's served entities, the compute configuration of those
REPLACE databricks_workspace.serving.serving_endpoints
SET
auto_capture_config = '{{ auto_capture_config }}',
served_entities = '{{ served_entities }}',
served_models = '{{ served_models }}',
traffic_config = '{{ traffic_config }}'
WHERE
name = '{{ name }}' --required
AND deployment_name = '{{ deployment_name }}' --required
RETURNING
id,
name,
budget_policy_id,
ai_gateway,
config,
creation_timestamp,
creator,
data_plane_info,
description,
email_notifications,
endpoint_url,
last_updated_timestamp,
pending_config,
permission_level,
route_optimized,
state,
tags,
task;
DELETE examples
- delete
Delete a serving endpoint.
DELETE FROM databricks_workspace.serving.serving_endpoints
WHERE name = '{{ name }}' --required
AND deployment_name = '{{ deployment_name }}' --required
;
Lifecycle Methods
- query
Query a serving endpoint
EXEC databricks_workspace.serving.serving_endpoints.query
@name='{{ name }}' --required,
@deployment_name='{{ deployment_name }}' --required
@@json=
'{
"client_request_id": "{{ client_request_id }}",
"dataframe_records": "{{ dataframe_records }}",
"dataframe_split": "{{ dataframe_split }}",
"extra_params": "{{ extra_params }}",
"input": "{{ input }}",
"inputs": "{{ inputs }}",
"instances": "{{ instances }}",
"max_tokens": {{ max_tokens }},
"messages": "{{ messages }}",
"n": {{ n }},
"prompt": "{{ prompt }}",
"stop": "{{ stop }}",
"stream": {{ stream }},
"temperature": {{ temperature }},
"usage_context": "{{ usage_context }}"
}'
;