serving_endpoints
Creates, updates, deletes, gets or lists a serving_endpoints
resource.
Overview
Name | serving_endpoints |
Type | Resource |
Id | databricks_workspace.realtimeserving.serving_endpoints |
Fields
The following fields are returned by SELECT
queries:
- get
- list
Serving endpoint was retrieved successfully.
Name | Datatype | Description |
---|---|---|
id | string | |
name | string | |
config | object | |
creation_timestamp | integer | |
creator | string | |
last_updated_timestamp | integer | |
permission_level | string | |
route_optimized | boolean | |
state | object | |
tags | array |
List of serving endpoints was retrieved successfully.
Name | Datatype | Description |
---|---|---|
id | string | |
name | string | |
ai_gateway | object | |
config | object | |
creation_timestamp | integer | |
creator | string | |
last_updated_timestamp | integer | |
state | object | |
tags | array | |
task | string |
Methods
The following methods are available for this resource:
Name | Accessible by | Required Params | Optional Params | Description |
---|---|---|---|---|
get | select | deployment_name | Retrieves the details for a single serving endpoint. | |
list | select | deployment_name | ||
create | insert | deployment_name | ||
updateconfig | update | deployment_name | Updates any combination of the serving endpoint's served entities, the compute configuration of those served entities, and the endpoint's traffic config. An endpoint that already has an update in progress can not be updated until the current update completes or fails. | |
patch | update | deployment_name | Used to batch add and delete tags from a serving endpoint with a single API call. | |
put | replace | deployment_name | Used to update the rate limits of a serving endpoint. NOTE: Only foundation model endpoints are currently supported. For external models, use AI Gateway to manage rate limits. | |
delete | delete | deployment_name | ||
query | exec | deployment_name |
Parameters
Parameters can be passed in the WHERE
clause of a query. Check the Methods section to see which parameters are required or optional for each operation.
Name | Datatype | Description |
---|---|---|
deployment_name | string | The Databricks Workspace Deployment Name (default: dbc-abcd0123-a1bc) |
SELECT
examples
- get
- list
Retrieves the details for a single serving endpoint.
SELECT
id,
name,
config,
creation_timestamp,
creator,
last_updated_timestamp,
permission_level,
route_optimized,
state,
tags
FROM databricks_workspace.realtimeserving.serving_endpoints
WHERE deployment_name = '{{ deployment_name }}' -- required;
List of serving endpoints was retrieved successfully.
SELECT
id,
name,
ai_gateway,
config,
creation_timestamp,
creator,
last_updated_timestamp,
state,
tags,
task
FROM databricks_workspace.realtimeserving.serving_endpoints
WHERE deployment_name = '{{ deployment_name }}' -- required;
INSERT
examples
- create
- Manifest
No description available.
INSERT INTO databricks_workspace.realtimeserving.serving_endpoints (
data__name,
data__route_optimized,
data__config,
data__tags,
data__rate_limits,
data__ai_gateway,
deployment_name
)
SELECT
'{{ name }}',
'{{ route_optimized }}',
'{{ config }}',
'{{ tags }}',
'{{ rate_limits }}',
'{{ ai_gateway }}',
'{{ deployment_name }}'
RETURNING
id,
name,
ai_gateway,
config,
creation_timestamp,
creator,
last_updated_timestamp,
permission_level,
route_optimized,
state,
tags
;
# Description fields are for documentation purposes
- name: serving_endpoints
props:
- name: deployment_name
value: string
description: Required parameter for the serving_endpoints resource.
- name: name
value: required
- name: route_optimized
value: string
- name: config
value: required
- name: tags
value: object
- name: rate_limits
value: Array of object
- name: ai_gateway
value: Array of object
UPDATE
examples
- updateconfig
- patch
Updates any combination of the serving endpoint's served entities, the compute configuration of those served entities, and the endpoint's traffic config. An endpoint that already has an update in progress can not be updated until the current update completes or fails.
UPDATE databricks_workspace.realtimeserving.serving_endpoints
SET
data__served_entities = '{{ served_entities }}',
data__served_models = '{{ served_models }}',
data__traffic_config = '{{ traffic_config }}',
data__auto_capture_config = '{{ auto_capture_config }}'
WHERE
deployment_name = '{{ deployment_name }}' --required
RETURNING
id,
name,
config,
creation_timestamp,
creator,
last_updated_timestamp,
permission_level,
route_optimized,
state;
Used to batch add and delete tags from a serving endpoint with a single API call.
UPDATE databricks_workspace.realtimeserving.serving_endpoints
SET
data__delete_tags = '{{ delete_tags }}',
data__add_tags = '{{ add_tags }}'
WHERE
deployment_name = '{{ deployment_name }}' --required
RETURNING
key,
value;
REPLACE
examples
- put
Used to update the rate limits of a serving endpoint. NOTE: Only foundation model endpoints are currently supported. For external models, use AI Gateway to manage rate limits.
REPLACE databricks_workspace.realtimeserving.serving_endpoints
SET
data__rate_limits = '{{ rate_limits }}'
WHERE
deployment_name = '{{ deployment_name }}' --required
RETURNING
rate_limits;
DELETE
examples
- delete
No description available.
DELETE FROM databricks_workspace.realtimeserving.serving_endpoints
WHERE deployment_name = '{{ deployment_name }}' --required;
Lifecycle Methods
- query
Serving endpoint was queried successfully and returned predictions.
EXEC databricks_workspace.realtimeserving.serving_endpoints.query
@deployment_name='{{ deployment_name }}' --required
@@json=
'{
"prompt": "{{ prompt }}",
"input": "{{ input }}",
"temperature": "{{ temperature }}",
"stop": "{{ stop }}",
"max_tokens": {{ max_tokens }},
"n": {{ n }},
"stream": {{ stream }},
"dataframe_records": "{{ dataframe_records }}",
"instances": "{{ instances }}",
"inputs": "{{ inputs }}",
"messages": "{{ messages }}",
"extra_params": "{{ extra_params }}",
"dataframe_split": "{{ dataframe_split }}"
}';