statement_execution
Creates, updates, deletes, gets or lists a statement_execution resource.
Overview
| Name | statement_execution |
| Type | Resource |
| Id | databricks_workspace.sql.statement_execution |
Fields
The following fields are returned by SELECT queries:
- get_result_chunk
- get
| Name | Datatype | Description |
|---|---|---|
byte_count | integer | The number of bytes in the result chunk. This field is not available when using `INLINE` disposition. |
chunk_index | integer | The position within the sequence of result set chunks. |
data_array | array | The `JSON_ARRAY` format is an array of arrays of values, where each non-null value is formatted as a string. Null values are encoded as JSON `null`. |
external_links | array | |
next_chunk_index | integer | When fetching, provides the `chunk_index` for the _next_ chunk. If absent, indicates there are no more chunks. The next chunk can be fetched with a :method:statementexecution/getstatementresultchunkn request. |
next_chunk_internal_link | string | When fetching, provides a link to fetch the _next_ chunk. If absent, indicates there are no more chunks. This link is an absolute `path` to be joined with your `$DATABRICKS_HOST`, and should be treated as an opaque link. This is an alternative to using `next_chunk_index`. |
row_count | integer | The number of rows within the result chunk. |
row_offset | integer | The starting row offset within the result set. |
| Name | Datatype | Description |
|---|---|---|
statement_id | string | The statement ID is returned upon successfully submitting a SQL statement, and is a required reference for all subsequent calls. |
manifest | object | The result manifest provides schema and metadata for the result set. |
result | object | Contains the result data of a single chunk when using `INLINE` disposition. When using<br /> `EXTERNAL_LINKS` disposition, the array `external_links` is used instead to provide URLs to the<br /> result data in cloud storage. Exactly one of these alternatives is used. (While the<br /> `external_links` array prepares the API to return multiple links in a single response. Currently<br /> only a single link is returned.) |
status | object | The status response includes execution state and if relevant, error information. |
Methods
The following methods are available for this resource:
| Name | Accessible by | Required Params | Optional Params | Description |
|---|---|---|---|---|
get_result_chunk | select | statement_id, chunk_index, deployment_name | After the statement execution has SUCCEEDED, this request can be used to fetch any chunk by index. | |
get | select | statement_id, deployment_name | This request can be used to poll for the statement's status. StatementResponse contains statement_id | |
cancel | insert | statement_id, deployment_name | Requests that an executing statement be canceled. Callers must poll for status to see the terminal | |
execute | insert | deployment_name, statement, warehouse_id | Execute a SQL statement and optionally await its results for a specified time. |
Parameters
Parameters can be passed in the WHERE clause of a query. Check the Methods section to see which parameters are required or optional for each operation.
| Name | Datatype | Description |
|---|---|---|
chunk_index | integer | :returns: :class:ResultData |
deployment_name | string | The Databricks Workspace Deployment Name (default: dbc-abcd0123-a1bc) |
statement_id | string | The statement ID is returned upon successfully submitting a SQL statement, and is a required reference for all subsequent calls. |
SELECT examples
- get_result_chunk
- get
After the statement execution has SUCCEEDED, this request can be used to fetch any chunk by index.
SELECT
byte_count,
chunk_index,
data_array,
external_links,
next_chunk_index,
next_chunk_internal_link,
row_count,
row_offset
FROM databricks_workspace.sql.statement_execution
WHERE statement_id = '{{ statement_id }}' -- required
AND chunk_index = '{{ chunk_index }}' -- required
AND deployment_name = '{{ deployment_name }}' -- required
;
This request can be used to poll for the statement's status. StatementResponse contains statement_id
SELECT
statement_id,
manifest,
result,
status
FROM databricks_workspace.sql.statement_execution
WHERE statement_id = '{{ statement_id }}' -- required
AND deployment_name = '{{ deployment_name }}' -- required
;
INSERT examples
- cancel
- execute
- Manifest
Requests that an executing statement be canceled. Callers must poll for status to see the terminal
INSERT INTO databricks_workspace.sql.statement_execution (
statement_id,
deployment_name
)
SELECT
'{{ statement_id }}',
'{{ deployment_name }}'
;
Execute a SQL statement and optionally await its results for a specified time.
INSERT INTO databricks_workspace.sql.statement_execution (
statement,
warehouse_id,
byte_limit,
catalog,
disposition,
format,
on_wait_timeout,
parameters,
query_tags,
row_limit,
schema,
wait_timeout,
deployment_name
)
SELECT
'{{ statement }}' /* required */,
'{{ warehouse_id }}' /* required */,
{{ byte_limit }},
'{{ catalog }}',
'{{ disposition }}',
'{{ format }}',
'{{ on_wait_timeout }}',
'{{ parameters }}',
'{{ query_tags }}',
{{ row_limit }},
'{{ schema }}',
'{{ wait_timeout }}',
'{{ deployment_name }}'
RETURNING
statement_id,
manifest,
result,
status
;
# Description fields are for documentation purposes
- name: statement_execution
props:
- name: statement_id
value: "{{ statement_id }}"
description: Required parameter for the statement_execution resource.
- name: deployment_name
value: "{{ deployment_name }}"
description: Required parameter for the statement_execution resource.
- name: statement
value: "{{ statement }}"
description: |
The SQL statement to execute. The statement can optionally be parameterized, see `parameters`. The maximum query text size is 16 MiB.
- name: warehouse_id
value: "{{ warehouse_id }}"
description: |
Warehouse upon which to execute a statement. See also [What are SQL warehouses?] [What are SQL warehouses?]: https://docs.databricks.com/sql/admin/warehouse-type.html
- name: byte_limit
value: {{ byte_limit }}
description: |
Applies the given byte limit to the statement's result size. Byte counts are based on internal data representations and might not match the final size in the requested `format`. If the result was truncated due to the byte limit, then `truncated` in the response is set to `true`. When using `EXTERNAL_LINKS` disposition, a default `byte_limit` of 100 GiB is applied if `byte_limit` is not explicitly set.
- name: catalog
value: "{{ catalog }}"
description: |
Sets default catalog for statement execution, similar to [`USE CATALOG`] in SQL. [`USE CATALOG`]: https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-use-catalog.html
- name: disposition
value: "{{ disposition }}"
description: |
The fetch disposition provides two modes of fetching results: `INLINE` and `EXTERNAL_LINKS`. Statements executed with `INLINE` disposition will return result data inline, in `JSON_ARRAY` format, in a series of chunks. If a given statement produces a result set with a size larger than 25 MiB, that statement execution is aborted, and no result set will be available. **NOTE** Byte limits are computed based upon internal representations of the result set data, and might not match the sizes visible in JSON responses. Statements executed with `EXTERNAL_LINKS` disposition will return result data as external links: URLs that point to cloud storage internal to the workspace. Using `EXTERNAL_LINKS` disposition allows statements to generate arbitrarily sized result sets for fetching up to 100 GiB. The resulting links have two important properties: 1. They point to resources _external_ to the Databricks compute; therefore any associated authentication information (typically a personal access token, OAuth token, or similar) _must be removed_ when fetching from these links. 2. These are URLs with a specific expiration, indicated in the response. The behavior when attempting to use an expired link is cloud specific.
- name: format
value: "{{ format }}"
description: |
Statement execution supports three result formats: `JSON_ARRAY` (default), `ARROW_STREAM`, and `CSV`. Important: The formats `ARROW_STREAM` and `CSV` are supported only with `EXTERNAL_LINKS` disposition. `JSON_ARRAY` is supported in `INLINE` and `EXTERNAL_LINKS` disposition. When specifying `format=JSON_ARRAY`, result data will be formatted as an array of arrays of values, where each value is either the *string representation* of a value, or `null`. For example, the output of `SELECT concat('id-', id) AS strCol, id AS intCol, null AS nullCol FROM range(3)` would look like this: ``` [ [ "id-1", "1", null ], [ "id-2", "2", null ], [ "id-3", "3", null ], ] ``` When specifying `format=JSON_ARRAY` and `disposition=EXTERNAL_LINKS`, each chunk in the result contains compact JSON with no indentation or extra whitespace. When specifying `format=ARROW_STREAM` and `disposition=EXTERNAL_LINKS`, each chunk in the result will be formatted as Apache Arrow Stream. See the [Apache Arrow streaming format]. When specifying `format=CSV` and `disposition=EXTERNAL_LINKS`, each chunk in the result will be a CSV according to [RFC 4180] standard. All the columns values will have *string representation* similar to the `JSON_ARRAY` format, and `null` values will be encoded as “null”. Only the first chunk in the result would contain a header row with column names. For example, the output of `SELECT concat('id-', id) AS strCol, id AS intCol, null as nullCol FROM range(3)` would look like this: ``` strCol,intCol,nullCol id-1,1,null id-2,2,null id-3,3,null ``` [Apache Arrow streaming format]: https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format [RFC 4180]: https://www.rfc-editor.org/rfc/rfc4180
- name: on_wait_timeout
value: "{{ on_wait_timeout }}"
description: |
When `wait_timeout > 0s`, the call will block up to the specified time. If the statement execution doesn't finish within this time, `on_wait_timeout` determines whether the execution should continue or be canceled. When set to `CONTINUE`, the statement execution continues asynchronously and the call returns a statement ID which can be used for polling with :method:statementexecution/getStatement. When set to `CANCEL`, the statement execution is canceled and the call returns with a `CANCELED` state.
- name: parameters
description: |
A list of parameters to pass into a SQL statement containing parameter markers. A parameter consists of a name, a value, and optionally a type. To represent a NULL value, the `value` field may be omitted or set to `null` explicitly. If the `type` field is omitted, the value is interpreted as a string. If the type is given, parameters will be checked for type correctness according to the given type. A value is correct if the provided string can be converted to the requested type using the `cast` function. The exact semantics are described in the section [`cast` function] of the SQL language reference. For example, the following statement contains two parameters, `my_name` and `my_date`: ``` SELECT * FROM my_table WHERE name = :my_name AND date = :my_date ``` The parameters can be passed in the request body as follows: ` { ..., "statement": "SELECT * FROM my_table WHERE name = :my_name AND date = :my_date", "parameters": [ { "name": "my_name", "value": "the name" }, { "name": "my_date", "value": "2020-01-01", "type": "DATE" } ] } ` Currently, positional parameters denoted by a `?` marker are not supported by the Databricks SQL Statement Execution API. Also see the section [Parameter markers] of the SQL language reference. [Parameter markers]: https://docs.databricks.com/sql/language-manual/sql-ref-parameter-marker.html [`cast` function]: https://docs.databricks.com/sql/language-manual/functions/cast.html
value:
- name: "{{ name }}"
type: "{{ type }}"
value: "{{ value }}"
- name: query_tags
description: |
An array of query tags to annotate a SQL statement. A query tag consists of a non-empty key and, optionally, a value. To represent a NULL value, either omit the `value` field or manually set it to `null` or white space. Refer to the SQL language reference for the format specification of query tags. There's no significance to the order of tags. Only one value per key will be recorded. A sequence in excess of 20 query tags will be coerced to 20. Example: { ..., "query_tags": [ { "key": "team", "value": "eng" }, { "key": "some key only tag" } ] }
value:
- key: "{{ key }}"
value: "{{ value }}"
- name: row_limit
value: {{ row_limit }}
description: |
Applies the given row limit to the statement's result set, but unlike the `LIMIT` clause in SQL, it also sets the `truncated` field in the response to indicate whether the result was trimmed due to the limit or not.
- name: schema
value: "{{ schema }}"
description: |
Sets default schema for statement execution, similar to [`USE SCHEMA`] in SQL. [`USE SCHEMA`]: https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-use-schema.html
- name: wait_timeout
value: "{{ wait_timeout }}"
description: |
The time in seconds the call will wait for the statement's result set as `Ns`, where `N` can be set to 0 or to a value between 5 and 50. When set to `0s`, the statement will execute in asynchronous mode and the call will not wait for the execution to finish. In this case, the call returns directly with `PENDING` state and a statement ID which can be used for polling with :method:statementexecution/getStatement. When set between 5 and 50 seconds, the call will behave synchronously up to this timeout and wait for the statement execution to finish. If the execution finishes within this time, the call returns immediately with a manifest and result data (or a `FAILED` state in case of an execution error). If the statement takes longer to execute, `on_wait_timeout` determines what should happen after the timeout is reached.