quality_monitors
Creates, updates, deletes, gets or lists a quality_monitors resource.
Overview
| Name | quality_monitors |
| Type | Resource |
| Id | databricks_workspace.catalog.quality_monitors |
Fields
The following fields are returned by SELECT queries:
- get
| Name | Datatype | Description |
|---|---|---|
dashboard_id | string | [Create:ERR Update:OPT] Id of dashboard that visualizes the computed metrics. This can be empty if the monitor is in PENDING state. |
baseline_table_name | string | [Create:OPT Update:OPT] Baseline table name. Baseline data is used to compute drift from the data in the monitored `table_name`. The baseline table and the monitored table shall have the same schema. |
drift_metrics_table_name | string | [Create:ERR Update:IGN] Table that stores drift metrics data. Format: `catalog.schema.table_name`. |
output_schema_name | string | |
profile_metrics_table_name | string | [Create:ERR Update:IGN] Table that stores profile metrics data. Format: `catalog.schema.table_name`. |
table_name | string | [Create:ERR Update:IGN] UC table to monitor. Format: `catalog.schema.table_name` |
assets_dir | string | [Create:REQ Update:IGN] Field for specifying the absolute path to a custom directory to store data-monitoring assets. Normally prepopulated to a default user location via UI and Python APIs. |
custom_metrics | array | [Create:OPT Update:OPT] Custom metrics. |
data_classification_config | object | [Create:OPT Update:OPT] Data classification related config. |
inference_log | object | |
latest_monitor_failure_msg | string | [Create:ERR Update:IGN] The latest error message for a monitor failure. |
monitor_version | integer | [Create:ERR Update:IGN] Represents the current monitor configuration version in use. The version will be represented in a numeric fashion (1,2,3...). The field has flexibility to take on negative values, which can indicate corrupted monitor_version numbers. |
notifications | object | [Create:OPT Update:OPT] Field for specifying notification settings. |
schedule | object | [Create:OPT Update:OPT] The monitor schedule. |
slicing_exprs | array | [Create:OPT Update:OPT] List of column expressions to slice data with for targeted analysis. The data is grouped by each expression independently, resulting in a separate slice for each predicate and its complements. For example `slicing_exprs=[“col_1”, “col_2 > 10”]` will generate the following slices: two slices for `col_2 > 10` (True and False), and one slice per unique value in `col1`. For high-cardinality columns, only the top 100 unique values by frequency will generate slices. |
snapshot | object | Configuration for monitoring snapshot tables. |
status | string | [Create:ERR Update:IGN] The monitor status. (MONITOR_STATUS_ACTIVE, MONITOR_STATUS_DELETE_PENDING, MONITOR_STATUS_ERROR, MONITOR_STATUS_FAILED, MONITOR_STATUS_PENDING) |
time_series | object | Configuration for monitoring time series tables. |
Methods
The following methods are available for this resource:
| Name | Accessible by | Required Params | Optional Params | Description |
|---|---|---|---|---|
get | select | table_name, deployment_name | [DEPRECATED] Gets a monitor for the specified table. Use Data Quality Monitors API instead | |
create | insert | table_name, deployment_name, output_schema_name, assets_dir | [DEPRECATED] Creates a new monitor for the specified table. Use Data Quality Monitors API instead | |
update | replace | table_name, deployment_name, output_schema_name | [DEPRECATED] Updates a monitor for the specified table. Use Data Quality Monitors API instead | |
delete | delete | table_name, deployment_name | [DEPRECATED] Deletes a monitor for the specified table. Use Data Quality Monitors API instead | |
cancel_refresh | exec | table_name, refresh_id, deployment_name | [DEPRECATED] Cancels an already-initiated refresh job. Use Data Quality Monitors API instead | |
regenerate_dashboard | exec | table_name, deployment_name | [DEPRECATED] Regenerates the monitoring dashboard for the specified table. Use Data Quality Monitors |
Parameters
Parameters can be passed in the WHERE clause of a query. Check the Methods section to see which parameters are required or optional for each operation.
| Name | Datatype | Description |
|---|---|---|
deployment_name | string | The Databricks Workspace Deployment Name (default: dbc-abcd0123-a1bc) |
refresh_id | integer | int |
table_name | string | UC table name in format catalog.schema.table_name. This field corresponds to the {full_table_name_arg} arg in the endpoint path. |
SELECT examples
- get
[DEPRECATED] Gets a monitor for the specified table. Use Data Quality Monitors API instead
SELECT
dashboard_id,
baseline_table_name,
drift_metrics_table_name,
output_schema_name,
profile_metrics_table_name,
table_name,
assets_dir,
custom_metrics,
data_classification_config,
inference_log,
latest_monitor_failure_msg,
monitor_version,
notifications,
schedule,
slicing_exprs,
snapshot,
status,
time_series
FROM databricks_workspace.catalog.quality_monitors
WHERE table_name = '{{ table_name }}' -- required
AND deployment_name = '{{ deployment_name }}' -- required
;
INSERT examples
- create
- Manifest
[DEPRECATED] Creates a new monitor for the specified table. Use Data Quality Monitors API instead
INSERT INTO databricks_workspace.catalog.quality_monitors (
output_schema_name,
assets_dir,
baseline_table_name,
custom_metrics,
data_classification_config,
inference_log,
latest_monitor_failure_msg,
notifications,
schedule,
skip_builtin_dashboard,
slicing_exprs,
snapshot,
time_series,
warehouse_id,
table_name,
deployment_name
)
SELECT
'{{ output_schema_name }}' /* required */,
'{{ assets_dir }}' /* required */,
'{{ baseline_table_name }}',
'{{ custom_metrics }}',
'{{ data_classification_config }}',
'{{ inference_log }}',
'{{ latest_monitor_failure_msg }}',
'{{ notifications }}',
'{{ schedule }}',
{{ skip_builtin_dashboard }},
'{{ slicing_exprs }}',
'{{ snapshot }}',
'{{ time_series }}',
'{{ warehouse_id }}',
'{{ table_name }}',
'{{ deployment_name }}'
RETURNING
dashboard_id,
baseline_table_name,
drift_metrics_table_name,
output_schema_name,
profile_metrics_table_name,
table_name,
assets_dir,
custom_metrics,
data_classification_config,
inference_log,
latest_monitor_failure_msg,
monitor_version,
notifications,
schedule,
slicing_exprs,
snapshot,
status,
time_series
;
# Description fields are for documentation purposes
- name: quality_monitors
props:
- name: table_name
value: "{{ table_name }}"
description: Required parameter for the quality_monitors resource.
- name: deployment_name
value: "{{ deployment_name }}"
description: Required parameter for the quality_monitors resource.
- name: output_schema_name
value: "{{ output_schema_name }}"
description: |
[Create:REQ Update:REQ] Schema where output tables are created. Needs to be in 2-level format {catalog}.{schema}
- name: assets_dir
value: "{{ assets_dir }}"
description: |
[Create:REQ Update:IGN] Field for specifying the absolute path to a custom directory to store data-monitoring assets. Normally prepopulated to a default user location via UI and Python APIs.
- name: baseline_table_name
value: "{{ baseline_table_name }}"
description: |
[Create:OPT Update:OPT] Baseline table name. Baseline data is used to compute drift from the data in the monitored `table_name`. The baseline table and the monitored table shall have the same schema.
- name: custom_metrics
description: |
[Create:OPT Update:OPT] Custom metrics.
value:
- name: "{{ name }}"
definition: "{{ definition }}"
input_columns: "{{ input_columns }}"
output_data_type: "{{ output_data_type }}"
type: "{{ type }}"
- name: data_classification_config
description: |
[Create:OPT Update:OPT] Data classification related config.
value:
enabled: {{ enabled }}
- name: inference_log
description: |
:param latest_monitor_failure_msg: str (optional) [Create:ERR Update:IGN] The latest error message for a monitor failure.
value:
problem_type: "{{ problem_type }}"
timestamp_col: "{{ timestamp_col }}"
granularities:
- "{{ granularities }}"
prediction_col: "{{ prediction_col }}"
model_id_col: "{{ model_id_col }}"
label_col: "{{ label_col }}"
prediction_proba_col: "{{ prediction_proba_col }}"
- name: latest_monitor_failure_msg
value: "{{ latest_monitor_failure_msg }}"
- name: notifications
description: |
[Create:OPT Update:OPT] Field for specifying notification settings.
value:
on_failure:
email_addresses:
- "{{ email_addresses }}"
on_new_classification_tag_detected:
email_addresses:
- "{{ email_addresses }}"
- name: schedule
description: |
[Create:OPT Update:OPT] The monitor schedule.
value:
quartz_cron_expression: "{{ quartz_cron_expression }}"
timezone_id: "{{ timezone_id }}"
pause_status: "{{ pause_status }}"
- name: skip_builtin_dashboard
value: {{ skip_builtin_dashboard }}
description: |
Whether to skip creating a default dashboard summarizing data quality metrics.
- name: slicing_exprs
value:
- "{{ slicing_exprs }}"
description: |
[Create:OPT Update:OPT] List of column expressions to slice data with for targeted analysis. The data is grouped by each expression independently, resulting in a separate slice for each predicate and its complements. For example `slicing_exprs=[“col_1”, “col_2 > 10”]` will generate the following slices: two slices for `col_2 > 10` (True and False), and one slice per unique value in `col1`. For high-cardinality columns, only the top 100 unique values by frequency will generate slices.
- name: snapshot
value: "{{ snapshot }}"
description: |
Configuration for monitoring snapshot tables.
- name: time_series
description: |
Configuration for monitoring time series tables.
value:
timestamp_col: "{{ timestamp_col }}"
granularities:
- "{{ granularities }}"
- name: warehouse_id
value: "{{ warehouse_id }}"
description: |
Optional argument to specify the warehouse for dashboard creation. If not specified, the first running warehouse will be used.
REPLACE examples
- update
[DEPRECATED] Updates a monitor for the specified table. Use Data Quality Monitors API instead
REPLACE databricks_workspace.catalog.quality_monitors
SET
output_schema_name = '{{ output_schema_name }}',
baseline_table_name = '{{ baseline_table_name }}',
custom_metrics = '{{ custom_metrics }}',
dashboard_id = '{{ dashboard_id }}',
data_classification_config = '{{ data_classification_config }}',
inference_log = '{{ inference_log }}',
latest_monitor_failure_msg = '{{ latest_monitor_failure_msg }}',
notifications = '{{ notifications }}',
schedule = '{{ schedule }}',
slicing_exprs = '{{ slicing_exprs }}',
snapshot = '{{ snapshot }}',
time_series = '{{ time_series }}'
WHERE
table_name = '{{ table_name }}' --required
AND deployment_name = '{{ deployment_name }}' --required
AND output_schema_name = '{{ output_schema_name }}' --required
RETURNING
dashboard_id,
baseline_table_name,
drift_metrics_table_name,
output_schema_name,
profile_metrics_table_name,
table_name,
assets_dir,
custom_metrics,
data_classification_config,
inference_log,
latest_monitor_failure_msg,
monitor_version,
notifications,
schedule,
slicing_exprs,
snapshot,
status,
time_series;
DELETE examples
- delete
[DEPRECATED] Deletes a monitor for the specified table. Use Data Quality Monitors API instead
DELETE FROM databricks_workspace.catalog.quality_monitors
WHERE table_name = '{{ table_name }}' --required
AND deployment_name = '{{ deployment_name }}' --required
;
Lifecycle Methods
- cancel_refresh
- regenerate_dashboard
[DEPRECATED] Cancels an already-initiated refresh job. Use Data Quality Monitors API instead
EXEC databricks_workspace.catalog.quality_monitors.cancel_refresh
@table_name='{{ table_name }}' --required,
@refresh_id='{{ refresh_id }}' --required,
@deployment_name='{{ deployment_name }}' --required
;
[DEPRECATED] Regenerates the monitoring dashboard for the specified table. Use Data Quality Monitors
EXEC databricks_workspace.catalog.quality_monitors.regenerate_dashboard
@table_name='{{ table_name }}' --required,
@deployment_name='{{ deployment_name }}' --required
@@json=
'{
"warehouse_id": "{{ warehouse_id }}"
}'
;