quality_monitors

Creates, updates, deletes, gets or lists a quality_monitors resource.

Overview

Name	`quality_monitors`
Type	Resource
Id	`databricks_workspace.catalog.quality_monitors`

Fields

The following fields are returned by SELECT queries:

Name	Datatype	Description
`dashboard_id`	`string`	[Create:ERR Update:OPT] Id of dashboard that visualizes the computed metrics. This can be empty if the monitor is in PENDING state.
`baseline_table_name`	`string`	[Create:OPT Update:OPT] Baseline table name. Baseline data is used to compute drift from the data in the monitored `table_name`. The baseline table and the monitored table shall have the same schema.
`drift_metrics_table_name`	`string`	[Create:ERR Update:IGN] Table that stores drift metrics data. Format: `catalog.schema.table_name`.
`output_schema_name`	`string`
`profile_metrics_table_name`	`string`	[Create:ERR Update:IGN] Table that stores profile metrics data. Format: `catalog.schema.table_name`.
`table_name`	`string`	[Create:ERR Update:IGN] UC table to monitor. Format: `catalog.schema.table_name`
`assets_dir`	`string`	[Create:REQ Update:IGN] Field for specifying the absolute path to a custom directory to store data-monitoring assets. Normally prepopulated to a default user location via UI and Python APIs.
▶`custom_metrics`	`array`	[Create:OPT Update:OPT] Custom metrics.
▶`data_classification_config`	`object`	[Create:OPT Update:OPT] Data classification related config.
▶`inference_log`	`object`
`latest_monitor_failure_msg`	`string`	[Create:ERR Update:IGN] The latest error message for a monitor failure.
`monitor_version`	`integer`	[Create:ERR Update:IGN] Represents the current monitor configuration version in use. The version will be represented in a numeric fashion (1,2,3...). The field has flexibility to take on negative values, which can indicate corrupted monitor_version numbers.
▶`notifications`	`object`	[Create:OPT Update:OPT] Field for specifying notification settings.
▶`schedule`	`object`	[Create:OPT Update:OPT] The monitor schedule.
`slicing_exprs`	`array`	[Create:OPT Update:OPT] List of column expressions to slice data with for targeted analysis. The data is grouped by each expression independently, resulting in a separate slice for each predicate and its complements. For example `slicing_exprs=[“col_1”, “col_2 > 10”]` will generate the following slices: two slices for `col_2 > 10` (True and False), and one slice per unique value in `col1`. For high-cardinality columns, only the top 100 unique values by frequency will generate slices.
`snapshot`	`object`	Configuration for monitoring snapshot tables.
`status`	`string`	[Create:ERR Update:IGN] The monitor status. (MONITOR_STATUS_ACTIVE, MONITOR_STATUS_DELETE_PENDING, MONITOR_STATUS_ERROR, MONITOR_STATUS_FAILED, MONITOR_STATUS_PENDING)
▶`time_series`	`object`	Configuration for monitoring time series tables.

Methods

The following methods are available for this resource:

Name	Accessible by	Required Params	Description
`get`	`select`	`table_name`, `deployment_name`	[DEPRECATED] Gets a monitor for the specified table. Use Data Quality Monitors API instead
`create`	`insert`	`table_name`, `deployment_name`, `output_schema_name`, `assets_dir`	[DEPRECATED] Creates a new monitor for the specified table. Use Data Quality Monitors API instead
`update`	`replace`	`table_name`, `deployment_name`, `output_schema_name`	[DEPRECATED] Updates a monitor for the specified table. Use Data Quality Monitors API instead
`delete`	`delete`	`table_name`, `deployment_name`	[DEPRECATED] Deletes a monitor for the specified table. Use Data Quality Monitors API instead
`cancel_refresh`	`exec`	`table_name`, `refresh_id`, `deployment_name`	[DEPRECATED] Cancels an already-initiated refresh job. Use Data Quality Monitors API instead
`regenerate_dashboard`	`exec`	`table_name`, `deployment_name`	[DEPRECATED] Regenerates the monitoring dashboard for the specified table. Use Data Quality Monitors

Parameters

Parameters can be passed in the WHERE clause of a query. Check the Methods section to see which parameters are required or optional for each operation.

Name	Datatype	Description
`deployment_name`	`string`	The Databricks Workspace Deployment Name (default: dbc-abcd0123-a1bc)
`refresh_id`	`integer`	int
`table_name`	`string`	UC table name in format `catalog.schema.table_name`. This field corresponds to the {full_table_name_arg} arg in the endpoint path.

`SELECT` examples

[DEPRECATED] Gets a monitor for the specified table. Use Data Quality Monitors API instead

SELECT
dashboard_id,
baseline_table_name,
drift_metrics_table_name,
output_schema_name,
profile_metrics_table_name,
table_name,
assets_dir,
custom_metrics,
data_classification_config,
inference_log,
latest_monitor_failure_msg,
monitor_version,
notifications,
schedule,
slicing_exprs,
snapshot,
status,
time_series
FROM databricks_workspace.catalog.quality_monitors
WHERE table_name = '{{ table_name }}' -- required
AND deployment_name = '{{ deployment_name }}' -- required
;

`INSERT` examples

create
Manifest

[DEPRECATED] Creates a new monitor for the specified table. Use Data Quality Monitors API instead

INSERT INTO databricks_workspace.catalog.quality_monitors (
output_schema_name,
assets_dir,
baseline_table_name,
custom_metrics,
data_classification_config,
inference_log,
latest_monitor_failure_msg,
notifications,
schedule,
skip_builtin_dashboard,
slicing_exprs,
snapshot,
time_series,
warehouse_id,
table_name,
deployment_name
)
SELECT 
'{{ output_schema_name }}' /* required */,
'{{ assets_dir }}' /* required */,
'{{ baseline_table_name }}',
'{{ custom_metrics }}',
'{{ data_classification_config }}',
'{{ inference_log }}',
'{{ latest_monitor_failure_msg }}',
'{{ notifications }}',
'{{ schedule }}',
{{ skip_builtin_dashboard }},
'{{ slicing_exprs }}',
'{{ snapshot }}',
'{{ time_series }}',
'{{ warehouse_id }}',
'{{ table_name }}',
'{{ deployment_name }}'
RETURNING
dashboard_id,
baseline_table_name,
drift_metrics_table_name,
output_schema_name,
profile_metrics_table_name,
table_name,
assets_dir,
custom_metrics,
data_classification_config,
inference_log,
latest_monitor_failure_msg,
monitor_version,
notifications,
schedule,
slicing_exprs,
snapshot,
status,
time_series
;

# Description fields are for documentation purposes
- name: quality_monitors
props:
  - name: table_name
    value: "{{ table_name }}"
    description: Required parameter for the quality_monitors resource.
  - name: deployment_name
    value: "{{ deployment_name }}"
    description: Required parameter for the quality_monitors resource.
  - name: output_schema_name
    value: "{{ output_schema_name }}"
    description: |
      [Create:REQ Update:REQ] Schema where output tables are created. Needs to be in 2-level format {catalog}.{schema}
  - name: assets_dir
    value: "{{ assets_dir }}"
    description: |
      [Create:REQ Update:IGN] Field for specifying the absolute path to a custom directory to store data-monitoring assets. Normally prepopulated to a default user location via UI and Python APIs.
  - name: baseline_table_name
    value: "{{ baseline_table_name }}"
    description: |
      [Create:OPT Update:OPT] Baseline table name. Baseline data is used to compute drift from the data in the monitored `table_name`. The baseline table and the monitored table shall have the same schema.
  - name: custom_metrics
    description: |
      [Create:OPT Update:OPT] Custom metrics.
    value:
      - name: "{{ name }}"
        definition: "{{ definition }}"
        input_columns: "{{ input_columns }}"
        output_data_type: "{{ output_data_type }}"
        type: "{{ type }}"
  - name: data_classification_config
    description: |
      [Create:OPT Update:OPT] Data classification related config.
    value:
      enabled: {{ enabled }}
  - name: inference_log
    description: |
      :param latest_monitor_failure_msg: str (optional) [Create:ERR Update:IGN] The latest error message for a monitor failure.
    value:
      problem_type: "{{ problem_type }}"
      timestamp_col: "{{ timestamp_col }}"
      granularities:
        - "{{ granularities }}"
      prediction_col: "{{ prediction_col }}"
      model_id_col: "{{ model_id_col }}"
      label_col: "{{ label_col }}"
      prediction_proba_col: "{{ prediction_proba_col }}"
  - name: latest_monitor_failure_msg
    value: "{{ latest_monitor_failure_msg }}"
  - name: notifications
    description: |
      [Create:OPT Update:OPT] Field for specifying notification settings.
    value:
      on_failure:
        email_addresses:
          - "{{ email_addresses }}"
      on_new_classification_tag_detected:
        email_addresses:
          - "{{ email_addresses }}"
  - name: schedule
    description: |
      [Create:OPT Update:OPT] The monitor schedule.
    value:
      quartz_cron_expression: "{{ quartz_cron_expression }}"
      timezone_id: "{{ timezone_id }}"
      pause_status: "{{ pause_status }}"
  - name: skip_builtin_dashboard
    value: {{ skip_builtin_dashboard }}
    description: |
      Whether to skip creating a default dashboard summarizing data quality metrics.
  - name: slicing_exprs
    value:
      - "{{ slicing_exprs }}"
    description: |
      [Create:OPT Update:OPT] List of column expressions to slice data with for targeted analysis. The data is grouped by each expression independently, resulting in a separate slice for each predicate and its complements. For example `slicing_exprs=[“col_1”, “col_2 > 10”]` will generate the following slices: two slices for `col_2 > 10` (True and False), and one slice per unique value in `col1`. For high-cardinality columns, only the top 100 unique values by frequency will generate slices.
  - name: snapshot
    value: "{{ snapshot }}"
    description: |
      Configuration for monitoring snapshot tables.
  - name: time_series
    description: |
      Configuration for monitoring time series tables.
    value:
      timestamp_col: "{{ timestamp_col }}"
      granularities:
        - "{{ granularities }}"
  - name: warehouse_id
    value: "{{ warehouse_id }}"
    description: |
      Optional argument to specify the warehouse for dashboard creation. If not specified, the first running warehouse will be used.

`REPLACE` examples

update

[DEPRECATED] Updates a monitor for the specified table. Use Data Quality Monitors API instead

REPLACE databricks_workspace.catalog.quality_monitors
SET 
output_schema_name = '{{ output_schema_name }}',
baseline_table_name = '{{ baseline_table_name }}',
custom_metrics = '{{ custom_metrics }}',
dashboard_id = '{{ dashboard_id }}',
data_classification_config = '{{ data_classification_config }}',
inference_log = '{{ inference_log }}',
latest_monitor_failure_msg = '{{ latest_monitor_failure_msg }}',
notifications = '{{ notifications }}',
schedule = '{{ schedule }}',
slicing_exprs = '{{ slicing_exprs }}',
snapshot = '{{ snapshot }}',
time_series = '{{ time_series }}'
WHERE 
table_name = '{{ table_name }}' --required
AND deployment_name = '{{ deployment_name }}' --required
AND output_schema_name = '{{ output_schema_name }}' --required
RETURNING
dashboard_id,
baseline_table_name,
drift_metrics_table_name,
output_schema_name,
profile_metrics_table_name,
table_name,
assets_dir,
custom_metrics,
data_classification_config,
inference_log,
latest_monitor_failure_msg,
monitor_version,
notifications,
schedule,
slicing_exprs,
snapshot,
status,
time_series;

`DELETE` examples

delete

[DEPRECATED] Deletes a monitor for the specified table. Use Data Quality Monitors API instead

DELETE FROM databricks_workspace.catalog.quality_monitors
WHERE table_name = '{{ table_name }}' --required
AND deployment_name = '{{ deployment_name }}' --required
;

Lifecycle Methods

cancel_refresh
regenerate_dashboard

[DEPRECATED] Cancels an already-initiated refresh job. Use Data Quality Monitors API instead

EXEC databricks_workspace.catalog.quality_monitors.cancel_refresh 
@table_name='{{ table_name }}' --required, 
@refresh_id='{{ refresh_id }}' --required, 
@deployment_name='{{ deployment_name }}' --required
;

[DEPRECATED] Regenerates the monitoring dashboard for the specified table. Use Data Quality Monitors

EXEC databricks_workspace.catalog.quality_monitors.regenerate_dashboard 
@table_name='{{ table_name }}' --required, 
@deployment_name='{{ deployment_name }}' --required 
@@json=
'{
"warehouse_id": "{{ warehouse_id }}"
}'
;

Overview​

Fields​

Methods​

Parameters​

SELECT examples​

INSERT examples​

REPLACE examples​

DELETE examples​

Lifecycle Methods​