Client 

Returns:

Client for the newly created workspace.

static for_workspace(workspace_url=None, service_account_token=None, region='us-east-1', dasl_host=None)[source]

Create a client for the argument workspace, if specified, or the current workspace if running in databricks notebook context.

Parameters:

workspace_url (Optional[str]) – The full base URL of the Databricks workspace being registered. If you omit this value, it will be inferred if you are running within a Databricks notebook. Otherwise, an exception will be raised.
service_account_token (Optional[str]) – Antimatter service account token. If provided, the client will use this token for auth instead of (automatic) secret-based auth.
region (str) – The name of the DASL region.
dasl_host (Optional[str]) – The URL of the DASL server. This value should not generally be specified. When specified, this value overrides region.

Return type:

Returns:

Client for the existing workspace.

static new_or_existing(admin_email, app_client_id, service_principal_id, service_principal_secret, workspace_url=None, service_account_token=None, region='us-east-1', dasl_host=None)[source]

Initialize a new client for the workspace associated with the argument Databricks workspace_url. If no such workspace exists, one will be created for you.

Parameters:

admin_email (str) – The email address associated with the (DASL) workspace admin, if the workspace will be created. Ignored if the workspace already exists.
app_client_id (str) – The Databricks app connection client ID to use for authentication calls related to the workspace. If the workspace already exists, the existing config will be updated to use this client ID.
service_principal_id (str) – The ID of the Databricks service principal that will interact with Databricks on your behalf. If the workspace already exists, the existing config will be updated to use this service principal ID.
service_principal_secret (str) – An OAuth secret that entitles the service principal to make Databricks API calls on your behalf. If the workspace already exists, the existing config will be updated to use this service principal secret.
workspace_url (Optional[str]) – The full base URL of the Databricks workspace being registered. If you omit this value, it will be inferred if you are running within a Databricks notebook. Otherwise, an exception will be raised. If the workspace already exists, the existing config will be updated to use this value.
service_account_token (Optional[str]) – Antimatter service account token. If provided, the client will use this token for auth instead of (automatic) secret-based auth. Ignored if the workspace doesn’t exist.
region (str) – The name of the DASL region.
dasl_host (Optional[str]) – The URL of the DASL server. This value should not generally be specified. When specified, this value overrides region.

Return type:

Returns:

Client for the newly created or existing workspace.

get_admin_config()[source]

Retrieve the AdminConfig from the DASL server. Note that the service principal secret will be redacted server side, so if you plan to make changes and issue a request using put_admin_config, you will need to repopulate the service_principal_secret correctly before passing the result back to put_admin_config.

Return type:: AdminConfig
Returns:: AdminConfig containing the current settings.

put_admin_config(config)[source]

Update the AdminConfig stored in the DASL server. See the AdminConfig docs for details about its contents.

Parameters:: config (AdminConfig) – AdminConfig to replace the existing. Note that the service principal credentials will be verified server side before the request is accepted.
Return type:: None

get_config()[source]

Retrieve the WorkspaceConfig from the DASL server. The returned value can be updated directly and passed to put_config in order to make changes.

Return type:: WorkspaceConfig
Returns:: WorkspaceConfig containing the current configuration.

put_config(config_in)[source]

Update the WorkspaceConfig stored in the DASL server. See the WorkspaceConfig docs for dtails about its contents.

Parameters:: config_in (WorkspaceConfig) – WorkspaceConfig to replace the existing.
Return type:: None
Returns:: WorkspaceConfig. Note that the returned value is a clone of config_in and may not be precisely equal to the originally passed value.

get_datasource(name)[source]

Get the DataSource with the argument name from the DASL server. The returned value can be updated directly and passed to update_datasource in order to make changes.

Parameters:: name (str) – The unique name of the DataSource within this workspace
Return type:: DataSource
Returns:: DataSource

delete_datasource(name)[source]

Delete the DataSource with the argument name from the DASL server. The DataSource will not necessarily be deleted immediately as the server will dispatch background tasks to clean up any allocated resources before actually deleting the resource, so it may take some time before its name is available for reuse.

Parameters:: name (str) – The unique name of the DataSource within this workspace
Return type:: None

list_datasources(cursor=None, limit=None)[source]

List the DataSources in this workspace. Each yielded DataSource contains all fields in the DataSource as if it were fetched using the get_datasource method.

Parameters:

cursor (Optional[str]) – The ID of a DataSource. If specified, the results will contain DataSources starting (lexically) directly after this DataSource. If not specified, then the results will begin with the lexically least DataSource.
limit (Optional[int]) – The maximum number of DataSources to yield. If there are fewer than this number of DataSources beginning directly after cursor, then all such DataSources will be yielded. If not specified, then all DataSources starting directly after cursor will be returned.

Yields DataSource:

One DataSource at a time in lexically increasing order

Return type:

Iterator[DataSource]

create_datasource(name, ds_in)[source]

Create a new DataSource. The chosen name must be unique for your workspace, and cannot refer to a DataSource that already exists and has not been deleted. See the documentation for delete_datasource as there are some caveats around name reuse.

Parameters:

name (str) – The unique name of this DataSource in the workspace.
ds_in (DataSource) – The specification of the DataSource to create. See the documentation for the DataSource type for more details.

Returns DataSource:

Note that the returned value is a clone of ds_in and may not be precisely equal to the originally passed value.

Return type:

replace_datasource(name, ds_in)[source]

Replace an existing DataSource. The name must refer to a DataSource that already exists in your workspace.

Parameters:

name (str) – The name of the existing DataSource to replace.
ds_in (DataSource) – The specification of the DataSource taking the place of the existing DataSource.

Returns DataSource:

Note that the returned value is a clone of ds_in and may not be precisely equal to the originally passed value.

Return type:

get_rule(name)[source]

Get the Rule with the argument name from the DASL server. The returned value can be updated directly and passed to update_rule in order to make changes.

Parameters:: name (str) – The unique name of the Rule within this workspace
Return type:: Rule
Returns:: Rule

delete_rule(name)[source]

Delete the Rule with the argument name from the DASL server. The Rule will not necessarily be deleted immediately as the server will dispatch background tasks to clean up any allocated resources before actually deleting the resource, so it may take some time before its name is available for reuse.

Parameters:: name (str) – The unique name of the Rule within this workspace
Return type:: None

list_rules(cursor=None, limit=None)[source]

List the Rules in this workspace. Each yielded Rule contains all fields in the Rule as if it were fetched using the get_rule method.

Parameters:

cursor (Optional[str]) – The ID of a Rule. If specified, the results will contain DataSources starting (lexically) directly after this Rule. If not specified, then the results will begin with the lexically least Rule.
limit (Optional[int]) – The maximum number of Rules to yield. If there are fewer than this number of Rules beginning directly after cursor, then all such Rules will be yielded. If not specified, then all Rules starting directly after cursor will be returned.

Yields Rule:

One Rule at a time in lexically increasing order.

Return type:

Iterator[Rule]

create_rule(name, rule_in)[source]

Create a new Rule. The chosen name must be unique for your workspace, and cannot refer to a Rule that already exists and has not been deleted. See the documentation for delete_rule as there are some caveats around name reuse.

Parameters:

name (str) – The unique name of this Rule in the workspace.
rule_in (Rule) – The specification of the Rule to create. See the documentation for the Rule type for more details.

Returns Rule:

Note that the returned value is a clone of rule_in and may not be precisely equal to the originally passed value.

Return type:

replace_rule(name, rule_in)[source]

Replace an existing Rule. The name must refer to a Rule that already exists in your workspace.

Parameters:

name (str) – The name of the existing Rule to replace.
rule_in (Rule) – The specification of the Rule taking the place of the existing Rule.

Returns Rule:

Note that the returned value is a clone of rule_in and may not be precisely equal to the originally passed value.

Return type:

exec_rule(spark, rule_in)[source]

Locally execute a Rule. Must be run from within a Databricks notebook or else an exception will be raised. This is intended to facilitate Rule development.

Parameters:

spark – Spark context from Databricks notebook. Will be injected into the execution environment for use by the Rule notebook.
rule_in (Rule) – The specification of the Rule to execute.

Returns ExecRule:

A class containing various information and functionality relating to the execution. See the docs for ExecRule for additional details, but note that you must call its cleanup function or tables created just for this request will leak.

Return type:

ExecRule

adhoc_transform(warehouse, request, timeout=datetime.timedelta(seconds=300))[source]

Run a sequence of ADHOC transforms against a SQL warehouse to mimic the operations performed by a datasource.

Parameters:

warehouse (str) – The warehouse ID to run the transforms against.
request (TransformRequest) – The request containing the transforms to run.
timeout (timedelta)

Return type:

TransformResponse

Returns:

a TransformResponse object containing the results after running the transforms.

Raises:

NotFoundError if the rule does not exist

Raises:

Exception for a server-side error or timeout

get_observable_events(warehouse, kind, value, cursor=None, limit=None)[source]

Get the observable events associated with a specific field and value.

Parameters:

warehouse (str) – The warehouse id to perform the operation on
kind (str) – The observable kind
value (str) – The observable value
cursor (Optional[str]) – A cursor to be used when paginating results
limit (Optional[int]) – A limit of the number of results to return

Return type:

EventsList

Returns:

EventsList

list_presets()[source]

List the Presets in this workspace. This will include any user defined presets if a custom presets path has been configured in the workspace.

Return type:: DataSourcePresetsList
Returns:: DataSourcePresetsList

get_preset(name)[source]

Get the preset with the argument name from the DASL server. If the preset name begins with ‘internal_’ it will instead be collected from the user catalog, provided a preset path is set in the workspace config.

Parameters:: name (str) – The unique name of the DataSource preset within this workspace.
Return type:: DataSourcePreset
Returns:: DataSourcePreset

purge_preset_cache()[source]

Purge the datasource cache presets. This will cause the DASL workspace to fetch presets from provided sources.

Return type:: None

generate_query(sql, warehouse=None, start_date=None, end_date=None)[source]

Generate a query from the given SQL.

Parameters:

sql (str) – The SQL to use to create the query data set.
warehouse (Optional[str]) – The SQL warehouse use to execute the SQL. If omitted, the default SQL warehouse specified in the workspace config will be used.
start_date (Optional[str]) – The optional starting date to filter by for the provided sql used to create the data set. Only rows with their time column (see the time_col parameter) greater than or equal to this value will be included in the data set. You must specify a value for this parameter if you wish to filter by time. Valid values include actual timestamps and computed timestamps (such as now()).
end_date (Optional[str]) – The optional ending date to filter by for the provided sql used to create the data set. The same caveats apply as with the start_time parameter. However, this parameter is not required and if omitted when a start_date is provided, the current date will be used.

Returns str:

The ID of the query generation operation. This value can be used with get_query_status to track the progress of the generation process, and eventually to perform lookups on the completed query.

Return type:

str

get_query_status(id)[source]

Check the status of a query generation operation. Since generation happens in the background, it is up to the caller to check the status until the return value’s status member is either equal to “succeeded” or “failed”.

Parameters:: id (str) – The id of the query generation operation.
Returns DbuiV1QueryGenerateStatus:: The imporant field is status (as used in the example code).
Return type:: DbuiV1QueryGenerateStatus

The following example demonstrates usage of the API.

Example: id = client.generate_query(“SELECT now() as time”) result = None while True:

time.sleep(3) status = client.get_query_status(id) if status.status == “failed”:

raise Exception(“query failed”)

if status.status == “succeeded”:
break

query_lookup(id, warehouse=None, pagination=None, start_value=None, row_count=None, refinements=None)[source]

Perform a lookup on a query, which applies refinements to the query and returns the results.

Parameters:

id (str) – The query ID returned from query_generate and get_query_status.
warehouse (Optional[str]) – The optional SQL warehouse ID to use to compute the results. If not specified, uses the default SQL warehouse configured for the workspace.
pagination (Optional[DbuiV1QueryLookupRequestPagination]) – A sequence of fields and a direction that can be applied to a lookup request. If ‘fetchPreceding’ is true, the prior n rows up to the first row that matches the provided fields will be returned. Otherwise, the n rows following the first row that matches the provided fields will be returned.
start_value (Optional[str]) – An optional start value to constrain the data being returned. This will be applied to the primary ordering column if provided, before any refinements.
row_count (Optional[int]) – The maximum number of rows to include in a page. Defaults to 1000, and must be in the range [1,1000].
refinements (Optional[List[str]]) – Pipeline filters to be applied to the result. Any SQL which is valid as a pipeline stage (i.e. coming between |> symbols) is valid here, such as ORDER BY id, or WHERE column = ‘value’.

Return type:

DbuiV1QueryLookupResult

query_histogram(id, interval, warehouse=None, start_date=None, end_date=None, refinements=None)[source]

Perform a lookup on a query, which applies refinements to the query and returns the results.

Parameters:

id (str) – The query ID returned from query_generate and get_query_status.
warehouse (Optional[str]) – The optional SQL warehouse ID to use to compute the results. If not specified, uses the default SQL warehouse configured for the workspace.
start_date (str) – The start date filter. The resulting frequency map will be restricted to rows where the time column value is greater than or equal to this value. Valid values include literal timestamps and function calls such as now().
end_date (Optional[str]) – The optional end date filter. If specified, the resulting frequency map will contain only rows where the time column value is less than or equal to this value.
interval (str) – The duration of each interval in the resulting frequency map. This must be an interval string in the format: ‘1 day’, ‘3 minutes 2 seconds’, ‘2 weeks’.
refinements (Optional[List[str]]) – Pipeline filters to be applied to the result. Any SQL which is valid as a pipeline stage (i.e. coming between |> symbols) is valid here, such as ORDER BY id, or WHERE column = ‘value’.

Return type:

DbuiV1QueryHistogramResult

query_cancel(id)[source]

Cancel an existing query.

Parameters:: id (str) – The query ID returned from query_generate and get_query_status.
Return type:: None

Authentication Factory Methods

The client provides a convenient factory method for authentication within a Databricks notebook:

static Client.for_workspace(workspace_url=None, service_account_token=None, region='us-east-1', dasl_host=None)[source]

Create a client for the argument workspace, if specified, or the current workspace if running in databricks notebook context.

Parameters:

workspace_url (Optional[str]) – The full base URL of the Databricks workspace being registered. If you omit this value, it will be inferred if you are running within a Databricks notebook. Otherwise, an exception will be raised.
service_account_token (Optional[str]) – Antimatter service account token. If provided, the client will use this token for auth instead of (automatic) secret-based auth.
region (str) – The name of the DASL region.
dasl_host (Optional[str]) – The URL of the DASL server. This value should not generally be specified. When specified, this value overrides region.

Return type:

Returns:

Client for the existing workspace.

Datasource Operations

Methods for managing datasources:

Client.list_datasources(cursor=None, limit=None)[source]

List the DataSources in this workspace. Each yielded DataSource contains all fields in the DataSource as if it were fetched using the get_datasource method.

Parameters:

cursor (Optional[str]) – The ID of a DataSource. If specified, the results will contain DataSources starting (lexically) directly after this DataSource. If not specified, then the results will begin with the lexically least DataSource.
limit (Optional[int]) – The maximum number of DataSources to yield. If there are fewer than this number of DataSources beginning directly after cursor, then all such DataSources will be yielded. If not specified, then all DataSources starting directly after cursor will be returned.

Yields DataSource:

One DataSource at a time in lexically increasing order

Return type:

Iterator[DataSource]

Client.get_datasource(name)[source]

Get the DataSource with the argument name from the DASL server. The returned value can be updated directly and passed to update_datasource in order to make changes.

Parameters:: name (str) – The unique name of the DataSource within this workspace
Return type:: DataSource
Returns:: DataSource

Client.create_datasource(name, ds_in)[source]

Create a new DataSource. The chosen name must be unique for your workspace, and cannot refer to a DataSource that already exists and has not been deleted. See the documentation for delete_datasource as there are some caveats around name reuse.

Parameters:

name (str) – The unique name of this DataSource in the workspace.
ds_in (DataSource) – The specification of the DataSource to create. See the documentation for the DataSource type for more details.

Returns DataSource:

Note that the returned value is a clone of ds_in and may not be precisely equal to the originally passed value.

Return type:

Client.replace_datasource(name, ds_in)[source]

Replace an existing DataSource. The name must refer to a DataSource that already exists in your workspace.

Parameters:

name (str) – The name of the existing DataSource to replace.
ds_in (DataSource) – The specification of the DataSource taking the place of the existing DataSource.

Returns DataSource:

Note that the returned value is a clone of ds_in and may not be precisely equal to the originally passed value.

Return type:

Client.delete_datasource(name)[source]

Delete the DataSource with the argument name from the DASL server. The DataSource will not necessarily be deleted immediately as the server will dispatch background tasks to clean up any allocated resources before actually deleting the resource, so it may take some time before its name is available for reuse.

Parameters:: name (str) – The unique name of the DataSource within this workspace
Return type:: None

Rule Operations

Methods for managing rules:

Client.list_rules(cursor=None, limit=None)[source]

List the Rules in this workspace. Each yielded Rule contains all fields in the Rule as if it were fetched using the get_rule method.

Parameters:

cursor (Optional[str]) – The ID of a Rule. If specified, the results will contain DataSources starting (lexically) directly after this Rule. If not specified, then the results will begin with the lexically least Rule.
limit (Optional[int]) – The maximum number of Rules to yield. If there are fewer than this number of Rules beginning directly after cursor, then all such Rules will be yielded. If not specified, then all Rules starting directly after cursor will be returned.

Yields Rule:

One Rule at a time in lexically increasing order.

Return type:

Iterator[Rule]

Client.get_rule(name)[source]

Get the Rule with the argument name from the DASL server. The returned value can be updated directly and passed to update_rule in order to make changes.

Parameters:: name (str) – The unique name of the Rule within this workspace
Return type:: Rule
Returns:: Rule

Client.create_rule(name, rule_in)[source]

Create a new Rule. The chosen name must be unique for your workspace, and cannot refer to a Rule that already exists and has not been deleted. See the documentation for delete_rule as there are some caveats around name reuse.

Parameters:

name (str) – The unique name of this Rule in the workspace.
rule_in (Rule) – The specification of the Rule to create. See the documentation for the Rule type for more details.

Returns Rule:

Note that the returned value is a clone of rule_in and may not be precisely equal to the originally passed value.

Return type:

Client.replace_rule(name, rule_in)[source]

Replace an existing Rule. The name must refer to a Rule that already exists in your workspace.

Parameters:

name (str) – The name of the existing Rule to replace.
rule_in (Rule) – The specification of the Rule taking the place of the existing Rule.

Returns Rule:

Note that the returned value is a clone of rule_in and may not be precisely equal to the originally passed value.

Return type: