Databricks
This guide walks you through connecting a Databricks workspace to Roundtable so the AI can query your data using the query_databricks tool.
Prerequisites
Before you begin, make sure you have:
- An active Databricks workspace (on AWS, Azure, or GCP).
- A Databricks SQL warehouse or all-purpose cluster with SQL access.
- Permissions to generate a personal access token in Databricks.
- Admin access to your Roundtable organization (to create connections).
Step 1: Gather Your Databricks Credentials
You'll need four values from your Databricks workspace:
Host
Your Databricks workspace URL without the https:// prefix.
- Find it in your browser's address bar when logged into Databricks.
- Example:
dbc-a1b2c3d4-e5f6.cloud.databricks.com
HTTP Path
The connection path for your SQL warehouse or cluster.
- In Databricks, go to SQL Warehouses (or Compute for all-purpose clusters).
- Click your warehouse/cluster.
- Open the Connection Details tab.
- Copy the HTTP Path value.
- Example:
/sql/1.0/warehouses/abcdef1234567890
- Example:
Catalog
The Unity Catalog name that contains your data.
- Example:
mainoranalytics
If your Databricks workspace doesn't use Unity Catalog, you can leave this field as hive_metastore (the legacy default) or set it to match your metastore configuration.
Personal Access Token
- In Databricks, click your user avatar in the top-right corner.
- Select Settings.
- Go to Developer → Access Tokens.
- Click Generate New Token.
- Enter a comment (e.g.,
Roundtable read access) and set an expiration. - Click Generate and copy the token.
The token is only shown once. Copy it immediately and store it securely. If you lose it, you'll need to generate a new one.
Step 2: Add the Connection in Roundtable
- Go to your Roundtable organization's Settings → Connections.
- Click Add Connection.
- Select Databricks as the connection type.
- Fill in the required fields:
| Field | Description | Example |
|---|---|---|
| Connection Name | A friendly label for this connection | Production Databricks |
| Host | Databricks workspace URL (without https://) | dbc-a1b2c3d4-e5f6.cloud.databricks.com |
| HTTP Path | SQL warehouse or cluster connection path | /sql/1.0/warehouses/abcdef1234567890 |
| Catalog | Unity Catalog name | analytics |
| Token | Personal access token | dapi•••••••••••••••• |
- Click Save.
Step 3: Test the Connection
After saving, click the Test Connection button. Roundtable will:
- Connect to your Databricks workspace using the provided host and token.
- Verify that the SQL warehouse is reachable via the HTTP path.
- Confirm that the catalog exists and the token has query permissions.
Common test failures and their fixes:
| Error | Cause | Fix |
|---|---|---|
Authentication failed | Invalid or expired token | Generate a new personal access token |
Could not connect to host | Incorrect host URL | Verify the workspace URL in your browser |
HTTP Path not found | Wrong path or warehouse is stopped | Check the path in Connection Details; start the warehouse |
Catalog not found | Incorrect catalog name | Verify the catalog name in Databricks Data Explorer |
Make sure your Databricks SQL warehouse is running before testing the connection. Serverless warehouses may auto-start, but classic warehouses need to be started manually.
Step 4: Attach to a Workspace
- Open the workspace where you want Databricks access.
- Go to Settings → Connections.
- Select your Databricks connection from the dropdown.
- Click Save.
The query_databricks tool is now active in this workspace.
Example: Querying Your Data
Once the connection is live, workspace members can ask the AI to query Databricks in natural language:
User:
List all tables in the analytics catalog and show me the top 10 rows from the events table.
AI response (using query_databricks):
The AI will:
- Run a discovery query:
SHOW TABLES IN analytics.default;
- Then fetch sample data:
SELECT *FROM analytics.default.eventsLIMIT 10;
- Return the table list and sample rows in a formatted summary.
Another example:
User:
What's the conversion rate from signup to first purchase by channel this quarter?
AI response:
SELECT
signup_channel,
COUNT(DISTINCT s.user_id) AS signups,
COUNT(DISTINCT p.user_id) AS purchasers,
ROUND(COUNT(DISTINCT p.user_id) * 100.0 / COUNT(DISTINCT s.user_id), 2) AS conversion_rate
FROM analytics.default.signups s
LEFT JOIN analytics.default.purchases p
ON s.user_id = p.user_id
AND p.purchase_date >= s.signup_date
AND p.purchase_date <= DATE_ADD(s.signup_date, 90)
WHERE s.signup_date >= DATE_TRUNC('QUARTER', CURRENT_DATE())
GROUP BY signup_channel
ORDER BY conversion_rate DESC;
Include your catalog structure and key table descriptions in the workspace's system prompt so the AI can write accurate queries without needing to explore the schema first.