OverArchitected

February 2026

Nick Karpov & Holly Smith

Welcome to OverArchitected with Nick & Holly — web edition!

Every month we pick our favorite Databricks features and try to shoehorn them into one architecture. This page recaps each feature we covered — with the episode video ↘ queued up to jump between sections as you scroll.

For the web edition, each feature summary is augmented by Claude Opus 4.6 from the raw episode transcript, and each one is followed by unfiltered Director’s Commentary.

In this month's episode

01New Connectors in LakeFlow Connect GA / Beta 02Databricks Assistant — Skills, Agent Mode, and Docs Public Preview 03forEachBatch() in Spark Declarative Pipelines Public Preview 04Foundation Models — Claude 4.5, GPT 5.2, Haiku GA 05Delta Sharing to Iceberg Clients Public Preview 06Knowledge Assistant is GA GA 07Lakebase Autoscaling Public Preview

01 New Connectors in LakeFlow Connect GA / Beta

What is it? Eight new connectors for LakeFlow Connect: Microsoft Dynamics 365, Jira, Confluence, Salesforce (now with incremental loads and formula field support), Meta Ads, Excel file reading, NetSuite, and PostgreSQL. These are fully managed, no-code ingestion pipelines that land data directly into Delta tables.

Is it for you? If you're currently running custom scripts or third-party ETL tools to get data from any of these sources into Databricks, this replaces all of that. Set up a connection, pick your tables, and data flows automatically with checkpointing and error handling.

Try it Navigate to your workspace, go to Data Ingestion > LakeFlow Connect, and create a new connection. OAuth setup takes about 5 minutes for most connectors.

📄 Read the docs

Jira connector (beta) Confluence connector (beta) Microsoft Dynamics 365 (public preview) Salesforce incremental loads PostgreSQL connector (public preview) Meta Ads (beta) Read Excel files (beta) NetSuite connector

Director's Commentary (Nick)

Connectors are basically your lifeblood. Without them your workspace is DOA. It's hard to believe I spent years of my life in the field implementing exactly these types of connectors, which can now be deployed with just a few clicks. We're iterating insanely fast here and this release just goes to show that. I don't expect any slowdown anytime soon! If you don't see a connector for your service... make some noise and we'll build it.

02 Databricks Assistant — Skills, Agent Mode, and Docs Public Preview

What is it? Three updates. The Assistant now lives on docs.databricks.com as a chat interface. Agent Mode lets the Assistant interact directly with your workspace (run queries, create clusters, etc.) and is in public preview. And skills — an open standard from Anthropic — let you bundle instructions, scripts, and context into folders that the agent discovers progressively instead of loading everything at once.

Is it for you? Agent Mode is for anyone who wants to operate their workspace conversationally. Skills are for teams who want to encode their specific workflows, patterns, and tribal knowledge so the Assistant can learn them incrementally.

Try it Open any notebook and click the Assistant icon. For Agent Mode, toggle it on in the Assistant panel. For skills, create a .skills/ folder in your repo with a SKILLS.md file describing available capabilities.

📄 Read the docs

Create skills for Databricks Assistant Assistant on the documentation site Assistant Agent Mode (public preview)

Director's Commentary (Nick)

All the assistants. EVERYWHERE. Databricks assistant is taking over every surface area of the product and I love it. There's still some hard lines between this assistant and that assistant, but I think we all see where this is going, so, expect many more updates here in the coming months. For now, asking the assistant to "create a skill based on what we've done here" at the end of any piece of work is an absolute must.

As for me? I'm a CLI/TUI person and I'm still on the fence about all the tooling in the harness world. MCP servers hosting tools, SKILLS.md, AGENTS.md, CLAUDE.md... it's exhausting. My personal favorite way to work right now is to use an extremely lightweight harness (Pi agent!!) with as few tools as possible with an extremely capable model (the latest, duh). This forces the model to use bash + CLI tool. This... works insanely well. The "are CLIs good enough?" debate continues to rage on X. I ~think~ yes, but it's anyones guess how things evolve.

03 forEachBatch() in Spark Declarative Pipelines Public Preview

What is it? Spark Declarative Pipelines (formerly DLT) now support forEachBatch() — giving you micro-batch control over how data is written. This unlocks custom sinks (JDBC, REST APIs), MERGE INTO operations, and any processing logic that needs to happen per-batch rather than per-record.

Is it for you? If you've been stuck on classic Structured Streaming because SDP couldn't handle your write pattern (merges, upserts, external sinks), this is your ticket in. If your pipelines are already working with simple append/overwrite, you don't need this.

Try it In your SDP pipeline definition, use the new sink parameter with a forEachBatch function. The syntax mirrors classic Structured Streaming's forEachBatch API.

📄 Read the docs

forEachBatch for SDP (public preview)

Director's Commentary (Nick)

This was actually my very first complaint when I saw DLT. After years in the field abusing forEachBatch() for every and any use case that didn't "fit the mold", to suddenly lose that ability in DLT was basically a non starter. If you've been stuck on classic Structured Streaming because of quirky SDP edges, it's time to take another look. This product has seriously matured.

04 Foundation Models — Claude 4.5, GPT 5.2, Haiku GA

What is it? Three new models on the Foundation Model API: Claude Opus 4.5, Claude Haiku 4.5 (both Anthropic), and GPT 5.2 (OpenAI). All available as Databricks-hosted endpoints — meaning you get inference tables, AI Gateway, PII detection, and usage tracking without any extra setup.

Is it for you? If you're building AI features on Databricks (agents, RAG, structured extraction), these are the models to use. Hosting them through Databricks means your data stays in your environment and every call is logged and governable.

Try it Available immediately in the AI Playground — no setup needed. For production use, enable AI Gateway (beta preview) on a foundation model endpoint to get inference tables, PII detection, rate limiting, and usage tracking.

📄 Read the docs

Claude Opus 4.5 Claude Haiku 4.5 OpenAI GPT 5.2

Director's Commentary (Nick)

I'm backfilling this content so I'm writing this FROM THE FUTURE MARTY! Aka. 4.5 and 5.2 are OLD NEWS. Codex 5.3 and Opus/Sonnet 4.6 are absolutely bonkers good. Start using. Check out our March episode for the latest.

05 Delta Sharing to Iceberg Clients Public Preview

What is it? Delta tables can now be shared to external consumers as Iceberg-formatted data via Delta Sharing. Any Iceberg-compatible client (Snowflake, Trino, Spark on another platform) can read your shared tables without needing Databricks.

Is it for you? If you need to share data with partners or teams that aren't on Databricks but use Iceberg-compatible tools, this eliminates the need for data export/copy workflows. Your data stays in place; they read it live.

Try it Enable Iceberg compatibility on your table (ALTER TABLE SET TBLPROPERTIES ('delta.universalFormat.enabledFormats' = 'iceberg')), then add it to a Delta Share. Note: deletion vectors must be disabled on the table.

📄 Read the docs

Delta Sharing to external Iceberg clients (public preview)

Director's Commentary (Nick)

I worked very closely on the Delta project for a few years. All I can say is I'm sick of discerning the difference between these things. Delta? Iceberg? Diceberg? Icelta? Who cares. Databricks supports ACID like tables on top of blob storage. That's the point. This feature, among many others, is just nailing the format wars coffin shut.

06 Knowledge Assistant is GA GA

What is it? The Knowledge Assistant in Agent Bricks is now generally available. It's a no-code RAG chatbot — point it at Unity Catalog files, volumes, or a vector search index, and it builds a question-answering agent with full citations (page numbers, source excerpts). Deployable as a serving endpoint.

Is it for you? If your team has documentation, manuals, policies, or knowledge bases that people constantly ask questions about, this turns them into a searchable, conversational interface in minutes. No ML expertise required.

Try it Go to Machine Learning > Agent Bricks, select Knowledge Assistant, upload your documents or point to a UC volume, and deploy. You'll have a working chatbot with citations in under 10 minutes.

📄 Read the docs

Agent Bricks Knowledge Assistant GA

Director's Commentary (Nick)

As much as I'm obsessed with coding harnesses of late, I can't overlook how big this feature is. Anybody can point and click a smart knowledge assistant in a few minutes. So, either let your users build their own knowledge assistants, or spend the 5 minutes it takes to create one and expose it to them ASAP. We have a lot of demos now like Casper's Kitchens that demonstrate exactly what the video shows. Try it!!

07 Lakebase Autoscaling Public Preview

What is it? Lakebase — Databricks-managed PostgreSQL — now supports autoscaling in public preview, including scale-to-zero. No more manually choosing capacity units. It also ships with ACL support for fine-grained access control and read-write access from the SQL editor.

Is it for you? If you need an operational database for serving applications, APIs, or low-latency lookups on top of your lakehouse data, Lakebase is the native answer. Autoscaling means you pay for what you use and don't manage infrastructure.

Try it In your workspace, go to SQL > Lakebase and create a new instance. It provisions in seconds. Connect from any Postgres-compatible client or use the built-in SQL editor.

📄 Read the docs

Lakebase autoscaling (public preview) Lakebase SQL editor read-write access Lakebase autoscaling ACL support

Director's Commentary (Nick)

OLTP on Databricks is something many users have wanted for a very long time. I've been waiting for it since the first time I wanted to serve a Delta table live from Databricks. Now with a simple few clicks (or one API call) I can synchronize my data directly into application databases. Lakebase is serverless, autoscaling, and instant... frankly it exceeds what I could have imagined years ago. My favorite feature is git style branching. This means I can basically use production data directly in my dev environment at almost no cost. The bigger picture around Lakebase: Databricks is now totally end to end. Data platform, apps, operational data stores... the future is building on Databricks.

Next →OverArchitected: March 2026