Architecture & Foundations

Your Databricks Migration Will Fail in One of Three Places. Here's the Free Tool That Fixes All Three.

Boitumelo Dikoko
Databricks migration strategy and platform modernisation support for complex legacy estates.

TL;DR: Lakebridge is a free, open-source toolkit from Databricks Labs that handles the three hardest parts of a Databricks migration: figuring out how big the job really is, translating all that legacy SQL, and proving the numbers tie out afterwards. If you’re planning a migration, spend a day with it before you write the proposal.

The part of a migration nobody budgets for

Here’s a scene you’ve probably lived through.

A client wants to move off Teradata, Oracle, Synapse, or Snowflake and onto Databricks. The business case is solid. Everyone’s excited. Then someone opens the source system and finds 80,000 lines of legacy SQL, orchestration jobs nobody documented, stored procedures written by a guy who left in 2017, and a quiet, creeping fear that when you finally cut over, the numbers won’t tie out.

The Databricks part is the easy bit. It’s everything around Databricks that eats the timeline.

That’s the part clients don’t budget for properly. And it’s exactly what Lakebridge is built to solve.

What Lakebridge actually does

Lakebridge is a Databricks Labs toolkit. Free. Open source. Backed by the Labs team.

It covers the three phases where migrations actually fail:

PhaseWhat it doesWhy it matters
AssessmentProfiles your existing warehouse and analyzes the SQLTurns guesswork into a real scope
ConversionThree different transpilers to translate your SQLGets you out of “someone has to rewrite all this” hell
ReconciliationCompares source vs. target data after migrationProves the numbers tie out before go-live

If you’re planning, scoping, or currently in the middle of a Databricks migration, this is a tool you should have opened a tab on already.

Migration assessment and delivery planning for data platform transformation work.

Phase 1: Assessment - “how bad is it, really?”

Before you quote a client or commit to a timeline, you need two numbers:

  1. The TCO savings you’ll get by moving
  2. The effort it’ll take to get there

Most teams guess. Lakebridge ships two tools to stop you guessing.

The Profiler

Connects to your existing warehouse and sizes up the workload. Table volumes, query complexity, features in use. It gives you a defensible number for “how much smaller will this be after we migrate?”

The Analyzer

Scans the actual SQL and orchestration code. Flags the constructs that will bite you during translation, the recursive CTEs, the vendor-specific functions, the weird procedural bits.

Why this matters: This is the phase consultancies underinvest in the most. Running the Lakebridge assessment up front changes the conversation with stakeholders from “trust us” to “here are the numbers.” That’s a very different meeting.

Phase 2: Conversion - “who’s going to rewrite all this?”

This is where Lakebridge gets genuinely interesting. It ships with three transpilers under one roof, and they’re not redundant. They’re complementary.

BladeBridge

The mature, battle-tested option. Broad dialect coverage and some ETL handling built in. This is what you reach for on a Teradata or Netezza job where you need predictable results.

Morpheus

Narrower dialect support today, but it has experimental dbt support. That’s a big deal if your client already lives in the dbt ecosystem.

Switch

Converts SQL and other sources directly into Databricks notebooks using large language models.

This one is the one to watch. For the long tail of weird, non-standard procedural SQL that rule-based transpilers choke on, a well-prompted LLM is often the pragmatic answer. Having it packaged as a first-class option inside an official Databricks Labs tool, rather than as a bespoke internal hack, is a genuine shift in how these migrations can be done.

Phase 3: Reconciliation - “do the numbers match?”

This is the phase that keeps data leads up at night.

It’s also the one clients quietly assume “will just work.” It doesn’t.

Here’s what actually happens during cutover:

  • Source and target systems are both live
  • Row counts drift between them
  • Nulls get coerced differently in the two engines
  • Timezone handling diverges silently
  • Finance asks why the monthly revenue number moved by 0.3%

Nobody has a good answer. Everybody panics.

Lakebridge’s Reconciler is purpose-built for this moment. It compares source and target datasets and gives you a defensible answer to the “do the numbers tie out?” question.

On regulated or commercially sensitive workloads, that’s not a nice-to-have. It’s the difference between a successful cutover and an embarrassing rollback.

What it supports (and what it doesn’t)

The supported-sources matrix is worth looking at yourself before you scope anything. Coverage is broad, but uneven across the three phases.

Here’s the quick read:

Full-path coverage (assessment + conversion + reconciliation)

  • Synapse
  • Oracle
  • Snowflake
  • MSSQL

If you’re migrating from one of these, Lakebridge has you covered end to end.

Assessment + conversion only (bring your own reconciliation)

  • Teradata
  • Netezza
  • Redshift
  • PostgreSQL

Analysis only (you can scope it, but conversion is still manual)

  • SSIS, DataStage, SAS, Alteryx
  • ADF, Oozie (orchestration)

Orchestration conversion

  • Airflow

That pattern tells you something useful: Lakebridge is honest about where it’s mature and where it’s still filling in. Treat it as a very strong starting point, not a press-the-button-and-go magic wand.

How we would use it on a real engagement

Discovery week (days 1 to 5)

Run the Profiler and Analyzer against the source environment. Use the output to produce:

  • A real TCO model
  • A complexity-weighted migration backlog

This replaces the hand-waved “it’ll take 3 to 6 months” estimate with something you can actually defend in a steering committee.

Pilot conversion (weeks 2 to 4)

Pick a representative slice of SQL. Ideally one easy, one medium, one hairy. Run it through BladeBridge and Switch side by side.

You learn two things fast:

  1. What percentage of your codebase transpiles cleanly
  2. What the human effort per job looks like for the bits that don’t

Cutover and reconciliation (ongoing)

As workloads land in Databricks, wire the Reconciler in as a gated step before anything is declared “migrated.” Nothing gets marked done until the numbers tie out.

This is the discipline that separates clean migrations from the ones that quietly rot for six months before someone notices the dashboards are wrong.

The honest takeaway

Lakebridge isn’t going to make a migration trivial. Nothing will.

What it will do is remove the three most expensive sources of ambiguity in a migration engagement:

  1. Scoping guesswork
  2. Translation toil
  3. Post-cutover uncertainty

Those three things are where margin goes to die on fixed-price projects, and where client trust gets damaged on T&M ones.

If you’re a data lead staring down a Databricks migration, or a consultancy scoping one, the honest recommendation is this:

Spend a day with Lakebridge before you write the proposal.

The assessment output alone will sharpen your numbers, and the reconciler will save you a cutover weekend or two down the line.

It’s free. It’s from Databricks Labs. There’s genuinely no reason not to.

Useful links

Planning a migration to Databricks and want a second set of eyes on the scope? Book a 30-minute architecture call.