The open lakehouse: warehouse guarantees, lake cost, formats that are yours.
Kavuka Lakehouse brings warehouse guarantees — ACID, schema and time travel — onto cheap object storage, in Delta Lake or Apache Iceberg, with the Medallion architecture organizing quality from raw to business. One source of truth for ETL, BI, ML and AI — deployed by those who operate petabytes.
- ACID + time travel
- on cheap object storage
- Delta & Iceberg
- open formats, no lock-in
- Medallion
- bronze · silver · gold
- Petabytes
- native scale in production
The architecture comes from an operator, not a slide: GUÉP's own infrastructure runs at petabyte scale, processing billions of fiscal documents — with TCO modeled by the discipline of those who pay their own bills.
Your warehouse charges for your success. And your data is its hostage.
The bill that scales with volume
The proprietary warehouse bill grows with every terabyte of success, and lock-in turns every future decision into a ransom negotiation.
Two systems and the pipeline that breaks
Duplicated lake and warehouse pay for storage twice and live tied by fragile copy pipelines — the plumbing that consumes the data team.
The BI number that doesn't match the model
With truth split across systems, the dashboard diverges from the model and the board decides on numbers no one can reconcile.
Cost The dual architecture charges three times — duplicated storage, the copy pipeline that breaks and divided truth (the BI number ≠ the model number). And the proprietary warehouse charges a fourth: the bill that scales with success and the lock-in that scales with the bill.
From modeled TCO to a governed platform, no big bang.
- 01
Model
The real TCO — your current scenario against the lakehouse, in cloud, bare metal or hybrid — with the discipline of those who pay their own bills.
- 02
Architect
Format (Delta or Iceberg), engine and Medallion chosen for your case — not our interest. Independence is part of the offer.
- 03
Migrate
Incremental: workloads move by priority and ROI, the warehouse shrinks as the lakehouse takes over — without the leap of faith of a big bang.
- 04
Operate
The governed platform — or assisted operation by those who do this in-house, at petabyte scale, every day.
The platform behind a single source of truth
A layer of open transactional tables over cheap object storage — and everything ETL, BI, ML and AI need to consume the same governed data.
Open formats
Delta Lake and/or Apache Iceberg, no lock-in
Medallion architecture
Bronze → silver → gold, traceable quality
Processing engines
Batch and streaming on the same platform
BI and SQL on the lake
The warehouse without the warehouse, over one source
ML and AI
Features, training and serving with no parallel copy
Unified governance
Catalog, permissions, lineage and quality
Cloud, bare metal or hybrid
On-premises and sovereignty the clouds don't prioritize
Modeled TCO
Current bill vs the lakehouse, before the first byte
Who migrates to Kavuka Lakehouse
Paying too much for the warehouse
Companies with a proprietary warehouse bill scaling with volume: migration with TCO modeled for real.
Dual architectures
Separate lake + warehouse tied by copy pipelines: the unification that ends the plumbing and the divided truth.
AI projects in production
Training and serving that demand a single governed base — AI without the parallel copy that diverges from BI.
Volumes and regulation
Hundreds of terabytes to petabytes, and those who need on-premises or hybrid: our native scale.
The protection of an architecture that is yours
In the lakehouse, anti-lock-in is not a marketing promise: it is a property of the format. Data stays in open, auditable tables, the engine becomes a choice, and operator credibility replaces the consultant's slide.
- Open, auditable formats (Delta Lake, Apache Iceberg): the data is yours, with no platform-vendor lock-in.
- Catalog, lineage and permissions in a single ruler for data and AI — governance that does not fragment across systems.
- The credential of our own petabytes: infrastructure processing billions of fiscal documents, not a proof-of-concept pilot.
- Documented TCO: CAPEX, OPEX and the comparison against your current warehouse delivered before the decision — the business case ready for the board.
- Deployment in cloud, on-premises or hybrid as your sovereignty and regulation require — not as the vendor prefers.
We left the proprietary warehouse without a big bang: workloads migrated by priority and the bill dropped as the lakehouse took over.
The BI number started matching the model because it is finally the same data. The board stopped arguing which report to trust.
Hiring those who operate petabytes in-house is a different conversation: the TCO came modeled, not estimated on a slide. We decided with the number in hand.
Bring your current data bill.
We return the compared lakehouse TCO — with the architecture designed and the business case ready for the board.
- For businesses only. No purchase commitment.
- Data used solely for commercial contact.
- Enterprise leads answered within 1 business day.
What a lakehouse is and how to migrate to it
The lakehouse is the architecture that unified the data lake and the data warehouse. Instead of keeping two duplicated systems — the cheap but unguaranteed lake, and the reliable but expensive and closed warehouse — the lakehouse adds a layer of open transactional tables on top of cheap object storage. With the Delta Lake and Apache Iceberg formats, it brings to the lake what was once exclusive to the warehouse: ACID transactions, schema evolution and enforcement, time travel (the ability to query data as it was at any point in time) and performance optimizations. The result is a single source of truth serving ETL, BI, machine learning and generative AI — without the fragile pipelines copying data between systems.
The internal organization of the lakehouse follows the Medallion pattern, today the canonical way to structure data quality in progressive layers. The bronze layer holds the faithful, traceable raw data — the reprocessable source of truth. The silver layer delivers clean, validated, deduplicated and conformed data: this is where entity resolution lives, reconciling records of the same customer or company scattered across sources. The gold layer delivers curated data, ready for business consumption — the dashboards, the models and the reports for the board. Quality grows layer by layer, always with the traceability to return to the origin of any number.
Delta Lake or Iceberg? The choice depends on the ecosystem, not on allegiance. Delta is at its best in the Spark and Databricks world; Iceberg is the standard for multi-engine portability — open REST catalog and support from Trino, Flink, Dremio and BigQuery — and interoperability between the two grows with each release (UniForm, Iceberg v3), pointing toward ecosystem unification. The market has consolidated the lakehouse as the standard architecture of the decade: Databricks defined the category, Snowflake opened up to Iceberg and Microsoft brought the model to Fabric. The argument the CTO most wants to hear is the anti-lock-in of open formats; the business case that unlocks migration is the TCO against the proprietary warehouse.
Migrating does not require a big bang. The right approach is incremental: first you model the real TCO — your current scenario against the lakehouse, in cloud, bare metal or hybrid — then you choose format, engine and Medallion organization for the concrete case, and then workloads move by priority and ROI, while the warehouse shrinks as the lakehouse takes over. Open formats guarantee that no future decision is held hostage by a vendor. Kavuka's edge in the local market is operator credibility: GUÉP does not design the lakehouse on a slide — it operates its own at petabyte scale, processing billions of fiscal documents, with CAPEX, OPEX and TCO studies made for its own decisions. When you bring your current data bill, we return the compared TCO, with the architecture designed — the business case ready for the board.
What is a lakehouse?
It is the architecture that adds a layer of open transactional tables (Delta Lake, Apache Iceberg) over cheap object storage — bringing warehouse guarantees to the lake (ACID, schema, time travel) and serving ETL, BI, ML and AI from a single source, with no duplicated systems.
Delta or Iceberg?
It depends on the ecosystem: Delta is at its best in the Spark/Databricks world; Iceberg is the multi-engine portability standard (Trino, Flink, Dremio, BigQuery) with an open REST catalog — and interoperability between the two grows with each release. We choose for your case, not by allegiance: independence is part of the offer.
What is the Medallion architecture?
It is the pattern of progressive layers: bronze (the faithful raw data, the reprocessable source of truth), silver (clean, validated, deduplicated and conformed — where entity resolution lives) and gold (curated for business consumption). Quality that grows with traceability.
Why GUÉP to deploy this?
Because we operate what we sell: our own infrastructure at petabyte scale, processing billions of fiscal documents — with CAPEX, OPEX and TCO studies made for our own decisions. The architecture comes from an operator, not a slide.
Can I leave my current warehouse without a big bang?
Yes — migration is incremental: workloads move by priority and ROI, the warehouse shrinks as the lakehouse takes over, and open formats guarantee that no future decision is held hostage. The compared TCO shows the path before the first byte.
What is the difference between a data lake, a warehouse and a lakehouse?
A data lake is cheap, flexible storage but without transactional guarantees. A warehouse is reliable and performant but expensive and closed. The lakehouse unifies the two: warehouse guarantees (ACID, schema, time travel) over the cost and openness of the lake — one platform instead of two.
Does the lakehouse run on-premises or only in the cloud?
It runs where your sovereignty and regulation require: cloud, on-premises or hybrid. The on-premises scenario is precisely the one the big clouds don't prioritize — and the one we operate in-house, on GUÉP's own infrastructure.
Let's talk
Your next high-impact decision starts with the right data.
Talk to a GUÉP specialist and find where applied intelligence creates the most value in your operation.