Medallion Architecture Layer - Bronze

Medallion Architecture Layer - Bronze


The Bronze Layer is the raw data repository within the Medallion Architecture, serving as the initial landing point for all data ingested from external source systems. This layer applies minimal or "just-enough" transformations, typically converting the source format into a compatible format (like Delta Lake) while maintaining the original schema "as-is" alongside ingestion metadata (load time, process ID). Its primary purpose is data preservation, acting as a trustworthy historical archive to establish foundational data lineage and ensure auditability.



Failure Prevention: It solves the lack of replay or recovery capabilities often encountered in data systems where raw data retention is short (e.g., Kafka retention periods of 24 hours to 7 days).

Utility/Auditability: The Bronze Layer provides the historical archive and the first stable point of data lineage, which is essential for auditability, compliance, and tracing the origin of any data downstream.

Flexibility: It enables future architectural agility by allowing engineers to reprocess the raw data later to support new use cases or modified data requirements without having to re-read the data from the original source system.





The Bronze layer is where we land all the data from external source systems. The table structures in this layer correspond to the source system table structures "as-is," along with any additional metadata columns that capture the load date/time, process ID, etc. The focus in this layer is quick Change Data Capture and the ability to provide an historical archive of source (cold storage), data lineage, auditability, reprocessing if needed without rereading the data from the source system.



Source: https://delta.io/pdfs/dldg_databricks.pdfPage: 202 • Key: leeDeltaLakeDefinitive