Data Lake
Definition
A data lake is a centralized storage repository that holds raw data in its native format until needed for analytics. Unlike data warehouses that store structured, pre-processed data, data lakes accept structured, semi-structured, and unstructured data at any scale. Data lakes typically run on object storage (S3, Azure Blob) or HDFS, with query engines like Apache Spark, Presto/Trino, or Databricks processing data on demand.
