“A data lake is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data. The data structure and requirements are not defined until the data is needed.”

James Dixon, the founder and CTO of Pentaho, although what makes this wisdom is that it’s been repeated.

… whereas a data warehouse, has structure and a schema!

Datalake

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.