The Dior moment for data

Published November 22, 2021 |

Team Crayon Data

As the Second World War intensified, women were asked to fill in for the jobs that were usually done by men (who were now on the battlefield). Some of them were physically hard jobs. This rapidly enhanced a trend of trousers and slimmer skirts for women. These outfits were easier to manage in the workplace.

As the postwar regime settled by around 1947, French designer Christian Dior saw an opportunity. He wanted to make a grand return to the fashion of the 1860s. His antidote: bring back the gentleness and femininity to women’s attire. He retained the functionality and comfort, no doubt. But he pioneered a trend that dominated the second half of 20th century.

Data experienced a similar evolution.

A few years ago, data was choked and rationed out in trickles with monthly data warehouses. Suddenly, the cloud and data lakes came along. If you’re wondering what the difference is:

A data lake is a vast pool of raw data with no restrictions on formats, can be expanded at minimal cost.

A data warehouse is a repository for structured, filtered data from restricted formats that is purpose built and has prohibitive costs to expand.

The freedom on formats & costs had a revolutionary impact on what gets stored and used. Organizations started storing unstructured data. They were able to unlock use cases that were unimaginable in the warehouse era. This created a mad race of moving into cloud, as firms saw the cost saves and the democratization of data.

Understandably, financial services were the last to join the race. Banks and insurance firms have a lot more credibility to lose. The cloud had to prove its security & reliability over time. Till then, information security teams treated data upload into cloud as if the doors of the bank were opened and data was being dumped onto the footpath. Cut to the end of 2021, there is hardly a bank that is not already on cloud or has a data lake roadmap.

Data lakes brought about a lot of advantages in terms of

– Cost of storage & processing

– Separating the write and read to happen in parallel

– Supporting disparate data formats and data tools

While these advantages are for real, firms were suddenly dealing with multiple data puddles that didn’t recognize other puddles within the same lake. (A data puddle is a single-purpose or single-project data mart built using big data technology.) The reason being banks started hunting for ‘use-cases’ that will use the data lake. Instead, they should have been viewing it as a one stop shop that will deliver all their data needs over the next 10 years.

Data lakes need to have some of the old warehousing properties like

– Data quality and schema enforcement

– Support for ACID transactions (atomic, consistent, isolated & durable)

– Indexation with enhanced accessibility

The ‘Dior moment’ is to bring back these old-fashioned concepts, while retaining the functionalities of a cloud. A lot of new-age firms are now mastering the concept of a ‘lake house’. It is meant to endure the concepts of warehousing in a data lake. Visualization, analytics and even application data can get powered up and enhanced with a lake house. In fact, it can bring in a lot of reliability to Machine Learning algos as well.

There are a million ways of positioning this concept as flakes or bricks. But the fact is that we cannot throw away some of the core concepts that were pioneered with the introduction of data warehouses. Cloud is the future for sure, but some of the old-fashioned features will return to rule!