Data Democratization: Stage 1, Drain the (data) swamp
Updated: Apr 12, 2021
The proliferation of applications (legacy, cloud), IOTs, databases, unstructured files (images, videos, etc.) has created an enormous gap between data creation and the ability to value out of that data. Hence the enterprises started to store data on low-cost data lake storage. Without a proper way to catalog, clean, and prep such data, it is impossible to get value out of creating such storage. Some teams embark on hiring big teams to curate these vast piles of data using various data governance tools, but those efforts continue to fall short. Thus the swamp of structured and unstructured data continues to get bigger and bigger. This results in a huge execution gap between data creation and data analysis.
There need to be automated and intelligent ways to label, clean, and curate data, hence drain the data swamp. Data cleansing and governance tools should offer and employ rule-based artificial intelligence techniques for data cleansing and curation. The good news is that with the advent of cloud technologies, access to sophisticated machine learning algorithms is getting easy with very little upfront investment. The enterprise data governance team can experiment with various machine learning techniques (supervised or unsupervised learning) to bridge the execution gap and drain the data swamp.