SELF-SERVICE FOR THE HADOOP DATA LAKE

Organizations are deploying Hadoop data lakes to provide unprecedented access to data for data science and analytics. However, unlike the data warehouse, Hadoop isn’t “pasteurized.” Getting data into Hadoop is easy, but getting it out in a way that is easily and securely consumed is hard.

The problem is how does Big Data IT enable enterprise self-service, when the Hadoop advantages of frictionless ingest, flexible schema on read, and lack of data governance, make self-service an almost insurmountable challenge? There are three challenges to face when moving from small Hadoop projects to the data lake: finding the right data in the cluster, understanding the data, and governing the data.  

This paper offers a solution to the challenge of self-service for Hadoop data.

Request Free!