Data Lake vs. Data Warehouse - what strategy is working at your organization?

1.5k views2 Upvotes11 Comments

Director of IT in Software, 201 - 500 employees
Data Warehouse has been working for us over the years. Data Lake is something that I am interested in implementing down the road.
Chief Information Technology Officer in Manufacturing, 501 - 1,000 employees
Data warehousing.
CIO in Education, 1,001 - 5,000 employees
Yes. We are successfully using both.
2 2 Replies
VP of IT Business Systems in Software, 1,001 - 5,000 employees

Howard, We are also planning to leverage both as there is a lot of self-service reporting as well. We should connect sometime on this.  

CIO in Education, 1,001 - 5,000 employees

sounds good.

Chief Information Officer in Software, 11 - 50 employees
We utilize both, but preference is Data Lake..

As in all solutions, it depends upon how they are being used, what problem domains you are attempting to solve, the competency of the staff using the technologies, and the overall cost to benefit ratio.
2 5 Replies
VP of IT Business Systems in Software, 1,001 - 5,000 employees

 Problem domain is business apps including CRM, ERP etc. We have data lake as well today, but it is lacking governance and structure. We are finding that self-service model is working but users have to spend a lot of time massaging the data in order to report.

Which solution you are leveraging today?  

Chief Information Officer in Software, 11 - 50 employees

I would not even try to make a recommendation without a good working knowledge of your platform and/or problem domains. It all depends upon the size and scale of the Data Lake you are envisioning, how it is to be used, and what you are attempting to accomplish. 

As in everything else, one size does not fit all and I have no idea about the size, scope, and overall CRM /ERP solutions you are looking to attack or address, not to mention overall budget. :)

I guess my main question is are these large Enterprise CRM and ERP solutions like Peoplesoft or SAP?  Or are they smaller type CRM and ERP solutions like Dynamics 365, ePROMIS, SalesForce, etc.? Are they currently Cloud based or are they residing on legacy hardware in a true HA or non HA (hot or passive  standby) based environments?

In the past, for large scale fortune 500-1000 type Enterprise solutions, we have used both Databricks and Cloudera based solutions. Also some experience with IBM Data Lake for SAP and Peoplesoft migrations.

If you have already employed some type of earlier Hadoop repository,  Databricks has some pretty good migration and management tools. Cloudera began as a managed Hadoop offering so they are pretty good about migrating earlier Hadoop and Spark stacks. If you are using a Cloud provider like AWS, Azure, or Google, you might want to spec and/or compare their Data Lake solution pricing with the other major Data Lake solutions (two or which are mentioned above). 

For smaller Data Lakes (like where I am now), we base it off of a more "roll your own" solution.  We are not ready to scale to a traditional Hadoop type repo, so we are primarily basing our solution off DataStax Cassandra. We are not handling copious amounts of unstructured data; -- our primary data (outside of corporate and MDM type data) is IoT based using message based protocols and communication layers between our K8 based microservices. These are hosted on our private clouds (sister corporation). 

There are some litigious type requirements around this data, but our primary objective for a Data Lake is based upon reporting, analytics, and predictive analysis based upon the IoT data we are collecting across several business verticals (with some overlap and cross cutting business domains).  

Our data warehouse solution is handling most of our internal business apps which are relatively small, but save us money on compliance and retention requirements and provide us with cost and performance enhancements through archival opportunities. We have to transfer part of this corp data into our IoT core solutions based upon contractual agreements, billing, and SLAs, but not for much else.

Best of luck in finding the right solution. Shop around and if you are selecting a vendor,  I strongly suggest  having your senior tech staff get on a couple of calls with the Vendors and let them walk you through demos of the environment(s)  under consideration [preferably a demo that is as close as possible to your problem domain). 

Ask the Vendors for architectural blueprints, overall tech stacks, and of course business success stories. If you have the latitude and budget for a prelim PoC (after selecting a Vendor), I highly recommend it. 

I sincerely hope this helps and does not further muddy the waters. :)

If you are using AWS today, a good solution to augment with Snowflake.

Of course

Chief Information Officer in Software, 11 - 50 employees

Wish there was some way to edit content. Skip those last two lines, they are not suppose to be part of my reply. 

Content you might like

Exclusively via organization-managed desktops, laptops, and mobile devices (phones and tablets)39%

Via a hybrid of organization-managed AND employee owned desktops, laptops, and mobile devices (phones and tablets)51%

Exclusively via employee owned desktops, laptops, and mobile devices (phones and tablets)6%



1.8k views2 Upvotes

CTO in Software, 201 - 500 employees
Without a doubt - Technical Debt! It's a ball and chain that creates an ever increasing drag on any organization, stifles innovation, and prevents transformation.
Read More Comments
42.4k views131 Upvotes319 Comments