Are data lakes a good way to manage big data?

4.1k viewscircle icon3 Upvotescircle icon11 Comments
Sort by:
vp information technology in Consumer Goods4 years ago

a principle of innovation is to avoid the next logical step. that is incrementalism. i think we should avoid data lakes or other designs that create yet another copy of data and another way to spend money on storing data. we should look to leverage AI snd ML to use data in place, place only the data needed, where it is needed when it is needed. consider ALL the data at all sites a global data environment that you can use and analyze without making copies.

Lightbulb on1
Director of IT in Software4 years ago

Depends on the size of data, source of data and format. Data lakes are great for a large volume of unstructured data, coming from multiple sources and with various formats. It can ingest a large amount of data quickly and can quickly adjust to the data generation stream. If you have such a use case, then Data Lake will be a good option for you. Now the important thing here is that Data Lake just aggregate large amounts of data from multiple sources and it’s a cost-effective repository, but it does not give any intelligence natively from the data, you need to apply data science to get something useful out of it.

CISO in Software4 years ago

Data lakes are critical for XDR and SIEM to perform current and historical correlation and analysis.

Lightbulb on1
Vice President - Global Head of Emerging Technologies & Digital Innovation4 years ago

The data lake is just a beginning towards modern data management. The important thing is how you curate your lake, making it apt for data analysis. Every modern data management initiatives need to fulfill the demand from three sections of users - Data Analyst, Data Scientist, and Business users. The objective is to make the data usable and available with proper segmentation and security.

Lightbulb on1
Director of Technology in Government4 years ago

Data Lakes can be very beneficial for entities that need to increase operational efficiencies and innovation but if there is no oversight or no purpose of the contents then it can create increased compute costs, complexity  and data integrity loss.

Lightbulb on5 circle icon1 Reply
no title4 years ago

Agreed. Must have a viable and flexible  architecture and plan around short, mid, and long term operations and growth, coupled with the various types, categorization, and cross functional operations on the data collection, persistence, access, and retention process. <br><br>process).<br><br>

Lightbulb on1

Content you might like

High Priority: This is a critical need. We want to run AI/vector workloads on our primary transactional data without relying on a separate database or a specialized analytics add-on.

Medium Priority: This would be a valuable feature for future projects. It would allow us to innovate on our core database, but it's not an immediate requirement.71%

Low Priority: This is a nice-to-have, but not essential. Our primary focus for MySQL remains on its traditional OLTP performance and stability.

Not a Priority: This is not a good fit for MySQL. We believe vector workloads are fundamentally different and are best handled by a dedicated system, keeping our core MySQL lean.29%

Unsure / Need More Information: We are not yet clear on the performance, security, or operational impact of integrating this capability into our core transactional engine.

View Results

Developers18%

Infrastructure / Cloud Architects18%

Engineering17%

Security15%

CIO / Leadership16%

Data Science / Analytics6%

Service Desk / User Support2%

Strategy / Innovation Dept1%

Networking

Other (comment)1%

View Results