Are data lakes a good way to manage big data?
Sort by:
Depends on the size of data, source of data and format. Data lakes are great for a large volume of unstructured data, coming from multiple sources and with various formats. It can ingest a large amount of data quickly and can quickly adjust to the data generation stream. If you have such a use case, then Data Lake will be a good option for you. Now the important thing here is that Data Lake just aggregate large amounts of data from multiple sources and it’s a cost-effective repository, but it does not give any intelligence natively from the data, you need to apply data science to get something useful out of it.
Data lakes are critical for XDR and SIEM to perform current and historical correlation and analysis.
The data lake is just a beginning towards modern data management. The important thing is how you curate your lake, making it apt for data analysis. Every modern data management initiatives need to fulfill the demand from three sections of users - Data Analyst, Data Scientist, and Business users. The objective is to make the data usable and available with proper segmentation and security.
Data Lakes can be very beneficial for entities that need to increase operational efficiencies and innovation but if there is no oversight or no purpose of the contents then it can create increased compute costs, complexity and data integrity loss.
Agreed. Must have a viable and flexible architecture and plan around short, mid, and long term operations and growth, coupled with the various types, categorization, and cross functional operations on the data collection, persistence, access, and retention process. <br><br>process).<br><br>
a principle of innovation is to avoid the next logical step. that is incrementalism. i think we should avoid data lakes or other designs that create yet another copy of data and another way to spend money on storing data. we should look to leverage AI snd ML to use data in place, place only the data needed, where it is needed when it is needed. consider ALL the data at all sites a global data environment that you can use and analyze without making copies.