BIG DATA ENTEPRISE SOLUTION
Data Lakehouse combining benefits of structured Data Warehouse with power and flexibility of cloud Data Lake
Provide a modern cloud analytics platform in Microsoft Azure, which fulfills multiple purposes: Provide a reporting platform in traditional sense (Data Warehouse) with a flexible and scalable solution (Data Lake), based on Databricks platform.
Main goals for the project are:
- Create a central Data Lake, providing low-cost storage for archiving structured, semi-structured and non-structured data
- Decouple storage and compute, thus enable storing much higher data volumes and at the same time increase scalability
- Facilitate much faster and easier data reloads or one-time transformations by using highly scalable compute clusters.
- Support not only daily loads, but also hourly or even near-real-time data updates
- Prepare a solution which is ready for future data science scenarios
Adopt the latest trend for enterprise reporting solutions by using “big data” technology, able to store and process much larger data volumes with much better performance than it is possible with traditional data warehouses. At same time, avoid the lack of governance and structure which is inherent for pure data lakes methodologies.
Our client benefits in following ways:
- Get all the benefits of a Data Warehouse, plus additional benefits:
- Easily possible to massively scale up and down in minutes
- Control the infrastructure cost by not paying during the times when computing is not used.
- Virtually infinite history of data thanks to very low cost of storage
- Very easy and quick to integrate new source systems
- Easy to build data science solutions using the same technology (Databricks) and directly from the Data Lakehouse
APPROACH AND EXPERTISE (conclusions)
IPS is involved in most phases and aspects of the project, from consultancy, architecture, implementation of the framework, implementation of project-related tasks, testing, system maintenance and handling of support questions.
IPS partners with architects from Microsoft and Databricks companies for architectural and consultation purposes on this project. Our client is involved too. The cooperation on this level happens both remotely and physically with occasional events organized in IPS premises in Prague, Czech Republic.
IPS has implemented following technology stack troughout the project on behalf of the customer.
- Azure Data Lake Storage (ADLS Gen2)
- Azure Databricks (ADB)
- Power BI Premium Capacity (P2)
- SharePoint, Excel
- Visual Studio, Azure DevOps, Git, PowerShell
- REST API