Managing Data in the Lakehouse
Managing Data in the Lakehouse
“Data Lakehouse” is a new architecture paradigm in the data management space that combines the best characteristics of Data Warehouse and Data Lakes.
Join the DZone community and get the full member experience.Join For Free
Introduction to Data Lakehouse
“Data Lakehouse” is a new architecture paradigm in the data management space that combines the best characteristics of Data Warehouse and Data Lakes. Once you load the data into a data lake, there is no need to load the data into a warehouse for additional analysis or business intelligence. You can directly query the data residing in cheaper but highly reliable storage, often termed as “Object Stores”, thus reducing the operational overhead on data pipelines.
Key Features of a Data Lakehouse
At a high-level, a Data Lakehouse has the following characteristics –
- Transaction Support
- Schema enforcement and Governance
- Support for BI tools
- Storage should be decoupled from Compute
- Support the latest storage formats
- Support for API access
- Support for both structured and unstructured data
- Support for streaming data
Challenges in the Current Approach
Today, increasing numbers of companies are building their new data warehouses or data lakes in the cloud. Or, they’re consolidating and modernizing their on-premises data warehouses or data lakes to run in the cloud. However, they do not see the first time to value from their investment. This is usually attributed to lack of proper toolsets for data integration, data quality, data governance, and metadata management or they rely on hand-coding efforts for such requirements.
Hand coding data management projects typically start with the tools bundled with the cloud provider’s platform-as-a-service (PaaS) or infrastructure-as-a-service (IaaS). Hand coding may be suitable for prototyping and training but is difficult to maintain, not reusable. If you change or upgrade the technology, platform, or processing engine, you have to reengineer and recode it all over again which makes is expensive and risky.
Using multiple products that are not integrated to achieve the entire data management is often risky and complex. Similarly relying on limited solutions from cloud vendors has its downside as they are basic. Cloud Data Management requires a multi-cloud strategy and deployment model.
The Solution: Informatica Cloud Lakehouse Data Management
Informatica Cloud Lakehouse Data Management is the industry’s only enterprise-class, cloud-native, end-to-end data management solution for lake houses – as well as data warehouses and data lakes – in the cloud.
Built on Informatica Intelligent Cloud Services (IICS), the industry’s most advanced enterprise iPaaS (Integration Platform as a Service), the Informatica Cloud Lakehouse Data Management Solution combines best-of-breed data integration, data quality, and metadata management.
IICS is a modern, modular, multi-cloud, microservices-based, API-drive, AI-powered integration platform as a service (iPaaS). IICS supports all the leading cloud platforms (Amazon, Microsoft, Snowflake, Databricks, and Google).
The solution encompasses three main pillars –
- Data Integration
- Cloud Data Integration
- Cloud Data Integration-elastic
- Cloud Mass Ingestion
- Data Quality
- Cloud Data Quality
- Metadata Management
- Enterprise Data Catalog
IICS Cloud Data Integration provides pre-built cloud-native connectivity to virtually any type of data, whether multi-cloud or on-premises so you can rapidly ingest and integrate all types of data.
Cloud Data Integration Elastic provides serverless-based Spark processing for increased scalability and capacity on demand.
IICS Cloud Mass Ingestion enables you to ingest data at scale from a variety of sources, including files, databases, change data capture, and streaming of real-time data.
IICS Cloud Data Quality provides cloud-native capabilities so you can take a holistic approach to your data quality needs ensuring that your data warehouse has data that is cleansed, standardized, trusted, and secure.
Your data management starts with finding and cataloging all your data, and Informatica Enterprise Data Catalog does this for your data assets and their relationships. It enables intelligent, automated, end-to-end visibility, and data lineage across your environment.
Opinions expressed by DZone contributors are their own.