Managing Data in the Lakehouse

DZone 's Guide to

Managing Data in the Lakehouse

“Data Lakehouse” is a new architecture paradigm in the data management space that combines the best characteristics of Data Warehouse and Data Lakes.

· Big Data Zone ·
Free Resource

Introduction to Data Lakehouse

“Data Lakehouse” is a new architecture paradigm in the data management space that combines the best characteristics of Data Warehouse and Data Lakes. Once you load the data into a data lake, there is no need to load the data into a warehouse for additional analysis or business intelligence. You can directly query the data residing in cheaper but highly reliable storage, often termed as “Object Stores”, thus reducing the operational overhead on data pipelines.

high-level data

Key Features of a Data Lakehouse

At a high-level, a Data Lakehouse has the following characteristics –

  • Transaction Support
  • Schema enforcement and Governance
  • Support for BI tools
  • Storage should be decoupled from Compute
  • Support the latest storage formats
  • Support for API access
  • Support for both structured and unstructured data
  • Support for streaming data

Challenges in the Current Approach

Today, increasing numbers of companies are building their new data warehouses or data lakes in the cloud. Or, they’re consolidating and modernizing their on-premises data warehouses or data lakes to run in the cloud. However, they do not see the first time to value from their investment. This is usually attributed to lack of proper toolsets for data integration, data quality, data governance, and metadata management or they rely on hand-coding efforts for such requirements.

Hand coding data management projects typically start with the tools bundled with the cloud provider’s platform-as-a-service (PaaS) or infrastructure-as-a-service (IaaS). Hand coding may be suitable for prototyping and training but is difficult to maintain, not reusable. If you change or upgrade the technology, platform, or processing engine, you have to reengineer and recode it all over again which makes is expensive and risky.

Using multiple products that are not integrated to achieve the entire data management is often risky and complex. Similarly relying on limited solutions from cloud vendors has its downside as they are basic. Cloud Data Management requires a multi-cloud strategy and deployment model. 

cloud data warehouse

The Solution: Informatica Cloud Lakehouse Data Management

Informatica Cloud Lakehouse Data Management is the industry’s only enterprise-class, cloud-native, end-to-end data management solution for lake houses – as well as data warehouses and data lakes – in the cloud.

Built on Informatica Intelligent Cloud Services (IICS), the industry’s most advanced enterprise iPaaS (Integration Platform as a Service), the Informatica Cloud Lakehouse Data Management Solution combines best-of-breed data integration, data quality, and metadata management.

IICS is a modern, modular, multi-cloud, microservices-based, API-drive, AI-powered integration platform as a service (iPaaS). IICS supports all the leading cloud platforms (Amazon, Microsoft, Snowflake, Databricks, and Google).

cloud services

The solution encompasses three main pillars –

  • Data Integration
    1. Cloud Data Integration
    2. Cloud Data Integration-elastic
    3. Cloud Mass Ingestion
  • Data Quality
    1. Cloud Data Quality
  • Metadata Management
    1. Enterprise Data Catalog

Data Integration

IICS Cloud Data Integration provides pre-built cloud-native connectivity to virtually any type of data, whether multi-cloud or on-premises so you can rapidly ingest and integrate all types of data. 

Cloud Data Integration Elastic provides serverless-based Spark processing for increased scalability and capacity on demand.

IICS Cloud Mass Ingestion enables you to ingest data at scale from a variety of sources, including files, databases, change data capture, and streaming of real-time data.

Data Quality

IICS Cloud Data Quality provides cloud-native capabilities so you can take a holistic approach to your data quality needs ensuring that your data warehouse has data that is cleansed, standardized, trusted, and secure.

Metadata Management

Your data management starts with finding and cataloging all your data, and Informatica Enterprise Data Catalog does this for your data assets and their relationships. It enables intelligent, automated, end-to-end visibility, and data lineage across your environment.

big data, data lake, managing data

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}