DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Curious about the future of data-driven systems? Join our Data Engineering roundtable and learn how to build scalable data platforms.

Data Engineering: The industry has come a long way from organizing unstructured data to adopting today's modern data pipelines. See how.

Threat Detection: Learn core practices for managing security risks and vulnerabilities in your organization — don't regret those threats!

Managing API integrations: Assess your use case and needs — plus learn patterns for the design, build, and maintenance of your integrations.

Related

  • Want To Build Successful Data Products? Start With Ingestion and Integration
  • Evolving Data Strategy at Major Canadian Bank
  • Optimizing Your Data Pipeline: Choosing the Right Approach for Efficient Data Handling and Transformation Through ETL and ELT
  • Unleashing the Power of Cloud Storage With JuiceFS

Trending

  • Using Oracle Database 23AI for Generative AI RAG Implementation: Part 1
  • 4 Essential Strategies for Enhancing Your Application Security Posture
  • Cost Optimization Strategies for Managing Large-Scale Open-Source Databases
  • Wow, pnpm, You’re Really Fast
  1. DZone
  2. Data Engineering
  3. Data
  4. How To Increase Data Quality With Memphis Schemaverse

How To Increase Data Quality With Memphis Schemaverse

In this article, the reader will learn about Memphis Schemaverse and discover how using Memphis Schemaverse will increase your data quality.

By 
Idan Asulin user avatar
Idan Asulin
·
Feb. 24, 23 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
2.6K Views

Join the DZone community and get the full member experience.

Join For Free

What Is Memphis?

Memphis{dev} is an open-source real-time data processing platform that provides end-to-end support for in-app streaming use cases using Memphis distributed message broker.

“It took me three minutes to do in Memphis what took me a week and a half in Kafka״
Memphis{dev} enables building next-generation applications that require large volumes of streamed and enriched data, modern protocols, zero ops, rapid development, extreme cost reduction, and a significantly lower amount of dev time for data-oriented developers and data engineers.

Four Critical Pillars:

  • Performance – Enhanced cache usage
  • Resiliency – Never lose a message and 99.95% uptime
  • Observability – Out-of-the-box observability that makes sense and reduces troubleshooting time
  • Developer Experience – Modularity, inline processing, schema management, gitops. Made by developers for anyone, not just developers.

Memphis’ Key Features (Current and Future):

  • A distributed cloud-native message broker
  • Written in Golang
  • Low-code UI and CLI for both engineers, ops, and product teams
  • Out-of-the-box monitoring and notification center
  • DLQ
  • Real-time message tracing
  • Schema management, validation, and transformation
  • Connectors framework made by Memphis team and the community *
  • Serverless stream enrichment *
  • Enhanced security and auditing
  • Object tagging

Data Contracts

Data Contracts

In a federated data platform, in which responsibilities are distributed between stakeholders, teams, and sources, it’s harder to control and establish a single standard. This is where the data contracts concept comes into play. Why do data contracts matter? Because (a) they provide insights into who owns what data products. (b) they support setting standards and managing your data pipelines with confidence. They also provide crucial information on what data is being consumed, by whom, and for what purpose. Bottom line: data contracts are essential for robust data management!

The very basic building block to control and ensure the quality of data that flows through your organization between the different owners is by defining well-written schemas and data models.

Why?

  • Data pipelines are constantly breaking and creating data quality AND usability issues.
  • There is a communication chasm between service implementers, data engineers, and data consumers.
  • There are multiple approaches to solving these issues, and data engineers are still very much pioneers in exploring the frontier of future best practices.

Formats

A quick overview of the most popular formats.

  • Protobuf
    The modern. Protocol buffers are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler.
  • Avro
    The popular. Apache Avro™ is the leading serialization format for record data, and first choice for streaming data pipelines. It offers excellent schema evolution.
  • JSON
    The simplest. JSON Schema is a vocabulary that allows you to annotate and validate JSON documents.

Serialization Process

Serialization converts a data object — a combination of code and data — into a series of bytes that saves the object’s state in an easily transmittable form. The opposite process is called deserialization.

Serialization is basically represented as a function, each data format with its own implementation, and in case the structure and content of a given data do not match the defined schema in the .proto/.avro/.json struct, the serialization process fails, and therefore, the message will not be sent to the broker.

That process establishes the message’s initial validation before it reaches the broker itself, using the client cache to store the schema locally.

Memphis Schemaverse

Memphis Schemaverse provides a robust schema store and schema management layer on top of Memphis broker without a standalone compute or dedicated resources. With a unique and modern UI and programmatic approach, technical and non-technical users can create and define different schemas, attach the schema to multiple stations, and choose if the schema should be enforced or not.
Memphis’ low-code approach removes the serialization part as it is embedded within the producer library.
Schemaverse supports versioning, GitOps methodologies, and schema evolution.

Also Includes:

  • Great UI and programmatic approach
  • Embed within the broker
  • Zero-trust enforcement
  • Versioning
  • Out-of-the-box monitoring
  • Import & Export schemas
  • Low/no-code validation and serialization
  • No configuration needed
  • Native support in Python, Go, Node.js
    schemaverse overview
    Schemas Management
    Partners

Getting Started

Step 1: Create a new schema (currently only available through Memphis GUI)


No Schema Found

Create Schema
Step 2: Attach Schema
Head to your station, and on the top-left corner, click on “+ Attach schema”

Attach Schema

marketing-partners.prod

Step 3: Code example in node.js
Memphis abstracts the need for external serialization functions and embeds them within the SDK.

Producer (Protobuf example):

JavaScript
 
const memphis = require("memphis-dev");
var protobuf = require("protobufjs");

(async function () {
    try {
        await memphis.connect({
            host: "localhost",
            username: "root",
            connectionToken: "*****"
        });
        const producer = await memphis.producer({
            stationName: "marketing-partners.prod",
            producerName: "prod.1"
        });
        var payload = {
            fname: "AwesomeString",
            lname: "AwesomeString",
            id: 54,
        };
        try {
            await producer.produce({
                message: payload
        });
        } catch (ex) {
            console.log(ex.message)
        }
    } catch (ex) {
        console.log(ex);
        memphis.close();
    }
})();


Consumer (Requires .proto file to decode messages):

JavaScript
const memphis = require("memphis-dev");
var protobuf = require("protobufjs");

(async function () {
    try {
        await memphis.connect({
            host: "localhost",
            username: "root",
            connectionToken: "*****"
        });

        const consumer = await memphis.consumer({
            stationName: "marketing",
            consumerName: "cons1",
            consumerGroup: "cg_cons1",
            maxMsgDeliveries: 3,
            maxAckTimeMs: 2000,
            genUniqueSuffix: true
        });

        const root = await protobuf.load("schema.proto");
        var TestMessage = root.lookupType("Test");

        consumer.on("message", message => {
            const x = message.getData()
            var msg = TestMessage.decode(x);
            console.log(msg)
            message.ack();
        });
        consumer.on("error", error => {
            console.log(error);
        });
    } catch (ex) {
        console.log(ex);
        memphis.close();
    }
})();


Data processing Data quality Schema Data management Open source

Published at DZone with permission of Idan Asulin. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Want To Build Successful Data Products? Start With Ingestion and Integration
  • Evolving Data Strategy at Major Canadian Bank
  • Optimizing Your Data Pipeline: Choosing the Right Approach for Efficient Data Handling and Transformation Through ETL and ELT
  • Unleashing the Power of Cloud Storage With JuiceFS

Partner Resources


Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: