DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

[DZone Research] Observability + Performance: We want to hear your experience and insights. Join us for our annual survey (enter to win $$).

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • The Role of Data Governance in Data Strategy: Part 1
  • Data Anonymization in Test Data Management
  • The Rise of Biometric Security: Protecting Data in the Future of Cybercrime
  • Streamlining Salesforce Data Management: Migrating Attachments to AWS S3

Trending

  • Scalable Rate Limiting in Java With Code Examples: Managing Multiple Instances
  • Modern Data Backup Strategies for Safeguarding Your Information
  • A Better Web3 Experience: Account Abstraction From Flow (Part 2)
  • LTS JDK 21 Features
  1. DZone
  2. Data Engineering
  3. Data
  4. Data Engineering Practices to Avoid

Data Engineering Practices to Avoid

Even the most skilled and experienced data engineers can make mistakes. Here are some of them and how to steer clear.

Devin Partida user avatar by
Devin Partida
·
Feb. 14, 23 · Opinion
Like (1)
Save
Tweet
Share
2.38K Views

Join the DZone community and get the full member experience.

Join For Free

Data engineers are increasingly in high demand, especially as more company leaders realize it’s necessary to use reliable information for better decision-making. However, even the most skilled and experienced professionals can make mistakes. Here are some of them and how to steer clear of these blunders. 

Preventing Safe and Effective Data Collaboration

Data usage does not happen in a vacuum. The times when only a few people or departments have access to information are in the past. It’s now standard practice for employees throughout organizations to use and add to databases. As a result, data engineers must incorporate collaboration capabilities into their design and management of information pipelines. 

However, data engineers must also recognize how people may work with information in isolation. The best way is to use tools and create environments that let employees handle them safely while working independently or with colleagues. 

Failing to Learn How Businesses Will Use Data

Data engineers lose valuable opportunities and waste precious time if they don’t engage in productive discussions about how people use data and why. For example, some companies use it to track customer trends, while others do so to stop fraud.

Those are two valid but different reasons to rely on company data. A critical part of a data engineer’s job is learning about what business leaders and other affected parties hope and expect to do with the information a company has. From there, they can build solutions that surpass expectations and remain relevant over the long term. 

Underestimating the Ramifications of Poor-Quality Data

Low-quality information can lead to questionable results, making people lose confidence in data-backed operations. Sometimes, the problem can even cause reputational damage. For example, a 2022 study found nearly half of respondents most often measure data quality by the number of customer complaints received. Unfortunately, however, it could be too late when things are at that stage. 

Another worrisome takeaway from the research was that, on average, companies spend almost 800 hours per month resolving data-quality problems. Engineers should strongly consider spending their time differently by focusing on aspects that will lead to better-quality information from the start. Of course, no solution eliminates every error, but if data engineers can prevent most issues, they’ll have more time to spend on productive activities that help businesses grow. 

Overlooking the Need for Data Security

Resolving a hack can cost millions of dollars. Plus, statistics indicate small and medium-sized businesses are data breach targets 43% of the time. Yet, those entities typically have comparatively fewer resources to put toward recovery. The costs could encompass cybersecurity experts, public relations professionals, and regulatory fines, among other expenses. 

Data engineers are not solely responsible for keeping a company’s information safe. They’ll typically work with cybersecurity or IT teams. However, they must give ongoing input on maintaining information’s safety as it moves through the organization. Discussions should also center on strategies to keep unauthorized parties from accessing it, whether such attempts happen outside or within the organization.

Falling Behind With Data Access Options

Data engineers that mention which parties can access information and for what reasons must also ensure they encourage relevant individuals to use modern data access tools and strategies. Otherwise, people may find that the current processes keep them from doing their jobs well. Some could even try to circumvent the procedures out of desperation, putting businesses at elevated risks. 

A 2023 Immuta survey found 46% of data practitioners believed their organizations’ outdated access policies made it harder for people to work. Moreover, 51% of respondents said legacy policies prevented them from scaling their security options. These are just some of the many reasons why data engineers must continually highlight the pervasive problems of having access policies that no longer suit current needs. 

Investing in Overly Complicated Products or Solutions

The market has many options to help data engineers do their jobs better. The problem is that they’re not equally suitable, and there’s no guarantee that popular solutions are the best for particular businesses. Data engineers must strive for simplicity, making their code easy for others to understand and follow. 

Another best practice is to keep the approach as modular as possible. Then, if something breaks or otherwise functions unexpectedly, data engineers will find it easier to troubleshoot the issue and prevent future mishaps. 

Maintaining too Many Manual Processes

A data engineering career comes with numerous challenges. The role will likely prove even tougher for those who use manual processes when automated options exist. A 2021 survey of data engineers showed 97% burnout from their daily work. When asked about the reasons behind the overwhelm, some mentioned manual and repetitive processes for data preparation and pipeline management. Others said their colleagues ask for too many things without providing adequate time to meet their needs. 

Fortunately, automation is well-suited to many data engineering steps. Automation can assist in moving, collecting, and preparing the content for later use. Now is an excellent time for data engineers to explore this option, even if they currently have manageable workloads.

Be Proactive for a Smoother Experience

These are some of the most common data engineering mistakes, and knowing about them makes it easier to recognize and avoid these problems. Furthermore, doing that will make data engineers more likely to have mutually beneficial outcomes while working in their roles.

Engineering Data (computing) Basic access authentication Data loss Data management Information security

Opinions expressed by DZone contributors are their own.

Related

  • The Role of Data Governance in Data Strategy: Part 1
  • Data Anonymization in Test Data Management
  • The Rise of Biometric Security: Protecting Data in the Future of Cybercrime
  • Streamlining Salesforce Data Management: Migrating Attachments to AWS S3

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: