DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Integrating Google BigQuery With Amazon SageMaker
  • AI, ML, and Data Science: Shaping the Future of Automation
  • Recommender Systems Best Practices: Collaborative Filtering
  • When Doris Meets Iceberg: A Data Engineer's Redemption

Trending

  • Navigating and Modernizing Legacy Codebases: A Developer's Guide to AI-Assisted Code Understanding
  • Introducing Graph Concepts in Java With Eclipse JNoSQL
  • Enhancing Business Decision-Making Through Advanced Data Visualization Techniques
  • How Kubernetes Cluster Sizing Affects Performance and Cost Efficiency in Cloud Deployments
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Data Manipulation in R Using dplyr

Data Manipulation in R Using dplyr

Learn about the primary functions of the dplyr package and the power of this package to transform and manipulate your datasets with ease in R.

By 
Sibanjan Das user avatar
Sibanjan Das
DZone Core CORE ·
Nov. 15, 17 · Tutorial
Likes (5)
Comment
Save
Tweet
Share
10.1K Views

Join the DZone community and get the full member experience.

Join For Free

In our previous article, we discussed the importance of data preprocessing and data management tasks in a data science pipeline. Also, we provided a brief explanation of the dplyr R package. This article will focus on the power of this package to transform your datasets with ease in R.

The dplyr package has five primary functions, commonly known as verbs. The verbs aids in performing most of the typical data manipulation operations, which we will discuss in the below sections.

Glimpse

The glimpse method can be used to see the columns of data and display some portion of the data for each variable that can be fit on a single line.

library(dplyr)
glimpse(mtcars)

Image title

Select 

select is used for choosing display variables based on the subset criteria. For instance, select(mtcars,mpg) displays the MPG column from the mtcars dataset: 

Image title

select(mtcars,mpg:disp) displays data in the columns from MPG to DISP, as shown in the below results:

Image title

select(mtcars, mpg:disp,-cyl) displays data in the columns from MPG to DISP without the CYL attribute:

Image title

Pipe Operator 

pipe operator(%>%) is used to tie multiple operations together. This makes it easy, especially when we need to perform various operations on a dataset to derive the results. 

We can read mtcars %>% select(wt,mpg,disp) from left to right — from the mtcars dataset, select WT, MPG, and DISP variables.

Image title

Mutate 

mutate is used to add new columns to a dataset. It is useful to create attributes that are functions of other attributes in the dataset. It's one of the essential tools that can come handy for new feature creation in the data preprocessing stage.

mtcars %>% mutate(nv=wt+mpg) creates a new attribute NV by adding WT and MPG together.

Image title

Filter 

The filter method selects cases based on their values.

mtcars %>% filter(hp>123) displays data whose HP values are more than 123.

Image title

Group_by

group_by is used to group data together based on one or more columns. It is often used along with a summarizing function to derive aggregated values:

mtcars %>% filter(hp>123) %>% group_by(am) 

Summarize

summarize is used to aggregate multiple values to a single value. It is most often used with the group_by function, and the output has one row per group:

mtcars %>% filter(hp>123) %>% group_by(am) %>% summarize(avg_wt=mean(wt)) 

This command calculates the average WT for each unique value in the AM column for mtcar data having HP > 123.

Image title

Arrange 

arrange is used to sort cases is ascending or descending order. The default is ascending order:

mtcars %>% filter(hp>123) %>% arrange(mpg) 

Image title

As shown below, use desc to order the data in descending order.

mtcars %>% filter(hp>123) %>% arrange(desc(mpg)) 

Image title

To learn more about dplyr, see here.

Though we can perform these tasks using base R functions, the verbs in dplyr are optimized for high performance, are easier to work with, and are consistent in the syntax. So, pick up a dataset, get started with dplyr, and share your data preparation story on DZone for other people to understand.

Data science R (programming language) Dplyr

Opinions expressed by DZone contributors are their own.

Related

  • Integrating Google BigQuery With Amazon SageMaker
  • AI, ML, and Data Science: Shaping the Future of Automation
  • Recommender Systems Best Practices: Collaborative Filtering
  • When Doris Meets Iceberg: A Data Engineer's Redemption

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!