DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
Securing Your Software Supply Chain with JFrog and Azure
Register Today

Trending

  • What Is JHipster?
  • Grow Your Skills With Low-Code Automation Tools
  • Rule-Based Prompts: How To Streamline Error Handling and Boost Team Efficiency With ChatGPT
  • Understanding Data Compaction in 3 Minutes

Trending

  • What Is JHipster?
  • Grow Your Skills With Low-Code Automation Tools
  • Rule-Based Prompts: How To Streamline Error Handling and Boost Team Efficiency With ChatGPT
  • Understanding Data Compaction in 3 Minutes
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Feature Engineering

Feature Engineering

Feature engineering is a huge part of ML and deep learning. Read on to learn how feature engineering helps developers work with data.

Bhalchandra Madhekar user avatar by
Bhalchandra Madhekar
·
Jul. 15, 18 · Analysis
Like (3)
Save
Tweet
Share
8.94K Views

Join the DZone community and get the full member experience.

Join For Free

Data has become a first-class asset for modern businesses, corporations, and organizations irrespective of their size and scale. Any intelligent system, regardless of its complexity, needs to be powered by data. At the heart of any intelligent system, we have one or more data insight algorithms based on some sort of means of learning from data, such as machine learning, deep learning, or statistical methods, which consume this data to gather knowledge and provide intelligent insights over a period of time. Algorithms are pretty generic by themselves and cannot work out of the box on plain, raw data. There is a need to extract meaningful features from raw data so that it can be understood and consumed.

Any intelligent data insight system basically consists of an end-to-end pipeline, from ingesting raw data to leveraging data processing techniques to wrangle, process, and engineer meaningful features and attributes from this data. Then we usually leverage techniques like statistical models or machine learning models to model on these features and then deploy this model if necessary for future usage based on the problem to be solved at hand. A typical standard machine learning pipeline based on the Cross-industry standard process for a data mining industry standard process model is depicted below.

Ingesting raw data and building models on top of this data directly would be foolhardy since we wouldn’t get the desired results or performance and also algorithms themselves do not automatically extract meaningful features from plain raw data. The data preparation aspect pointed out in the figure above, where we deal with various methodologies to extract meaningful attributes or features from the raw data after it has gone through the necessary analysis of wrangling and pre-processing. Feature Engineering is an art as well as a science and this is the reason Data Scientists often spend 70% of their time in the data preparation phase before modeling.

“Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data.” - Dr. Jason Brownlee

This gives us some insight into why feature engineering is the process used for transforming data into features to act as inputs for machine learning models, namely that good quality features help in improving the overall model performance and accuracy. Features are also very much dependent on the underlying question. Thus, even though the machine learning task might be the same in different scenarios, like a classification of IoT events into normal and abnormal behavior or classifying customer sentiments, the features extracted in each scenario will be very different from each other.

What Are Features?

A feature is, typically, a specific representation on top of raw data, which is an individual, measurable attribute, typically depicted by a column in a dataset. Considering a generic two-dimensional dataset, each observation is depicted by a row and each feature by a column, which will have a specific value for an observation

Thus, like in the example in the figure above, each row typically indicates a feature vector and the entire set of features across all the observations forms a two-dimensional feature matrix, also known as a feature-set. This is akin to data frames or spreadsheets representing two-dimensional data. Typically machine learning algorithms work with these numeric matrices or tensors and hence most feature engineering techniques deal with converting raw data into some numeric representations which can be easily understood by these algorithms.

Features can be of two major types based on the dataset. Inherent raw features are obtained directly from the dataset with no extra data manipulation or engineering. Derived features are usually obtained from feature engineering, where we extract features from existing data attributes. A simple example would be creating a new feature “order fulfillment in days” from an orders dataset containing “order date” by just subtracting their order date from the current date. On the other hand, with deep learning, in particular, features are usually simple since the algorithms generate their own internal transformations. This approach requires large amounts of data and comes at the expense of interpretability. However, these trade-offs are often worthwhile in image processing or natural language processing use cases.

For most other use cases companies face, e.g. predictive analytics, feature engineering is necessary to convert data into a machine learning-ready format. The choice of features is crucial for both interpretability and performance. Without feature engineering, we wouldn’t have the accurate machine learning systems deployed by major companies today.

Feature Engineering

Numeric data typically represents data in the form of scalar values depicting observations, recordings or measurements. Here, by numeric data, we mean continuous data and not discrete data which is typically represented as categorical data. Numeric data can also be represented as a vector of values where each value or entity in the vector can represent a specific feature. Integers and floats are the most common and widely used numeric data types for continuous numeric data. Even though numeric data can be directly fed into machine learning models, you would still need to engineer features which are relevant to the scenario, problem, and domain before building a model. Hence the need for feature engineering still remains.

Data science Engineering Machine learning Feature engineering

Opinions expressed by DZone contributors are their own.

Trending

  • What Is JHipster?
  • Grow Your Skills With Low-Code Automation Tools
  • Rule-Based Prompts: How To Streamline Error Handling and Boost Team Efficiency With ChatGPT
  • Understanding Data Compaction in 3 Minutes

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: