DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
  1. DZone
  2. Data Engineering
  3. Big Data
  4. The Art of Feature Engineering in the World of Data Science

The Art of Feature Engineering in the World of Data Science

A look at how data science and big data teams can use data analysis, data modeling, and other techniques to solve real-world problems.

Ramesh Manickavel user avatar by
Ramesh Manickavel
CORE ·
Sep. 11, 18 · Tutorial
Like (8)
Save
Tweet
Share
32.94K Views

Join the DZone community and get the full member experience.

Join For Free

Shubham Agrawal, CEO of Balludas Information Technology talks with Hari Shrawgi, Manager of Talent Acquisition team.

The Problem

Shubham: Hari, we have lot of open positions to be backfilled and our engineering teams are unhappy with the hiring turn-around.

Hari: But boss, we are doing our best to bring in good candidates, scheduling interviews for the earliest possible date. The problem lies within the engineering team in interviewing and providing feedback for the next rounds as quickly as possible.

Shubham: Hari, we are one team and I don’t like excuses. You claim to have the best Data Scientists in your team. Why don’t you work with them to come up with some level of analysis and recommendation? It will help us set expectations.

Hari meets his team, Bhawna Bhardwaj, Mohit Jain, Preeti Patel, and Vasu Sharma. The team has expertise in building various forecasting and prediction models leveraging Data Science tools and techniques.

Brainstorming

Hari: Let’s discuss a high-level approach on how we can tackle the problem that Shubham brought up this morning.

Vasu: We have tons of data in our system. Let’s process them and perform time series analyses to build a forecast.

Preeti: Yes! Also, we can use decision trees or rule-based classifiers for recommendations.

Bhawna: I guess we need to take a step backwards before we rush into a solution. With Hari and couple of us knowing the entire process of hiring, we have enough domain knowledge amongst ourselves. Let’s think of what to do next.

Mohit: Before we get into modeling, how about brainstorming the attributes required and have them ready for further processing?

Vasu: You mean, the feature engineering. But isn’t it time consuming? You know our CEO, he wants results yesterday!

Preeti: Let’s bring a method to the madness. How about a fishbone diagram to identify all possible causes that could potentially influence the problem? This shouldn't take much time.

Feature Selection

Hiring Cycle Time - FishboneHari: It’s amazing to see in a short time that we were able to cover all possible influencers.

Bhawna: Though we have identified all attributes, we may not be able to get the data.

Mohit: Yes, we have constraints such as data privacy, sanity of collected data, and more. But we will use data transformation techniques to minimize the challenges.

Vasu: Hari, we need to set the expectation that if we are unable to get key features, even after the data transformation that Mohit is referring to, the accuracy and precision of the model may be less.  

Preeti: We also need to be careful about over-fitting the model since we have all possible attributes and data.

Transforming Features

Image title

Bhawna: I just did descriptive statistics on the data set we got. Looks like we have a few records where the expected salaries and the location of job are missing.

Mohit: For the salary, since this category only has a few records, we can work with the respective hiring team to fill it in. Regarding the missing location, we can do similarity-based imputation since we have data for same hiring manager, similar job, responsibilities, and other matching criteria.

Vasu: There are some rows where the class label such as the time taken to shortlist the profile itself is missing. We can’t do any imputation here. Hence, it’s better to ignore the records.

Preeti: We can convert the expected salary into 10 bins such as 25-50k, 50-75k, etc., and replace the numeric value with the categorical ID. This will help reduce the noise and fit for the models.

Identifying Key Features

Bhawna: We have so many features identified. It’s important we select vital few for better goodness of our model and to limit the over-fitting.

Mohit: Well, we have couple of options. We can identify variables that are related to each other through correlation analysis.Image title

Vasu: Yes, but correlation doesn’t mean causation. To ensure the impact, we can conduct statistical hypothesis testing.

Preeti: We can also apply Design of Experiment. This not only helps in identifying key features also in optimizing the outcome, in our case the cycle time.

Bhawna: Principal Component Analysis also helps in reducing the dimensionality. It works only for numeric data like correlation analysis. That’s why the feature transformation helps to convert categorical data into numeric. For example, in our problem, instead of categorizing the level as ‘Associate, Senior, Principal’ we can number them.

Exploratory Data Analysis and Modeling

Mohit: Wow, even before we apply model, we can visualize the trend and oscillation for individual features as well as the relationships.

Vasu: That’s the power of Exploratory Data Analysis. Visualization helps in not just ‘why it happened’ but also some level of recommendation.

Preeti: Very true. From our analysis, it seems to be difficult to get the profile for Senior SAP consultant especially from this vendor.

Bhawna: Maybe we can check the alternate by different vendors as well as working with hiring manager for the need for senior level or if we can adjust with next level.

Hari: Awesome work folks. Never knew feature engineering itself can help identifying the causes and recommendations. While we continue working on implementing suitable model, I’ll share this with our CEO and the next steps.

Conclusion

CEO Appreciation

Shubham: This is absolutely fantastic team! Looks like lot of work has already been done to identify the key challenges. While we explore this further, I'll work with Engineering team on addressing some of the root causes. Good job again!

We have great predictive modeling algorithms and techniques available but let’s not forget the foundation. Feature Engineering helps understanding the problem from customer perspective for whom we are building those models. After all, “Well begun is half done.”

Data science Feature engineering Engineering

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Using GPT-3 in Our Applications
  • Practical Example of Using CSS Layer
  • Apache Kafka Is NOT Real Real-Time Data Streaming!
  • Test Execution Tutorial: A Comprehensive Guide With Examples and Best Practices

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: