DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • The Equivalence Rationale of Neural Networks and Decision Trees: Towards Improving the Explainability and Transparency of Neural Networks
  • When To Use Decision Trees vs. Random Forests in Machine Learning
  • Security by Design: Building Full-Stack Applications With DevSecOps
  • How Large Tech Companies Architect Resilient Systems for Millions of Users

Trending

  • Distributed Consensus: Paxos vs. Raft and Modern Implementations
  • Simpler Data Transfer Objects With Java Records
  • Implementing Explainable AI in CRM Using Stream Processing
  • How to Introduce a New API Quickly Using Micronaut

How to Design a Better Decision Tree With Pruning

A decision tree is a supervised machine learning algorithm that is used for classification and regression problems. Let's see how to design one with Pruning.

By 
Mahipal Nehra user avatar
Mahipal Nehra
·
Jan. 07, 22 · Analysis
Likes (3)
Comment
Save
Tweet
Share
6.5K Views

Join the DZone community and get the full member experience.

Join For Free

Decision tree (DT) analysis is a general and predictive modeling tool for machine learning. It is one of the simplest and most useful structures for machine learning. As the name indicates, DTs are trees of decisions. By using an algorithm approach for splitting data sets according to different conditions, decision trees are constructed. A decision tree is one of the commonly used and functional techniques for supervised learning. 

But before moving on to designing decision trees with pruning, let’s understand its true concept.

Understanding Decision Trees

A decision tree is a supervised machine learning algorithm that is used for classification and regression problems. Decision trees follow a set of nested if-else statement conditions to make predictions. And as decision trees are used for classification and regression, the algorithm that is used to grow them is called CART (Classification and Regression Trees). And there is not only one but multiple algorithms proposed to build decision trees. Decision trees aim to create a model that predicts the target variable’s value by learning simple decision rules inferred from the data features.

Each node of a decision tree represents a decision. 

CART Model Visualized

Source

In the above diagram, the root node is the beginning of the decision tree graph with zero depth. Then there are child/internal nodes where binary decisions are taken. And finally, there are leaf nodes, where predictions of a category are made. 

The algorithm that helps to build a decision tree aims to predict a target variable from the variables and their attributes. The structure of the decision tree is made through binary splits series from the root node through branches. To reach the leaf node in the decision tree, you have to pass multiple internal nodes to check the predictions made.

Assumptions While Building Decision Trees

Some of the assumptions that you have to make while working with decision trees are:

  • The whole training set is root.
  • It is preferable to have categorical feature values. Discrete values are used before building a decision tree model.
  • The attribute values are used to recursively distribute records.
  • Statistical approaches are used to determine which attributes should be placed as root or internal nodes of the tree.

The Sum of Product (SOP) or Disjunctive Normal Form representation is used in decision trees. 

Each branch in a class, from the root to the leaf node of the tree having the different branches ending forms a disjunction (sum), same class forms conjunction (product) of the values.

Why Decision Tree?

A decision tree follows the same process as a human follows when making a decision in real life, making it simpler to understand. It is crucial for solving decision-making problems in machine learning. The reason why it is so commonly used to train an ML model is that decision trees help to think about all the possible outcomes of a problem. Moreover, there is less requirement for data cleaning in comparison to other algorithms. 

But, the decision tree also has its limitations, which is Overfitting.

Overfitting in Decision Trees

Overfitting is a significant practical difficulty in decision trees. A decision tree will always overfit the training data if it is allowed to grow to its max depth. Overfitting occurs in a decision tree when the tree is designed to fit all samples in the training data set perfectly. As a result, decision trees end up with branches with strict sparse data rules and this affects the accuracy of prediction by working with samples that are not part of the training set. The deeper the tree grows, the more complex the decision rule sequence becomes. Assigning the maximum depth is the easiest way to simplify a tree and handle overfitting. 

But how to improve our tree model in a more precise way? Let’s find out!

How To Prevent Overfitting in Decision Trees with Pruning?

Pruning is a technique used to remove overfitting in Decision trees. It simplifies the decision tree by eliminating the weakest rule. It can be further divided into:

  • Pre-pruning refers to stopping the tree at an early stage by limiting the growth of the tree through setting constraints. To do that, we can set parameters like min_samples_split, min_samples_leaf, or max_depth using Hyperparameter tuning.
  • Post-pruning or Backward pruning is used after the decision tree is built. It is used when the decision tree has become extremely in-depth and shows model overfitting. To do that, we will control the decision tree branches like max_samples_split and max_depth through cost-complexity pruning.

Pruning starts with an unpruned tree. Then it takes the subtree sequence and selects the best one via cross-validation. 

It’s important that pruning ensures that the subtree is optimal, i.e., it has higher accuracy, and the optimal subtree search is computationally tractable. Hence, pruning should not only reduce overfitting but also make the decision tree less complex, easier to understand, and efficient to explain than the unpruned decision tree while maintaining its performance.

Takeaway

Now we know that decision trees belong to supervised machine learning algorithms. In contrast to other algorithms of supervised learning, decision trees can also be used to solve classification and regression problems. 

Decision trees are used to build models for predicting variable values or classes based on simple decision rules derived from previous training data. Another takeaway point of this blog is pruning. Pruning helps decision trees to make precise decisions while reducing their complexity and setting constraints.

Tree (data structure) Decision tree Design

Opinions expressed by DZone contributors are their own.

Related

  • The Equivalence Rationale of Neural Networks and Decision Trees: Towards Improving the Explainability and Transparency of Neural Networks
  • When To Use Decision Trees vs. Random Forests in Machine Learning
  • Security by Design: Building Full-Stack Applications With DevSecOps
  • How Large Tech Companies Architect Resilient Systems for Millions of Users

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!