DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Stop Fine-Tuning for Everything: A Decision Tree for RAG vs Tuning vs Tools
  • Building Call Graphs for Code Exploration Using Tree-Sitter
  • Two-Pass Huffman in Blocks of 2 Symbols: Golang Implementation
  • Implementing LSM Trees in Golang: A Comprehensive Guide

Trending

  • Evaluating SOC Effectiveness Using Detection Coverage and Response Metrics
  • Building a Skill-Based Agentic Reviewer with Claude Code: A Practical Guide Using Skills.MD, MCP Servers, Tools, and Tasks
  • Smart Deployment Strategies for Modern Applications
  • One Query, Four GPUs: Tracing a Distributed Training Stall Across Nodes

Exporting Decision Trees in Textual Format With sklearn

In this post, we explore how to make decision trees using Python and a open data set.

By 
Giuseppe Vettigli user avatar
Giuseppe Vettigli
·
Jun. 12, 19 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
13.4K Views

Join the DZone community and get the full member experience.

Join For Free

In the past, we have covered Decision Trees showing how interpretable these models can be (see the tutorials here). In the previous tutorials, we exported the rules of the models using the function export_graphviz from sklearn and visualized the output of this function in a graphical way with an external tool which is not easy to install in some cases. Luckily, since version 0.21.2, scikit-learn offers the possibility to export Decision Trees in a textual format (I implemented this feature personally!) and in this post we will see an example how of to use this new feature.

Let's train a tree with two layers on the famous iris dataset using all the data and print the resulting rules using the brand new function export_text:

from sklearn.tree import DecisionTreeClassifier
from sklearn.tree.export import export_text
from sklearn.datasets import load_iris

iris = load_iris()
X = iris['data']
y = ['setosa']*50+['versicolor']*50+['virginica']*50
decision_tree = DecisionTreeClassifier(random_state=0, max_depth=2)
decision_tree = decision_tree.fit(X, y)
r = export_text(decision_tree, feature_names=iris['feature_names'])
print(r)
|--- petal width (cm) <= 0.80
|   |--- class: setosa
|--- petal width (cm) >  0.80
|   |--- petal width (cm) <= 1.75
|   |   |--- class: versicolor
|   |--- petal width (cm) >  1.75
|   |   |--- class: virginica

Reading the data, we note that if the feature petal widthis less or equal than 80mm the samples are always classified as setosa. Otherwise if the petal width is less or equal than 1.75cm they're classified as versicolor, or as virginica if the petal width is more than 1.75cm. This model might well suffer from overfitting, but tells us some important details of the data. It's easy to note that the petal width is the only feature used, we could even say that the petal width is small for setosa samples, medium for versicolor, and large for virginica.

To understand how the rules separate the labels we can also print the number of samples from each class (class weights) on the leaves:

r = export_text(decision_tree, feature_names=iris['feature_names'],
                decimals=0, show_weights=True)
print(r)
|--- petal width (cm) <= 1
|   |--- weights: [50, 0, 0] class: setosa
|--- petal width (cm) >  1
|   |--- petal width (cm) <= 2
|   |   |--- weights: [0, 49, 5] class: versicolor
|   |--- petal width (cm) >  2
|   |   |--- weights: [0, 1, 45] class: virginica

Here we have the number of samples per class among square brackets. Recalling that we have 50 samples per class, we see that all the samples labeled as setosa are correctly modeled by the tree, while for 5 virginica and 1 versicolor the model fails to capture the information given by the label.

Check out the documentation of the function export_text to discover all its capabilities here.

Tree (data structure)

Published at DZone with permission of Giuseppe Vettigli. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Stop Fine-Tuning for Everything: A Decision Tree for RAG vs Tuning vs Tools
  • Building Call Graphs for Code Exploration Using Tree-Sitter
  • Two-Pass Huffman in Blocks of 2 Symbols: Golang Implementation
  • Implementing LSM Trees in Golang: A Comprehensive Guide

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook