Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

DZone's Guide to

# scikit-learn: Using GridSearch to Tune the Hyperparameters of VotingClassifier

### When building a classification ensemble, you need to be sure that the right classifiers are being included and the wrong ones are being excluded. Here's how to do that.

· AI Zone ·
Free Resource

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Did you know that 50- 80% of your enterprise business processes can be automated with AssistEdge?  Identify processes, deploy bots and scale effortlessly with AssistEdge.

In my last post, I showed how to create a multi-class classification ensemble using scikit-learn’s `VotingClassifier` and finished mentioning that I didn’t know which classifiers should be part of the ensemble.

We need to get a better score with each of the classifiers in the ensemble — otherwise, they can be excluded.

We have a TF/IDF-based classifier as well as well as the classifiers I wrote about in the last post. This is the code describing the classifiers:

``````import pandas as pd
from sklearn import linear_model
from sklearn.ensemble import VotingClassifier
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline

Y_COLUMN = "author"
TEXT_COLUMN = "text"

unigram_log_pipe = Pipeline([
('cv', CountVectorizer()),
('logreg', linear_model.LogisticRegression())
])

ngram_pipe = Pipeline([
('cv', CountVectorizer(ngram_range=(1, 2))),
('mnb', MultinomialNB())
])

tfidf_pipe = Pipeline([
('tfidf', TfidfVectorizer(min_df=3, max_features=None,
strip_accents='unicode', analyzer='word', token_pattern=r'\w{1,}',
ngram_range=(1, 3), use_idf=1, smooth_idf=1, sublinear_tf=1,
stop_words='english')),
('mnb', MultinomialNB())
])

classifiers = [
("ngram", ngram_pipe),
("unigram", unigram_log_pipe),
("tfidf", tfidf_pipe),
]

mixed_pipe = Pipeline([
("voting", VotingClassifier(classifiers, voting="soft"))
])``````

Now we’re ready to work out which classifiers are needed. We’ll use `GridSearchCV` to do this.

``````from sklearn.model_selection import GridSearchCV

def combinations_on_off(num_classifiers):
return [[int(x) for x in list("{0:0b}".format(i).zfill(num_classifiers))]
for i in range(1, 2 ** num_classifiers)]

param_grid = dict(
voting__weights=combinations_on_off(len(classifiers))
)

y = train_df[Y_COLUMN].copy()
X = pd.Series(train_df[TEXT_COLUMN])

grid_search = GridSearchCV(mixed_pipe, param_grid=param_grid, n_jobs=-1, verbose=10, scoring="neg_log_loss")

grid_search.fit(X, y)

cv_results = grid_search.cv_results_

for mean_score, params in zip(cv_results["mean_test_score"], cv_results["params"]):
print(params, mean_score)

print("Best score: %0.3f" % grid_search.best_score_)
print("Best parameters set:")
best_parameters = grid_search.best_estimator_.get_params()
for param_name in sorted(param_grid.keys()):
print("\t%s: %r" % (param_name, best_parameters[param_name]))``````

Let’s run the grid scan and see what it comes up with:

``````{'voting__weights': [0, 0, 1]} -0.60533660756
{'voting__weights': [0, 1, 0]} -0.474562462086
{'voting__weights': [0, 1, 1]} -0.508363479586
{'voting__weights': [1, 0, 0]} -0.697231760084
{'voting__weights': [1, 0, 1]} -0.456599644003
{'voting__weights': [1, 1, 0]} -0.409406571361
{'voting__weights': [1, 1, 1]} -0.439084397238

Best score: -0.409
Best parameters set:
voting__weights: [1, 1, 0]``````

We can see from the output that we’ve tried every combination of each of the classifiers. The output suggests that we should only include the `ngram_pipe` and `unigram_log_pipe` classifiers. `tfidf_pipe` should not be included; our log loss score is worse when it is added.

The code is on GitHub if you want to see it all in one place.

Consuming AI in byte sized applications is the best way to transform digitally. #BuiltOnAI, EdgeVerve’s business application, provides you with everything you need to plug & play AI into your enterprise.  Learn more.

Topics:
ai ,classification ,tutorial ,scikit-learn ,gridsearch ,tuning ,algorithm

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.