1.12. Multiclass and multioutput algorithms (2024)

This section of the user guide covers functionality related to multi-learningproblems, including multiclass, multilabel, andmultioutput classification and regression.

The modules in this section implement meta-estimators, which require abase estimator to be provided in their constructor. Meta-estimators extend thefunctionality of the base estimator to support multi-learning problems, whichis accomplished by transforming the multi-learning problem into a set ofsimpler problems, then fitting one estimator per problem.

This section covers two modules: sklearn.multiclass andsklearn.multioutput. The chart below demonstrates the problem typesthat each module is responsible for, and the corresponding meta-estimatorsthat each module provides.

1.12. Multiclass and multioutput algorithms (1)

The table below provides a quick reference on the differences between problemtypes. More detailed explanations can be found in subsequent sections of thisguide.

Number of targets

Target cardinality

Validtype_of_target

Multiclassclassification

1

>2

‘multiclass’

Multilabelclassification

>1

2 (0 or 1)

‘multilabel-indicator’

Multiclass-multioutputclassification

>1

>2

‘multiclass-multioutput’

Multioutputregression

>1

Continuous

‘continuous-multioutput’

Below is a summary of scikit-learn estimators that have multi-learning supportbuilt-in, grouped by strategy. You don’t need the meta-estimators provided bythis section if you’re using one of these estimators. However, meta-estimatorscan provide additional strategies beyond what is built-in:

  • Inherently multiclass:

    • naive_bayes.BernoulliNB

    • tree.DecisionTreeClassifier

    • tree.ExtraTreeClassifier

    • ensemble.ExtraTreesClassifier

    • naive_bayes.GaussianNB

    • neighbors.KNeighborsClassifier

    • semi_supervised.LabelPropagation

    • semi_supervised.LabelSpreading

    • discriminant_analysis.LinearDiscriminantAnalysis

    • svm.LinearSVC (setting multi_class=”crammer_singer”)

    • linear_model.LogisticRegression (setting multi_class=”multinomial”)

    • linear_model.LogisticRegressionCV (setting multi_class=”multinomial”)

    • neural_network.MLPClassifier

    • neighbors.NearestCentroid

    • discriminant_analysis.QuadraticDiscriminantAnalysis

    • neighbors.RadiusNeighborsClassifier

    • ensemble.RandomForestClassifier

    • linear_model.RidgeClassifier

    • linear_model.RidgeClassifierCV

  • Multiclass as One-Vs-One:

    • svm.NuSVC

    • svm.SVC.

    • gaussian_process.GaussianProcessClassifier (setting multi_class = “one_vs_one”)

  • Multiclass as One-Vs-The-Rest:

  • Support multilabel:

    • tree.DecisionTreeClassifier

    • tree.ExtraTreeClassifier

    • ensemble.ExtraTreesClassifier

    • neighbors.KNeighborsClassifier

    • neural_network.MLPClassifier

    • neighbors.RadiusNeighborsClassifier

    • ensemble.RandomForestClassifier

    • linear_model.RidgeClassifier

    • linear_model.RidgeClassifierCV

  • Support multiclass-multioutput:

    • tree.DecisionTreeClassifier

    • tree.ExtraTreeClassifier

    • ensemble.ExtraTreesClassifier

    • neighbors.KNeighborsClassifier

    • neighbors.RadiusNeighborsClassifier

    • ensemble.RandomForestClassifier

1.12.1. Multiclass classification

Warning

All classifiers in scikit-learn do multiclass classificationout-of-the-box. You don’t need to use the sklearn.multiclass moduleunless you want to experiment with different multiclass strategies.

Multiclass classification is a classification task with more than twoclasses. Each sample can only be labeled as one class.

For example, classification using features extracted from a set of images offruit, where each image may either be of an orange, an apple, or a pear.Each image is one sample and is labeled as one of the 3 possible classes.Multiclass classification makes the assumption that each sample is assignedto one and only one label - one sample cannot, for example, be both a pearand an apple.

While all scikit-learn classifiers are capable of multiclass classification,the meta-estimators offered by sklearn.multiclasspermit changing the way they handle more than two classesbecause this may have an effect on classifier performance(either in terms of generalization error or required computational resources).

1.12.1.1. Target format

Valid multiclass representations fortype_of_target (y) are:

  • 1d or column vector containing more than two discrete values. Anexample of a vector y for 4 samples:

    >>> import numpy as np>>> y = np.array(['apple', 'pear', 'apple', 'orange'])>>> print(y)['apple' 'pear' 'apple' 'orange']
  • Dense or sparse binary matrix of shape (n_samples, n_classes)with a single sample per row, where each column represents one class. Anexample of both a dense and sparse binary matrix y for 4samples, where the columns, in order, are apple, orange, and pear:

    >>> import numpy as np>>> from sklearn.preprocessing import LabelBinarizer>>> y = np.array(['apple', 'pear', 'apple', 'orange'])>>> y_dense = LabelBinarizer().fit_transform(y)>>> print(y_dense)[[1 0 0] [0 0 1] [1 0 0] [0 1 0]]>>> from scipy import sparse>>> y_sparse = sparse.csr_matrix(y_dense)>>> print(y_sparse) (0, 0) 1 (1, 2) 1 (2, 0) 1 (3, 1) 1

For more information about LabelBinarizer,refer to Transforming the prediction target (y).

1.12.1.2. OneVsRestClassifier

The one-vs-rest strategy, also known as one-vs-all, is implemented inOneVsRestClassifier. The strategy consists infitting one classifier per class. For each classifier, the class is fittedagainst all the other classes. In addition to its computational efficiency(only n_classes classifiers are needed), one advantage of this approach isits interpretability. Since each class is represented by one and only oneclassifier, it is possible to gain knowledge about the class by inspecting itscorresponding classifier. This is the most commonly used strategy and is a fairdefault choice.

Below is an example of multiclass learning using OvR:

>>> from sklearn import datasets>>> from sklearn.multiclass import OneVsRestClassifier>>> from sklearn.svm import LinearSVC>>> X, y = datasets.load_iris(return_X_y=True)>>> OneVsRestClassifier(LinearSVC(dual="auto", random_state=0)).fit(X, y).predict(X)array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

OneVsRestClassifier also supports multilabelclassification. To use this feature, feed the classifier an indicator matrix,in which cell [i, j] indicates the presence of label j in sample i.

1.12.1.3. OneVsOneClassifier

OneVsOneClassifier constructs one classifier perpair of classes. At prediction time, the class which received the most votesis selected. In the event of a tie (among two classes with an equal number ofvotes), it selects the class with the highest aggregate classificationconfidence by summing over the pair-wise classification confidence levelscomputed by the underlying binary classifiers.

Since it requires to fit n_classes * (n_classes - 1) / 2 classifiers,this method is usually slower than one-vs-the-rest, due to itsO(n_classes^2) complexity. However, this method may be advantageous foralgorithms such as kernel algorithms which don’t scale well withn_samples. This is because each individual learning problem only involvesa small subset of the data whereas, with one-vs-the-rest, the completedataset is used n_classes times. The decision function is the resultof a monotonic transformation of the one-versus-one classification.

Below is an example of multiclass learning using OvO:

>>> from sklearn import datasets>>> from sklearn.multiclass import OneVsOneClassifier>>> from sklearn.svm import LinearSVC>>> X, y = datasets.load_iris(return_X_y=True)>>> OneVsOneClassifier(LinearSVC(dual="auto", random_state=0)).fit(X, y).predict(X)array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

1.12.1.4. OutputCodeClassifier

Error-Correcting Output Code-based strategies are fairly different fromone-vs-the-rest and one-vs-one. With these strategies, each class isrepresented in a Euclidean space, where each dimension can only be 0 or 1.Another way to put it is that each class is represented by a binary code (anarray of 0 and 1). The matrix which keeps track of the location/code of eachclass is called the code book. The code size is the dimensionality of theaforementioned space. Intuitively, each class should be represented by a codeas unique as possible and a good code book should be designed to optimizeclassification accuracy. In this implementation, we simply use arandomly-generated code book as advocated in [3] although more elaboratemethods may be added in the future.

At fitting time, one binary classifier per bit in the code book is fitted.At prediction time, the classifiers are used to project new points in theclass space and the class closest to the points is chosen.

In OutputCodeClassifier, the code_sizeattribute allows the user to control the number of classifiers which will beused. It is a percentage of the total number of classes.

A number between 0 and 1 will require fewer classifiers thanone-vs-the-rest. In theory, log2(n_classes) / n_classes is sufficient torepresent each class unambiguously. However, in practice, it may not lead togood accuracy since log2(n_classes) is much smaller than n_classes.

A number greater than 1 will require more classifiers thanone-vs-the-rest. In this case, some classifiers will in theory correct forthe mistakes made by other classifiers, hence the name “error-correcting”.In practice, however, this may not happen as classifier mistakes willtypically be correlated. The error-correcting output codes have a similareffect to bagging.

Below is an example of multiclass learning using Output-Codes:

>>> from sklearn import datasets>>> from sklearn.multiclass import OutputCodeClassifier>>> from sklearn.svm import LinearSVC>>> X, y = datasets.load_iris(return_X_y=True)>>> clf = OutputCodeClassifier(LinearSVC(dual="auto", random_state=0),...  code_size=2, random_state=0)>>> clf.fit(X, y).predict(X)array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

1.12.2. Multilabel classification

Multilabel classification (closely related to multioutputclassification) is a classification task labeling each sample with mlabels from n_classes possible classes, where m can be 0 ton_classes inclusive. This can be thought of as predicting properties of asample that are not mutually exclusive. Formally, a binary output is assignedto each class, for every sample. Positive classes are indicated with 1 andnegative classes with 0 or -1. It is thus comparable to running n_classesbinary classification tasks, for example withMultiOutputClassifier. This approach treatseach label independently whereas multilabel classifiers may treat themultiple classes simultaneously, accounting for correlated behavior amongthem.

For example, prediction of the topics relevant to a text document or video.The document or video may be about one of ‘religion’, ‘politics’, ‘finance’or ‘education’, several of the topic classes or all of the topic classes.

1.12.2.1. Target format

A valid representation of multilabel y is an either dense or sparsebinary matrix of shape (n_samples, n_classes). Each columnrepresents a class. The 1’s in each row denote the positive classes asample has been labeled with. An example of a dense matrix y for 3samples:

>>> y = np.array([[1, 0, 0, 1], [0, 0, 1, 1], [0, 0, 0, 0]])>>> print(y)[[1 0 0 1] [0 0 1 1] [0 0 0 0]]

Dense binary matrices can also be created usingMultiLabelBinarizer. For more information,refer to Transforming the prediction target (y).

An example of the same y in sparse matrix form:

>>> y_sparse = sparse.csr_matrix(y)>>> print(y_sparse) (0, 0) 1 (0, 3) 1 (1, 2) 1 (1, 3) 1

1.12.2.2. MultiOutputClassifier

Multilabel classification support can be added to any classifier withMultiOutputClassifier. This strategy consists offitting one classifier per target. This allows multiple target variableclassifications. The purpose of this class is to extend estimatorsto be able to estimate a series of target functions (f1,f2,f3…,fn)that are trained on a single X predictor matrix to predict a seriesof responses (y1,y2,y3…,yn).

You can find a usage example forMultiOutputClassifieras part of the section on Multiclass-multioutput classificationsince it is a generalization of multilabel classification tomulticlass outputs instead of binary outputs.

1.12.2.3. ClassifierChain

Classifier chains (see ClassifierChain) are a wayof combining a number of binary classifiers into a single multi-label modelthat is capable of exploiting correlations among targets.

For a multi-label classification problem with N classes, N binaryclassifiers are assigned an integer between 0 and N-1. These integersdefine the order of models in the chain. Each classifier is then fit on theavailable training data plus the true labels of the classes whosemodels were assigned a lower number.

When predicting, the true labels will not be available. Instead thepredictions of each model are passed on to the subsequent models in thechain to be used as features.

Clearly the order of the chain is important. The first model in the chainhas no information about the other labels while the last model in the chainhas features indicating the presence of all of the other labels. In generalone does not know the optimal ordering of the models in the chain sotypically many randomly ordered chains are fit and their predictions areaveraged together.

1.12.3. Multiclass-multioutput classification

Multiclass-multioutput classification(also known as multitask classification) is aclassification task which labels each sample with a set of non-binaryproperties. Both the number of properties and the number ofclasses per property is greater than 2. A single estimator thushandles several joint classification tasks. This is both a generalization ofthe multilabel classification task, which only considers binaryattributes, as well as a generalization of the multiclass classificationtask, where only one property is considered.

For example, classification of the properties “type of fruit” and “colour”for a set of images of fruit. The property “type of fruit” has the possibleclasses: “apple”, “pear” and “orange”. The property “colour” has thepossible classes: “green”, “red”, “yellow” and “orange”. Each sample is animage of a fruit, a label is output for both properties and each label isone of the possible classes of the corresponding property.

Note that all classifiers handling multiclass-multioutput (also known asmultitask classification) tasks, support the multilabel classification taskas a special case. Multitask classification is similar to the multioutputclassification task with different model formulations. For more information,see the relevant estimator documentation.

Below is an example of multiclass-multioutput classification:

>>> from sklearn.datasets import make_classification>>> from sklearn.multioutput import MultiOutputClassifier>>> from sklearn.ensemble import RandomForestClassifier>>> from sklearn.utils import shuffle>>> import numpy as np>>> X, y1 = make_classification(n_samples=10, n_features=100,...  n_informative=30, n_classes=3,...  random_state=1)>>> y2 = shuffle(y1, random_state=1)>>> y3 = shuffle(y1, random_state=2)>>> Y = np.vstack((y1, y2, y3)).T>>> n_samples, n_features = X.shape # 10,100>>> n_outputs = Y.shape[1] # 3>>> n_classes = 3>>> forest = RandomForestClassifier(random_state=1)>>> multi_target_forest = MultiOutputClassifier(forest, n_jobs=2)>>> multi_target_forest.fit(X, Y).predict(X)array([[2, 2, 0], [1, 2, 1], [2, 1, 0], [0, 0, 2], [0, 2, 1], [0, 0, 2], [1, 1, 0], [1, 1, 1], [0, 0, 2], [2, 0, 0]])

Warning

At present, no metric in sklearn.metricssupports the multiclass-multioutput classification task.

1.12.3.1. Target format

A valid representation of multioutput y is a dense matrix of shape(n_samples, n_classes) of class labels. A column wise concatenation of 1dmulticlass variables. An example of y for 3 samples:

>>> y = np.array([['apple', 'green'], ['orange', 'orange'], ['pear', 'green']])>>> print(y)[['apple' 'green'] ['orange' 'orange'] ['pear' 'green']]

1.12.4. Multioutput regression

Multioutput regression predicts multiple numerical properties for eachsample. Each property is a numerical variable and the number of propertiesto be predicted for each sample is greater than or equal to 2. Some estimatorsthat support multioutput regression are faster than just running n_outputestimators.

For example, prediction of both wind speed and wind direction, in degrees,using data obtained at a certain location. Each sample would be dataobtained at one location and both wind speed and direction would beoutput for each sample.

1.12.4.1. Target format

A valid representation of multioutput y is a dense matrix of shape(n_samples, n_output) of floats. A column wise concatenation ofcontinuous variables. An example of y for 3 samples:

>>> y = np.array([[31.4, 94], [40.5, 109], [25.0, 30]])>>> print(y)[[ 31.4 94. ] [ 40.5 109. ] [ 25. 30. ]]

1.12.4.2. MultiOutputRegressor

Multioutput regression support can be added to any regressor withMultiOutputRegressor. This strategy consists offitting one regressor per target. Since each target is represented by exactlyone regressor it is possible to gain knowledge about the target byinspecting its corresponding regressor. AsMultiOutputRegressor fits one regressor pertarget it can not take advantage of correlations between targets.

Below is an example of multioutput regression:

>>> from sklearn.datasets import make_regression>>> from sklearn.multioutput import MultiOutputRegressor>>> from sklearn.ensemble import GradientBoostingRegressor>>> X, y = make_regression(n_samples=10, n_targets=3, random_state=1)>>> MultiOutputRegressor(GradientBoostingRegressor(random_state=0)).fit(X, y).predict(X)array([[-154.75474165, -147.03498585, -50.03812219], [ 7.12165031, 5.12914884, -81.46081961], [-187.8948621 , -100.44373091, 13.88978285], [-141.62745778, 95.02891072, -191.48204257], [ 97.03260883, 165.34867495, 139.52003279], [ 123.92529176, 21.25719016, -7.84253 ], [-122.25193977, -85.16443186, -107.12274212], [ -30.170388 , -94.80956739, 12.16979946], [ 140.72667194, 176.50941682, -17.50447799], [ 149.37967282, -81.15699552, -5.72850319]])

1.12.4.3. RegressorChain

Regressor chains (see RegressorChain) isanalogous to ClassifierChain as a way ofcombining a number of regressions into a single multi-target model that iscapable of exploiting correlations among targets.

1.12. Multiclass and multioutput algorithms (2024)
Top Articles
Latest Posts
Article information

Author: Fr. Dewey Fisher

Last Updated:

Views: 5943

Rating: 4.1 / 5 (62 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Fr. Dewey Fisher

Birthday: 1993-03-26

Address: 917 Hyun Views, Rogahnmouth, KY 91013-8827

Phone: +5938540192553

Job: Administration Developer

Hobby: Embroidery, Horseback riding, Juggling, Urban exploration, Skiing, Cycling, Handball

Introduction: My name is Fr. Dewey Fisher, I am a powerful, open, faithful, combative, spotless, faithful, fair person who loves writing and wants to share my knowledge and understanding with you.