LabelEncoder可以将标签分配一个0—n_classes-1之间的编码
将各种标签分配一个可数的连续编号:
1
2
3
4
5
6
7
8
9
10
|
>>> le = preprocessing.LabelEncoder()
>>> le.fit([ 1 , 2 , 2 , 6 ])
LabelEncoder()
>>> le.classes_
array([ 1 , 2 , 6 ])
>>> le.transform([ 1 , 1 , 2 , 6 ]) # Transform Categories Into Integers
array([ 0 , 0 , 1 , 2 ], dtype = int64)
>>> le.inverse_transform([ 0 , 0 , 1 , 2 ]) # Transform Integers Into Categories
array([ 1 , 1 , 2 , 6 ])
|
1
2
3
4
5
6
7
8
9
|
>>> le = preprocessing.LabelEncoder()
>>> le.fit([ "paris" , "paris" , "tokyo" , "amsterdam" ])
LabelEncoder()
>>> list (le.classes_)
[ 'amsterdam' , 'paris' , 'tokyo' ]
>>> le.transform([ "tokyo" , "tokyo" , "paris" ]) # Transform Categories Into Integers
array([ 2 , 2 , 1 ], dtype = int64)
>>> list (le.inverse_transform([ 2 , 2 , 1 ])) #Transform Integers Into Categories
[ 'tokyo' , 'tokyo' , 'paris' ]
|
将DataFrame中的所有ID标签转换成连续编号:
1
2
3
4
|
from sklearn.preprocessing import LabelEncoder
import numpy as np
import pandas as pd
df = pd.read_csv( 'testdata.csv' ,sep = '|' ,header = None )
|
1
2
3
4
5
6
7
8
9
10
11
|
0 1 2 3 4 5
0 37 52 55 50 38 54
1 17 32 20 9 6 48
2 28 10 56 51 45 16
3 27 49 41 30 53 19
4 44 29 8 1 46 13
5 11 26 21 14 7 33
6 0 39 22 33 35 43
7 18 15 47 5 25 34
8 23 2 4 9 3 31
9 12 57 36 40 42 24
|
1
2
3
|
le = LabelEncoder()
le.fit(np.unique(df.values))
df. apply (le.transform)
|
1
2
3
4
5
6
7
8
9
10
11
|
0 1 2 3 4 5
0 37 52 55 50 38 54
1 17 32 20 9 6 48
2 28 10 56 51 45 16
3 27 49 41 30 53 19
4 44 29 8 1 46 13
5 11 26 21 14 7 33
6 0 39 22 33 35 43
7 18 15 47 5 25 34
8 23 2 4 9 3 31
9 12 57 36 40 42 24
|
将DataFrame中的每一行ID标签分别转换成连续编号:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline
class MultiColumnLabelEncoder:
def __init__( self ,columns = None ):
self .columns = columns # array of column names to encode
def fit( self ,X,y = None ):
return self # not relevant here
def transform( self ,X):
'''
Transforms columns of X specified in self.columns using
LabelEncoder(). If no columns specified, transforms all
columns in X.
'''
output = X.copy()
if self .columns is not None :
for col in self .columns:
output[col] = LabelEncoder().fit_transform(output[col])
else :
for colname,col in output.iteritems():
output[colname] = LabelEncoder().fit_transform(col)
return output
def fit_transform( self ,X,y = None ):
return self .fit(X,y).transform(X)
|
1
|
MultiColumnLabelEncoder(columns = [ 0 , 1 , 2 , 3 , 4 , 5 ]).fit_transform(df)
|
或者
1
|
df. apply (LabelEncoder().fit_transform)
|
1
2
3
4
5
6
7
8
9
10
11
|
0 1 2 3 4 5
0 8 8 8 7 5 9
1 3 5 2 2 1 8
2 7 1 9 8 7 1
3 6 7 6 4 9 2
4 9 4 1 0 8 0
5 1 3 3 3 2 5
6 0 6 4 5 4 7
7 4 2 7 1 3 6
8 5 0 0 2 0 4
9 2 9 5 6 6 3
|
1
2
3
4
5
6
|
# Create some toy data in a Pandas dataframe
fruit_data = pd.DataFrame({
'fruit' : [ 'apple' , 'orange' , 'pear' , 'orange' ],
'color' : [ 'red' , 'orange' , 'green' , 'green' ],
'weight' : [ 5 , 6 , 3 , 4 ]
})
|
1
2
3
4
5
|
color fruit weight
0 red apple 5
1 orange orange 6
2 green pear 3
3 green orange 4
|
1
|
MultiColumnLabelEncoder(columns = [ 'fruit' , 'color' ]).fit_transform(fruit_data)
|
或者
1
|
fruit_data[[ 'fruit' , 'color' ]] = fruit_data[[ 'fruit' , 'color' ]]. apply (LabelEncoder().fit_transform)
|
1
2
3
4
5
|
color fruit weight
0 2 0 5
1 1 1 6
2 0 2 3
3 0 1 4
|
以上这篇使用sklearn之LabelEncoder将Label标准化的方法就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持服务器之家。
原文链接:https://blog.csdn.net/u010412858/article/details/78386407