I have a numpy array like that:
我有一个像这样的numpy数组:
l1 = (['United States', 'England', 'South Africa']).
Sometimes it could have >1 Value:
有时它可能有> 1值:
l1 = ([['United States','South Korea'], 'England', 'South Africa'])
I want to use MultiLabelBinarizer to encode these values. According to the documentation of fit_transform in the scikit-learn documentation. The parameter should be
我想使用MultiLabelBinarizer来编码这些值。根据scikit-learn文档中的fit_transform文档。参数应该是
y : iterable of iterables A set of labels (any orderable and hashable object) for each sample. If the classes parameter is set, y will not be iterated.
y:iterable of iterables每个样本的一组标签(任何可订购和可散列对象)。如果设置了classes参数,则不会迭代y。
How can I convert this numpy array of list and single strings into sets?
如何将这个numpy列表和单个字符串数组转换为集合?
I have tried this:
我试过这个:
value = [set(v) for v in l1]
list_2sets = np.asarray(value)
But it seems it doesn't work properly.
但它似乎无法正常工作。
The thing is that I do not have the values considered (all countries). If I had this, I tried the following which works:
问题是我没有考虑价值(所有国家)。如果我有这个,我尝试了以下工作:
mlb.fit_transform(headings.split(', ') for headings in l1)
Being headings the list of all values considered:
作为标题列出所有考虑的价值观:
['England','Spain', ...]
But I do not have those values so far, so I wanted to try applying MLB without 'headings'
但到目前为止我还没有这些价值观,所以我想尝试在没有“标题”的情况下应用MLB
1 个解决方案
#1
1
Try to preprocess your array of strings as follows:
尝试预处理字符串数组,如下所示:
In [50]: l1 = [[x] if isinstance(x, (str)) else x for x in l1]
In [51]: l1
Out[51]: [['United States', 'South Korea'], ['England'], ['South Africa']]
For Python 2.x:
对于Python 2.x:
In [50]: l1 = [[x] if isinstance(x, (str, unicode)) else x for x in l1]
In [51]: l1
Out[51]: [['United States', 'South Korea'], ['England'], ['South Africa']]
#1
1
Try to preprocess your array of strings as follows:
尝试预处理字符串数组,如下所示:
In [50]: l1 = [[x] if isinstance(x, (str)) else x for x in l1]
In [51]: l1
Out[51]: [['United States', 'South Korea'], ['England'], ['South Africa']]
For Python 2.x:
对于Python 2.x:
In [50]: l1 = [[x] if isinstance(x, (str, unicode)) else x for x in l1]
In [51]: l1
Out[51]: [['United States', 'South Korea'], ['England'], ['South Africa']]