I have a variable. the variable is two dimensional but i don't know if it is a list or array. thinking about this variable as a matrix of size n by m. I want to append to it a column of size by 1. so my new variable would be n by m+1. this is how i am doing it:
我有一个变量。变量是二维的,但我不知道它是列表还是数组。将此变量视为大小为n乘m的矩阵。我想在它旁边添加一个大小为1的列。所以我的新变量将是m乘以m + 1。这就是我这样做的方式:
train_data_features.append(train['NewsDesk'])
this is the error i am getting:
这是我得到的错误:
train_data_features.append(train['NewsDesk'])
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/scipy/sparse/base.py", line 440, in __getattr__
raise AttributeError(attr + " not found")
AttributeError: append not found
and this is my whole code:
这是我的全部代码:
import os
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.ensemble import RandomForestClassifier
from KaggleWord2VecUtility import KaggleWord2VecUtility
import pandas as pd
import numpy as np
if __name__ == '__main__':
train = pd.read_csv(os.path.join(os.path.dirname(__file__), 'data', 'NYTimesBlogTrain.csv'), header=0)
test = pd.read_csv(os.path.join(os.path.dirname(__file__), 'data', 'NYTimesBlogTest.csv'), header=0)
train["Headline"].fillna(0)
print 'A sample headline is:'
print train["Headline"][0:10]
#raw_input("Press Enter to continue...")
#print 'Download text data sets. If you already have NLTK datasets downloaded, just close the Python download window...'
#nltk.download() # Download text data sets, including stop words
# Initialize an empty list to hold the clean reviews
clean_train_reviews = []
# Loop over each review; create an index i that goes from 0 to the length
# of the movie review list
print "Cleaning and parsing the training set headlines...\n"
for i in xrange( 0, len(train["Headline"])):
#for i in xrange( 0, 10):
if pd.isnull(train["Headline"][i])==False:
clean_train_reviews.append(" ".join(KaggleWord2VecUtility.review_to_wordlist(train["Headline"][i], True)))
else:
clean_train_reviews.append(" ")
print 'clean train reviews (headlines)'
print clean_train_reviews
# ****** Create a bag of words from the training set
#
print "Creating the bag of words...\n"
# Initialize the "CountVectorizer" object, which is scikit-learn's
# bag of words tool.
vectorizer = CountVectorizer(analyzer = "word", \
tokenizer = None, \
preprocessor = None, \
stop_words = None, \
max_features = 5000)
# fit_transform() does two functions: First, it fits the model
# and learns the vocabulary; second, it transforms our training data
# into feature vectors. The input to fit_transform should be a list of
# strings.
train_data_features = vectorizer.fit_transform(clean_train_reviews)
print 'train_data_features'
print train_data_features
print 'train_data_features.shape'
print train_data_features.shape
# Take a look at the words in the vocabulary
vocab = vectorizer.get_feature_names()
print 'vocab'
print vocab
# Sum up the counts of each vocabulary word
#dist = np.sum(train_data_features, axis=0)
dist = train_data_features.sum (axis=0)
print 'dist'
print dist
# For each, print the vocabulary word and the number of times it
# appears in the training set
print 'tag+count'
for tag, count in zip(vocab, dist):
print count, tag
print 'and'
# for i in xrange( 0, len(train["NewsDesk"])):
for i in xrange( 0, 10):
if pd.isnull(train["NewsDesk"][i])==False:
print train['NewsDesk'][i]
else:
print ' '
train_data_features.append(train['NewsDesk'])
1 个解决方案
#1
There isn't an append
for sparse matrices. But there is vstack
and hstack
. I'll illustrate with a simple matrix
稀疏矩阵没有附加。但是有vstack和hstack。我将用一个简单的矩阵来说明
In [121]: from scipy import sparse
In [122]: M = sparse.csr_matrix([[0,1,0],[1,0,1]])
In [123]: M.A # show as array
Out[123]:
array([[0, 1, 0],
[1, 0, 1]], dtype=int32)
In [124]: M.todense() # show a numpy matrix
Out[124]:
matrix([[0, 1, 0],
[1, 0, 1]], dtype=int32)
In [125]: col=np.array([[2],[3]]) # a simple column array
In [126]: col
Out[126]:
array([[2],
[3]])
In [128]: sparse.hstack([M,col])
Out[128]:
<2x4 sparse matrix of type '<class 'numpy.int32'>'
with 5 stored elements in COOrdinate format>
In [129]: sparse.hstack([M,col]).A
Out[129]:
array([[0, 1, 0, 2],
[1, 0, 1, 3]], dtype=int32)
In [130]: sparse.vstack([M,[1,2,3]]).A # or add a row
Out[130]:
array([[0, 1, 0],
[1, 0, 1],
[1, 2, 3]], dtype=int32)
numpy
append
is just a fancy wrapper for np.concatenate
. vstack
and hstack
are simpler wrappers. Also, append
does not change the array in place (like the list append). It best to just avoid it, thinking instead in terms concatenate
.
numpy append只是np.concatenate的一个奇特的包装器。 vstack和hstack是更简单的包装器。此外,append不会更改数组(如列表追加)。最好只是避免它,而是用连接的方式思考。
#1
There isn't an append
for sparse matrices. But there is vstack
and hstack
. I'll illustrate with a simple matrix
稀疏矩阵没有附加。但是有vstack和hstack。我将用一个简单的矩阵来说明
In [121]: from scipy import sparse
In [122]: M = sparse.csr_matrix([[0,1,0],[1,0,1]])
In [123]: M.A # show as array
Out[123]:
array([[0, 1, 0],
[1, 0, 1]], dtype=int32)
In [124]: M.todense() # show a numpy matrix
Out[124]:
matrix([[0, 1, 0],
[1, 0, 1]], dtype=int32)
In [125]: col=np.array([[2],[3]]) # a simple column array
In [126]: col
Out[126]:
array([[2],
[3]])
In [128]: sparse.hstack([M,col])
Out[128]:
<2x4 sparse matrix of type '<class 'numpy.int32'>'
with 5 stored elements in COOrdinate format>
In [129]: sparse.hstack([M,col]).A
Out[129]:
array([[0, 1, 0, 2],
[1, 0, 1, 3]], dtype=int32)
In [130]: sparse.vstack([M,[1,2,3]]).A # or add a row
Out[130]:
array([[0, 1, 0],
[1, 0, 1],
[1, 2, 3]], dtype=int32)
numpy
append
is just a fancy wrapper for np.concatenate
. vstack
and hstack
are simpler wrappers. Also, append
does not change the array in place (like the list append). It best to just avoid it, thinking instead in terms concatenate
.
numpy append只是np.concatenate的一个奇特的包装器。 vstack和hstack是更简单的包装器。此外,append不会更改数组(如列表追加)。最好只是避免它,而是用连接的方式思考。