将列附加到二维变量

时间:2021-12-01 23:18:04

I have a variable. the variable is two dimensional but i don't know if it is a list or array. thinking about this variable as a matrix of size n by m. I want to append to it a column of size by 1. so my new variable would be n by m+1. this is how i am doing it:

我有一个变量。变量是二维的,但我不知道它是列表还是数组。将此变量视为大小为n乘m的矩阵。我想在它旁边添加一个大小为1的列。所以我的新变量将是m乘以m + 1。这就是我这样做的方式:

train_data_features.append(train['NewsDesk'])

this is the error i am getting:

这是我得到的错误:

train_data_features.append(train['NewsDesk'])
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/scipy/sparse/base.py", line 440, in __getattr__
    raise AttributeError(attr + " not found")
AttributeError: append not found

and this is my whole code:

这是我的全部代码:

import os
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.ensemble import RandomForestClassifier
from KaggleWord2VecUtility import KaggleWord2VecUtility
import pandas as pd
import numpy as np

if __name__ == '__main__':
    train = pd.read_csv(os.path.join(os.path.dirname(__file__), 'data', 'NYTimesBlogTrain.csv'), header=0)
    test = pd.read_csv(os.path.join(os.path.dirname(__file__), 'data', 'NYTimesBlogTest.csv'), header=0)
    train["Headline"].fillna(0)
    print 'A sample headline is:'
    print train["Headline"][0:10]
    #raw_input("Press Enter to continue...")


    #print 'Download text data sets. If you already have NLTK datasets downloaded, just close the Python download window...'
    #nltk.download()  # Download text data sets, including stop words

    # Initialize an empty list to hold the clean reviews
    clean_train_reviews = []
    # Loop over each review; create an index i that goes from 0 to the length
    # of the movie review list
    print "Cleaning and parsing the training set headlines...\n"
    for i in xrange( 0, len(train["Headline"])):
    #for i in xrange( 0, 10):
        if pd.isnull(train["Headline"][i])==False:
            clean_train_reviews.append(" ".join(KaggleWord2VecUtility.review_to_wordlist(train["Headline"][i], True)))
        else:
            clean_train_reviews.append(" ")
    print 'clean train reviews (headlines)'
    print clean_train_reviews  

    # ****** Create a bag of words from the training set
    #
    print "Creating the bag of words...\n"


    # Initialize the "CountVectorizer" object, which is scikit-learn's
    # bag of words tool.
    vectorizer = CountVectorizer(analyzer = "word",   \
                             tokenizer = None,    \
                             preprocessor = None, \
                             stop_words = None,   \
                             max_features = 5000)

    # fit_transform() does two functions: First, it fits the model
    # and learns the vocabulary; second, it transforms our training data
    # into feature vectors. The input to fit_transform should be a list of
    # strings.

    train_data_features = vectorizer.fit_transform(clean_train_reviews)
    print 'train_data_features'
    print train_data_features
    print 'train_data_features.shape'
    print train_data_features.shape
    # Take a look at the words in the vocabulary
    vocab = vectorizer.get_feature_names()
    print 'vocab'
    print vocab

    # Sum up the counts of each vocabulary word
    #dist = np.sum(train_data_features, axis=0)
    dist = train_data_features.sum (axis=0)
    print 'dist'
    print dist
    # For each, print the vocabulary word and the number of times it 
    # appears in the training set
    print 'tag+count'
    for tag, count in zip(vocab, dist):
        print count, tag
        print 'and'

#    for i in xrange( 0, len(train["NewsDesk"])):    
    for i in xrange( 0, 10):    
        if pd.isnull(train["NewsDesk"][i])==False:
            print train['NewsDesk'][i]
        else:
            print '   '

    train_data_features.append(train['NewsDesk'])

1 个解决方案

#1


There isn't an append for sparse matrices. But there is vstack and hstack. I'll illustrate with a simple matrix

稀疏矩阵没有附加。但是有vstack和hstack。我将用一个简单的矩阵来说明

In [121]: from scipy import sparse
In [122]: M = sparse.csr_matrix([[0,1,0],[1,0,1]])

In [123]: M.A   # show as array
Out[123]: 
array([[0, 1, 0],
       [1, 0, 1]], dtype=int32)

In [124]: M.todense()  # show a numpy matrix
Out[124]: 
matrix([[0, 1, 0],
        [1, 0, 1]], dtype=int32)

In [125]: col=np.array([[2],[3]])  # a simple column array
In [126]: col
Out[126]: 
array([[2],
       [3]])

In [128]: sparse.hstack([M,col])
Out[128]: 
<2x4 sparse matrix of type '<class 'numpy.int32'>'
    with 5 stored elements in COOrdinate format>

In [129]: sparse.hstack([M,col]).A
Out[129]: 
array([[0, 1, 0, 2],
       [1, 0, 1, 3]], dtype=int32)

In [130]: sparse.vstack([M,[1,2,3]]).A   # or add a row
Out[130]: 
array([[0, 1, 0],
       [1, 0, 1],
       [1, 2, 3]], dtype=int32)

numpy append is just a fancy wrapper for np.concatenate. vstack and hstack are simpler wrappers. Also, append does not change the array in place (like the list append). It best to just avoid it, thinking instead in terms concatenate.

numpy append只是np.concatenate的一个奇特的包装器。 vstack和hstack是更简单的包装器。此外,append不会更改数组(如列表追加)。最好只是避免它,而是用连接的方式思考。

#1


There isn't an append for sparse matrices. But there is vstack and hstack. I'll illustrate with a simple matrix

稀疏矩阵没有附加。但是有vstack和hstack。我将用一个简单的矩阵来说明

In [121]: from scipy import sparse
In [122]: M = sparse.csr_matrix([[0,1,0],[1,0,1]])

In [123]: M.A   # show as array
Out[123]: 
array([[0, 1, 0],
       [1, 0, 1]], dtype=int32)

In [124]: M.todense()  # show a numpy matrix
Out[124]: 
matrix([[0, 1, 0],
        [1, 0, 1]], dtype=int32)

In [125]: col=np.array([[2],[3]])  # a simple column array
In [126]: col
Out[126]: 
array([[2],
       [3]])

In [128]: sparse.hstack([M,col])
Out[128]: 
<2x4 sparse matrix of type '<class 'numpy.int32'>'
    with 5 stored elements in COOrdinate format>

In [129]: sparse.hstack([M,col]).A
Out[129]: 
array([[0, 1, 0, 2],
       [1, 0, 1, 3]], dtype=int32)

In [130]: sparse.vstack([M,[1,2,3]]).A   # or add a row
Out[130]: 
array([[0, 1, 0],
       [1, 0, 1],
       [1, 2, 3]], dtype=int32)

numpy append is just a fancy wrapper for np.concatenate. vstack and hstack are simpler wrappers. Also, append does not change the array in place (like the list append). It best to just avoid it, thinking instead in terms concatenate.

numpy append只是np.concatenate的一个奇特的包装器。 vstack和hstack是更简单的包装器。此外,append不会更改数组(如列表追加)。最好只是避免它,而是用连接的方式思考。