I have a data set coming from a customer survey. As it stands, each column corresponds to a question. When importing the data I end up with the column name being the question:
我有一个来自客户调查的数据集。目前,每一列都对应一个问题。当导入数据时,我以列名作为问题:
['this is a long question 01', 'this is a long question 02, ..., 'this is a long question 186']
That is right 186 questions = columns.
对,186题=栏。
I am new to Panda. My analysis is quite simple, I just need to do a few things like:
我是新来的熊猫。我的分析很简单,我只需要做以下几件事:
myDataFrame.loc['column1' == 'BLue hair']
As column1 is really long, managing it becomes cumbersome. I figure I could just reference the index. Something like:
由于column1非常长,管理它变得非常麻烦。我想我可以只引用索引。喜欢的东西:
myDataFrame.loc[myDataFrame[33] == 'BLue hair']
That doesn't seem to work either for DataFrame.loc or Dataframe.iloc.
这似乎对DataFrame也不起作用。loc或Dataframe.iloc。
I was wondering what is the proper way of doing this. By the way, Transposing the dataframe allows me to get rid of the column name issue but it complicates my analysis unnecessarily.
我想知道做这件事的正确方法是什么。顺便说一下,对dataframe进行换位可以消除列名问题,但不必要地使我的分析变得复杂。
I have not yet grasp many concepts of working with Pandas and dataframes, I appreciate any suggestion.
我还没有掌握与熊猫和数据存储器合作的许多概念,我感谢任何建议。
2 个解决方案
#1
1
If you write
如果你写
cols = myDataFrame.columns
then you can use
然后您可以使用
myDataFrame[myDataFrame[cols[33]] == 'BLue hair']
#2
1
My preference in this situation is to number your columns and use a dictionary to link each question with a number.
在这种情况下,我更喜欢给你的专栏编号,并用字典把每个问题与数字联系起来。
For example:
例如:
# list of questions, equivalent to existing column names
questions = ['this is a long question 001', 'this is a long question 002',
'this is a long question 003', 'this is a long question 004']
# create dictionary
id_question = dict(enumerate(questions, 1))
# reverse dictionary for easy access later
question_id = {v: k for k, v in id_question.items()}
# {1: 'this is a long question 001', 2: 'this is a long question 002',
# 3: 'this is a long question 003', 4: 'this is a long question 004'}
# redefine column names in dataframe from mapper dict keys
df.columns = list(mapper)
Now you can easily convert between the numeric id and your questions via the 2 dictionaries you have created.
现在,您可以通过您创建的两个字典轻松地在数字id和您的问题之间进行转换。
#1
1
If you write
如果你写
cols = myDataFrame.columns
then you can use
然后您可以使用
myDataFrame[myDataFrame[cols[33]] == 'BLue hair']
#2
1
My preference in this situation is to number your columns and use a dictionary to link each question with a number.
在这种情况下,我更喜欢给你的专栏编号,并用字典把每个问题与数字联系起来。
For example:
例如:
# list of questions, equivalent to existing column names
questions = ['this is a long question 001', 'this is a long question 002',
'this is a long question 003', 'this is a long question 004']
# create dictionary
id_question = dict(enumerate(questions, 1))
# reverse dictionary for easy access later
question_id = {v: k for k, v in id_question.items()}
# {1: 'this is a long question 001', 2: 'this is a long question 002',
# 3: 'this is a long question 003', 4: 'this is a long question 004'}
# redefine column names in dataframe from mapper dict keys
df.columns = list(mapper)
Now you can easily convert between the numeric id and your questions via the 2 dictionaries you have created.
现在,您可以通过您创建的两个字典轻松地在数字id和您的问题之间进行转换。