I have a data set coming from a customer survey. As it stands, each column corresponds to a question. When importing the data I end up with the column name being the question:
['this is a long question 01', 'this is a long question 02, ..., 'this is a long question 186']
That is right 186 questions = columns.
I am new to Panda. My analysis is quite simple, I just need to do a few things like:
myDataFrame.loc['column1' == 'BLue hair']
As column1 is really long, managing it becomes cumbersome. I figure I could just reference the index. Something like:
myDataFrame.loc[myDataFrame[33] == 'BLue hair']
That doesn't seem to work either for DataFrame.loc or Dataframe.iloc.
I was wondering what is the proper way of doing this. By the way, Transposing the dataframe allows me to get rid of the column name issue but it complicates my analysis unnecessarily.
I have not yet grasp many concepts of working with Pandas and dataframes, I appreciate any suggestion.
2 个解决方案
If you write
cols = myDataFrame.columns
then you can use
myDataFrame[myDataFrame[cols[33]] == 'BLue hair']
My preference in this situation is to number your columns and use a dictionary to link each question with a number.
For example:
# list of questions, equivalent to existing column names
questions = ['this is a long question 001', 'this is a long question 002',
'this is a long question 003', 'this is a long question 004']
# create dictionary
id_question = dict(enumerate(questions, 1))
# reverse dictionary for easy access later
question_id = {v: k for k, v in id_question.items()}
# {1: 'this is a long question 001', 2: 'this is a long question 002',
# 3: 'this is a long question 003', 4: 'this is a long question 004'}
# redefine column names in dataframe from mapper dict keys
df.columns = list(mapper)
Now you can easily convert between the numeric id and your questions via the 2 dictionaries you have created.
If you write
cols = myDataFrame.columns
then you can use
myDataFrame[myDataFrame[cols[33]] == 'BLue hair']
My preference in this situation is to number your columns and use a dictionary to link each question with a number.
For example:
# list of questions, equivalent to existing column names
questions = ['this is a long question 001', 'this is a long question 002',
'this is a long question 003', 'this is a long question 004']
# create dictionary
id_question = dict(enumerate(questions, 1))
# reverse dictionary for easy access later
question_id = {v: k for k, v in id_question.items()}
# {1: 'this is a long question 001', 2: 'this is a long question 002',
# 3: 'this is a long question 003', 4: 'this is a long question 004'}
# redefine column names in dataframe from mapper dict keys
df.columns = list(mapper)
Now you can easily convert between the numeric id and your questions via the 2 dictionaries you have created.