I'm new to data science and trying to do some data wrangling with python 2.7 in iPython notebook. A tutorial I was following for my first project asked me to replace all NaN intputs with 0 or 1. But I'd like to consider another approach where I can 1st look at the count for the rows with non-numerical values corresponding to all rows having credit_history as NaN...
我是数据科学的新手,并尝试在iPython笔记本中使用python 2.7进行一些数据争用。我为第一个项目关注的教程要求我用0或1替换所有NaN输入。但是我想考虑另一种方法,我可以首先查看具有与所有行对应的非数值的行的计数将credit_history作为NaN ......
Ideal Output when Credit_History is NaN:
Credit_History为NaN时的理想输出:
Self_Employed
Yes 532
No 32
Married
No 398
Yes 213
And for the numerical values, I'd like to get the mean for all columns when credit_history is NaN
对于数值,我想在credit_history为NaN时得到所有列的均值
Ideal output for non-numberical values when Credit_History is NaN:
当Credit_History为NaN时,非数字值的理想输出:
Mean Applicant Income: 54003.1232
LoanAmount: 35435.12
Loan_Amount_Term: 360
Thanks in advance!
提前致谢!
1 个解决方案
#1
0
For value counts, you can use pd.Series.value_counts
:
对于值计数,您可以使用pd.Series.value_counts:
df.loc[pd.isnull(df['Credit_History']), 'Self_Employed'].value_counts()
df.loc[pd.isnull(df['Credit_History']), 'Married'].value_counts()
For calculating mean, you can use pd.DataFrame.mean
:
要计算均值,可以使用pd.DataFrame.mean:
cols = ['Applicant_Income', 'LoanAmount', 'Loan_Amount_Term']
df.loc[pd.isnull(df['Credit_History']), cols].mean()
#1
0
For value counts, you can use pd.Series.value_counts
:
对于值计数,您可以使用pd.Series.value_counts:
df.loc[pd.isnull(df['Credit_History']), 'Self_Employed'].value_counts()
df.loc[pd.isnull(df['Credit_History']), 'Married'].value_counts()
For calculating mean, you can use pd.DataFrame.mean
:
要计算均值,可以使用pd.DataFrame.mean:
cols = ['Applicant_Income', 'LoanAmount', 'Loan_Amount_Term']
df.loc[pd.isnull(df['Credit_History']), cols].mean()