代码中的value.counts()代码,用于在一列中指定特定值

时间:2022-02-01 11:09:47

I'm new to data science and trying to do some data wrangling with python 2.7 in iPython notebook. A tutorial I was following for my first project asked me to replace all NaN intputs with 0 or 1. But I'd like to consider another approach where I can 1st look at the count for the rows with non-numerical values corresponding to all rows having credit_history as NaN...

我是数据科学的新手,并尝试在iPython笔记本中使用python 2.7进行一些数据争用。我为第一个项目关注的教程要求我用0或1替换所有NaN输入。但是我想考虑另一种方法,我可以首先查看具有与所有行对应的非数值的行的计数将credit_history作为NaN ......

Ideal Output when Credit_History is NaN:

Credit_History为NaN时的理想输出:

Self_Employed
Yes  532
No   32

Married
No   398
Yes  213

And for the numerical values, I'd like to get the mean for all columns when credit_history is NaN

对于数值,我想在credit_history为NaN时得到所有列的均值

Ideal output for non-numberical values when Credit_History is NaN:

当Credit_History为NaN时,非​​数字值的理想输出:

Mean Applicant Income: 54003.1232
LoanAmount: 35435.12
Loan_Amount_Term: 360

Thanks in advance!

提前致谢!

1 个解决方案

#1


0  

For value counts, you can use pd.Series.value_counts:

对于值计数,您可以使用pd.Series.value_counts:

df.loc[pd.isnull(df['Credit_History']), 'Self_Employed'].value_counts()
df.loc[pd.isnull(df['Credit_History']), 'Married'].value_counts()

For calculating mean, you can use pd.DataFrame.mean:

要计算均值,可以使用pd.DataFrame.mean:

cols = ['Applicant_Income', 'LoanAmount', 'Loan_Amount_Term']

df.loc[pd.isnull(df['Credit_History']), cols].mean()

#1


0  

For value counts, you can use pd.Series.value_counts:

对于值计数,您可以使用pd.Series.value_counts:

df.loc[pd.isnull(df['Credit_History']), 'Self_Employed'].value_counts()
df.loc[pd.isnull(df['Credit_History']), 'Married'].value_counts()

For calculating mean, you can use pd.DataFrame.mean:

要计算均值,可以使用pd.DataFrame.mean:

cols = ['Applicant_Income', 'LoanAmount', 'Loan_Amount_Term']

df.loc[pd.isnull(df['Credit_History']), cols].mean()