I want to replicate what where clause does in SQL, using Python. Many times conditions in where clause can be complex and have multiple conditions. I am able to do it in the following way. But I think there should be a smarter way to achieve this. I have following data and code.
我想用Python复制SQL中where子句。许多情况下,where子句可以是复杂的,并且有多个条件。我可以用以下的方法做这件事。但我认为应该有一种更聪明的方式来实现这一点。我有以下数据和代码。
My requirement is: I want to select all columns only when first letter in the address is 'N'. This is the initial data frame.
我的要求是:只有当地址中的第一个字母是“N”时,我才需要选择所有的列。这是初始数据帧。
d = {'name': ['john', 'tom', 'bob', 'rock', 'dick'], 'Age': [23, 32, 45, 42, 28], 'YrsOfEducation': [10, 15, 8, 12, 10], 'Address': ['NY', 'NJ', 'PA', 'NY', 'CA']}
import pandas as pd
df = pd.DataFrame(data = d)
df['col1'] = df['Address'].str[0:1] #creating a new column which will have only the first letter from address column
n = df['col1'] == 'N' #creating a filtering criteria where the letter will be equal to N
newdata = df[n] # filtering the dataframe
newdata1 = newdata.drop('col1', axis = 1) # finally dropping the extra column 'col1'
So after 7 lines of code I am getting this output:
在7行代码之后,我得到这个输出:
My question is how can I do it more efficiently or is there any smarter way to do that ?
我的问题是,我如何才能更有效地做这件事,或者有没有更聪明的方法来做这件事?
1 个解决方案
#1
4
A new column is not necessary:
没有必要新建一栏:
newdata = df[df['Address'].str[0] == 'N'] # filtering the dataframe
print (newdata)
Address Age YrsOfEducation name
0 NY 23 10 john
1 NJ 32 15 tom
3 NY 42 12 rock
#1
4
A new column is not necessary:
没有必要新建一栏:
newdata = df[df['Address'].str[0] == 'N'] # filtering the dataframe
print (newdata)
Address Age YrsOfEducation name
0 NY 23 10 john
1 NJ 32 15 tom
3 NY 42 12 rock