I am working with the web scraping framework Scrapy and I am a bit of a noob when it comes to python. So I am wondering how do I iterate over all of the scraped items which seem to be in a dictionary and strip the white space from each one.
我正在使用Web抓取框架Scrapy,当涉及到python时,我有点像菜鸟。所以我想知道如何迭代所有似乎都在字典中的被删除的项目并从每个项目中删除空白区域。
Here is the code I have been playing with in my item pipeline.:
这是我在项目管道中一直在玩的代码:
for info in item:
info[info].lstrip()
But this code does not work, because I cannot select items individually. So I tried to do this:
但是这段代码不起作用,因为我无法单独选择项目。所以我试着这样做:
for key, value item.items():
value[1].lstrip()
This second method works to a degree, but the problem is that I have no idea how then to loop over all of the values.
第二种方法在某种程度上起作用,但问题是我不知道如何循环遍历所有值。
I know this is probably such an easy fix, but I cannot seem to find it. Any help would be greatly appreciated. :)
我知道这可能是一个很容易解决的问题,但我似乎无法找到它。任何帮助将不胜感激。 :)
7 个解决方案
#1
1
Not a direct answer to the question, but I would suggest you look at Item Loaders and input/output processors. A lot of your cleanup can be take care of here.
这不是问题的直接答案,但我建议你看一下Item Loaders和输入/输出处理器。你可以在这里完成很多清理工作。
An example which strips each entry would be:
剥离每个条目的示例是:
class ItemLoader(ItemLoader):
default_output_processor = MapCompose(unicode.strip)
#2
15
In a dictionary comprehension (available in Python >=2.7):
在字典理解中(Python> = 2.7):
clean_d = { k:v.strip() for k, v in d.iteritems()}
#3
2
What you should note is that lstrip()
returns a copy of the string rather than modify the object. To actually update your dictionary, you'll need to assign the stripped value back to the item.
你应该注意的是lstrip()返回字符串的副本而不是修改对象。要实际更新字典,您需要将剥离的值分配回项目。
For example:
for k, v in your_dict.iteritems():
your_dict[k] = v.lstrip()
Note the use of .iteritems()
which returns an iterator instead of a list of key value pairs. This makes it somewhat more efficient.
注意使用.iteritems()返回迭代器而不是键值对列表。这使它更有效率。
I should add that in Python3, .item()
has been changed to return "views" and so .iteritems()
would not be required.
我应该在Python3中添加,.item()已更改为返回“视图”,因此不需要.iteritems()。
#4
2
Try
for k,v in item.items():
item[k] = v.replace(' ', '')
or in a comprehensive way as suggested by monkut:
或者以monkut建议的综合方式:
newDic = {k,v.replace(' ','') for k,v in item.items()}
#5
0
Although @zquare had the best answer for this question, I feel I need to chime in with a Pythonic method that will also account for dictionary values that are not strings. This is not recursive mind you, as it only works with one dimensional dictionary objects.
尽管@zquare对这个问题有最好的答案,但我觉得我需要使用Pythonic方法来解释,该方法也会考虑非字符串的字典值。这不是递归的意思,因为它只适用于一维字典对象。
d.update({k: v.lstrip() for k, v in d.items() if isinstance(v, str) and v.startswith(' ')})
This updates the original dictionary value if the value is a string and starts with a space.
如果值是字符串并以空格开头,则更新原始字典值。
UPDATE: If you want to use Regular Expressions and avoid using starts with and endswith. You can use this:
更新:如果您想使用正则表达式,并避免使用开头和结束。你可以用这个:
import re
rex = re.compile(r'^\s|\s$')
d.update({k: v.strip() for k, v in d.items() if isinstance(v, str) and rex.search(v)})
This version strips if the value has a leading or trailing white space character.
如果值具有前导或尾随空格字符,则此版本将剥离。
#6
0
I use the following. You can pass any object as an argument, including a string, list or dictionary.
我使用以下内容。您可以将任何对象作为参数传递,包括字符串,列表或字典。
# strip any type of object
def strip_all(x):
if isinstance(x, str): # if using python2 replace str with basestring to include unicode type
x = x.strip()
elif isinstance(x, list):
x = [strip_all(v) for v in x]
elif isinstance(x, dict):
for k, v in x.iteritems():
x.pop(k) # also strip keys
x[ strip_all(k) ] = strip_all(v)
return x
#7
0
Assuming you would like to strip the values of yourDict
creating a new dict
called newDict
:
假设您想剥离yourDict的值,创建一个名为newDict的新dict:
newDict = dict(zip(yourDict.keys(), [v.strip() if isinstance(v,str) else v for v in yourDict.values()]))
This code can handle multi-type values, so will avoid stripping int
, float
, etc.
此代码可以处理多类型值,因此将避免剥离int,float等。
#1
1
Not a direct answer to the question, but I would suggest you look at Item Loaders and input/output processors. A lot of your cleanup can be take care of here.
这不是问题的直接答案,但我建议你看一下Item Loaders和输入/输出处理器。你可以在这里完成很多清理工作。
An example which strips each entry would be:
剥离每个条目的示例是:
class ItemLoader(ItemLoader):
default_output_processor = MapCompose(unicode.strip)
#2
15
In a dictionary comprehension (available in Python >=2.7):
在字典理解中(Python> = 2.7):
clean_d = { k:v.strip() for k, v in d.iteritems()}
#3
2
What you should note is that lstrip()
returns a copy of the string rather than modify the object. To actually update your dictionary, you'll need to assign the stripped value back to the item.
你应该注意的是lstrip()返回字符串的副本而不是修改对象。要实际更新字典,您需要将剥离的值分配回项目。
For example:
for k, v in your_dict.iteritems():
your_dict[k] = v.lstrip()
Note the use of .iteritems()
which returns an iterator instead of a list of key value pairs. This makes it somewhat more efficient.
注意使用.iteritems()返回迭代器而不是键值对列表。这使它更有效率。
I should add that in Python3, .item()
has been changed to return "views" and so .iteritems()
would not be required.
我应该在Python3中添加,.item()已更改为返回“视图”,因此不需要.iteritems()。
#4
2
Try
for k,v in item.items():
item[k] = v.replace(' ', '')
or in a comprehensive way as suggested by monkut:
或者以monkut建议的综合方式:
newDic = {k,v.replace(' ','') for k,v in item.items()}
#5
0
Although @zquare had the best answer for this question, I feel I need to chime in with a Pythonic method that will also account for dictionary values that are not strings. This is not recursive mind you, as it only works with one dimensional dictionary objects.
尽管@zquare对这个问题有最好的答案,但我觉得我需要使用Pythonic方法来解释,该方法也会考虑非字符串的字典值。这不是递归的意思,因为它只适用于一维字典对象。
d.update({k: v.lstrip() for k, v in d.items() if isinstance(v, str) and v.startswith(' ')})
This updates the original dictionary value if the value is a string and starts with a space.
如果值是字符串并以空格开头,则更新原始字典值。
UPDATE: If you want to use Regular Expressions and avoid using starts with and endswith. You can use this:
更新:如果您想使用正则表达式,并避免使用开头和结束。你可以用这个:
import re
rex = re.compile(r'^\s|\s$')
d.update({k: v.strip() for k, v in d.items() if isinstance(v, str) and rex.search(v)})
This version strips if the value has a leading or trailing white space character.
如果值具有前导或尾随空格字符,则此版本将剥离。
#6
0
I use the following. You can pass any object as an argument, including a string, list or dictionary.
我使用以下内容。您可以将任何对象作为参数传递,包括字符串,列表或字典。
# strip any type of object
def strip_all(x):
if isinstance(x, str): # if using python2 replace str with basestring to include unicode type
x = x.strip()
elif isinstance(x, list):
x = [strip_all(v) for v in x]
elif isinstance(x, dict):
for k, v in x.iteritems():
x.pop(k) # also strip keys
x[ strip_all(k) ] = strip_all(v)
return x
#7
0
Assuming you would like to strip the values of yourDict
creating a new dict
called newDict
:
假设您想剥离yourDict的值,创建一个名为newDict的新dict:
newDict = dict(zip(yourDict.keys(), [v.strip() if isinstance(v,str) else v for v in yourDict.values()]))
This code can handle multi-type values, so will avoid stripping int
, float
, etc.
此代码可以处理多类型值,因此将避免剥离int,float等。