I'm relatively new to the python world, but this seems very straight forward.
我对蟒蛇世界比较陌生,但这似乎很直接。
Google is yelling at me that this code needs to be optimized:
谷歌对我大吼大叫这个代码需要优化:
class AddLinks(webapp.RequestHandler):
def post(self):
# Hash the textarea input to generate pseudo-unique value
hash = md5.new(self.request.get('links')).hexdigest()
# Seperate the input by line
allLinks = self.request.get('links').splitlines()
# For each line in the input, add to the database
for x in allLinks:
newGroup = LinkGrouping()
newGroup.reference = hash
newGroup.link = x
newGroup.put()
# testing vs live
#baseURL = 'http://localhost:8080'
baseURL = 'http://linkabyss.appspot.com'
# Build template parameters
template_values = {
'all_links': allLinks,
'base_url': baseURL,
'reference': hash,
}
# Output the template
path = os.path.join(os.path.dirname(__file__), 'addLinks.html')
self.response.out.write(template.render(path, template_values))
The dashboard is telling me that this is using a ton of CPU.
仪表板告诉我这是使用了大量的CPU。
Where should I look for improvements?
我应该在哪里寻找改进措施?
6 个解决方案
#1
7
The main overhead here is the multiple individual puts to the datastore. If you can, store the links as a single entity, as Andre suggests. You can always split the links into an array and store it in a ListProperty.
这里的主要开销是多个单独放入数据存储区。如果可以的话,将链接存储为单个实体,正如安德烈建议的那样。您始终可以将链接拆分为数组并将其存储在ListProperty中。
If you do need an entity for each link, try this:
如果您确实需要每个链接的实体,请尝试以下操作:
# For each line in the input, add to the database
groups = []
for x in allLinks:
newGroup = LinkGrouping()
newGroup.reference = hash
newGroup.link = x
groups.append(newGroup)
db.put(groups)
It will reduce the datastore roundtrips to one, and it's the roundtrips that are really killing your high CPU cap.
它会将数据存储区往返减少到一个,并且这是真正杀死你的高CPU上限的往返。
#2
3
Looks pretty tight to me.
看起来很紧张。
I see one thing that may make a small improvement. Your calling, "self.request.get('links')" twice.
我看到一件可能会有一点改进的事情。你的电话,“self.request.get('链接')”两次。
So adding:
unsplitlinks = self.request.get('links')
And referencing, "unsplitlinks" could help.
引用“unsplitlinks”可能有所帮助。
Other than that the loop is the only area I see that would be a target for optimization. Is it possible to prep the data and then add it to the db at once, instead of doing a db add per link? (I assume the .put() command adds the link to the database)
除此之外,循环是我看到的唯一可以作为优化目标的区域。是否可以准备数据然后立即将其添加到数据库,而不是每个链接执行数据库添加? (我假设.put()命令将链接添加到数据库)
#3
2
You can dramatically reduce the interaction between your app and the database by just storing the complete self.request.get('links')
in a text field in the database.
只需将完整的self.request.get('links')存储在数据库的文本字段中,即可显着减少应用程序与数据库之间的交互。
- only one
put()
perpost(self)
- the hash isn't stored n-times (for every link, which makes no sense and is really a waste of space)
每个帖子只有一个put()(个体经营)
哈希不存储n次(对于每个链接,这没有任何意义,实际上是浪费空间)
And you save yourself the parsing of the textfield when someone actually calls the page....
当有人真正调用页面时,你可以省去文本字段的解析....
#4
0
How frequently is this getting called? This doesn't look that bad... especially after removing the duplicate request.
这个被调用的频率是多少?这看起来并不那么糟糕......尤其是在删除重复请求之后。
#5
0
Can I query against the ListProperty?
我可以查询ListProperty吗?
Something like
SELECT * FROM LinkGrouping WHERE links.contains('http://www.google.com')
I have future plans where I would need that functionality.
我有未来的计划,我需要这个功能。
I'll definitely implement the single db.put() to reduce usage.
我肯定会实现单个db.put()来减少使用量。
#6
0
no/ you can not use something like "links.contains('http://www.google.com')" GQL not support this
不能/你不能使用像“links.contains('http://www.google.com')这样的东西”GQL不支持这个
#1
7
The main overhead here is the multiple individual puts to the datastore. If you can, store the links as a single entity, as Andre suggests. You can always split the links into an array and store it in a ListProperty.
这里的主要开销是多个单独放入数据存储区。如果可以的话,将链接存储为单个实体,正如安德烈建议的那样。您始终可以将链接拆分为数组并将其存储在ListProperty中。
If you do need an entity for each link, try this:
如果您确实需要每个链接的实体,请尝试以下操作:
# For each line in the input, add to the database
groups = []
for x in allLinks:
newGroup = LinkGrouping()
newGroup.reference = hash
newGroup.link = x
groups.append(newGroup)
db.put(groups)
It will reduce the datastore roundtrips to one, and it's the roundtrips that are really killing your high CPU cap.
它会将数据存储区往返减少到一个,并且这是真正杀死你的高CPU上限的往返。
#2
3
Looks pretty tight to me.
看起来很紧张。
I see one thing that may make a small improvement. Your calling, "self.request.get('links')" twice.
我看到一件可能会有一点改进的事情。你的电话,“self.request.get('链接')”两次。
So adding:
unsplitlinks = self.request.get('links')
And referencing, "unsplitlinks" could help.
引用“unsplitlinks”可能有所帮助。
Other than that the loop is the only area I see that would be a target for optimization. Is it possible to prep the data and then add it to the db at once, instead of doing a db add per link? (I assume the .put() command adds the link to the database)
除此之外,循环是我看到的唯一可以作为优化目标的区域。是否可以准备数据然后立即将其添加到数据库,而不是每个链接执行数据库添加? (我假设.put()命令将链接添加到数据库)
#3
2
You can dramatically reduce the interaction between your app and the database by just storing the complete self.request.get('links')
in a text field in the database.
只需将完整的self.request.get('links')存储在数据库的文本字段中,即可显着减少应用程序与数据库之间的交互。
- only one
put()
perpost(self)
- the hash isn't stored n-times (for every link, which makes no sense and is really a waste of space)
每个帖子只有一个put()(个体经营)
哈希不存储n次(对于每个链接,这没有任何意义,实际上是浪费空间)
And you save yourself the parsing of the textfield when someone actually calls the page....
当有人真正调用页面时,你可以省去文本字段的解析....
#4
0
How frequently is this getting called? This doesn't look that bad... especially after removing the duplicate request.
这个被调用的频率是多少?这看起来并不那么糟糕......尤其是在删除重复请求之后。
#5
0
Can I query against the ListProperty?
我可以查询ListProperty吗?
Something like
SELECT * FROM LinkGrouping WHERE links.contains('http://www.google.com')
I have future plans where I would need that functionality.
我有未来的计划,我需要这个功能。
I'll definitely implement the single db.put() to reduce usage.
我肯定会实现单个db.put()来减少使用量。
#6
0
no/ you can not use something like "links.contains('http://www.google.com')" GQL not support this
不能/你不能使用像“links.contains('http://www.google.com')这样的东西”GQL不支持这个