I am trying to calculate the origin and offset of variable size arrays and store them in a dictionary. Here is the likely non-pythonic way that I am achieving this. I am not sure if I should be looking to use map, a lambda function, or list comprehensions to make the code more pythonic.
我试图计算可变大小数组的起源和偏移量并将它们存储在字典中。这是我实现这一目标的非pythonic方式。我不确定我是否应该使用map,lambda函数或列表推导来使代码更加pythonic。
Essentially, I need to cut chunks of an array up based on the total size and store the xstart, ystart, x_number_of_rows_to_read, y_number_of_columns_to_read in a dictionary. The total size is variable. I can not load the entire array into memory and use numpy indexing or I definitely would. The origin and offset are used to get the array into numpy.
基本上,我需要根据总大小来剪切数组的块,并将xstart,ystart,x_number_of_rows_to_read,y_number_of_columns_to_read存储在字典中。总大小是可变的。我无法将整个数组加载到内存中并使用numpy索引,或者我肯定会。原点和偏移用于使数组变成numpy。
intervalx = xsize / xsegment #Get the size of the chunks
intervaly = ysize / ysegment #Get the size of the chunks
#Setup to segment the image storing the start values and key into a dictionary.
xstart = 0
ystart = 0
key = 0
d = defaultdict(list)
for y in xrange(0, ysize, intervaly):
if y + (intervaly * 2) < ysize:
numberofrows = intervaly
else:
numberofrows = ysize - y
for x in xrange(0, xsize, intervalx):
if x + (intervalx * 2) < xsize:
numberofcolumns = intervalx
else:
numberofcolumns = xsize - x
l = [x,y,numberofcolumns, numberofrows]
d[key].append(l)
key += 1
return d
I realize that xrange is not ideal for a port to 3.
我意识到xrange并不适合3端口。
4 个解决方案
#1
7
This code looks fine except for your use of defaultdict
. A list seems like a much better data structure because:
除了使用defaultdict之外,此代码看起来很好。列表似乎是一个更好的数据结构,因为:
- Your keys are sequential
- you are storing a list whose only element is another list in your dict.
你的钥匙是顺序的
您正在存储一个列表,其唯一元素是您的字典中的另一个列表。
One thing you could do:
你可以做的一件事:
- use the ternary operator (I'm not sure if this would be an improvement, but it would be fewer lines of code)
使用三元运算符(我不确定这是否会有所改进,但代码行数会减少)
Here's a modified version of your code with my few suggestions.
这是我的几个建议的修改版代码。
intervalx = xsize / xsegment #Get the size of the chunks
intervaly = ysize / ysegment #Get the size of the chunks
#Setup to segment the image storing the start values and key into a dictionary.
xstart = 0
ystart = 0
output = []
for y in xrange(0, ysize, intervaly):
numberofrows = intervaly if y + (intervaly * 2) < ysize else ysize -y
for x in xrange(0, xsize, intervalx):
numberofcolumns = intervalx if x + (intervalx * 2) < xsize else xsize -x
lst = [x, y, numberofcolumns, numberofrows]
output.append(lst)
#If it doesn't make any difference to your program, the above 2 lines could read:
#tple = (x, y, numberofcolumns, numberofrows)
#output.append(tple)
#This will be slightly more efficient
#(tuple creation is faster than list creation)
#and less memory hungry. In other words, if it doesn't need to be a list due
#to other constraints (e.g. you append to it later), you should make it a tuple.
Now to get your data, you can do offset_list=output[5]
instead of offset_list=d[5][0]
现在要获取数据,你可以做offset_list = output [5]而不是offset_list = d [5] [0]
#2
0
Although it doesn't change your algorithm, a more pythonic way to write your if/else statements is:
虽然它不会改变你的算法,但是编写if / else语句的更加pythonic的方法是:
numberofrows = intervaly if y + intervaly * 2 < ysize else ysize - y
instead of this:
而不是这个:
if y + (intervaly * 2) < ysize:
numberofrows = intervaly
else:
numberofrows = ysize - y
(and similarly for the other if/else statement).
(和其他if / else语句类似)。
#3
0
Have you considered using np.memmap
to load the pieces dynamically instead? You would then just need to determine the offsets that you need on the fly rather than chunking the array storing the offsets.
您是否考虑过使用np.memmap动态加载片段?然后,您只需要动态确定所需的偏移量,而不是分块存储偏移量的数组。
http://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html
#4
0
This is a long one liner :
这是一个很长的班轮:
d = [(x,y,min(x+xinterval,xsize)-x,min(y+yinterval,ysize)-y) for x in
xrange(0,xsize,xinterval) for y in xrange(0,ysize,yinterval)]
#1
7
This code looks fine except for your use of defaultdict
. A list seems like a much better data structure because:
除了使用defaultdict之外,此代码看起来很好。列表似乎是一个更好的数据结构,因为:
- Your keys are sequential
- you are storing a list whose only element is another list in your dict.
你的钥匙是顺序的
您正在存储一个列表,其唯一元素是您的字典中的另一个列表。
One thing you could do:
你可以做的一件事:
- use the ternary operator (I'm not sure if this would be an improvement, but it would be fewer lines of code)
使用三元运算符(我不确定这是否会有所改进,但代码行数会减少)
Here's a modified version of your code with my few suggestions.
这是我的几个建议的修改版代码。
intervalx = xsize / xsegment #Get the size of the chunks
intervaly = ysize / ysegment #Get the size of the chunks
#Setup to segment the image storing the start values and key into a dictionary.
xstart = 0
ystart = 0
output = []
for y in xrange(0, ysize, intervaly):
numberofrows = intervaly if y + (intervaly * 2) < ysize else ysize -y
for x in xrange(0, xsize, intervalx):
numberofcolumns = intervalx if x + (intervalx * 2) < xsize else xsize -x
lst = [x, y, numberofcolumns, numberofrows]
output.append(lst)
#If it doesn't make any difference to your program, the above 2 lines could read:
#tple = (x, y, numberofcolumns, numberofrows)
#output.append(tple)
#This will be slightly more efficient
#(tuple creation is faster than list creation)
#and less memory hungry. In other words, if it doesn't need to be a list due
#to other constraints (e.g. you append to it later), you should make it a tuple.
Now to get your data, you can do offset_list=output[5]
instead of offset_list=d[5][0]
现在要获取数据,你可以做offset_list = output [5]而不是offset_list = d [5] [0]
#2
0
Although it doesn't change your algorithm, a more pythonic way to write your if/else statements is:
虽然它不会改变你的算法,但是编写if / else语句的更加pythonic的方法是:
numberofrows = intervaly if y + intervaly * 2 < ysize else ysize - y
instead of this:
而不是这个:
if y + (intervaly * 2) < ysize:
numberofrows = intervaly
else:
numberofrows = ysize - y
(and similarly for the other if/else statement).
(和其他if / else语句类似)。
#3
0
Have you considered using np.memmap
to load the pieces dynamically instead? You would then just need to determine the offsets that you need on the fly rather than chunking the array storing the offsets.
您是否考虑过使用np.memmap动态加载片段?然后,您只需要动态确定所需的偏移量,而不是分块存储偏移量的数组。
http://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html
#4
0
This is a long one liner :
这是一个很长的班轮:
d = [(x,y,min(x+xinterval,xsize)-x,min(y+yinterval,ysize)-y) for x in
xrange(0,xsize,xinterval) for y in xrange(0,ysize,yinterval)]