在ruby中按批次排序数组

时间:2021-11-03 16:00:41

Sorry if this has been asked before, I am not sure how to even search for it and what I have searched for doesn't yield any useful answer.

对不起,如果之前已经询问过,我不确定如何搜索它,我搜索的内容不会产生任何有用的答案。

Here is my issue, I have a framework that basically manages the jobs that will be submitted to a PBS cluster and each job will need to read from an input file. We are in a case in which we have more than 5k jobs that need to run and there are batches of, lets say, ~30 that read from different files but the rest of them read from a file which is being read by another job.

这是我的问题,我有一个框架,基本上管理将提交到PBS集群的作业,每个作业都需要从输入文件中读取。我们处于这样一种情况,即我们有超过5k个需要运行的作业,并且有批次,比如说,从不同的文件中读取~30,但其余的从同一个工作正在读取的文件中读取。

This could be easily dealt with (although not the best solution buy maybe the quickest one for the timescale we have) by being able to sort the job list by the ID which basically means which file it is going to be reading from, i.e. I would like to sort an array like this

这可以很容易地处理(虽然不是最好的解决方案,也许是我们所拥有的时间刻度中最快的一个),因为它能够通过ID对作业列表进行排序,这基本上意味着它将从哪个文件读取,即我会喜欢这样排序一个数组

a = [1,1,1,2,2,2,3,3,3,4,4,4]

into

a = [1,2,3,4,1,2,3,4,1,2,3,4]

Is there a way of achieving such an ordering in ruby? I could think of an algorithm buy maybe it has already been done and someone knows the answer.

有没有办法在红宝石中实现这样的排序?我可以想到一个算法购买可能它已经完成,有人知道答案。

Thanks!

3 个解决方案

#1


7  

Solution

Thanks to @sagarpandya82 for the original idea and @Cary Swoveland for bug finding!

感谢@ sagarpandya82最初的想法和@Cary Swoveland寻找错误!

Either use 2 methods :

要么使用2种方法:

def safe_transpose_and_flatten(array)
  l = array.map(&:length).max
  array.map{|e| e.values_at(0...l)}.transpose.flatten.compact
end

def sort_by_batches(array)
  safe_transpose_and_flatten(array.sort.group_by{|i| i}.values)
end

Or this one-liner (split over multiple lines for relative readability) :

或者这个单行(为了相对可读性而分成多行):

def sort_by_batches(array)
  array.group_by{|i| i }.values                  # Chunks of equal values,
       .sort_by{|v| -v.size }                    # sorted by decreasing length,
       .reduce(&:zip)                            # transposed,
       .map{|r| r.flatten.compact.sort }.flatten # flattened and sorted
end

Example

a = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]
sort_by_batches(a) # => [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]

a = [1, 1, 3, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 1, 1]
sort_by_batches(a) # => [1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4, 1, 3]

a = [1,2,2,3,3,3]
sort_by_batches(a) # => [1, 2, 3, 2, 3, 3]

Steps

Here are the steps for the second array :

以下是第二个数组的步骤:

[1, 1, 3, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 1, 1] # input
{1=>[1, 1, 1, 1], 3=>[3, 3, 3, 3], 2=>[2, 2, 2], 4=>[4, 4, 4], 5=>[5]} # group_by
[[1, 1, 1, 1], [3, 3, 3, 3], [2, 2, 2], [4, 4, 4], [5]] # values
[[1, 1, 1, 1], [3, 3, 3, 3], [2, 2, 2], [4, 4, 4], [5]] # sort_by -length
[[[[[1, 3], 2], 4], 5], [[[[1, 3], 2], 4], nil], [[[[1, 3], 2], 4], nil], [[[[1, 3], nil], nil], nil]] # zip
[[1, 2, 3, 4, 5], [1, 2, 3, 4], [1, 2, 3, 4], [1, 3]] # map(&:flatten) and compact
[1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4, 1, 3] # flatten

.reduce(&:zip).map(&:flatten).compact was used at first as a supposedly safe transpose, but it didn't work when the first array was smaller than the others.

.reduce(&:zip).map(&:flatten).compact最初被用作一个所谓的安全转置,但是当第一个数组小于其他数组时,它不起作用。

The first method use this answer for transposing, the one-liner sorts the arrays by decreasing length before using zip.

第一种方法使用这个答案进行转置,单线程在使用zip之前通过减少长度对数组进行排序。

Application to Job class

Here's a very basic Job class as an example :

这是一个非常基本的Job类作为示例:

class Job
  attr_reader :id
  def initialize(id)
    @id = id
  end

  def self.sort_by_batches(jobs)
    safe_transpose_and_flatten(jobs.sort_by{|j| j.id}.group_by{|j| j.id}.values)
  end

  def to_s
    "Job %d" % id
  end
end

jobs = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4].map{|i| Job.new(i)}
Job.sort_by_batches(jobs)

It outputs :

它输出:

Job 1
Job 2
Job 3
Job 4
Job 1
Job 2
Job 3
Job 4
Job 1
Job 2
Job 3
Job 4

#2


4  

You could do this with a collation function:

您可以使用整理功能执行此操作:

def collate(input)
  # Split the input array up into chunks of identical values
  # and sort the resulting groups.
  sets = input.group_by { |v| v }.values.sort_by(&:first)

  # Recombine these into a single output array by iterating over
  # each set and transposing values. Any nil values are scrubbed
  # with compact.
  (0...sets.map(&:length).max).flat_map do |i|
    sets.map do |s|
      s[i]
    end
  end.compact
end

You can see this work on some less trivial data:

您可以在一些不太重要的数据上看到这项工作:

input = [1,1,3,2,2,2,3,3,3,4,4,4,5,1,1]

collate(input)
# => [1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4, 1, 3]

Here 5 only shows up once.

这里5只出现一次。

#3


4  

Code

def doit(a)    
  b = a.sort.slice_when { |x,y| x != y }
  b.max_by(&:size).size.times.flat_map { |i| b.each_with_object([]) { |c,arr|
    arr << c[i] unless c[i].nil? } } 
end

Example

doit [5, 1, 7, 2, 3, 3, 5, 2, 3, 1, 4]
  #=> [1, 2, 3, 4, 5, 7, 1, 2, 3, 5, 3]

Explanation

For the example the steps are as follows.

对于该示例,步骤如下。

a = [5, 1, 7, 2, 3, 3, 5, 2, 3, 1, 4]

c = a.sort
  #=> [1, 1, 2, 2, 3, 3, 3, 4, 5, 5, 7] 
b = c.slice_when { |x,y| x != y }
  #=> #<Enumerator: #<Enumerator::Generator:0x007fb8a99d94c8>:each> 

We can see the elements that are generated by the enumerator b (and passed to the block) by converting it to an array:

我们可以通过将枚举器b转换为数组来查看枚举器生成的元素(并传递给块):

b.to_a
  #=> [[1, 1], [2, 2], [3, 3, 3], [4], [5, 5], [7]] 

Continuing,

c = b.max_by(&:size)
  #=> [3, 3, 3] 
d = c.size
  #=> 3 
e = d.times
  #=> #<Enumerator: 3:times>
e.to_a
  #=> [0, 1, 2] 
e.flat_map { |i| b.each_with_object([]) { |c,arr| arr << c[i] unless c[i].nil? } }
  #=> [1, 2, 3, 4, 5, 7, 1, 2, 3, 5, 3] 

Here is the last operation with some included puts statements.

这是包含一些put语句的最后一个操作。

3.times.flat_map do |i|
  puts "i=#{i}"
  b.each_with_object([]) do |c,arr|
    puts "  c=#{c}, c[#{i}]=#{c[i]}, arr=#{arr}"
    arr << c[i] unless c[i].nil?
    puts "  arr after arr << c[#{i}]=#{arr}" unless c[i].nil? 
  end
end

# i=0
#   c=[1, 1], c[0]=1, arr=[]
#     arr after arr << c[0]=[1]
#   c=[2, 2], c[0]=2, arr=[1]
#     arr after arr << c[0]=[1, 2]
#   c=[3, 3, 3], c[0]=3, arr=[1, 2]
#     arr after arr << c[0]=[1, 2, 3]
#   c=[4], c[0]=4, arr=[1, 2, 3]
#     arr after arr << c[0]=[1, 2, 3, 4]
#   c=[5, 5], c[0]=5, arr=[1, 2, 3, 4]
#     arr after arr << c[0]=[1, 2, 3, 4, 5]
#   c=[7], c[0]=7, arr=[1, 2, 3, 4, 5]
#     arr after arr << c[0]=[1, 2, 3, 4, 5, 7]
# i=1
#   c=[1, 1], c[1]=1, arr=[]
#     arr after arr << c[1]=[1]
#   c=[2, 2], c[1]=2, arr=[1]
#     arr after arr << c[1]=[1, 2]
#   c=[3, 3, 3], c[1]=3, arr=[1, 2]
#     arr after arr << c[1]=[1, 2, 3]
#   c=[4], c[1]=, arr=[1, 2, 3]
#   c=[5, 5], c[1]=5, arr=[1, 2, 3]
#     arr after arr << c[1]=[1, 2, 3, 5]
#   c=[7], c[1]=, arr=[1, 2, 3, 5]
# i=2
#   c=[1, 1], c[2]=, arr=[]
#   c=[2, 2], c[2]=, arr=[]
#   c=[3, 3, 3], c[2]=3, arr=[]
#     arr after arr << c[2]=[3]
#   c=[4], c[2]=, arr=[3]
#   c=[5, 5], c[2]=, arr=[3]
#   c=[7], c[2]=, arr=[3]
#=> [1, 2, 3, 4, 5, 7, 1, 2, 3, 5, 3] 

#1


7  

Solution

Thanks to @sagarpandya82 for the original idea and @Cary Swoveland for bug finding!

感谢@ sagarpandya82最初的想法和@Cary Swoveland寻找错误!

Either use 2 methods :

要么使用2种方法:

def safe_transpose_and_flatten(array)
  l = array.map(&:length).max
  array.map{|e| e.values_at(0...l)}.transpose.flatten.compact
end

def sort_by_batches(array)
  safe_transpose_and_flatten(array.sort.group_by{|i| i}.values)
end

Or this one-liner (split over multiple lines for relative readability) :

或者这个单行(为了相对可读性而分成多行):

def sort_by_batches(array)
  array.group_by{|i| i }.values                  # Chunks of equal values,
       .sort_by{|v| -v.size }                    # sorted by decreasing length,
       .reduce(&:zip)                            # transposed,
       .map{|r| r.flatten.compact.sort }.flatten # flattened and sorted
end

Example

a = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]
sort_by_batches(a) # => [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]

a = [1, 1, 3, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 1, 1]
sort_by_batches(a) # => [1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4, 1, 3]

a = [1,2,2,3,3,3]
sort_by_batches(a) # => [1, 2, 3, 2, 3, 3]

Steps

Here are the steps for the second array :

以下是第二个数组的步骤:

[1, 1, 3, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 1, 1] # input
{1=>[1, 1, 1, 1], 3=>[3, 3, 3, 3], 2=>[2, 2, 2], 4=>[4, 4, 4], 5=>[5]} # group_by
[[1, 1, 1, 1], [3, 3, 3, 3], [2, 2, 2], [4, 4, 4], [5]] # values
[[1, 1, 1, 1], [3, 3, 3, 3], [2, 2, 2], [4, 4, 4], [5]] # sort_by -length
[[[[[1, 3], 2], 4], 5], [[[[1, 3], 2], 4], nil], [[[[1, 3], 2], 4], nil], [[[[1, 3], nil], nil], nil]] # zip
[[1, 2, 3, 4, 5], [1, 2, 3, 4], [1, 2, 3, 4], [1, 3]] # map(&:flatten) and compact
[1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4, 1, 3] # flatten

.reduce(&:zip).map(&:flatten).compact was used at first as a supposedly safe transpose, but it didn't work when the first array was smaller than the others.

.reduce(&:zip).map(&:flatten).compact最初被用作一个所谓的安全转置,但是当第一个数组小于其他数组时,它不起作用。

The first method use this answer for transposing, the one-liner sorts the arrays by decreasing length before using zip.

第一种方法使用这个答案进行转置,单线程在使用zip之前通过减少长度对数组进行排序。

Application to Job class

Here's a very basic Job class as an example :

这是一个非常基本的Job类作为示例:

class Job
  attr_reader :id
  def initialize(id)
    @id = id
  end

  def self.sort_by_batches(jobs)
    safe_transpose_and_flatten(jobs.sort_by{|j| j.id}.group_by{|j| j.id}.values)
  end

  def to_s
    "Job %d" % id
  end
end

jobs = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4].map{|i| Job.new(i)}
Job.sort_by_batches(jobs)

It outputs :

它输出:

Job 1
Job 2
Job 3
Job 4
Job 1
Job 2
Job 3
Job 4
Job 1
Job 2
Job 3
Job 4

#2


4  

You could do this with a collation function:

您可以使用整理功能执行此操作:

def collate(input)
  # Split the input array up into chunks of identical values
  # and sort the resulting groups.
  sets = input.group_by { |v| v }.values.sort_by(&:first)

  # Recombine these into a single output array by iterating over
  # each set and transposing values. Any nil values are scrubbed
  # with compact.
  (0...sets.map(&:length).max).flat_map do |i|
    sets.map do |s|
      s[i]
    end
  end.compact
end

You can see this work on some less trivial data:

您可以在一些不太重要的数据上看到这项工作:

input = [1,1,3,2,2,2,3,3,3,4,4,4,5,1,1]

collate(input)
# => [1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4, 1, 3]

Here 5 only shows up once.

这里5只出现一次。

#3


4  

Code

def doit(a)    
  b = a.sort.slice_when { |x,y| x != y }
  b.max_by(&:size).size.times.flat_map { |i| b.each_with_object([]) { |c,arr|
    arr << c[i] unless c[i].nil? } } 
end

Example

doit [5, 1, 7, 2, 3, 3, 5, 2, 3, 1, 4]
  #=> [1, 2, 3, 4, 5, 7, 1, 2, 3, 5, 3]

Explanation

For the example the steps are as follows.

对于该示例,步骤如下。

a = [5, 1, 7, 2, 3, 3, 5, 2, 3, 1, 4]

c = a.sort
  #=> [1, 1, 2, 2, 3, 3, 3, 4, 5, 5, 7] 
b = c.slice_when { |x,y| x != y }
  #=> #<Enumerator: #<Enumerator::Generator:0x007fb8a99d94c8>:each> 

We can see the elements that are generated by the enumerator b (and passed to the block) by converting it to an array:

我们可以通过将枚举器b转换为数组来查看枚举器生成的元素(并传递给块):

b.to_a
  #=> [[1, 1], [2, 2], [3, 3, 3], [4], [5, 5], [7]] 

Continuing,

c = b.max_by(&:size)
  #=> [3, 3, 3] 
d = c.size
  #=> 3 
e = d.times
  #=> #<Enumerator: 3:times>
e.to_a
  #=> [0, 1, 2] 
e.flat_map { |i| b.each_with_object([]) { |c,arr| arr << c[i] unless c[i].nil? } }
  #=> [1, 2, 3, 4, 5, 7, 1, 2, 3, 5, 3] 

Here is the last operation with some included puts statements.

这是包含一些put语句的最后一个操作。

3.times.flat_map do |i|
  puts "i=#{i}"
  b.each_with_object([]) do |c,arr|
    puts "  c=#{c}, c[#{i}]=#{c[i]}, arr=#{arr}"
    arr << c[i] unless c[i].nil?
    puts "  arr after arr << c[#{i}]=#{arr}" unless c[i].nil? 
  end
end

# i=0
#   c=[1, 1], c[0]=1, arr=[]
#     arr after arr << c[0]=[1]
#   c=[2, 2], c[0]=2, arr=[1]
#     arr after arr << c[0]=[1, 2]
#   c=[3, 3, 3], c[0]=3, arr=[1, 2]
#     arr after arr << c[0]=[1, 2, 3]
#   c=[4], c[0]=4, arr=[1, 2, 3]
#     arr after arr << c[0]=[1, 2, 3, 4]
#   c=[5, 5], c[0]=5, arr=[1, 2, 3, 4]
#     arr after arr << c[0]=[1, 2, 3, 4, 5]
#   c=[7], c[0]=7, arr=[1, 2, 3, 4, 5]
#     arr after arr << c[0]=[1, 2, 3, 4, 5, 7]
# i=1
#   c=[1, 1], c[1]=1, arr=[]
#     arr after arr << c[1]=[1]
#   c=[2, 2], c[1]=2, arr=[1]
#     arr after arr << c[1]=[1, 2]
#   c=[3, 3, 3], c[1]=3, arr=[1, 2]
#     arr after arr << c[1]=[1, 2, 3]
#   c=[4], c[1]=, arr=[1, 2, 3]
#   c=[5, 5], c[1]=5, arr=[1, 2, 3]
#     arr after arr << c[1]=[1, 2, 3, 5]
#   c=[7], c[1]=, arr=[1, 2, 3, 5]
# i=2
#   c=[1, 1], c[2]=, arr=[]
#   c=[2, 2], c[2]=, arr=[]
#   c=[3, 3, 3], c[2]=3, arr=[]
#     arr after arr << c[2]=[3]
#   c=[4], c[2]=, arr=[3]
#   c=[5, 5], c[2]=, arr=[3]
#   c=[7], c[2]=, arr=[3]
#=> [1, 2, 3, 4, 5, 7, 1, 2, 3, 5, 3]