如何将行转换为基于列的重复数据?

时间:2022-02-03 17:02:36

I'm trying to take a dataset that looks like this:

我想取一个像这样的数据集

如何将行转换为基于列的重复数据?

And transform the records into this format:

并将记录转换为以下格式:

如何将行转换为基于列的重复数据?

The resulting format would have two columns, one for the old column names and one column for the values. If there are 10,000 rows then there should be 10,000 groups of data in the new format.

结果格式将有两列,一列是旧列名,一列是值。如果有10,000行,那么新的格式应该有10,000组数据。

I'm open to all different methods, excel formulas, sql (mysql), or straight ruby code would work for me also. What is the best way to tackle this problem?

我对所有不同的方法、excel公式、sql (mysql)或直接ruby代码都很开放。解决这个问题的最好办法是什么?

3 个解决方案

#1


1  

Just for fun:

只是为了好玩:

# Input file format is tab separated values

# name  search_term address code
# Jim jim jim_address 123
# Bob bob bob_address 124
# Lisa  lisa  lisa_address  126
# Mona  mona  mona_address  129


infile = File.open("inputfile.tsv")

headers = infile.readline.strip.split("\t")
puts headers.inspect
of = File.new("outputfile.tsv","w")
infile.each_line do |line|
  row = line.split("\t")
  headers.each_with_index do |key, index|
    of.puts "#{key}\t#{row[index]}"
  end
end

of.close



# A nicer way, on my machine it does 1.6M rows in about 17 sec

File.open("inputfile.tsv") do | in_file |
  headers = in_file.readline.strip.split("\t")
  File.open("outputfile.tsv","w") do | out_file |
    in_file.each_line do | line |
      row = line.split("\t")
      headers.each_with_index do | key, index | 
        out_file << key << "\t" << row[index]
      end
    end 
  end
end

#2


8  

You could add an ID column to the left of your data and use a Reverse PivotTable method.

您可以在数据的左边添加一个ID列,并使用反向数据透视表方法。

  • Press Alt+D+P to access the Pivottable Wizard with the steps:

    按Alt+D+P访问数据透视表向导,步骤如下:

    1.  Multiple Consolidation Ranges
    2a. I will create the page fields
    2b. Range: eg. sheet1!A1:A4 
        How Many Page Fields: 0
    3.  Existing Worksheet: H1
    
  • In the PivotTable:

    在数据透视表:

    Uncheck Row and Column from the Field List
    Double-Click the Grand Total as shown
    

如何将行转换为基于列的重复数据?

#3


0  

destination = File.open(dir, 'a') do |d|   #choose the destination file and open it

    source = File.open(dir , 'r+') do |s|  #choose the source file and open it
      headers = s.readline.strip.split("\t")  #grab the first row of the source file to use as headers
      s.each do |line| #interate over each line from the source

        currentLine = line.strip.split("\t") #create an array from the current line
           count = 0   #track the count of each array index
        currentLine.each do |c| #iterate over each cell of the currentline
              finalNewLine = '"' + "#{headers[count]}" + '"' + "\t" + '"' + "#{currentLine[count]}" + '"' + "\n" #build each new line as one big string
          d.write(finalNewLine) #write final line to the destination file.
          count += 1 #increment the count to work on the next cell in the line
        end

      end
  end

end

#1


1  

Just for fun:

只是为了好玩:

# Input file format is tab separated values

# name  search_term address code
# Jim jim jim_address 123
# Bob bob bob_address 124
# Lisa  lisa  lisa_address  126
# Mona  mona  mona_address  129


infile = File.open("inputfile.tsv")

headers = infile.readline.strip.split("\t")
puts headers.inspect
of = File.new("outputfile.tsv","w")
infile.each_line do |line|
  row = line.split("\t")
  headers.each_with_index do |key, index|
    of.puts "#{key}\t#{row[index]}"
  end
end

of.close



# A nicer way, on my machine it does 1.6M rows in about 17 sec

File.open("inputfile.tsv") do | in_file |
  headers = in_file.readline.strip.split("\t")
  File.open("outputfile.tsv","w") do | out_file |
    in_file.each_line do | line |
      row = line.split("\t")
      headers.each_with_index do | key, index | 
        out_file << key << "\t" << row[index]
      end
    end 
  end
end

#2


8  

You could add an ID column to the left of your data and use a Reverse PivotTable method.

您可以在数据的左边添加一个ID列,并使用反向数据透视表方法。

  • Press Alt+D+P to access the Pivottable Wizard with the steps:

    按Alt+D+P访问数据透视表向导,步骤如下:

    1.  Multiple Consolidation Ranges
    2a. I will create the page fields
    2b. Range: eg. sheet1!A1:A4 
        How Many Page Fields: 0
    3.  Existing Worksheet: H1
    
  • In the PivotTable:

    在数据透视表:

    Uncheck Row and Column from the Field List
    Double-Click the Grand Total as shown
    

如何将行转换为基于列的重复数据?

#3


0  

destination = File.open(dir, 'a') do |d|   #choose the destination file and open it

    source = File.open(dir , 'r+') do |s|  #choose the source file and open it
      headers = s.readline.strip.split("\t")  #grab the first row of the source file to use as headers
      s.each do |line| #interate over each line from the source

        currentLine = line.strip.split("\t") #create an array from the current line
           count = 0   #track the count of each array index
        currentLine.each do |c| #iterate over each cell of the currentline
              finalNewLine = '"' + "#{headers[count]}" + '"' + "\t" + '"' + "#{currentLine[count]}" + '"' + "\n" #build each new line as one big string
          d.write(finalNewLine) #write final line to the destination file.
          count += 1 #increment the count to work on the next cell in the line
        end

      end
  end

end