I'm trying to take a dataset that looks like this:
我想取一个像这样的数据集
And transform the records into this format:
并将记录转换为以下格式:
The resulting format would have two columns, one for the old column names and one column for the values. If there are 10,000 rows then there should be 10,000 groups of data in the new format.
结果格式将有两列,一列是旧列名,一列是值。如果有10,000行,那么新的格式应该有10,000组数据。
I'm open to all different methods, excel formulas, sql (mysql), or straight ruby code would work for me also. What is the best way to tackle this problem?
我对所有不同的方法、excel公式、sql (mysql)或直接ruby代码都很开放。解决这个问题的最好办法是什么?
3 个解决方案
#1
1
Just for fun:
只是为了好玩:
# Input file format is tab separated values
# name search_term address code
# Jim jim jim_address 123
# Bob bob bob_address 124
# Lisa lisa lisa_address 126
# Mona mona mona_address 129
infile = File.open("inputfile.tsv")
headers = infile.readline.strip.split("\t")
puts headers.inspect
of = File.new("outputfile.tsv","w")
infile.each_line do |line|
row = line.split("\t")
headers.each_with_index do |key, index|
of.puts "#{key}\t#{row[index]}"
end
end
of.close
# A nicer way, on my machine it does 1.6M rows in about 17 sec
File.open("inputfile.tsv") do | in_file |
headers = in_file.readline.strip.split("\t")
File.open("outputfile.tsv","w") do | out_file |
in_file.each_line do | line |
row = line.split("\t")
headers.each_with_index do | key, index |
out_file << key << "\t" << row[index]
end
end
end
end
#2
8
You could add an ID column to the left of your data and use a Reverse PivotTable method.
您可以在数据的左边添加一个ID列,并使用反向数据透视表方法。
-
Press Alt+D+P to access the Pivottable Wizard with the steps:
按Alt+D+P访问数据透视表向导,步骤如下:
1. Multiple Consolidation Ranges 2a. I will create the page fields 2b. Range: eg. sheet1!A1:A4 How Many Page Fields: 0 3. Existing Worksheet: H1
-
In the PivotTable:
在数据透视表:
Uncheck Row and Column from the Field List Double-Click the Grand Total as shown
#3
0
destination = File.open(dir, 'a') do |d| #choose the destination file and open it
source = File.open(dir , 'r+') do |s| #choose the source file and open it
headers = s.readline.strip.split("\t") #grab the first row of the source file to use as headers
s.each do |line| #interate over each line from the source
currentLine = line.strip.split("\t") #create an array from the current line
count = 0 #track the count of each array index
currentLine.each do |c| #iterate over each cell of the currentline
finalNewLine = '"' + "#{headers[count]}" + '"' + "\t" + '"' + "#{currentLine[count]}" + '"' + "\n" #build each new line as one big string
d.write(finalNewLine) #write final line to the destination file.
count += 1 #increment the count to work on the next cell in the line
end
end
end
end
#1
1
Just for fun:
只是为了好玩:
# Input file format is tab separated values
# name search_term address code
# Jim jim jim_address 123
# Bob bob bob_address 124
# Lisa lisa lisa_address 126
# Mona mona mona_address 129
infile = File.open("inputfile.tsv")
headers = infile.readline.strip.split("\t")
puts headers.inspect
of = File.new("outputfile.tsv","w")
infile.each_line do |line|
row = line.split("\t")
headers.each_with_index do |key, index|
of.puts "#{key}\t#{row[index]}"
end
end
of.close
# A nicer way, on my machine it does 1.6M rows in about 17 sec
File.open("inputfile.tsv") do | in_file |
headers = in_file.readline.strip.split("\t")
File.open("outputfile.tsv","w") do | out_file |
in_file.each_line do | line |
row = line.split("\t")
headers.each_with_index do | key, index |
out_file << key << "\t" << row[index]
end
end
end
end
#2
8
You could add an ID column to the left of your data and use a Reverse PivotTable method.
您可以在数据的左边添加一个ID列,并使用反向数据透视表方法。
-
Press Alt+D+P to access the Pivottable Wizard with the steps:
按Alt+D+P访问数据透视表向导,步骤如下:
1. Multiple Consolidation Ranges 2a. I will create the page fields 2b. Range: eg. sheet1!A1:A4 How Many Page Fields: 0 3. Existing Worksheet: H1
-
In the PivotTable:
在数据透视表:
Uncheck Row and Column from the Field List Double-Click the Grand Total as shown
#3
0
destination = File.open(dir, 'a') do |d| #choose the destination file and open it
source = File.open(dir , 'r+') do |s| #choose the source file and open it
headers = s.readline.strip.split("\t") #grab the first row of the source file to use as headers
s.each do |line| #interate over each line from the source
currentLine = line.strip.split("\t") #create an array from the current line
count = 0 #track the count of each array index
currentLine.each do |c| #iterate over each cell of the currentline
finalNewLine = '"' + "#{headers[count]}" + '"' + "\t" + '"' + "#{currentLine[count]}" + '"' + "\n" #build each new line as one big string
d.write(finalNewLine) #write final line to the destination file.
count += 1 #increment the count to work on the next cell in the line
end
end
end
end