MapReduce的reduce函数里的key用的是同一个引用

时间:2024-10-06 21:06:14

最近写MapReduce程序,出现了这么一个问题,程序代码如下:


 package demo;

 import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.Map.Entry; import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer; public class ReducerDemo extends Reducer<Text, IntWritable, Text, IntWritable>{ private FileSystem fs = null;
private FSDataOutputStream outs = null;
public Map<Text, Integer> wordNumMap = new HashMap<Text, Integer>(); @Override
protected void setup(Context context)
throws IOException, InterruptedException {
String logFile = context.getConfiguration().get(HdpDemo.LOG_FILE);
fs = FileSystem.get(context.getConfiguration());
if(null != logFile){
int taskId = context.getTaskAttemptID().getTaskID().getId();
logFile += ("_"+taskId);
outs = fs.create(new Path(logFile));
}
} /* public void reduce(Text key, IntWritable value, Context context){ }*/ public void reduce(Text key, Iterable<IntWritable> numberIter, Context context)
throws IOException, InterruptedException {
Text word = key;
Integer currNum = wordNumMap.get(word);
if(null == currNum){
currNum = 0;
}
for(IntWritable num:numberIter){
currNum += num.get();
}
wordNumMap.put(word, currNum); } @Override
protected void cleanup(Context context)
throws IOException, InterruptedException {
for(Entry<Text, Integer> entry : wordNumMap.entrySet()){
IntWritable num = new IntWritable(entry.getValue());
context.write(entry.getKey(), num);
}
outs.close();
} private void log(String content) throws IOException{
if(null != outs){
outs.write(content.getBytes());
}
} }

 

这是个单词统计的reducer类,按理说打印出来的结果应该是如下结果:

world
ccc
of
best
the
is
bbb
james
ddd
hello
aaa

而实际上的打印结果却为:

world:
world:
world:
world:
world:
world:
world:
world:
world:
world:
world:

原因分析如下:

Hadoop的MapReduce框架每次调用reducer的reduce函数,代码中的第39行,每次传入的key都是对同一个地址的引用,导致了插入wordNumMap中的那些key都被修改了。

而如果把第41行的

Text word = key;

改为

Text word = new Text();
word.set(key);

这样结果就正确了,也印证了我的猜测。