当我们有每一位同学的每一科成绩时,我们计算他们的平均成绩,用传统的方法比较麻烦,如果我们用hadoop中MapReduce组件的话就比较简单了。
测试数据如下:
从上面的数据可以看到,计算每一位同学的平均成绩,在map阶段,我们可以用同学的姓名作为key,成绩作为value;在reduce阶段,key值相同的value值相加计算出总成绩,并且计算出科目的数量,然后用总成绩来除以科目数量就可以得出每一位同学的平均成绩了。
代码如下:
import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class Socre {
public static class Map extends Mapper<Object,Text,Text,IntWritable>{
public void map(Object key,Text value,Context context){
//把读入的每一行转换为String
String line=value.toString();
//以“\n”切分数据
StringTokenizer tokenizer=new StringTokenizer(line,"\n");
while(tokenizer.hasMoreElements()){
//以空格切分数据
StringTokenizer tokenizerLine=new StringTokenizer(tokenizer.nextToken());
//获取姓名
String nameString=tokenizerLine.nextToken();
//获取成绩
String scoreString=tokenizerLine.nextToken();
int scoreInt=Integer.parseInt(scoreString);
try {
//key:姓名,value:成绩
context.write(new Text(nameString),new IntWritable(scoreInt));
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
public static class Reduce extends Reducer<Text,IntWritable,Text,IntWritable>{
public void reduce(Text key,Iterable<IntWritable> values,Context context){
//计算给同学的总成绩
int sum=0;
//获取科目数目
int count=0;
Iterator<IntWritable> iterable=values.iterator();
while(iterable.hasNext()){
sum+=iterable.next().get();
count++;
}
int avg=(int)sum/count;
try {
context.write(key,new IntWritable(avg));
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
public static void main(String[] args) throws IllegalArgumentException, IOException, ClassNotFoundException, InterruptedException{
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setJarByClass(Socre.class);
job.setJobName("avgscore1");
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path("/avgscoreinput"));
FileOutputFormat.setOutputPath(job, new Path("/avgscoreoutput"));
job.waitForCompletion(true);
}
}
运行结果:
程序还是有没有考虑到的情况,比如成绩时小数的话,应该用double或float来定义变量,但是该程序只是提供一个算法思想。