mapreduce计算平均值

时间:2021-04-25 00:44:51

当我们有每一位同学的每一科成绩时,我们计算他们的平均成绩,用传统的方法比较麻烦,如果我们用hadoop中MapReduce组件的话就比较简单了。
测试数据如下:
mapreduce计算平均值
从上面的数据可以看到,计算每一位同学的平均成绩,在map阶段,我们可以用同学的姓名作为key,成绩作为value;在reduce阶段,key值相同的value值相加计算出总成绩,并且计算出科目的数量,然后用总成绩来除以科目数量就可以得出每一位同学的平均成绩了。
代码如下:

import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Socre {
    public static class Map extends Mapper<Object,Text,Text,IntWritable>{
        public void map(Object key,Text value,Context context){
            //把读入的每一行转换为String
            String line=value.toString();
            //以“\n”切分数据
            StringTokenizer tokenizer=new StringTokenizer(line,"\n");
            while(tokenizer.hasMoreElements()){
                //以空格切分数据
                StringTokenizer tokenizerLine=new StringTokenizer(tokenizer.nextToken());
                //获取姓名
                String nameString=tokenizerLine.nextToken();
                //获取成绩
                String scoreString=tokenizerLine.nextToken();
                int scoreInt=Integer.parseInt(scoreString);
                try {
                    //key:姓名,value:成绩
                    context.write(new Text(nameString),new IntWritable(scoreInt));
                } catch (IOException e) {
                    e.printStackTrace();
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        }
    }

    public static class Reduce extends Reducer<Text,IntWritable,Text,IntWritable>{
        public void reduce(Text key,Iterable<IntWritable> values,Context context){
            //计算给同学的总成绩
            int sum=0;
            //获取科目数目
            int count=0;
            Iterator<IntWritable> iterable=values.iterator();
            while(iterable.hasNext()){
                sum+=iterable.next().get();
                count++;
            }
            int avg=(int)sum/count;
            try {
                context.write(key,new IntWritable(avg));
            } catch (IOException e) {
                e.printStackTrace();
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }

    public static void main(String[] args) throws IllegalArgumentException, IOException, ClassNotFoundException, InterruptedException{
        Configuration conf = new Configuration();
        Job job = new Job(conf);
        job.setJarByClass(Socre.class);
        job.setJobName("avgscore1");
        job.setMapperClass(Map.class); 
        job.setReducerClass(Reduce.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path("/avgscoreinput"));
        FileOutputFormat.setOutputPath(job, new Path("/avgscoreoutput"));
        job.waitForCompletion(true);

    }

}

运行结果:
mapreduce计算平均值
程序还是有没有考虑到的情况,比如成绩时小数的话,应该用double或float来定义变量,但是该程序只是提供一个算法思想。