关于reduce输出write方法

关于hadoop一些自定义输出

code>OutputFormat</code> describes the output-specification for a

 * Map-Reduce job.

首先继承outputFormat<key,value>这个抽象类 Map-Reduce job的输出规范

实现他的方法：

RecordWriter<KeyBaseDimension, BaseStatsValueWritable> getRecordWriter 在方法内可以进行数据库连接操作

这里需要一个返回一个RecordWriter

继承这个RecordWriter类

 实现里面的write方法 进行数据库jdbc存储即可

 关于reduce端输出时会调用的write方法

 实现类为：TaskInputOutputContextImpl

  private RecordWriter<KEYOUT,VALUEOUT> output;

  public void write(KEYOUT key, VALUEOUT value

                    ) throws IOException, InterruptedException {

    output.write(key, value);

  }

  最终是调用了RecordWriter的write方法，

map端读取hbase一个mr工具类

*在提交TableMap作业之前使用此选项。它将被适当地设置

*工作。

TableMapReduceUtil 这个类很重要，在提交读取hbase表job之前可以对其进行一系列过滤操作

FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);

        filterList.addFilter(

                new SingleColumnValueFilter(EventLogConstants.EVENT_LOGS_FAMILY_NAME_BYTES,

                        Bytes.toBytes(EventLogConstants.LOG_COLUMN_NAME_EVENT_NAME),

                        CompareOp.EQUAL, Bytes.toBytes(EventLogConstants.EventEnum.BC_SX.alias)));

public static void initTableMapperJob(List<Scan> scans,

      Class<? extends TableMapper> mapper,

      Class<?> outputKeyClass,

      Class<?> outputValueClass, Job job,

      boolean addDependencyJars,

      boolean initCredentials) throws IOException {

 scan之前进行过滤器数据
   List<Scan> scanList = new ArrayList<Scan>();

        try {

            conn = ConnectionFactory.createConnection(conf);

            admin = conn.getAdmin();

            String tableName = EventLogConstants.HBASE_NAME_AUDIT_SX + GlobalConstants.UNDERLINE + statDate.replaceAll(GlobalConstants.KEY_SEPARATOR, "");

            if (admin.tableExists(TableName.valueOf(tableName))) {

                Scan scan = new Scan();

  scan读取多个表的设置
 // If an application wants to use multiple scans over different tables each scan must

  // define this attribute with the appropriate table name by calling

  // scan.setAttribute(Scan.SCAN_ATTRIBUTES_TABLE_NAME, Bytes.toBytes(tableName))

 // static public final String SCAN_ATTRIBUTES_TABLE_NAME = "scan.attributes.table.name";

                scan.setAttribute(Scan.SCAN_ATTRIBUTES_TABLE_NAME, Bytes.toBytes(tableName));

                scan.setFilter(filterList);

                scanList.add(scan);

          }
最后将job 与scanlist都设置进去

TableMapReduceUtil.initTableMapperJob(scanList, AuditorSXMapper.class,

AuditorDimensionKey.class, Text.class, job, false);

strom一些笔记知识

 storm echo(File(),fun,File())

 filter:实现filter接口 iskeep方法

 partitionAggregate函数：分区内聚合，实现aggregate<保存聚合状态的类> 的aggregate实现聚合逻辑 ，complete方法 ridentCollector collector.emit(Value(聚合后的值))

 一般的key拼接函数：实现function接口的execute方法

 HBaseMapState.Options optsWait = new HBaseMapState.Options();

     TridentState amtOfWaitState = partStream.project(new Fields("waitingTotalOfPartDay","dayAndContType"))

                .groupBy(new Fields("dayAndContType"))

                .persistentAggregate(

                        factoryWait,

                        new Fields("waitingTotalOfPartDay"),new Sum(),

                        new Fields("waitingGlobalOfDay")

                );

 persistentAggregate 持久化保存函数 进行全区的sum求和，输入各区，输出为总和

秒客网

关于reduce输出write方法

相关文章