自定义UDF

1、编写udf类


package hive.udf;

import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;

/**
 * UDF的说明文档
 * name是UDF的名称
 * value是desc function xx输出的内容 
 * extended是desc function extended xx输出的内容 
 *
 */
@Description(name="uppernum", 
	value="_FUNC_ the input is a num, the output is the UpperNum", 
	extended="Example:\n" + 
		"> select uppernum(1) from src \n")
public class UpperNumUDF extends UDF {

	/**
	 * UDF类需要自定义evaluate方法，可以重载多个
	 * 
	 * evaluate的参数、返回值必须是hive可以序列化的类型
	 * 
	 * 参数：
	 * 	1）可以是java基本类型、String
	 * 	2）可以是Text
	 * 	3）可以包含多个参数，参数类型也可以不同
	 * 	4）可以是List<?>。如果hive表字段是Array<?>类型的，则是List
	 * 
	 * 返回值：
	 * 	1）必须有返回值，可以为null
	 * 	2）返回值只能是java基本类型、String、Writable
	 * 
	 * 可以自定义 UDFMethodResolver来手动匹配方法的执行
	 * 
	 * @param num
	 * @return
	 */
	public String evaluate(int num) {
		String str = "ERROR";
		
		switch (num) {
		case 0:
			str = "零";
			break;
		case 1:
			str = "壹";
			break;
		case 2:
			str = "贰";
			break;
		case 3:
			str = "叁";
			break;
		case 4:
			str = "肆";
			break;
		case 5:
			str = "伍";
			break;
		default:
			break;
		}
		
		return str;
	}
	
	public String evaluate(String num) {
		int intnum = 100;
		try {
			intnum = Integer.parseInt(num);
		} catch (NumberFormatException e) {
		}
		return evaluate(intnum);
	}

}

2、打包类，将jar上传到hive所在机器上

3、启动hive，在hive会话里执行：

1）加载jar包

hive (test)> add jar ${env:HOME}/udf.jar;

ps：jar包路径没有引号

2）创建uppernum

hive (test)> create [temporary] function uppernum as 'hive.udf.UpperNumUDF';

ps：创建的function，在hive会话结束后将消失，可以在.hiverc里设置，来确保每次hive启动时都执行create。

3）使用uppernum


hive (test)> select uppernum(2) from dual;
OK
贰
Time taken: 0.118 seconds, Fetched: 1 row(s)
hive (test)> select uppernum(5) from dual;
OK
伍

4）查看uppernum的说明文档


hive (test)> desc function uppernum;
OK
uppernum the input is a num, the output is the UpperNum
Time taken: 0.138 seconds, Fetched: 1 row(s)

hive (test)> desc function extended uppernum;
OK
uppernum the input is a num, the output is the UpperNum
Example:
> select uppernum(1) from src 

Time taken: 0.138 seconds, Fetched: 4 row(s)

5）删除UDF


hive> drop [temporary] function uppernum;
OK
Time taken: 0.221 seconds

来自为知笔记(Wiz)

秒客网

附件列表

相关文章