(待整理)flume操作----------hivelogsToHDFS案例----------运行时,发生NoClassDefFoundError错误

时间:2024-03-08 14:47:00

1.

 

 

2.错误日志

命令为 bin/flume-ng agent --name a2 --conf conf/ --conf-file job/file-hdfs.conf 

Info: Sourcing environment configuration script /opt/modules/flume/conf/flume-env.sh
Info: Including Hive libraries found via () for Hive access
+ exec /opt/modules/jdk1.8.0_121/bin/java -Xmx20m -cp \'/opt/modules/flume/conf:/opt/modules/flume/lib/*:/lib/*\' -Djava.library.path= org.apache.flume.node.Application --name a2 --conf-file job/file-hdfs.conf
Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor" java.lang.NoClassDefFoundError: org/apache/htrace/SamplerBuilder
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:635)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)

 

3.情况好转

把如图的两个jar放入flume下的lib目录

重新运行flume,没有报错,但是没有动静,如图

同时启动hive,在hdfs并没有产生/flume/%Y%m%d/%H目录

 

问题待解决!!!

 

4.进一步实验

把那两个jar移除,同时把conf中sink指定的02号机namenode关闭掉,再启动01号机上的flume,没有发生错误但是在hdfs上任然没有flume目录

 

猜想原因:能够不报错,可能是因为JVM记录着原来的变量??????

 

 

问题待解决!!!

 

案列3,发生同样的情况,HDFS上没有flume文件夹

在命令中加入了输出日志

 bin/flume-ng agent --conf conf/ --name a3 --conf-file job/dir-hdfs.conf -Dflume.root.logger=INFO,console

发现错误日志

 

2019-01-23 06:44:20,627 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:182)] Starting Source r3
2019-01-23 06:44:20,628 (lifecycleSupervisor-1-4) [INFO - org.apache.flume.source.SpoolDirectorySource.start(SpoolDirectorySource.java:83)] SpoolDirectorySource source starting with directory: /opt/module/flume/upload
2019-01-23 06:44:20,634 (lifecycleSupervisor-1-4) [ERROR - org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)] Unable to start EventDrivenSourceRunner: { source:Spool Directory source r3: { spoolDir: /opt/module/flume/upload } } - Exception follows.
java.lang.IllegalStateException: Directory does not exist: /opt/module/flume/upload
        at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
        at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.<init>(ReliableSpoolingFileEventReader.java:159)
        at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.<init>(ReliableSpoolingFileEventReader.java:85)

上述日志中错误原因是:

conf中少了s

 

改正之后重新运行flume:

同时上传NOTICE文件到upload中,此时upload文件中

但是flume打印出来的日志提示:

[ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:447)] process failed
java.lang.NoClassDefFoundError: org/apache/htrace/SamplerBuilder

2019-01-23 07:05:33,717 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: SOURCE, name: r3 started
2019-01-23 07:05:34,064 (pool-5-thread-1) [INFO - org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:324)] Last read took us just up to a file boundary. Rolling to the next file, if there is one.
2019-01-23 07:05:34,064 (pool-5-thread-1) [INFO - org.apache.flume.client.avro.ReliableSpoolingFileEventReader.rollCurrentFile(ReliableSpoolingFileEventReader.java:433)] Preparing to move file /opt/modules/flume/upload/NOTICE to /opt/modules/flume/upload/NOTICE.COMPLETED
2019-01-23 07:05:34,085 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.HDFSDataStream.configure(HDFSDataStream.java:57)] Serializer = TEXT, UseRawLocalFileSystem = false
2019-01-23 07:05:34,388 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:231)] Creating hdfs://hadoop-senior02.itguigu.com:9000/flume/upload/20190123/07/upload-.1548198334086.tmp
2019-01-23 07:05:34,636 (hdfs-k3-call-runner-0) [WARN - org.apache.hadoop.util.NativeCodeLoader.<clinit>(NativeCodeLoader.java:62)] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-01-23 07:05:34,837 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:447)] process failed
java.lang.NoClassDefFoundError: org/apache/htrace/SamplerBuilder
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:635)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)

经查询:flime/lib中缺少htrace-core-3.1.0-incubating.jar包,mvn工程的话,通过mvn install安装(参考http://blog.51cto.com/enetq/1827028)。我直接找到此jar包手动拷贝进lib/xia

上面问题解决了,继续:cp NOTICE upload/,但是flume报错,日志如下:

java.lang.NoClassDefFoundError: org/apache/commons/io/Charsets
Info: Sourcing environment configuration script /opt/modules/flume/conf/flume-env.sh
Info: Including Hive libraries found via () for Hive access
+ exec /opt/modules/jdk1.8.0_121/bin/java -Xmx20m -cp \'/opt/modules/flume/conf:/opt/modules/flume/lib/*:/lib/*\' -Djava.library.path= org.apache.flume.node.Application -n a3 -f job/dir-hdfs.conf
Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor" java.lang.NoClassDefFoundError: org/apache/commons/io/Charsets
        at org.apache.hadoop.ipc.Server.<clinit>(Server.java:182)
        at org.apache.hadoop.ipc.ProtobufRpcEngine.<clinit>(ProtobufRpcEngine.java:72)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)

解决:把commons-io-2.4.jar放进flume/lib/目录下

再重新过程,出现HDFS  IO error,见日志:

2019-01-23 15:48:00,717 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:443)] HDFS IO error
java.net.ConnectException: Call From hadoop-senior01/192.168.10.20 to hadoop-senior02:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

   (插曲)  因为file-hdfs.conf,之前也出现了问题,现在配置基本改好了。运行此配置出现,如日志所示问题:

xecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
2019-01-23 15:59:29,157 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:231)] Creating hdfs://hadoop-senior01/flume/20190123/15/logs-.1548230362663.tmp
2019-01-23 15:59:29,199 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:443)] HDFS IO error
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby
        at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)

打不开HA中的standby节点中的目录,改成active namenode之后,flume运行过程成功!

 

继续,dir-file.conf还是出问题,经对比file-file.conf(成功),dir-file.conf中指定了9000端口,去掉,成功!!!

a2.sinks.k2.hdfs.path = hdfs://hadoop-senior02/flume/%Y%m%d/
%H

 

 

 

 

 

 

 

 

 

有关参考:https://blog.csdn.net/dai451954706/article/details/50449436

https://blog.csdn.net/woloqun/article/details/81350323