第1节 flume：8、flume采集某个文件内容到hdfs上

2、采集文件内容到HDFS

需求分析：

采集需求：比如业务系统使用log4j生成的日志，日志内容不断增加，需要把追加到日志文件中的数据实时采集到hdfs。

同一个日志文件的内容不断增加，hdfs上该文件对应的文件的内容也要同时增加。

根据需求，首先定义以下3大要素

l 采集源，即source——监控文件内容更新 : exec ‘tail -F file’

l 下沉目标，即sink——HDFS文件系统 : hdfs sink

l Source和sink之间的传递通道——channel，可用file channel 也可以用内存channel

定义flume的配置文件

node03开发配置文件

cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf

vim tail-file.conf

配置文件内容

agent1.sources = source1

agent1.sinks = sink1

agent1.channels = channel1

# Describe/configure tail -F source1

agent1.sources.source1.type = exec

agent1.sources.source1.command = tail -F /export/servers/taillogs/access_log

agent1.sources.source1.channels = channel1

#configure host for source

#agent1.sources.source1.interceptors = i1

#agent1.sources.source1.interceptors.i1.type = host

#agent1.sources.source1.interceptors.i1.hostHeader = hostname

# Describe sink1

agent1.sinks.sink1.type = hdfs

#a1.sinks.k1.channel = c1

agent1.sinks.sink1.hdfs.path = hdfs://node01:8020/weblog/flume-collection/%y-%m-%d/%H-%M

agent1.sinks.sink1.hdfs.filePrefix = access_log

agent1.sinks.sink1.hdfs.maxOpenFiles = 5000

agent1.sinks.sink1.hdfs.batchSize= 100

agent1.sinks.sink1.hdfs.fileType = DataStream

agent1.sinks.sink1.hdfs.writeFormat =Text

agent1.sinks.sink1.hdfs.rollSize = 102400

agent1.sinks.sink1.hdfs.rollCount = 1000000

agent1.sinks.sink1.hdfs.rollInterval = 60

agent1.sinks.sink1.hdfs.round = true

agent1.sinks.sink1.hdfs.roundValue = 10

agent1.sinks.sink1.hdfs.roundUnit = minute

agent1.sinks.sink1.hdfs.useLocalTimeStamp = true

# Use a channel which buffers events in memory

agent1.channels.channel1.type = memory

agent1.channels.channel1.keep-alive = 120

agent1.channels.channel1.capacity = 500000

agent1.channels.channel1.transactionCapacity = 600

# Bind the source and sink to the channel

agent1.sources.source1.channels = channel1

agent1.sinks.sink1.channel = channel1

启动flume

cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin

bin/flume-ng agent -c conf -f conf/tail-file.conf -n agent1 -Dflume.root.logger=INFO,console

秒客网

第1节 flume：8、flume采集某个文件内容到hdfs上

2、采集文件内容到HDFS

需求分析：

定义flume的配置文件

启动flume

相关文章

第1节 flume：8、flume采集某个文件内容到hdfs上

2、 采集文件内容到HDFS

需求分析：

定义flume的配置文件

启动flume

相关文章

2、采集文件内容到HDFS