安装Mahout,并运行20newsgroup的测试样例,抓图说明实验过程
1:下载二进制解压安装
http://mirror.bit.edu.cn/apache/mahout/0.8/mahout-distribution-0.8.tar.gz
tar -zxvf mahout-distribution-0.8.tar.gz
2:配置环境变量
在/etc/profile,/usr/grid/.bashrc中添加如下信息
export MAHOUT_HOME=/usr/grid/mahout-distribution-0.8
export MAHOUT_CONF_DIR=/usr/grid/mahout-distribution-0.8/conf
export PATH=$PATH:$MAHOUT_HOME/conf:$MAHOUT_HOME/bin
3.启动hadoop.
[grid@h1 data]$ start-all.sh
[grid@h1 data]$ jps
10163 Jps
4178 SecondaryNameNode
3997 NameNode
4260 JobTracker
4.检查Mahout是否安装完好,看是否列出了一些算法
[grid@h1 data]$ mahout–help
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.
Running on hadoop, using /user/grid/hadoop/bin/hadoop andHADOOP_CONF_DIR=/user/grid/hadoop/conf
MAHOUT-JOB:/usr/grid/mahout-distribution-0.8/mahout-examples-0.8-job.jar
Warning: $HADOOP_HOME is deprecated.
14/01/26 09:43:37 WARN driver.MahoutDriver: Unable to addclass: –help
14/01/26 09:43:38 WARN driver.MahoutDriver: No –help.propsfound on classpath, will use command-line arguments only
Unknown program '–help' chosen.
Valid program names are:
arff.vector: :Generate Vectors from an ARFF file or directory
baumwelch: :Baum-Welch algorithm for unsupervised HMM training
canopy: : Canopyclustering
cat: : Print a fileor resource as the logistic regression models would see it
cleansvd: : Cleanupand verification of SVD output
clusterdump: : Dumpcluster output to text
clusterpp: : GroupsClustering Output In Clusters
cmdump: : Dumpconfusion matrix in HTML or text formats
concatmatrices: :Concatenates 2 matrices of same cardinality into a single matrix
cvb: : LDA viaCollapsed Variation Bayes (0th deriv. approx)
cvb0_local: : LDA viaCollapsed Variation Bayes, in memory locally.
dirichlet: :Dirichlet Clustering
eigencuts: :Eigencuts spectral clustering
evaluateFactorization:: compute RMSE and MAE of a rating matrix factorization against probes
fkmeans: : FuzzyK-means clustering
fpg: : FrequentPattern Growth
hmmpredict: :Generate random sequence of observations by given HMM
itemsimilarity: :Compute the item-item-similarities for item-based collaborative filtering
kmeans: : K-meansclustering
lucene.vector: :Generate Vectors from a Lucene index
lucene2seq: :Generate Text SequenceFiles from a Lucene index
matrixdump: : Dumpmatrix in CSV format
matrixmult: : Takethe product of two matrices
meanshift: : MeanShift clustering
minhash: : RunMinhash clustering
parallelALS: : ALS-WRfactorization of a rating matrix
qualcluster: : Runsclustering experiments and summarizes results in a CSV
recommendfactorized:: Compute recommendations using the factorization of a rating matrix
recommenditembased: :Compute recommendations using item-based collaborative filtering
regexconverter: :Convert text files on a per line basis based on regular expressions
resplit: : Splits aset of SequenceFiles into a number of equal splits
rowid: : MapSequenceFile<Text,VectorWritable> to{SequenceFile<IntWritable,VectorWritable>,SequenceFile<IntWritable,Text>}
rowsimilarity: :Compute the pairwise similarities of the rows of a matrix
runAdaptiveLogistic:: Score new production data using a probably trained and validatedAdaptivelogisticRegression model
runlogistic: : Run alogistic regression model against CSV data
seq2encoded: :Encoded Sparse Vector generation from Text sequence files
seq2sparse: : SparseVector generation from Text sequence files
seqdirectory: :Generate sequence files (of Text) from a directory
seqdumper: : GenericSequence File dumper
seqmailarchives: :Creates SequenceFile from a directory containing gzipped mail archives
seqwiki: : Wikipediaxml dump to sequence file
spectralkmeans: :Spectral k-means clustering
split: : Split Inputdata into test and train sets
splitDataset: : splita rating dataset into training and probe parts
ssvd: : StochasticSVD
streamingkmeans: :Streaming k-means clustering
svd: : LanczosSingular Value Decomposition
testnb: : Test theVector-based Bayes classifier
trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
trainlogistic: :Train a logistic regression using stochastic gradient descent
trainnb: : Train theVector-based Bayes classifier
transpose: : Take thetranspose of a matrix
validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression modelagainst hold-out data set
vecdist: : Computethe distances between a set of Vectors (or Cluster or Canopy, they must fit inmemory) and a list of Vectors
vectordump: : Dumpvectors from a sequence file to text
viterbi: : Viterbidecoding of hidden states from given output states sequence
[grid@h1 data]$
[grid@h1 data]$
[grid@h1 data]$
[grid@h1 data]$
下载测试数据
wgethttp://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data
5.将测试数据拷贝到HDFS
hadoop fs -mkdir ./testdata
hadoop fs -put ./synthetic_control.data ./testdata
hadoop fs -ls ./testdata
6.做一个kmeans聚类测试
mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
[grid@h1 ~]$ mahoutorg.apache.mahout.clustering.syntheticcontrol.kmeans.Job
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.
Running on hadoop, using /usr/grid/hadoop/bin/hadoop andHADOOP_CONF_DIR=/usr/grid/hadoop/conf
MAHOUT-JOB:/usr/grid/mahout-distribution-0.8/mahout-examples-0.8-job.jar
Warning: $HADOOP_HOME is deprecated.
14/01/26 00:43:36 WARN driver.MahoutDriver: Noorg.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found onclasspath, will use command-line arguments only
14/01/26 00:43:36 INFO kmeans.Job: Running with defaultarguments
14/01/26 00:43:37 INFO common.HadoopUtil: Deleting output
14/01/26 00:43:37 INFO kmeans.Job: Preparing Input
14/01/26 00:43:37 WARN mapred.JobClient: UseGenericOptionsParser for parsing the arguments. Applications should implementTool for the same.
14/01/26 00:43:45 INFO input.FileInputFormat: Total inputpaths to process : 1
14/01/26 00:43:45 INFO util.NativeCodeLoader: Loaded thenative-hadoop library
14/01/26 00:43:45 WARN snappy.LoadSnappy: Snappy nativelibrary not loaded
14/01/26 00:43:45 INFO mapred.JobClient: Running job:job_201401252331_0013
14/01/26 00:43:46 INFO mapred.JobClient: map 0% reduce 0%
14/01/26 00:44:12 INFO mapred.JobClient: map 100% reduce 0%
14/01/26 00:44:17 INFO mapred.JobClient: Job complete:job_201401252331_0013
14/01/26 00:44:17 INFO mapred.JobClient: Counters: 19
14/01/26 00:44:17 INFO mapred.JobClient: Job Counters
14/01/26 00:44:17 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=15321
14/01/26 00:44:17 INFO mapred.JobClient: Total time spent by all reduces waitingafter reserving slots (ms)=0
14/01/26 00:44:17 INFO mapred.JobClient: Total time spent by all maps waiting afterreserving slots (ms)=0
14/01/26 00:44:17 INFO mapred.JobClient: Launched map tasks=1
14/01/26 00:44:17 INFO mapred.JobClient: Data-local map tasks=1
14/01/26 00:44:17 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
14/01/26 00:44:17 INFO mapred.JobClient: File Output Format Counters
14/01/26 00:44:17 INFO mapred.JobClient: Bytes Written=335470
14/01/26 00:44:17 INFO mapred.JobClient: FileSystemCounters
14/01/26 00:44:17 INFO mapred.JobClient: HDFS_BYTES_READ=288495
14/01/26 00:44:17 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21400
14/01/26 00:44:17 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=335470
14/01/26 00:44:17 INFO mapred.JobClient: File Input Format Counters
14/01/26 00:44:17 INFO mapred.JobClient: Bytes Read=288374
14/01/26 00:44:17 INFO mapred.JobClient: Map-Reduce Framework
14/01/26 00:44:17 INFO mapred.JobClient: Map input records=600
14/01/26 00:44:17 INFO mapred.JobClient: Physical memory (bytes) snapshot=33697792
14/01/26 00:44:17 INFO mapred.JobClient: Spilled Records=0
14/01/26 00:44:17 INFO mapred.JobClient: CPU time spent (ms)=330
14/01/26 00:44:17 INFO mapred.JobClient: Total committed heap usage (bytes)=7929856
14/01/26 00:44:17 INFO mapred.JobClient: Virtual memory (bytes) snapshot=376700928
14/01/26 00:44:17 INFO mapred.JobClient: Map output records=600
14/01/26 00:44:17 INFO mapred.JobClient: SPLIT_RAW_BYTES=121
14/01/26 00:44:17 INFO kmeans.Job: Running random seed to getinitial clusters
14/01/26 00:44:17 INFO zlib.ZlibFactory: Successfully loaded& initialized native-zlib library
14/01/26 00:44:17 INFO compress.CodecPool: Got brand-newcompressor
14/01/26 00:44:18 INFO kmeans.RandomSeedGenerator: Wrote 6Klusters to output/random-seeds/part-randomSeed
14/01/26 00:44:18 INFO kmeans.Job: Running KMeans with k = 6
14/01/26 00:44:18 INFO kmeans.KMeansDriver: Input:output/data Clusters In: output/random-seeds/part-randomSeed Out: outputDistance: org.apache.mahout.common.distance.EuclideanDistanceMeasure
14/01/26 00:44:18 INFO kmeans.KMeansDriver: convergence: 0.5max Iterations: 10
14/01/26 00:44:18 INFO compress.CodecPool: Got brand-newdecompressor
14/01/26 00:44:19 WARN mapred.JobClient: UseGenericOptionsParser for parsing the arguments. Applications should implementTool for the same.
14/01/26 00:44:23 INFO input.FileInputFormat: Total inputpaths to process : 1
14/01/26 00:44:24 INFO mapred.JobClient: Running job:job_201401252331_0014
14/01/26 00:44:25 INFO mapred.JobClient: map 0% reduce 0%
14/01/26 00:44:48 INFO mapred.JobClient: map 100% reduce 0%
.404, 25.369, 21.068, 19.346, 20.055, 23.319, 24.743, 16.394,16.527, 25.255, 15.532, 23.677, 16.800, 16.444, 24.945, 14.802, 21.979, 17.191,23.474, 14.164, 24.928, 13.213, 22.669, 14.831, 17.453, 13.798, 22.499, 11.606,12.931, 15.505, 13.456, 14.295]
1.0: [35.162,30.783, 33.848, 26.778, 32.632, 30.928, 33.958, 30.005, 32.792, 23.600, 32.514,31.969, 23.302, 22.740, 26.831, 25.599, 28.648, 23.295, 25.424, 23.333, 23.906,20.742, 21.021, 19.262, 19.733, 22.366, 24.415, 21.432, 21.119, 28.102, 21.169,26.818, 25.745, 24.934, 19.991, 22.085, 17.193, 20.809, 18.696, 15.019, 22.573,
1.0: [35.899,26.672, 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979, 26.118,26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553, 21.452, 15.836,21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542, 25.766, 26.018, 20.820,24.959, 18.959, 23.346, 16.068, 22.836, 21.939, 25.722, 19.671, 26.299, 21.879,16.002, 15.288, 16.946, 17.534, 16.846, 16.546, 15.927, 18.084, 17.475]
14/01/26 00:39:32 INFO clustering.ClusterDumper: Wrote 6 clusters
14/01/26 00:39:32 INFO driver.MahoutDriver: Program took860026 ms (Minutes: 14.333766666666667)
[grid@h1 ~]$
7.测试贝叶斯分类器
1.下载数据集解压
http://qwone.com/~jason/20Newsgroups/20news-bydate.tar.gz
cd data
[grid@h1 data]$ pwd
/usr/grid/data
[grid@h1 data]$ tar –zxvf 20news-bydate.tar.gz
………
20news-bydate-train/talk.religion.misc/84200
20news-bydate-train/talk.religion.misc/84131
20news-bydate-train/talk.religion.misc/84201
20news-bydate-train/talk.religion.misc/84101
20news-bydate-train/talk.religion.misc/84202
20news-bydate-train/talk.religion.misc/84203
2.建立训练集
数据准备
①将20news-bydate.tar.gz解压,并将20news-bydate中的所有子文夹中的内容复制到20news-all中,执行如下命令:
[grid@h1 data]$ hadoop fs -mkdir 20news-all
[grid@h1 data]$ hadooop fs -ls 20news-all
[grid@h1 data]$ ll
总用量 14140
-rwxrw-rw-. 1 gridgrid 14464277 1月 2609:09 20news-bydate.tar.gz
drwxr-xr-x. 22 grid grid 4096 3月 182003 20news-bydate-test
drwxr-xr-x. 22 grid grid 4096 3月 182003 20news-bydate-train
drwxrwxr-x. 6 gridgrid 4096 1月 26 10:41 tmp
drwxrwxr-x. 22 grid grid 4096 1月 27 00:05 20news-all
drwxrwxr-x. 4 grid grid 4096 1月 27 00:04 20news-bydate
[grid@h1 mahout-work-grid]$ hadoop fs -put 20news-all /usr/grid/mahout-work-grid/20news-all
Warning: $HADOOP_HOME is deprecated.
[grid@h1 mahout-work-grid]$ cd $MAHOUT_HOME
[grid@h1 mahout-distribution-0.8]$./examples/bin/classify-20newsgroups.sh
Please select a number to choose thecorresponding task to run
1. cnaivebayes
2. naivebayes
3. sgd
4. clean -- cleans up the work area in/usr/grid/mahout-work-grid
Enter your choice : 2
ok. You chose 2 and we'll use naivebayes
creating work directory at/usr/grid/mahout-work-grid
+ echo 'Preparing 20newsgroups data'
Preparing 20newsgroups data
+ rm -rf/usr/grid/mahout-work-grid/20news-all
+ mkdir/usr/grid/mahout-work-grid/20news-all
+ cp -R /usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/alt.atheism/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/comp.graphics/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/comp.os.ms-windows.misc/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/comp.sys.ibm.pc.hardware/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/comp.sys.mac.hardware/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/comp.windows.x/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/misc.forsale/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/rec.autos/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/rec.motorcycles/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/rec.sport.baseball/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/rec.sport.hockey/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/sci.crypt/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/sci.electronics/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/sci.med/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/sci.space/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/soc.religion.christian/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/talk.politics.guns/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/talk.politics.mideast/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/talk.politics.misc/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/talk.religion.misc/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/alt.atheism/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/comp.graphics/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/comp.os.ms-windows.misc/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/comp.sys.ibm.pc.hardware/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/comp.sys.mac.hardware/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/comp.windows.x/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/misc.forsale/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/rec.autos/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/rec.motorcycles/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/rec.sport.baseball/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/rec.sport.hockey/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/sci.crypt/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/sci.electronics/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/sci.med/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/sci.space/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/soc.religion.christian/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/talk.politics.guns/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/talk.politics.mideast/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/talk.politics.misc/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/talk.religion.misc/usr/grid/mahout-work-grid/20news-all
+ echo 'Creating sequence files from20newsgroups data'
Creating sequence files from 20newsgroupsdata
+ ./bin/mahout seqdirectory -i/usr/grid/mahout-work-grid/20news-all -o /usr/grid/mahout-work-grid/20news-seq-ow
MAHOUT_LOCAL is not set; addingHADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.
Running on hadoop, using/usr/grid/hadoop/bin/hadoop and HADOOP_CONF_DIR=/usr/grid/hadoop/conf
MAHOUT-JOB:/usr/grid/mahout-distribution-0.8/mahout-examples-0.8-job.jar
Warning: $HADOOP_HOME is deprecated.
14/01/27 00:25:06 INFO common.AbstractJob:Command line arguments: {--charset=[UTF-8], --chunkSize=[64],--endPhase=[2147483647],--fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter],--input=[/usr/grid/mahout-work-grid/20news-all], --keyPrefix=[],--method=[mapreduce], --output=[/usr/grid/mahout-work-grid/20news-seq],--overwrite=null, --startPhase=[0], --tempDir=[temp]}
14/01/27 00:25:34 INFOinput.FileInputFormat: Total input paths to process : 18846
14/01/27 00:25:35 INFOutil.NativeCodeLoader: Loaded the native-hadoop library
14/01/27 00:25:35 WARN snappy.LoadSnappy:Snappy native library not loaded
14/01/27 00:25:53 INFO mapred.JobClient:Running job: job_201401261418_0001
14/01/27 00:25:54 INFO mapred.JobClient: map 0% reduce 0%
14/01/27 00:26:41 INFOmapred.JobClient: map 1% reduce 0%
14/01/27 00:26:44 INFOmapred.JobClient: map 2% reduce 0%
14/01/27 00:26:47 INFOmapred.JobClient: map 4% reduce 0%
14/01/27 00:26:50 INFOmapred.JobClient: map 5% reduce 0%
14/01/27 00:26:53 INFOmapred.JobClient: map 7% reduce 0%
14/01/27 00:26:56 INFOmapred.JobClient: map 8% reduce 0%
14/01/27 00:26:59 INFOmapred.JobClient: map 9% reduce 0%
14/01/27 00:27:05 INFOmapred.JobClient: map 10% reduce 0%
14/01/27 00:27:08 INFOmapred.JobClient: map 12% reduce 0%
14/01/27 00:27:11 INFOmapred.JobClient: map 13% reduce 0%
14/01/27 00:27:14 INFOmapred.JobClient: map 14% reduce 0%
14/01/27 00:27:17 INFOmapred.JobClient: map 17% reduce 0%
14/01/27 00:27:20 INFOmapred.JobClient: map 18% reduce 0%
14/01/27 00:27:23 INFOmapred.JobClient: map 19% reduce 0%
14/01/27 00:27:26 INFOmapred.JobClient: map 21% reduce 0%
14/01/27 00:27:29 INFOmapred.JobClient: map 22% reduce 0%
14/01/27 00:27:32 INFO mapred.JobClient: map 25% reduce 0%
14/01/27 00:27:35 INFOmapred.JobClient: map 26% reduce 0%
14/01/27 00:27:38 INFOmapred.JobClient: map 28% reduce 0%
14/01/27 00:27:41 INFOmapred.JobClient: map 30% reduce 0%
14/01/27 00:27:44 INFOmapred.JobClient: map 31% reduce 0%
14/01/27 00:27:47 INFOmapred.JobClient: map 32% reduce 0%
14/01/27 00:27:50 INFOmapred.JobClient: map 34% reduce 0%
14/01/27 00:27:53 INFOmapred.JobClient: map 35% reduce 0%
14/01/27 00:27:56 INFOmapred.JobClient: map 36% reduce 0%
14/01/27 00:27:59 INFOmapred.JobClient: map 38% reduce 0%
14/01/27 00:28:05 INFOmapred.JobClient: map 40% reduce 0%
14/01/27 00:28:08 INFOmapred.JobClient: map 41% reduce 0%
14/01/27 00:28:11 INFOmapred.JobClient: map 42% reduce 0%
14/01/27 00:28:17 INFO mapred.JobClient: map 43% reduce 0%
14/01/27 00:28:24 INFOmapred.JobClient: map 45% reduce 0%
14/01/27 00:28:30 INFOmapred.JobClient: map 46% reduce 0%
14/01/27 00:28:45 INFOmapred.JobClient: map 47% reduce 0%
14/01/27 00:28:54 INFOmapred.JobClient: map 49% reduce 0%
14/01/27 00:29:00 INFOmapred.JobClient: map 51% reduce 0%
14/01/27 00:29:03 INFOmapred.JobClient: map 52% reduce 0%
14/01/27 00:29:06 INFOmapred.JobClient: map 54% reduce 0%
14/01/27 00:29:08 INFOmapred.JobClient: map 56% reduce 0%
14/01/27 00:29:11 INFOmapred.JobClient: map 57% reduce 0%
14/01/27 00:29:14 INFOmapred.JobClient: map 58% reduce 0%
14/01/27 00:29:20 INFOmapred.JobClient: map 59% reduce 0%
14/01/27 00:29:23 INFOmapred.JobClient: map 61% reduce 0%
14/01/27 00:29:26 INFOmapred.JobClient: map 63% reduce 0%
14/01/27 00:29:29 INFOmapred.JobClient: map 65% reduce 0%
14/01/27 00:29:32 INFOmapred.JobClient: map 66% reduce 0%
14/01/27 00:29:35 INFOmapred.JobClient: map 68% reduce 0%
14/01/27 00:29:38 INFO mapred.JobClient: map 70% reduce 0%
14/01/27 00:29:41 INFOmapred.JobClient: map 72% reduce 0%
14/01/27 00:29:44 INFOmapred.JobClient: map 74% reduce 0%
14/01/27 00:29:47 INFOmapred.JobClient: map 76% reduce 0%
14/01/27 00:29:50 INFOmapred.JobClient: map 79% reduce 0%
14/01/27 00:29:53 INFOmapred.JobClient: map 81% reduce 0%
14/01/27 00:29:56 INFOmapred.JobClient: map 84% reduce 0%
14/01/27 00:29:59 INFOmapred.JobClient: map 86% reduce 0%
14/01/27 00:30:02 INFOmapred.JobClient: map 88% reduce 0%
14/01/27 00:30:05 INFOmapred.JobClient: map 91% reduce 0%
14/01/27 00:30:08 INFOmapred.JobClient: map 94% reduce 0%
14/01/27 00:30:14 INFOmapred.JobClient: map 100% reduce 0%
14/01/27 00:30:23 INFO mapred.JobClient:Job complete: job_201401261418_0001
14/01/27 00:30:23 INFO mapred.JobClient:Counters: 18
14/01/27 00:30:23 INFOmapred.JobClient: Job Counters
14/01/27 00:30:23 INFOmapred.JobClient: SLOTS_MILLIS_MAPS=237675
14/01/27 00:30:23 INFOmapred.JobClient: Total time spent byall reduces waiting after reserving slots (ms)=0
14/01/27 00:30:23 INFOmapred.JobClient: Total time spent byall maps waiting after reserving slots (ms)=0
14/01/27 00:30:23 INFOmapred.JobClient: Launched maptasks=1
14/01/27 00:31:27 INFO mapred.JobClient: Total time spent by all reduces waitingafter reserving slots (ms)=0
14/01/27 00:31:27 INFOmapred.JobClient: Total time spent byall maps waiting after reserving slots (ms)=0
14/01/27 00:31:27 INFOmapred.JobClient: Launched maptasks=1
14/01/27 00:31:27 INFOmapred.JobClient: Data-local maptasks=1
14/01/27 00:31:27 INFOmapred.JobClient: SLOTS_MILLIS_REDUCES=0
14/01/27 00:31:27 INFOmapred.JobClient: File Output FormatCounters
14/01/27 00:31:27 INFOmapred.JobClient: Bytes Written=27503580
14/01/27 00:31:27 INFOmapred.JobClient: FileSystemCounters
14/01/27 00:31:27 INFOmapred.JobClient: HDFS_BYTES_READ=19202520
14/01/27 00:31:27 INFOmapred.JobClient: FILE_BYTES_WRITTEN=21795
14/01/27 00:31:27 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=27503580
14/01/27 00:31:27 INFOmapred.JobClient: File Input FormatCounters
14/01/27 00:31:27 INFOmapred.JobClient: Bytes Read=19202391
14/01/27 00:31:27 INFOmapred.JobClient: Map-Reduce Framework
14/01/27 00:31:27 INFOmapred.JobClient: Map inputrecords=18846
14/01/27 00:31:27 INFOmapred.JobClient: Physical memory(bytes) snapshot=41730048
14/01/27 00:31:27 INFOmapred.JobClient: Spilled Records=0
14/01/27 00:31:27 INFOmapred.JobClient: CPU time spent(ms)=10280
14/01/27 00:31:27 INFOmapred.JobClient: Total committedheap usage (bytes)=8060928
14/01/27 00:31:27 INFOmapred.JobClient: Virtual memory(bytes) snapshot=1744265216
14/01/27 00:31:27 INFOmapred.JobClient: Map outputrecords=18846
14/01/27 00:31:27 INFOmapred.JobClient: SPLIT_RAW_BYTES=129
14/01/27 00:31:27 INFOvectorizer.SparseVectorsFromSequenceFiles: Creating Term Frequency Vectors
14/01/27 00:31:27 INFOvectorizer.DictionaryVectorizer: Creating dictionary from /usr/grid/mahout-work-grid/20news-vectors/tokenized-documentsand saving at /usr/grid/mahout-work-grid/20news-vectors/wordcount
14/01/27 00:31:31 INFOinput.FileInputFormat: Total input paths to process : 1
14/01/27 00:31:32 INFO mapred.JobClient:Running job: job_201401261418_0003
14/01/27 00:31:33 INFOmapred.JobClient: map 0% reduce 0%
14/01/27 00:32:46 INFOmapred.JobClient: map 4% reduce 0%
14/01/27 00:32:49 INFOmapred.JobClient: map 20% reduce 0%
14/01/27 00:32:52 INFOmapred.JobClient: map 41% reduce 0%
14/01/27 00:32:55 INFOmapred.JobClient: map 66% reduce 0%
14/01/27 00:32:58 INFOmapred.JobClient: map 90% reduce 0%
14/01/27 00:33:01 INFOmapred.JobClient: map 100% reduce 0%
14/01/27 00:33:22 INFOmapred.JobClient: map 100% reduce 100%
14/01/27 00:33:27 INFO mapred.JobClient:Job complete: job_201401261418_0003
14/01/27 00:33:27 INFO mapred.JobClient:Counters: 29
14/01/27 00:33:27 INFOmapred.JobClient: Job Counters
14/01/27 00:33:27 INFOmapred.JobClient: Launched reducetasks=1
14/01/27 00:33:27 INFOmapred.JobClient: SLOTS_MILLIS_MAPS=50887
14/01/27 00:33:27 INFOmapred.JobClient: Total time spent byall reduces waiting after reserving slots (ms)=0
14/01/27 00:33:27 INFOmapred.JobClient: Total time spent byall maps waiting after reserving slots (ms)=0
14/01/27 00:33:27 INFOmapred.JobClient: Launched maptasks=1
14/01/27 00:33:27 INFOmapred.JobClient: Data-local maptasks=1
14/01/27 00:33:27 INFOmapred.JobClient: SLOTS_MILLIS_REDUCES=20520
14/01/27 00:33:27 INFO mapred.JobClient: File Output Format Counters
14/01/27 00:33:27 INFOmapred.JobClient: BytesWritten=2315037
14/01/27 00:33:27 INFOmapred.JobClient: FileSystemCounters
14/01/27 00:33:27 INFOmapred.JobClient: FILE_BYTES_READ=11857906
14/01/27 00:33:27 INFOmapred.JobClient: HDFS_BYTES_READ=27503733
14/01/27 00:33:27 INFOmapred.JobClient: FILE_BYTES_WRITTEN=15440177
14/01/27 00:33:27 INFOmapred.JobClient: HDFS_BYTES_WRITTEN=2315037
14/01/27 00:33:27 INFOmapred.JobClient: File Input FormatCounters
14/01/27 00:33:27 INFOmapred.JobClient: Bytes Read=27503580
14/01/27 00:33:27 INFOmapred.JobClient: Map-Reduce Framework
14/01/27 00:33:27 INFOmapred.JobClient: Map outputmaterialized bytes=3538084
14/01/27 00:33:27 INFO mapred.JobClient: Map input records=18846
14/01/27 00:33:27 INFOmapred.JobClient: Reduce shufflebytes=0
14/01/27 00:33:27 INFOmapred.JobClient: SpilledRecords=849345
14/01/27 00:33:27 INFOmapred.JobClient: Map outputbytes=39462740
14/01/27 00:33:27 INFOmapred.JobClient: CPU time spent(ms)=20780
14/01/27 00:33:27 INFOmapred.JobClient: Total committedheap usage (bytes)=264306688
14/01/27 00:33:27 INFOmapred.JobClient: Combine inputrecords=3026242
14/01/27 00:33:27 INFO mapred.JobClient: SPLIT_RAW_BYTES=153
14/01/27 00:33:27 INFOmapred.JobClient: Reduce inputrecords=192904
14/01/27 00:33:27 INFOmapred.JobClient: Reduce inputgroups=192904
14/01/27 00:33:27 INFOmapred.JobClient: Combine outputrecords=554873
14/01/27 00:33:27 INFOmapred.JobClient: Physical memory(bytes) snapshot=253255680
14/01/27 00:33:27 INFOmapred.JobClient: Reduce outputrecords=93563
14/01/27 00:33:27 INFOmapred.JobClient: Virtual memory(bytes) snapshot=3491995648
14/01/27 00:33:27 INFOmapred.JobClient: Map outputrecords=2664273
14/01/27 00:33:31 INFOinput.FileInputFormat: Total input paths to process : 1
14/01/27 00:33:32 INFO mapred.JobClient:Running job: job_201401261418_0004
14/01/27 00:33:33 INFO mapred.JobClient: map 0% reduce 0%
14/01/27 00:34:35 INFOmapred.JobClient: map 36% reduce 0%
14/01/27 00:34:38 INFOmapred.JobClient: map 93% reduce 0%
14/01/27 00:34:41 INFOmapred.JobClient: map 100% reduce 0%
14/01/27 00:35:02 INFOmapred.JobClient: map 100% reduce 73%
14/01/27 00:35:05 INFOmapred.JobClient: map 100% reduce 84%
14/01/27 00:35:11 INFOmapred.JobClient: map 100% reduce 100%
14/01/27 00:35:16 INFO mapred.JobClient:Job complete: job_201401261418_0004
14/01/27 00:35:16 INFO mapred.JobClient:Counters: 29
14/01/27 00:35:16 INFOmapred.JobClient: Job Counters
14/01/27 00:35:16 INFOmapred.JobClient: Launched reducetasks=1
14/01/27 00:35:16 INFOmapred.JobClient: SLOTS_MILLIS_MAPS=25499
14/01/27 00:35:16 INFOmapred.JobClient: Total time spent byall reduces waiting after reserving slots (ms)=0
14/01/27 00:35:16 INFOmapred.JobClient: Total time spent byall maps waiting after reserving slots (ms)=0
14/01/27 00:35:16 INFOmapred.JobClient: Launched maptasks=1
14/01/27 00:35:16 INFOmapred.JobClient: Data-local maptasks=1
14/01/27 00:35:16 INFOmapred.JobClient: SLOTS_MILLIS_REDUCES=27870
14/01/27 00:35:16 INFOmapred.JobClient: File Output FormatCounters
14/01/27 00:35:16 INFOmapred.JobClient: Bytes Written=29314118
14/01/27 00:35:16 INFOmapred.JobClient: FileSystemCounters
14/01/27 00:35:16 INFOmapred.JobClient: FILE_BYTES_READ=29226519
14/01/27 00:35:16 INFOmapred.JobClient: HDFS_BYTES_READ=27503733
14/01/27 00:35:16 INFO mapred.JobClient: FILE_BYTES_WRITTEN=54594825
14/01/27 00:35:16 INFOmapred.JobClient: HDFS_BYTES_WRITTEN=29314118
14/01/27 00:35:16 INFOmapred.JobClient: File Input FormatCounters
14/01/27 00:35:16 INFOmapred.JobClient: Bytes Read=27503580
14/01/27 00:35:16 INFOmapred.JobClient: Map-Reduce Framework
14/01/27 00:35:16 INFOmapred.JobClient: Map outputmaterialized bytes=27274291
14/01/27 00:35:16 INFOmapred.JobClient: Map inputrecords=18846
14/01/27 00:35:16 INFOmapred.JobClient: Reduce shufflebytes=27274291
14/01/27 00:35:16 INFOmapred.JobClient: SpilledRecords=37692
14/01/27 00:35:16 INFOmapred.JobClient: Map outputbytes=27199343
14/01/27 00:35:16 INFOmapred.JobClient: CPU time spent(ms)=18110
14/01/27 00:35:16 INFO mapred.JobClient: Total committed heap usage(bytes)=304148480
14/01/27 00:35:16 INFOmapred.JobClient: Combine inputrecords=0
14/01/27 00:35:16 INFOmapred.JobClient: SPLIT_RAW_BYTES=153
14/01/27 00:35:16 INFOmapred.JobClient: Reduce inputrecords=18846
14/01/27 00:35:16 INFOmapred.JobClient: Reduce inputgroups=18846
14/01/27 00:35:16 INFOmapred.JobClient: Combine outputrecords=0
14/01/27 00:35:16 INFOmapred.JobClient: Physical memory(bytes) snapshot=298504192
14/01/27 00:35:16 INFOmapred.JobClient: Reduce outputrecords=18846
14/01/27 00:35:16 INFOmapred.JobClient: Virtual memory(bytes) snapshot=3491328000
14/01/27 00:35:16 INFOmapred.JobClient: Map outputrecords=18846
14/01/27 00:35:23 INFO input.FileInputFormat:Total input paths to process : 1
14/01/27 00:35:24 INFO mapred.JobClient:Running job: job_201401261418_0005
14/01/27 00:35:25 INFOmapred.JobClient: map 0% reduce 0%
14/01/27 00:35:53 INFOmapred.JobClient: map 20% reduce 0%
14/01/27 00:35:56 INFOmapred.JobClient: map 100% reduce 0%
14/01/27 00:36:14 INFOmapred.JobClient: map 100% reduce 84%
14/01/27 00:36:20 INFOmapred.JobClient: map 100% reduce 100%
14/01/27 00:36:25 INFO mapred.JobClient:Job complete: job_201401261418_0005
14/01/27 00:36:25 INFO mapred.JobClient:Counters: 29
14/01/27 00:36:25 INFOmapred.JobClient: Job Counters
14/01/27 00:36:25 INFOmapred.JobClient: Launched reducetasks=1
14/01/27 00:36:25 INFOmapred.JobClient: SLOTS_MILLIS_MAPS=20908
14/01/27 00:36:25 INFOmapred.JobClient: Total time spent byall reduces waiting after reserving slots (ms)=0
14/01/27 00:36:25 INFOmapred.JobClient: Total time spent byall maps waiting after reserving slots (ms)=0
14/01/27 00:36:25 INFOmapred.JobClient: Launched maptasks=1
14/01/27 00:36:25 INFOmapred.JobClient: Data-local maptasks=1
14/01/27 00:36:25 INFOmapred.JobClient: SLOTS_MILLIS_REDUCES=23135
14/01/27 00:36:25 INFOmapred.JobClient: File Output FormatCounters
14/01/27 00:36:25 INFO mapred.JobClient: Bytes Written=29314118
14/01/27 00:36:25 INFOmapred.JobClient: FileSystemCounters
14/01/27 00:36:25 INFOmapred.JobClient: FILE_BYTES_READ=29059398
14/01/27 00:36:25 INFOmapred.JobClient: HDFS_BYTES_READ=29314269
14/01/27 00:36:25 INFOmapred.JobClient: FILE_BYTES_WRITTEN=58163213
14/01/27 00:36:25 INFOmapred.JobClient: HDFS_BYTES_WRITTEN=29314118
14/01/27 00:36:25 INFOmapred.JobClient: File Input FormatCounters
14/01/27 00:36:25 INFOmapred.JobClient: Bytes Read=29314118
14/01/27 00:36:25 INFOmapred.JobClient: Map-Reduce Framework
14/01/27 00:36:25 INFOmapred.JobClient: Map outputmaterialized bytes=29059398
14/01/27 00:36:25 INFOmapred.JobClient: Map inputrecords=18846
14/01/27 00:36:25 INFO mapred.JobClient: Reduce shuffle bytes=0
14/01/27 00:36:25 INFOmapred.JobClient: SpilledRecords=37692
14/01/27 00:36:25 INFOmapred.JobClient: Map outputbytes=28984080
14/01/27 00:36:25 INFOmapred.JobClient: CPU time spent(ms)=10800
14/01/27 00:36:25 INFOmapred.JobClient: Total committedheap usage (bytes)=293572608
14/01/27 00:36:25 INFOmapred.JobClient: Combine inputrecords=0
14/01/27 00:36:25 INFOmapred.JobClient: SPLIT_RAW_BYTES=151
14/01/27 00:36:25 INFO mapred.JobClient: Reduce input records=18846
14/01/27 00:36:25 INFOmapred.JobClient: Reduce inputgroups=18846
14/01/27 00:36:25 INFOmapred.JobClient: Combine outputrecords=0
14/01/27 00:36:25 INFOmapred.JobClient: Physical memory(bytes) snapshot=282771456
14/01/27 00:36:25 INFOmapred.JobClient: Reduce outputrecords=18846
14/01/27 00:36:25 INFOmapred.JobClient: Virtual memory(bytes) snapshot=3491999744
14/01/27 00:36:25 INFOmapred.JobClient: Map outputrecords=18846
14/01/27 00:36:25 INFO common.HadoopUtil:Deleting /usr/grid/mahout-work-grid/20news-vectors/partial-vectors-0
14/01/27 00:36:25 INFOvectorizer.SparseVectorsFromSequenceFiles: Calculating IDF
14/01/27 00:36:30 INFOinput.FileInputFormat: Total input paths to process : 1
14/01/27 00:36:30 INFO mapred.JobClient:Running job: job_201401261418_0006
14/01/27 00:36:31 INFOmapred.JobClient: map 0% reduce 0%
14/01/27 00:37:05 INFOmapred.JobClient: map 14% reduce 0%
14/01/27 00:37:08 INFOmapred.JobClient: map 49% reduce 0%
14/01/27 00:37:11 INFOmapred.JobClient: map 83% reduce 0%
14/01/27 00:37:14 INFOmapred.JobClient: map 100% reduce 0%
14/01/27 00:37:32 INFOmapred.JobClient: map 100% reduce 100%
14/01/27 00:37:37 INFO mapred.JobClient:Job complete: job_201401261418_0006
14/01/27 00:37:37 INFO mapred.JobClient:Counters: 29
14/01/27 00:37:37 INFOmapred.JobClient: Job Counters
14/01/27 00:37:37 INFOmapred.JobClient: Launched reducetasks=1
14/01/27 00:37:37 INFOmapred.JobClient: SLOTS_MILLIS_MAPS=30299
14/01/27 00:37:37 INFOmapred.JobClient: Total time spent byall reduces waiting after reserving slots (ms)=0
14/01/27 00:37:37 INFOmapred.JobClient: Total time spent byall maps waiting after reserving slots (ms)=0
14/01/27 00:37:37 INFO mapred.JobClient: Launched map tasks=1
14/01/27 00:37:37 INFOmapred.JobClient: Data-local maptasks=1
14/01/27 00:37:37 INFOmapred.JobClient: SLOTS_MILLIS_REDUCES=16561
14/01/27 00:37:37 INFOmapred.JobClient: File Output FormatCounters
14/01/27 00:37:37 INFOmapred.JobClient: BytesWritten=1890073
14/01/27 00:37:37 INFOmapred.JobClient: FileSystemCounters
14/01/27 00:37:37 INFOmapred.JobClient: FILE_BYTES_READ=4880816
14/01/27 00:37:37 INFOmapred.JobClient: HDFS_BYTES_READ=29314270
14/01/27 00:37:37 INFOmapred.JobClient: FILE_BYTES_WRITTEN=6234855
14/01/27 00:37:37 INFOmapred.JobClient: HDFS_BYTES_WRITTEN=1890073
14/01/27 00:37:37 INFOmapred.JobClient: File Input FormatCounters
14/01/27 00:37:37 INFOmapred.JobClient: Bytes Read=29314118
14/01/27 00:37:37 INFOmapred.JobClient: Map-Reduce Framework
14/01/27 00:37:37 INFOmapred.JobClient: Map outputmaterialized bytes=1309902
14/01/27 00:37:37 INFOmapred.JobClient: Map inputrecords=18846
14/01/27 00:37:37 INFOmapred.JobClient: Reduce shufflebytes=1309902
14/01/27 00:37:37 INFOmapred.JobClient: SpilledRecords=442189
14/01/27 00:37:37 INFOmapred.JobClient: Map outputbytes=31005336
14/01/27 00:37:37 INFOmapred.JobClient: CPU time spent (ms)=12610
14/01/27 00:37:37 INFOmapred.JobClient: Total committedheap usage (bytes)=264384512
14/01/27 00:37:37 INFOmapred.JobClient: Combine inputrecords=2838839
14/01/27 00:37:37 INFOmapred.JobClient: SPLIT_RAW_BYTES=152
14/01/27 00:37:37 INFOmapred.JobClient: Reduce inputrecords=93564
14/01/27 00:37:37 INFOmapred.JobClient: Reduce inputgroups=93564
14/01/27 00:37:37 INFOmapred.JobClient: Combine outputrecords=348625
14/01/27 00:37:37 INFOmapred.JobClient: Physical memory(bytes) snapshot=249851904
14/01/27 00:37:37 INFOmapred.JobClient: Reduce outputrecords=93564
14/01/27 00:37:37 INFOmapred.JobClient: Virtual memory(bytes) snapshot=3491995648
14/01/27 00:37:37 INFOmapred.JobClient: Map output records=2583778
14/01/27 00:37:38 INFOvectorizer.SparseVectorsFromSequenceFiles: Pruning
14/01/27 00:37:40 INFOinput.FileInputFormat: Total input paths to process : 1
14/01/27 00:37:41 INFO mapred.JobClient:Running job: job_201401261418_0007
14/01/27 00:37:42 INFOmapred.JobClient: map 0% reduce 0%
14/01/27 00:38:12 INFOmapred.JobClient: map 38% reduce 0%
14/01/27 00:38:15 INFOmapred.JobClient: map 100% reduce 0%
14/01/27 00:38:33 INFOmapred.JobClient: map 100% reduce 66%
14/01/27 00:38:36 INFO mapred.JobClient: map 100% reduce 85%
14/01/27 00:38:45 INFOmapred.JobClient: map 100% reduce 100%
14/01/27 00:38:50 INFO mapred.JobClient:Job complete: job_201401261418_0007
14/01/27 00:38:50 INFO mapred.JobClient:Counters: 29
14/01/27 00:38:50 INFO mapred.JobClient: Job Counters
14/01/27 00:38:50 INFOmapred.JobClient: Launched reducetasks=1
14/01/27 00:38:50 INFOmapred.JobClient: SLOTS_MILLIS_MAPS=26595
14/01/27 00:38:50 INFOmapred.JobClient: Total time spent byall reduces waiting after reserving slots (ms)=0
14/01/27 00:38:50 INFOmapred.JobClient: Total time spent byall maps waiting after reserving slots (ms)=0
14/01/27 00:38:50 INFOmapred.JobClient: Launched maptasks=1
14/01/27 00:38:50 INFOmapred.JobClient: Data-local maptasks=1
14/01/27 00:38:50 INFOmapred.JobClient: SLOTS_MILLIS_REDUCES=24054
14/01/27 00:38:50 INFOmapred.JobClient: File Output FormatCounters
14/01/27 00:38:50 INFOmapred.JobClient: BytesWritten=28689283
14/01/27 00:38:50 INFO mapred.JobClient: FileSystemCounters
14/01/27 00:38:50 INFOmapred.JobClient: FILE_BYTES_READ=9597304
14/01/27 00:38:50 INFOmapred.JobClient: HDFS_BYTES_READ=29314270
14/01/27 00:38:50 INFOmapred.JobClient: FILE_BYTES_WRITTEN=15430363
14/01/27 00:38:50 INFOmapred.JobClient: HDFS_BYTES_WRITTEN=28689283
14/01/27 00:38:50 INFOmapred.JobClient: File Input FormatCounters
14/01/27 00:38:50 INFOmapred.JobClient: Bytes Read=29314118
14/01/27 00:38:50 INFOmapred.JobClient: Map-Reduce Framework
14/01/27 00:38:50 INFOmapred.JobClient: Map outputmaterialized bytes=7692467
14/01/27 00:38:50 INFOmapred.JobClient: Map inputrecords=18846
14/01/27 00:38:50 INFOmapred.JobClient: Reduce shufflebytes=7692467
14/01/27 00:38:50 INFO mapred.JobClient: Spilled Records=37692
14/01/27 00:38:50 INFOmapred.JobClient: Map outputbytes=28984080
14/01/27 00:38:50 INFOmapred.JobClient: CPU time spent(ms)=16230
14/01/27 00:38:50 INFOmapred.JobClient: Total committedheap usage (bytes)=306655232
14/01/27 00:38:50 INFOmapred.JobClient: Combine inputrecords=0
14/01/27 00:38:50 INFOmapred.JobClient: SPLIT_RAW_BYTES=152
14/01/27 00:38:50 INFOmapred.JobClient: Reduce inputrecords=18846
14/01/27 00:38:50 INFO mapred.JobClient: Reduce input groups=18846
14/01/27 00:38:50 INFOmapred.JobClient: Combine outputrecords=0
14/01/27 00:38:50 INFOmapred.JobClient: Physical memory(bytes) snapshot=299081728
14/01/27 00:38:50 INFOmapred.JobClient: Reduce output records=18846
14/01/27 00:38:50 INFOmapred.JobClient: Virtual memory(bytes) snapshot=3492966400
14/01/27 00:38:50 INFOmapred.JobClient: Map outputrecords=18846
14/01/27 00:38:54 INFOinput.FileInputFormat: Total input paths to process : 1
14/01/27 00:38:55 INFO mapred.JobClient:Running job: job_201401261418_0008
14/01/27 00:38:56 INFOmapred.JobClient: map 0% reduce 0%
14/01/27 00:39:24 INFOmapred.JobClient: map 86% reduce 0%
14/01/27 00:39:27 INFOmapred.JobClient: map 100% reduce 0%
14/01/27 00:39:45 INFOmapred.JobClient: map 100% reduce 86%
14/01/27 00:39:51 INFOmapred.JobClient: map 100% reduce 100%
14/01/27 00:39:56 INFO mapred.JobClient:Job complete: job_201401261418_0008
14/01/27 00:39:56 INFO mapred.JobClient:Counters: 29
14/01/27 00:39:56 INFOmapred.JobClient: Job Counters
14/01/27 00:39:56 INFOmapred.JobClient: Launched reducetasks=1
14/01/27 00:39:56 INFOmapred.JobClient: SLOTS_MILLIS_MAPS=20854
14/01/27 00:39:56 INFOmapred.JobClient: Total time spent byall reduces waiting after reserving slots (ms)=0
14/01/27 00:39:56 INFOmapred.JobClient: Total time spent byall maps waiting after reserving slots (ms)=0
14/01/27 00:39:56 INFOmapred.JobClient: Launched maptasks=1
14/01/27 00:39:56 INFO mapred.JobClient: Data-local map tasks=1
14/01/27 00:39:56 INFOmapred.JobClient: SLOTS_MILLIS_REDUCES=23724
14/01/27 00:39:56 INFOmapred.JobClient: File Output FormatCounters
14/01/27 00:39:56 INFOmapred.JobClient: BytesWritten=28689283
14/01/27 00:39:56 INFOmapred.JobClient: FileSystemCounters
14/01/27 00:39:56 INFOmapred.JobClient: FILE_BYTES_READ=28437750
14/01/27 00:39:56 INFOmapred.JobClient: HDFS_BYTES_READ=28689445
14/01/27 00:39:56 INFOmapred.JobClient: FILE_BYTES_WRITTEN=56919517
14/01/27 00:39:56 INFOmapred.JobClient: HDFS_BYTES_WRITTEN=28689283
14/01/27 00:39:56 INFOmapred.JobClient: File Input FormatCounters
14/01/27 00:39:56 INFOmapred.JobClient: Bytes Read=28689283
14/01/27 00:39:56 INFO mapred.JobClient: Map-Reduce Framework
14/01/27 00:39:56 INFOmapred.JobClient: Map outputmaterialized bytes=28437750
14/01/27 00:39:56 INFOmapred.JobClient: Map inputrecords=18846
14/01/27 00:39:56 INFOmapred.JobClient: Reduce shufflebytes=0
14/01/27 00:39:56 INFOmapred.JobClient: SpilledRecords=37692
14/01/27 00:39:56 INFOmapred.JobClient: Map outputbytes=28362505
14/01/27 00:39:56 INFOmapred.JobClient: CPU time spent(ms)=10160
14/01/27 00:39:56 INFOmapred.JobClient: Total committedheap usage (bytes)=292847616
14/01/27 00:39:56 INFOmapred.JobClient: Combine inputrecords=0
14/01/27 00:39:56 INFOmapred.JobClient: SPLIT_RAW_BYTES=162
14/01/27 00:39:56 INFOmapred.JobClient: Reduce inputrecords=18846
14/01/27 00:39:56 INFOmapred.JobClient: Reduce inputgroups=18846
14/01/27 00:39:56 INFOmapred.JobClient: Combine outputrecords=0
14/01/27 00:39:56 INFOmapred.JobClient: Physical memory(bytes) snapshot=282537984
14/01/27 00:39:56 INFOmapred.JobClient: Reduce output records=18846
14/01/27 00:39:56 INFOmapred.JobClient: Virtual memory(bytes) snapshot=3492261888
14/01/27 00:39:56 INFOmapred.JobClient: Map outputrecords=18846
14/01/27 00:39:56 INFO common.HadoopUtil:Deleting /usr/grid/mahout-work-grid/20news-vectors/tf-vectors-partial
14/01/27 00:39:56 INFO common.HadoopUtil:Deleting /usr/grid/mahout-work-grid/20news-vectors/tf-vectors-toprune
14/01/27 00:40:00 INFOinput.FileInputFormat: Total input paths to process : 1
14/01/27 00:40:00 INFO mapred.JobClient:Running job: job_201401261418_0009
14/01/27 00:40:01 INFOmapred.JobClient: map 0% reduce 0%
14/01/27 00:40:33 INFOmapred.JobClient: map 59% reduce 0%
14/01/27 00:40:36 INFOmapred.JobClient: map 100% reduce 0%
14/01/27 00:40:54 INFOmapred.JobClient: map 100% reduce 83%
14/01/27 00:41:00 INFOmapred.JobClient: map 100% reduce 100%
14/01/27 00:41:05 INFO mapred.JobClient:Job complete: job_201401261418_0009
14/01/27 00:41:05 INFO mapred.JobClient:Counters: 29
14/01/27 00:41:05 INFOmapred.JobClient: Job Counters
14/01/27 00:41:05 INFOmapred.JobClient: Launched reducetasks=1
14/01/27 00:41:05 INFOmapred.JobClient: SLOTS_MILLIS_MAPS=20673
14/01/27 00:41:05 INFOmapred.JobClient: Total time spent byall reduces waiting after reserving slots (ms)=0
14/01/27 00:41:05 INFOmapred.JobClient: Total time spent byall maps waiting after reserving slots (ms)=0
14/01/27 00:41:05 INFOmapred.JobClient: Launched maptasks=1
14/01/27 00:41:05 INFOmapred.JobClient: Data-local maptasks=1
14/01/27 00:41:05 INFOmapred.JobClient: SLOTS_MILLIS_REDUCES=22591
14/01/27 00:41:05 INFOmapred.JobClient: File Output FormatCounters
14/01/27 00:41:05 INFOmapred.JobClient: BytesWritten=28689283
14/01/27 00:41:05 INFOmapred.JobClient: FileSystemCounters
14/01/27 00:41:05 INFOmapred.JobClient: FILE_BYTES_READ=30342579
14/01/27 00:41:05 INFOmapred.JobClient: HDFS_BYTES_READ=28689427
14/01/27 00:41:05 INFOmapred.JobClient: FILE_BYTES_WRITTEN=56921481
14/01/27 00:41:05 INFOmapred.JobClient: HDFS_BYTES_WRITTEN=28689283
14/01/27 00:41:05 INFOmapred.JobClient: File Input FormatCounters
14/01/27 00:41:05 INFOmapred.JobClient: Bytes Read=28689283
14/01/27 00:41:05 INFOmapred.JobClient: Map-Reduce Framework
14/01/27 00:41:05 INFOmapred.JobClient: Map outputmaterialized bytes=28437750
14/01/27 00:41:05 INFOmapred.JobClient: Map inputrecords=18846
14/01/27 00:41:05 INFOmapred.JobClient: Reduce shufflebytes=28437750
14/01/27 00:41:05 INFOmapred.JobClient: SpilledRecords=37692
14/01/27 00:41:05 INFOmapred.JobClient: Map outputbytes=28362505
14/01/27 00:41:05 INFOmapred.JobClient: CPU time spent(ms)=10990
14/01/27 00:41:05 INFOmapred.JobClient: Total committed heapusage (bytes)=305639424
14/01/27 00:41:05 INFOmapred.JobClient: Combine inputrecords=0
14/01/27 00:41:05 INFOmapred.JobClient: SPLIT_RAW_BYTES=144
14/01/27 00:41:05 INFOmapred.JobClient: Reduce inputrecords=18846
14/01/27 00:41:05 INFOmapred.JobClient: Reduce inputgroups=18846
14/01/27 00:41:05 INFOmapred.JobClient: Combine outputrecords=0
14/01/27 00:41:05 INFOmapred.JobClient: Physical memory(bytes) snapshot=296620032
14/01/27 00:41:05 INFOmapred.JobClient: Reduce outputrecords=18846
14/01/27 00:41:05 INFOmapred.JobClient: Virtual memory(bytes) snapshot=3492110336
14/01/27 00:41:05 INFOmapred.JobClient: Map outputrecords=18846
14/01/27 00:41:08 INFOinput.FileInputFormat: Total input paths to process : 1
14/01/27 00:41:08 INFO mapred.JobClient:Running job: job_201401261418_0010
14/01/27 00:41:09 INFOmapred.JobClient: map 0% reduce 0%
14/01/27 00:41:47 INFOmapred.JobClient: map 16% reduce 0%
14/01/27 00:41:53 INFOmapred.JobClient: map 21% reduce 0%
14/01/27 00:41:56 INFOmapred.JobClient: map 28% reduce 0%
14/01/27 00:41:58 INFOmapred.JobClient: map 100% reduce 0%
14/01/27 00:42:19 INFOmapred.JobClient: map 100% reduce 85%
14/01/27 00:42:25 INFOmapred.JobClient: map 100% reduce 100%
14/01/27 00:42:31 INFO mapred.JobClient:Job complete: job_201401261418_0010
14/01/27 00:42:31 INFO mapred.JobClient:Counters: 29
14/01/27 00:42:31 INFOmapred.JobClient: Job Counters
14/01/27 00:42:31 INFOmapred.JobClient: Launched reducetasks=1
14/01/27 00:42:31 INFOmapred.JobClient: SLOTS_MILLIS_MAPS=41166
14/01/27 00:42:31 INFOmapred.JobClient: Total time spent byall reduces waiting after reserving slots (ms)=0
14/01/27 00:42:31 INFOmapred.JobClient: Total time spent byall maps waiting after reserving slots (ms)=0
14/01/27 00:42:31 INFOmapred.JobClient: Launched maptasks=1
14/01/27 00:42:31 INFOmapred.JobClient: Data-local maptasks=1
14/01/27 00:42:31 INFOmapred.JobClient: SLOTS_MILLIS_REDUCES=22239
14/01/27 00:42:31 INFOmapred.JobClient: File Output FormatCounters
14/01/27 00:42:31 INFOmapred.JobClient: BytesWritten=28689283
14/01/27 00:42:31 INFOmapred.JobClient: FileSystemCounters
14/01/27 00:42:31 INFOmapred.JobClient: FILE_BYTES_READ=28437750
14/01/27 00:42:31 INFOmapred.JobClient: HDFS_BYTES_READ=28689434
14/01/27 00:42:31 INFOmapred.JobClient: FILE_BYTES_WRITTEN=56919905
14/01/27 00:42:31 INFOmapred.JobClient: HDFS_BYTES_WRITTEN=28689283
14/01/27 00:42:31 INFO mapred.JobClient: File Input Format Counters
14/01/27 00:42:31 INFOmapred.JobClient: Bytes Read=28689283
14/01/27 00:42:31 INFOmapred.JobClient: Map-Reduce Framework
14/01/27 00:42:31 INFOmapred.JobClient: Map outputmaterialized bytes=28437750
14/01/27 00:42:31 INFOmapred.JobClient: Map inputrecords=18846
14/01/27 00:42:31 INFOmapred.JobClient: Reduce shufflebytes=28437750
14/01/27 00:42:31 INFOmapred.JobClient: SpilledRecords=37692
14/01/27 00:42:31 INFOmapred.JobClient: Map output bytes=28362505
14/01/27 00:42:31 INFOmapred.JobClient: CPU time spent(ms)=11410
14/01/27 00:42:31 INFOmapred.JobClient: Total committedheap usage (bytes)=292954112
14/01/27 00:42:31 INFOmapred.JobClient: Combine inputrecords=0
14/01/27 00:42:31 INFOmapred.JobClient: SPLIT_RAW_BYTES=151
14/01/27 00:42:31 INFOmapred.JobClient: Reduce inputrecords=18846
14/01/27 00:42:31 INFOmapred.JobClient: Reduce inputgroups=18846
14/01/27 00:42:31 INFOmapred.JobClient: Combine output records=0
14/01/27 00:42:31 INFOmapred.JobClient: Physical memory(bytes) snapshot=281976832
14/01/27 00:42:31 INFOmapred.JobClient: Reduce outputrecords=18846
14/01/27 00:42:31 INFOmapred.JobClient: Virtual memory(bytes) snapshot=3491901440
14/01/27 00:42:31 INFOmapred.JobClient: Map outputrecords=18846
14/01/27 00:42:31 INFO common.HadoopUtil:Deleting /usr/grid/mahout-work-grid/20news-vectors/partial-vectors-0
14/01/27 00:42:31 INFO driver.MahoutDriver:Program took 715875 ms (Minutes: 11.93125)
+ echo 'Creating training and holdout setwith a random 80-20 split of the generated vector dataset'
Creating training and holdout set with arandom 80-20 split of the generated vector dataset
+ ./bin/mahout split -i/usr/grid/mahout-work-grid/20news-vectors/tfidf-vectors --trainingOutput/usr/grid/mahout-work-grid/20news-train-vectors --testOutput/usr/grid/mahout-work-grid/20news-test-vectors --randomSelectionPct 40--overwrite --sequenceFiles -xm sequential
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIRto classpath.
Warning: $HADOOP_HOME is deprecated.
Running on hadoop, using/usr/grid/hadoop/bin/hadoop and HADOOP_CONF_DIR=/usr/grid/hadoop/conf
MAHOUT-JOB:/usr/grid/mahout-distribution-0.8/mahout-examples-0.8-job.jar
Warning: $HADOOP_HOME is deprecated.
14/01/27 00:42:43 WARN driver.MahoutDriver:No split.props found on classpath, will use command-line arguments only
14/01/27 00:42:44 INFO common.AbstractJob:Command line arguments: {--endPhase=[2147483647], --input=[/usr/grid/mahout-work-grid/20news-vectors/tfidf-vectors],--method=[sequential], --overwrite=null, --randomSelectionPct=[40],--sequenceFiles=null, --startPhase=[0], --tempDir=[temp],--testOutput=[/usr/grid/mahout-work-grid/20news-test-vectors],--trainingOutput=[/usr/grid/mahout-work-grid/20news-train-vectors]}
14/01/27 00:42:48 INFO utils.SplitInput:part-r-00000 has 162419 lines
14/01/27 00:42:48 INFO utils.SplitInput:part-r-00000 test split size is 64968 based on random selection percentage 40
14/01/27 00:42:48 INFO util.NativeCodeLoader:Loaded the native-hadoop library
14/01/27 00:42:48 INFO zlib.ZlibFactory:Successfully loaded & initialized native-zlib library
14/01/27 00:42:48 INFO compress.CodecPool:Got brand-new compressor
14/01/27 00:42:48 INFO compress.CodecPool:Got brand-new compressor
14/01/27 00:43:01 INFO utils.SplitInput:file: part-r-00000, input: 162419 train: 11205, test: 7641 starting at 0
14/01/27 00:43:01 INFO driver.MahoutDriver:Program took 17995 ms (Minutes: 0.29991666666666666)
+ echo 'Training Naive Bayes model'
Training Naive Bayes model
+ ./bin/mahout trainnb -i/usr/grid/mahout-work-grid/20news-train-vectors -el -o/usr/grid/mahout-work-grid/model -li /usr/grid/mahout-work-grid/labelindex -ow
MAHOUT_LOCAL is not set; addingHADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.
Running on hadoop, using/usr/grid/hadoop/bin/hadoop and HADOOP_CONF_DIR=/usr/grid/hadoop/conf
MAHOUT-JOB:/usr/grid/mahout-distribution-0.8/mahout-examples-0.8-job.jar
Warning: $HADOOP_HOME is deprecated.
14/01/27 00:43:13 WARN driver.MahoutDriver:No trainnb.props found on classpath, will use command-line arguments only
14/01/27 00:43:14 INFO common.AbstractJob:Command line arguments: {--alphaI=[1.0], --endPhase=[2147483647],--extractLabels=null, --input=[/usr/grid/mahout-work-grid/20news-train-vectors],--labelIndex=[/usr/grid/mahout-work-grid/labelindex],--output=[/usr/grid/mahout-work-grid/model], --overwrite=null,--startPhase=[0], --tempDir=[temp]}
14/01/27 00:43:14 INFO common.HadoopUtil:Deleting temp
14/01/27 00:43:14 INFOutil.NativeCodeLoader: Loaded the native-hadoop library
14/01/27 00:43:14 INFO zlib.ZlibFactory:Successfully loaded & initialized native-zlib library
14/01/27 00:43:14 INFO compress.CodecPool:Got brand-new decompressor
14/01/27 00:43:21 INFOinput.FileInputFormat: Total input paths to process : 1
14/01/27 00:43:22 INFO mapred.JobClient:Running job: job_201401261418_0011
14/01/27 00:43:23 INFOmapred.JobClient: map 0% reduce 0%
14/01/27 00:43:56 INFOmapred.JobClient: map 39% reduce 0%
14/01/27 00:43:59 INFOmapred.JobClient: map 100% reduce 0%
14/01/27 00:44:20 INFOmapred.JobClient: map 100% reduce 100%
14/01/27 00:44:25 INFO mapred.JobClient:Job complete: job_201401261418_0011
14/01/27 00:44:25 INFO mapred.JobClient: Counters:29
14/01/27 00:44:25 INFOmapred.JobClient: Job Counters
14/01/27 00:44:25 INFOmapred.JobClient: Launched reducetasks=1
14/01/27 00:44:25 INFOmapred.JobClient: SLOTS_MILLIS_MAPS=27091
14/01/27 00:44:25 INFOmapred.JobClient: Total time spent byall reduces waiting after reserving slots (ms)=0
14/01/27 00:44:25 INFOmapred.JobClient: Total time spent byall maps waiting after reserving slots (ms)=0
14/01/27 00:44:25 INFOmapred.JobClient: Launched maptasks=1
14/01/27 00:44:25 INFOmapred.JobClient: Data-local maptasks=1
14/01/27 00:44:25 INFOmapred.JobClient: SLOTS_MILLIS_REDUCES=18002
14/01/27 00:44:25 INFOmapred.JobClient: File Output FormatCounters
14/01/27 00:44:25 INFOmapred.JobClient: Bytes Written=2727579
14/01/27 00:44:25 INFOmapred.JobClient: FileSystemCounters
14/01/27 00:44:25 INFOmapred.JobClient: FILE_BYTES_READ=1409402
14/01/27 00:44:25 INFOmapred.JobClient: HDFS_BYTES_READ=12578676
14/01/27 00:44:25 INFOmapred.JobClient: FILE_BYTES_WRITTEN=2862959
14/01/27 00:44:25 INFOmapred.JobClient: HDFS_BYTES_WRITTEN=2727579
14/01/27 00:44:25 INFOmapred.JobClient: File Input FormatCounters
14/01/27 00:44:25 INFOmapred.JobClient: Bytes Read=12578537
14/01/27 00:44:25 INFO mapred.JobClient: Map-Reduce Framework
14/01/27 00:44:25 INFOmapred.JobClient: Map outputmaterialized bytes=1408720
14/01/27 00:44:25 INFOmapred.JobClient: Map inputrecords=11205
14/01/27 00:44:25 INFOmapred.JobClient: Reduce shuffle bytes=1408720
14/01/27 00:44:25 INFOmapred.JobClient: Spilled Records=40
14/01/27 00:44:25 INFOmapred.JobClient: Map outputbytes=16592779
14/01/27 00:44:25 INFOmapred.JobClient: CPU time spent(ms)=9950
14/01/27 00:44:25 INFO mapred.JobClient: Total committed heap usage(bytes)=264400896
14/01/27 00:44:25 INFOmapred.JobClient: Combine inputrecords=11205
14/01/27 00:44:25 INFOmapred.JobClient: SPLIT_RAW_BYTES=139
14/01/27 00:44:25 INFOmapred.JobClient: Reduce inputrecords=20
14/01/27 00:44:25 INFOmapred.JobClient: Reduce inputgroups=20
14/01/27 00:44:25 INFOmapred.JobClient: Combine outputrecords=20
14/01/27 00:44:25 INFOmapred.JobClient: Physical memory(bytes) snapshot=258179072
14/01/27 00:44:25 INFO mapred.JobClient: Reduce output records=20
14/01/27 00:44:25 INFOmapred.JobClient: Virtual memory(bytes) snapshot=3493203968
14/01/27 00:44:25 INFOmapred.JobClient: Map outputrecords=11205
14/01/27 00:44:28 INFOinput.FileInputFormat: Total input paths to process : 1
14/01/27 00:44:28 INFO mapred.JobClient:Running job: job_201401261418_0012
14/01/27 00:44:29 INFOmapred.JobClient: map 0% reduce 0%
14/01/27 00:44:56 INFOmapred.JobClient: map 100% reduce 0%
14/01/27 00:45:11 INFO mapred.JobClient: map 100% reduce 100%
14/01/27 00:45:16 INFO mapred.JobClient:Job complete: job_201401261418_0012
14/01/27 00:45:16 INFO mapred.JobClient:Counters: 29
14/01/27 00:45:16 INFOmapred.JobClient: Job Counters
14/01/27 00:45:16 INFOmapred.JobClient: Launched reduce tasks=1
14/01/27 00:45:16 INFOmapred.JobClient: SLOTS_MILLIS_MAPS=14633
14/01/27 00:45:16 INFOmapred.JobClient: Total time spent byall reduces waiting after reserving slots (ms)=0
14/01/27 00:45:16 INFOmapred.JobClient: Total time spent byall maps waiting after reserving slots (ms)=0
14/01/27 00:45:16 INFOmapred.JobClient: Launched maptasks=1
14/01/27 00:45:16 INFOmapred.JobClient: Data-local maptasks=1
14/01/27 00:45:16 INFOmapred.JobClient: SLOTS_MILLIS_REDUCES=13687
14/01/27 00:45:16 INFOmapred.JobClient: File Output FormatCounters
14/01/27 00:45:16 INFOmapred.JobClient: BytesWritten=902324
14/01/27 00:45:16 INFOmapred.JobClient: FileSystemCounters
14/01/27 00:45:16 INFO mapred.JobClient: FILE_BYTES_READ=365663
14/01/27 00:45:16 INFOmapred.JobClient: HDFS_BYTES_READ=2727705
14/01/27 00:45:16 INFOmapred.JobClient: FILE_BYTES_WRITTEN=776951
14/01/27 00:45:16 INFOmapred.JobClient: HDFS_BYTES_WRITTEN=902324
14/01/27 00:45:16 INFOmapred.JobClient: File Input FormatCounters
14/01/27 00:45:16 INFOmapred.JobClient: Bytes Read=2727579
14/01/27 00:45:16 INFOmapred.JobClient: Map-Reduce Framework
14/01/27 00:45:16 INFOmapred.JobClient: Map outputmaterialized bytes=365655
14/01/27 00:45:16 INFOmapred.JobClient: Map inputrecords=20
14/01/27 00:45:16 INFOmapred.JobClient: Reduce shufflebytes=365655
14/01/27 00:45:16 INFOmapred.JobClient: Spilled Records=4
14/01/27 00:45:16 INFOmapred.JobClient: Map outputbytes=902198
14/01/27 00:45:16 INFOmapred.JobClient: CPU time spent(ms)=3740
14/01/27 00:45:16 INFOmapred.JobClient: Total committedheap usage (bytes)=272609280
14/01/27 00:45:16 INFOmapred.JobClient: Combine inputrecords=2
14/01/27 00:45:16 INFOmapred.JobClient: SPLIT_RAW_BYTES=126
14/01/27 00:45:16 INFOmapred.JobClient: Reduce inputrecords=2
14/01/27 00:45:16 INFOmapred.JobClient: Reduce inputgroups=2
14/01/27 00:45:16 INFOmapred.JobClient: Combine output records=2
14/01/27 00:45:16 INFOmapred.JobClient: Physical memory(bytes) snapshot=233705472
14/01/27 00:45:16 INFOmapred.JobClient: Reduce outputrecords=2
14/01/27 00:45:16 INFOmapred.JobClient: Virtual memory(bytes) snapshot=3492974592
14/01/27 00:45:16 INFOmapred.JobClient: Map outputrecords=2
14/01/27 00:45:17 INFO driver.MahoutDriver:Program took 124098 ms (Minutes: 2.0683)
+ echo 'Self testing on training set'
Self testing on training set
+ ./bin/mahout testnb -i /usr/grid/mahout-work-grid/20news-train-vectors-m /usr/grid/mahout-work-grid/model -l /usr/grid/mahout-work-grid/labelindex-ow -o /usr/grid/mahout-work-grid/20news-testing
MAHOUT_LOCAL is not set; addingHADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.
Running on hadoop, using/usr/grid/hadoop/bin/hadoop and HADOOP_CONF_DIR=/usr/grid/hadoop/conf
MAHOUT-JOB:/usr/grid/mahout-distribution-0.8/mahout-examples-0.8-job.jar
Warning: $HADOOP_HOME is deprecated.
14/01/27 00:45:28 WARN driver.MahoutDriver:No testnb.props found on classpath, will use command-line arguments only
14/01/27 00:45:28 INFO common.AbstractJob:Command line arguments: {--endPhase=[2147483647],--input=[/usr/grid/mahout-work-grid/20news-train-vectors],--labelIndex=[/usr/grid/mahout-work-grid/labelindex],--model=[/usr/grid/mahout-work-grid/model],--output=[/usr/grid/mahout-work-grid/20news-testing], --overwrite=null,--startPhase=[0], --tempDir=[temp]}
14/01/27 00:45:31 INFOinput.FileInputFormat: Total input paths to process : 1
14/01/27 00:45:31 INFO mapred.JobClient:Running job: job_201401261418_0013
14/01/27 00:45:33 INFOmapred.JobClient: map 0% reduce 0%
14/01/27 00:45:59 INFOmapred.JobClient: map 36% reduce 0%
14/01/27 00:46:02 INFOmapred.JobClient: map 61% reduce 0%
14/01/27 00:46:05 INFOmapred.JobClient: map 84% reduce 0%
14/01/27 00:46:11 INFOmapred.JobClient: map 100% reduce 0%
14/01/27 00:46:16 INFO mapred.JobClient:Job complete: job_201401261418_0013
14/01/27 00:46:16 INFO mapred.JobClient:Counters: 20
14/01/27 00:46:16 INFOmapred.JobClient: Job Counters
14/01/27 00:46:16 INFOmapred.JobClient: SLOTS_MILLIS_MAPS=29672
14/01/27 00:46:16 INFOmapred.JobClient: Total time spent byall reduces waiting after reserving slots (ms)=0
14/01/27 00:46:16 INFO mapred.JobClient: Total time spent by all maps waiting afterreserving slots (ms)=0
14/01/27 00:46:16 INFOmapred.JobClient: Launched maptasks=1
14/01/27 00:46:16 INFOmapred.JobClient: Data-local maptasks=1
14/01/27 00:46:16 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
14/01/27 00:46:16 INFOmapred.JobClient: File Output FormatCounters
14/01/27 00:46:16 INFOmapred.JobClient: BytesWritten=2110958
14/01/27 00:46:16 INFOmapred.JobClient: FileSystemCounters
14/01/27 00:46:16 INFO mapred.JobClient: FILE_BYTES_READ=3657578
14/01/27 00:46:16 INFOmapred.JobClient: HDFS_BYTES_READ=12578676
14/01/27 00:46:16 INFOmapred.JobClient: FILE_BYTES_WRITTEN=22400
14/01/27 00:46:16 INFOmapred.JobClient: HDFS_BYTES_WRITTEN=2110958
14/01/27 00:46:16 INFOmapred.JobClient: File Input FormatCounters
14/01/27 00:46:16 INFOmapred.JobClient: Bytes Read=12578537
14/01/27 00:46:16 INFOmapred.JobClient: Map-Reduce Framework
14/01/27 00:46:16 INFOmapred.JobClient: Map inputrecords=11205
14/01/27 00:46:16 INFOmapred.JobClient: Physical memory(bytes) snapshot=58740736
14/01/27 00:46:16 INFOmapred.JobClient: Spilled Records=0
14/01/27 00:46:16 INFOmapred.JobClient: CPU time spent(ms)=13350
14/01/27 00:46:16 INFOmapred.JobClient: Total committedheap usage (bytes)=26341376
14/01/27 00:46:16 INFOmapred.JobClient: Virtual memory(bytes) snapshot=1744363520
14/01/27 00:46:16 INFOmapred.JobClient: Map outputrecords=11205
14/01/27 00:46:16 INFO mapred.JobClient: SPLIT_RAW_BYTES=139
14/01/27 00:46:17 INFOtest.TestNaiveBayesDriver: Standard NB Results:
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 11117 99.2146%
Incorrectly Classified Instances : 88 0.7854%
Total Classified Instances : 11205
=======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j k l m n o p q r s t <--Classifiedas
476 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 476 a = alt.atheism
0 565 0 2 1 2 1 0 0 0 0 1 0 0 0 0 0 0 0 0 | 572 b = comp.graphics
0 6 530 21 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 | 560 c =comp.os.ms-windows.misc
0 0 0 587 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 | 589 d = comp.sys.ibm.pc.hardware
0 0 1 1 562 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 | 565 e =comp.sys.mac.hardware
0 1 0 1 0 580 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 582 f = comp.windows.x
0 0 0 1 0 0 565 1 0 0 0 0 2 0 0 0 0 0 0 0 | 569 g = misc.forsale
0 0 0 0 0 0 2 591 0 0 0 0 1 0 0 0 0 0 0 0 | 594 h = rec.autos
0 0 0 0 0 0 1 1 579 0 0 0 0 0 0 0 0 0 0 0 | 581 i = rec.motorcycles
0 0 0 0 0 0 0 0 0 616 1 0 0 0 0 0 0 0 0 0 | 617 j = rec.sport.baseball
0 0 0 0 0 0 0 0 1 0 591 0 0 0 0 0 0 0 0 0 | 592 k = rec.sport.hockey
0 0 0 0 0 0 0 0 0 0 0 591 0 0 0 0 0 0 0 0 | 591 l = sci.crypt
0 0 0 5 1 0 2 0 0 0 0 0 596 0 0 0 0 0 0 0 | 604 m = sci.electronics
1 1 0 0 0 0 0 0 0 0 0 0 1 589 1 0 0 0 0 0 | 593 n = sci.med
0 0 0 0 0 0 0 0 0 0 0 0 0 0 579 0 0 0 0 0 | 579 o = sci.space
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 601 1 0 0 0 | 602 p =soc.religion.christian
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 584 0 0 0 | 586 q =talk.politics.mideast
0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 542 0 0 | 544 r = talk.politics.guns
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 3 348 2 | 361 s = talk.religion.misc
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 2 0 445 | 448 t = talk.politics.misc
=======================================================
Statistics
-------------------------------------------------------
Kappa 0.9858
Accuracy 99.2146%
Reliability 94.437%
Reliability (standard deviation) 0.2168
14/01/27 00:46:17 INFO driver.MahoutDriver:Program took 49046 ms (Minutes: 0.8174333333333333)
+ echo 'Testing on holdout set'
Testing on holdout set
+ ./bin/mahout testnb -i/usr/grid/mahout-work-grid/20news-test-vectors -m /usr/grid/mahout-work-grid/model-l /usr/grid/mahout-work-grid/labelindex -ow -o/usr/grid/mahout-work-grid/20news-testing
MAHOUT_LOCAL is not set; addingHADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.
Running on hadoop, using /usr/grid/hadoop/bin/hadoopand HADOOP_CONF_DIR=/usr/grid/hadoop/conf
MAHOUT-JOB:/usr/grid/mahout-distribution-0.8/mahout-examples-0.8-job.jar
Warning: $HADOOP_HOME is deprecated.
14/01/27 00:46:31 WARN driver.MahoutDriver:No testnb.props found on classpath, will use command-line arguments only
14/01/27 00:46:32 INFO common.AbstractJob:Command line arguments: {--endPhase=[2147483647],--input=[/usr/grid/mahout-work-grid/20news-test-vectors],--labelIndex=[/usr/grid/mahout-work-grid/labelindex], --model=[/usr/grid/mahout-work-grid/model],--output=[/usr/grid/mahout-work-grid/20news-testing], --overwrite=null,--startPhase=[0], --tempDir=[temp]}
14/01/27 00:46:32 INFO common.HadoopUtil:Deleting /usr/grid/mahout-work-grid/20news-testing
14/01/27 00:46:37 INFO input.FileInputFormat:Total input paths to process : 1
14/01/27 00:46:37 INFO mapred.JobClient:Running job: job_201401261418_0014
14/01/27 00:46:38 INFOmapred.JobClient: map 0% reduce 0%
14/01/27 00:47:08 INFOmapred.JobClient: map 46% reduce 0%
14/01/27 00:47:11 INFOmapred.JobClient: map 82% reduce 0%
14/01/27 00:47:17 INFOmapred.JobClient: map 100% reduce 0%
14/01/27 00:47:22 INFO mapred.JobClient:Job complete: job_201401261418_0014
14/01/27 00:47:22 INFO mapred.JobClient:Counters: 20
14/01/27 00:47:22 INFOmapred.JobClient: Job Counters
14/01/27 00:47:22 INFOmapred.JobClient: SLOTS_MILLIS_MAPS=26885
14/01/27 00:47:22 INFOmapred.JobClient: Total time spent byall reduces waiting after reserving slots (ms)=0
14/01/27 00:47:22 INFO mapred.JobClient: Total time spent by all maps waiting afterreserving slots (ms)=0
14/01/27 00:47:22 INFOmapred.JobClient: Launched maptasks=1
14/01/27 00:47:22 INFOmapred.JobClient: Data-local maptasks=1
14/01/27 00:47:22 INFOmapred.JobClient: SLOTS_MILLIS_REDUCES=0
14/01/27 00:47:22 INFOmapred.JobClient: File Output FormatCounters
14/01/27 00:47:22 INFOmapred.JobClient: BytesWritten=1439470
14/01/27 00:47:22 INFOmapred.JobClient: FileSystemCounters
14/01/27 00:47:22 INFO mapred.JobClient: FILE_BYTES_READ=3657578
14/01/27 00:47:22 INFOmapred.JobClient: HDFS_BYTES_READ=8630847
14/01/27 00:47:22 INFOmapred.JobClient: FILE_BYTES_WRITTEN=22398
14/01/27 00:47:22 INFOmapred.JobClient: HDFS_BYTES_WRITTEN=1439470
14/01/27 00:47:22 INFOmapred.JobClient: File Input FormatCounters
14/01/27 00:47:22 INFOmapred.JobClient: Bytes Read=8630709
14/01/27 00:47:22 INFOmapred.JobClient: Map-Reduce Framework
14/01/27 00:47:22 INFOmapred.JobClient: Map input records=7641
14/01/27 00:47:22 INFOmapred.JobClient: Physical memory(bytes) snapshot=57962496
14/01/27 00:47:22 INFOmapred.JobClient: Spilled Records=0
14/01/27 00:47:22 INFOmapred.JobClient: CPU time spent(ms)=9810
14/01/27 00:47:22 INFO mapred.JobClient: Total committed heap usage(bytes)=24989696
14/01/27 00:47:22 INFOmapred.JobClient: Virtual memory(bytes) snapshot=1744236544
14/01/27 00:47:22 INFOmapred.JobClient: Map outputrecords=7641
14/01/27 00:47:22 INFOmapred.JobClient: SPLIT_RAW_BYTES=138
14/01/27 00:47:23 INFOtest.TestNaiveBayesDriver: Standard NB Results:
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 6921 90.5771%
Incorrectly Classified Instances : 720 9.4229%
Total Classified Instances : 7641
=======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j k l m n o p q r s t <--Classifiedas
292 0 0 1 0 0 0 0 0 1 1 1 0 1 1 7 0 0 17 1 | 323 a = alt.atheism
0 335 7 17 6 18 7 1 0 0 0 2 5 1 2 0 0 0 0 0 | 401 b = comp.graphics
1 25 240 91 23 28 3 1 0 0 0 0 9 0 1 0 0 0 1 2 | 425 c =comp.os.ms-windows.misc
1 5 1 352 16 3 7 1 0 0 1 0 6 0 0 0 0 0 0 0 | 393 d = comp.sys.ibm.pc.hardware
0 2 0 9 372 1 5 0 0 0 0 1 7 1 0 0 0 0 0 0 | 398 e =comp.sys.mac.hardware
0 25 2 7 2 365 1 0 0 0 0 0 2 0 2 0 0 0 0 0 | 406 f = comp.windows.x
0 1 1 20 6 0 352 7 1 1 2 0 11 1 1 1 0 1 0 0 | 406 g = misc.forsale
0 1 0 1 3 1 6 368 7 0 0 0 5 0 1 0 0 2 0 1 | 396 h = rec.autos
0 1 0 2 0 2 6 6 394 0 1 0 0 1 0 0 0 1 0 1 | 415 i = rec.motorcycles
0 0 0 0 3 0 1 1 0 365 5 0 1 1 0 0 0 0 0 0 | 377 j = rec.sport.baseball
0 0 1 0 0 0 0 1 1 2 395 0 1 1 0 0 0 0 0 5 | 407 k = rec.sport.hockey
0 3 1 0 1 3 1 0 0 0 0 385 0 2 0 0 0 3 0 1 | 400 l = sci.crypt
0 2 0 13 3 2 3 2 0 0 0 0 350 1 3 0 0 1 0 0 | 380 m = sci.electronics
1 2 1 2 3 0 1 3 2 1 0 0 3 369 5 0 0 2 0 2 | 397 n = sci.med
1 3 0 0 2 0 0 2 0 1 0 1 2 4 389 0 2 0 1 0 | 408 o = sci.space
3 0 0 1 0 0 0 0 0 1 1 0 0 1 0 383 0 2 3 0 | 395 p =soc.religion.christian
0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 2 347 0 1 1 | 354 q =talk.politics.mideast
0 0 0 0 0 0 1 0 1 1 0 2 0 0 1 0 0 355 0 5 | 366 r = talk.politics.guns
27 1 0 0 0 1 0 0 0 0 0 0 0 0 1 9 3 6 216 3 | 267 s = talk.religion.misc
1 0 0 0 0 0 0 0 0 0 1 1 0 0 5 1 5 14 2 297 | 327 t = talk.politics.misc
=======================================================
Statistics
-------------------------------------------------------
Kappa 0.8776
Accuracy 90.5771%
Reliability 86.2922%
Reliability (standard deviation) 0.2174
14/01/27 00:47:23 INFO driver.MahoutDriver:Program took 51646 ms (Minutes: 0.8607666666666667)
[grid@h1 mahout-distribution-0.8]$