大数据可视化工具---GraphBuilder demo

时间:2023-02-12 03:48:02

Intel近日开源了GraphBuilder测试版本的源码。

GraphBuilder由英特尔研究院(Intel Labs)开发,是首个针对大数据的可扩展的开源Java库,可以将大数据集构建成图形——能够反映数据之间关系的网络状结构图,帮助行业和学术界的科学家或数据分析师快速分析大型数据集。

GraphBuilder使用MapReduce并行编程模型进行扩展,其主要组件及与Hadoop MapReduce的关系如下图所示。

大数据可视化工具---GraphBuilder demo

GraphBuilder的源码基于Apache 2许可协议,可以通过官网来获得源码。

1.从官网下载GraphBuilder的源码

https://01.org/graphbuilder/

wget https://01.org/graphbuilder/sites/default/files/downloads/graphbuilder-1.0.tar_1.gz

2.解压安装GraphBuilder的源码

tar zvxf graphbuilder-1.0.tar_1.gz

cd graphbuilder

mvn package

.............................................

[INFO] Reading assembly descriptor: hadoop-job.xml
[INFO] Building jar: /usr/grid/graphbuilder/target/graphbuilder-1.0.0-SNAPSHOT-hadoop-job.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3:49.505s
[INFO] Finished at: Thu Dec 05 22:19:04 CST 2013
[INFO] Final Memory: 17M/66M
[INFO] ------------------------------------------------------------------------
[grid@localhost graphbuilder]$ 


通过编译信息/usr/grid/graphbuilder/target/graphbuilder-1.0.0-SNAPSHOT-hadoop-job.jar生成了。

3. 下载wiki的样例文件并且解压:

[grid@localhost graphbuilder]$ wget http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles1.xml-p000000010p000010000.bz2
--2013-12-05 22:23:31--  http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles1.xml-p000000010p000010000.bz2
正在解析主机 dumps.wikimedia.org... 208.80.152.185
正在连接 dumps.wikimedia.org|208.80.152.185|:80... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:43533584 (42M) [application/x-bzip]
正在保存至: “enwiki-latest-pages-articles1.xml-p000000010p000010000.bz2”
Length: 43533584 (42M) [application/x-bzip]
Saving to:latest/enwiki-latest-pages-articles1.xml-p000000010p000010000.bz2


100%[================================================================>] 43,533,584  24.0K/s   in 26m 35s


2013-12-05 05:17:24 (26.7 KB/s) -latest/enwiki-latest-pages-articles1.xml-p000000010p000010000.bz2?saved [43533584/43533584]


You have new mail in /var/spool/mail/root

[root@hadoop graphbuilder]# bzip2 -d enwiki-latest-pages-articles1.xml-p000000010p000010000.bz2
[root@hadoop graphbuilder]# ll
total 149880
drwxrwxr-x. 3 1001 1001      4096 Jun 28 20:30 demoapps
drwxrwxr-x. 6 1001 1001      4096 Jun 28 20:30 doc
-rw-r--r--. 1 root root 153424297 Dec  5 05:41 enwiki-latest-pages-articles1.xml-p000000010p000010000


4.启动hadoop

[grid@localhost graphbuilder]$ start-all.sh
Warning: $HADOOP_HOME is deprecated.


starting namenode, logging to /usr/grid/hadoop/libexec/../logs/hadoop-grid-namenode-h3.out
localhost: starting datanode, logging to /usr/grid/hadoop/libexec/../logs/hadoop-grid-datanode-localhost.localdomain.out
localhost: starting secondarynamenode, logging to /usr/grid/hadoop/libexec/../logs/hadoop-grid-secondarynamenode-localhost.localdomain.out
starting jobtracker, logging to /usr/grid/hadoop/libexec/../logs/hadoop-grid-jobtracker-h3.out
localhost: starting tasktracker, logging to /usr/grid/hadoop/libexec/../logs/hadoop-grid-tasktracker-localhost.localdomain.out
[grid@localhost graphbuilder]$ 
[root@hadoop sbin]# jps
4055 DataNode
4448 NodeManager
4358 ResourceManager
3968 NameNode
4741 Jps
4190 SecondaryNameNode


[root@hadoop bin]# hadoop fs -ls /
Found 4 items
drwxr-xr-x   - root supergroup          0 2013-12-01 21:41 /home
drwxr-xr-x   - root supergroup          0 2013-12-01 21:38 /test
drwxr-xr-x   - root supergroup          0 2013-12-01 21:51 /tmp
drwxr-xr-x   - root supergroup          0 2013-12-01 21:52 /tmp-output
[root@hadoop bin]# hadoop fs -mkdir /user/
[root@hadoop bin]# hadoop fs -mkdir /user/wiki-input

[root@hadoop ~]# hadoop dfs -copyFromLocal enwiki-latest-pages-articles1.xml-p000000010p000010000 /user/wiki-input
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
[grid@localhost ~]$ hadoop jar /usr/grid/graphbuilder/target/graphbuilder-1.0.0-SNAPSHOT-hadoop-job.jar com.intel.hadoop.graphbuilder.demoapps.wikipedia.linkgraph.LinkGraphEnd2End 3 /user/wiki-input /user/en-wiki-articles-output 2
Warning: $HADOOP_HOME is deprecated.


13/12/05 22:52:04 INFO docwordgraph.CreateWordCountGraph: ========== Creating Graph ================
13/12/05 22:52:07 INFO mapreduce.CreateGraphMR: =========== Job: Create initial graph from raw data ===========
13/12/05 22:52:07 INFO mapreduce.CreateGraphMR: input: /user/wiki-input
13/12/05 22:52:07 INFO mapreduce.CreateGraphMR: Output = /user/en-wiki-articles-output/graph_raw
13/12/05 22:52:07 INFO mapreduce.CreateGraphMR: Inputformat = com.intel.hadoop.graphbuilder.demoapps.wikipedia.WikiPageInputFormat
13/12/05 22:52:07 INFO mapreduce.CreateGraphMR: GraphTokenizer = com.intel.hadoop.graphbuilder.demoapps.wikipedia.linkgraph.LinkGraphTokenizer
13/12/05 22:52:07 INFO mapreduce.CreateGraphMR: ==================== Start ====================================
13/12/05 22:52:07 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/12/05 22:52:08 INFO mapred.FileInputFormat: Total input paths to process : 1
13/12/05 22:52:09 INFO mapred.JobClient: Running job: job_201312052240_0001
13/12/05 22:52:10 INFO mapred.JobClient:  map 0% reduce 0%
13/12/05 22:53:04 INFO mapred.JobClient:  map 1% reduce 0%
13/12/05 22:53:07 INFO mapred.JobClient:  map 3% reduce 0%
13/12/05 22:53:10 INFO mapred.JobClient:  map 6% reduce 0%
13/12/05 22:53:13 INFO mapred.JobClient:  map 8% reduce 0%
13/12/05 22:53:16 INFO mapred.JobClient:  map 11% reduce 0%
13/12/05 22:53:19 INFO mapred.JobClient:  map 13% reduce 0%
13/12/05 22:53:22 INFO mapred.JobClient:  map 17% reduce 0%
13/12/05 22:53:25 INFO mapred.JobClient:  map 21% reduce 0%
13/12/05 22:53:28 INFO mapred.JobClient:  map 23% reduce 0%
13/12/05 22:53:31 INFO mapred.JobClient:  map 26% reduce 0%
13/12/05 22:53:33 INFO mapred.JobClient:  map 27% reduce 0%
13/12/05 22:53:36 INFO mapred.JobClient:  map 29% reduce 0%
13/12/05 22:53:39 INFO mapred.JobClient:  map 33% reduce 0%
13/12/05 22:53:42 INFO mapred.JobClient:  map 36% reduce 0%
13/12/05 22:53:46 INFO mapred.JobClient:  map 39% reduce 0%
13/12/05 22:53:49 INFO mapred.JobClient:  map 43% reduce 0%
13/12/05 22:53:52 INFO mapred.JobClient:  map 46% reduce 0%
13/12/05 22:53:55 INFO mapred.JobClient:  map 49% reduce 0%
13/12/05 22:53:58 INFO mapred.JobClient:  map 50% reduce 0%
13/12/05 22:54:01 INFO mapred.JobClient:  map 51% reduce 0%
13/12/05 22:54:04 INFO mapred.JobClient:  map 55% reduce 0%
13/12/05 22:54:07 INFO mapred.JobClient:  map 58% reduce 0%
13/12/05 22:54:10 INFO mapred.JobClient:  map 62% reduce 0%
13/12/05 22:54:13 INFO mapred.JobClient:  map 65% reduce 0%
13/12/05 22:54:16 INFO mapred.JobClient:  map 66% reduce 0%
13/12/05 22:54:31 INFO mapred.JobClient:  map 83% reduce 0%
13/12/05 22:54:34 INFO mapred.JobClient:  map 85% reduce 0%
13/12/05 22:54:37 INFO mapred.JobClient:  map 95% reduce 22%
13/12/05 22:54:40 INFO mapred.JobClient:  map 100% reduce 22%
13/12/05 22:54:46 INFO mapred.JobClient:  map 100% reduce 33%
13/12/05 22:54:52 INFO mapred.JobClient:  map 100% reduce 70%
13/12/05 22:54:55 INFO mapred.JobClient:  map 100% reduce 72%
13/12/05 22:54:58 INFO mapred.JobClient:  map 100% reduce 75%
13/12/05 22:55:01 INFO mapred.JobClient:  map 100% reduce 80%
13/12/05 22:55:04 INFO mapred.JobClient:  map 100% reduce 84%
13/12/05 22:55:07 INFO mapred.JobClient:  map 100% reduce 88%
13/12/05 22:55:11 INFO mapred.JobClient:  map 100% reduce 92%
13/12/05 22:55:14 INFO mapred.JobClient:  map 100% reduce 97%
13/12/05 22:55:19 INFO mapred.JobClient:  map 100% reduce 100%
13/12/05 22:55:24 INFO mapred.JobClient: Job complete: job_201312052240_0001
13/12/05 22:55:24 INFO mapred.JobClient: Counters: 32
13/12/05 22:55:24 INFO mapred.JobClient:   Job Counters 
13/12/05 22:55:24 INFO mapred.JobClient:     Launched reduce tasks=1
13/12/05 22:55:24 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=268128
13/12/05 22:55:24 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/12/05 22:55:24 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/12/05 22:55:24 INFO mapred.JobClient:     Launched map tasks=3
13/12/05 22:55:24 INFO mapred.JobClient:     Data-local map tasks=3
13/12/05 22:55:24 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=59264
13/12/05 22:55:24 INFO mapred.JobClient:   File Input Format Counters 
13/12/05 22:55:24 INFO mapred.JobClient:     Bytes Read=153510857
13/12/05 22:55:24 INFO mapred.JobClient:   com.intel.hadoop.graphbuilder.preprocess.mapreduce.CreateGraphReducer$CREATE_GRAPH_COUNTER
13/12/05 22:55:24 INFO mapred.JobClient:     NUM_EDGES=662471
13/12/05 22:55:24 INFO mapred.JobClient:     NUM_VERTICES=360189
13/12/05 22:55:24 INFO mapred.JobClient:   File Output Format Counters 
13/12/05 22:55:24 INFO mapred.JobClient:     Bytes Written=28502484
13/12/05 22:55:24 INFO mapred.JobClient:   FileSystemCounters
13/12/05 22:55:24 INFO mapred.JobClient:     FILE_BYTES_READ=94383675
13/12/05 22:55:24 INFO mapred.JobClient:     HDFS_BYTES_READ=153511323
13/12/05 22:55:24 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=144767981
13/12/05 22:55:24 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=28502484
13/12/05 22:55:24 INFO mapred.JobClient:   Map-Reduce Framework
13/12/05 22:55:24 INFO mapred.JobClient:     Map output materialized bytes=50292609
13/12/05 22:55:24 INFO mapred.JobClient:     Map input records=6299
13/12/05 22:55:24 INFO mapred.JobClient:     Reduce shuffle bytes=50292609
13/12/05 22:55:24 INFO mapred.JobClient:     Spilled Records=4670039
13/12/05 22:55:24 INFO mapred.JobClient:     Map output bytes=47049007
13/12/05 22:55:24 INFO mapred.JobClient:     Total committed heap usage (bytes)=645189632
13/12/05 22:55:24 INFO mapred.JobClient:     CPU time spent (ms)=106060
13/12/05 22:55:24 INFO mapred.JobClient:     Map input bytes=153510857
13/12/05 22:55:24 INFO mapred.JobClient:     SPLIT_RAW_BYTES=453
13/12/05 22:55:24 INFO mapred.JobClient:     Combine input records=0
13/12/05 22:55:24 INFO mapred.JobClient:     Reduce input records=1621723
13/12/05 22:55:24 INFO mapred.JobClient:     Reduce input groups=1017821
13/12/05 22:55:24 INFO mapred.JobClient:     Combine output records=0
13/12/05 22:55:24 INFO mapred.JobClient:     Physical memory (bytes) snapshot=725356544
13/12/05 22:55:24 INFO mapred.JobClient:     Reduce output records=1022660
13/12/05 22:55:24 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1517850624
13/12/05 22:55:24 INFO mapred.JobClient:     Map output records=1621723
13/12/05 22:55:24 INFO mapreduce.CreateGraphMR: =================== Done ====================================


13/12/05 22:55:24 INFO docwordgraph.CreateWordCountGraph: ========== Done creating graph ================
13/12/05 22:55:24 INFO linkgraph.LinkGraphEnd2End: Create graph finished in : 200 seconds
13/12/05 22:55:24 INFO linkgraph.NormalizeGraphIds: ========== Normalizing Graph ============
13/12/05 22:55:24 INFO mapreduce.HashIdMR: ====== Job: Create integer Id maps for vertices ==========
13/12/05 22:55:24 INFO mapreduce.HashIdMR: Input = /user/en-wiki-articles-output/graph_raw/vdata
13/12/05 22:55:24 INFO mapreduce.HashIdMR: Output = /user/en-wiki-articles-output/graph_norm
13/12/05 22:55:24 INFO mapreduce.HashIdMR: ==========================================================
13/12/05 22:55:24 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/12/05 22:55:26 INFO mapred.FileInputFormat: Total input paths to process : 1
13/12/05 22:55:27 INFO mapred.JobClient: Running job: job_201312052240_0002
13/12/05 22:55:28 INFO mapred.JobClient:  map 0% reduce 0%
13/12/05 22:55:46 INFO mapred.JobClient:  map 100% reduce 0%
13/12/05 22:56:01 INFO mapred.JobClient:  map 100% reduce 76%
13/12/05 22:56:04 INFO mapred.JobClient:  map 100% reduce 84%
13/12/05 22:56:07 INFO mapred.JobClient:  map 100% reduce 91%
13/12/05 22:56:13 INFO mapred.JobClient:  map 100% reduce 100%
13/12/05 22:56:18 INFO mapred.JobClient: Job complete: job_201312052240_0002
13/12/05 22:56:18 INFO mapred.JobClient: Counters: 29
13/12/05 22:56:18 INFO mapred.JobClient:   Job Counters 
13/12/05 22:56:18 INFO mapred.JobClient:     Launched reduce tasks=1
13/12/05 22:56:18 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=16669
13/12/05 22:56:18 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/12/05 22:56:18 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/12/05 22:56:18 INFO mapred.JobClient:     Launched map tasks=1
13/12/05 22:56:18 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=25850
13/12/05 22:56:18 INFO mapred.JobClient:   File Input Format Counters 
13/12/05 22:56:18 INFO mapred.JobClient:     Bytes Read=7029652
13/12/05 22:56:18 INFO mapred.JobClient:   File Output Format Counters 
13/12/05 22:56:18 INFO mapred.JobClient:     Bytes Written=12210267
13/12/05 22:56:18 INFO mapred.JobClient:   FileSystemCounters
13/12/05 22:56:18 INFO mapred.JobClient:     FILE_BYTES_READ=18381646
13/12/05 22:56:18 INFO mapred.JobClient:     HDFS_BYTES_READ=7029788
13/12/05 22:56:18 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=27617033
13/12/05 22:56:18 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=12210267
13/12/05 22:56:18 INFO mapred.JobClient:   Map-Reduce Framework
13/12/05 22:56:18 INFO mapred.JobClient:     Map output materialized bytes=9190820
13/12/05 22:56:18 INFO mapred.JobClient:     Map input records=360189
13/12/05 22:56:18 INFO mapred.JobClient:     Reduce shuffle bytes=9190820
13/12/05 22:56:18 INFO mapred.JobClient:     Spilled Records=1080567
13/12/05 22:56:18 INFO mapred.JobClient:     Map output bytes=8470422
13/12/05 22:56:18 INFO mapred.JobClient:     Total committed heap usage (bytes)=177016832
13/12/05 22:56:18 INFO mapred.JobClient:     CPU time spent (ms)=17650
13/12/05 22:56:18 INFO mapred.JobClient:     Map input bytes=7029652
13/12/05 22:56:18 INFO mapred.JobClient:     SPLIT_RAW_BYTES=136
13/12/05 22:56:18 INFO mapred.JobClient:     Combine input records=0
13/12/05 22:56:18 INFO mapred.JobClient:     Reduce input records=360189
13/12/05 22:56:18 INFO mapred.JobClient:     Reduce input groups=360189
13/12/05 22:56:18 INFO mapred.JobClient:     Combine output records=0
13/12/05 22:56:18 INFO mapred.JobClient:     Physical memory (bytes) snapshot=241008640
13/12/05 22:56:18 INFO mapred.JobClient:     Reduce output records=720378
13/12/05 22:56:18 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=760651776
13/12/05 22:56:18 INFO mapred.JobClient:     Map output records=360189
13/12/05 22:56:18 INFO mapreduce.HashIdMR: =======================Done =====================


13/12/05 22:56:18 INFO mapreduce.SortDictMR: ========== Job: Partition the map of rawid -> id ===========
13/12/05 22:56:18 INFO mapreduce.SortDictMR: Input = /user/en-wiki-articles-output/graph_norm/vidmap
13/12/05 22:56:18 INFO mapreduce.SortDictMR: Output = /user/en-wiki-articles-output/graph_norm/temp/partitionedvidmap
13/12/05 22:56:18 INFO mapreduce.SortDictMR: ======================================================
13/12/05 22:56:18 INFO mapreduce.SortDictMR: Partition on rawId.
13/12/05 22:56:18 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/12/05 22:56:19 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
13/12/05 22:56:19 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 6bb1b7f8b9044d8df9b4d2b6641db7658aab3cf8]
13/12/05 22:56:19 INFO mapred.FileInputFormat: Total input paths to process : 1
13/12/05 22:56:19 INFO mapred.JobClient: Running job: job_201312052240_0003
13/12/05 22:56:20 INFO mapred.JobClient:  map 0% reduce 0%
13/12/05 22:56:40 INFO mapred.JobClient:  map 100% reduce 0%
13/12/05 22:56:55 INFO mapred.JobClient:  map 100% reduce 100%
13/12/05 22:57:12 INFO mapred.JobClient: Job complete: job_201312052240_0003
13/12/05 22:57:12 INFO mapred.JobClient: Counters: 30
13/12/05 22:57:12 INFO mapred.JobClient:   Job Counters 
13/12/05 22:57:12 INFO mapred.JobClient:     Launched reduce tasks=1
13/12/05 22:57:12 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=26847
13/12/05 22:57:12 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/12/05 22:57:12 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/12/05 22:57:12 INFO mapred.JobClient:     Launched map tasks=2
13/12/05 22:57:12 INFO mapred.JobClient:     Data-local map tasks=2
13/12/05 22:57:12 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=26028
13/12/05 22:57:12 INFO mapred.JobClient:   File Input Format Counters 
13/12/05 22:57:12 INFO mapred.JobClient:     Bytes Read=9082303
13/12/05 22:57:12 INFO mapred.JobClient:   File Output Format Counters 
13/12/05 22:57:12 INFO mapred.JobClient:     Bytes Written=0
13/12/05 22:57:12 INFO mapred.JobClient:   FileSystemCounters
13/12/05 22:57:12 INFO mapred.JobClient:     FILE_BYTES_READ=11240848
13/12/05 22:57:12 INFO mapred.JobClient:     HDFS_BYTES_READ=9082579
13/12/05 22:57:12 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=22633705
13/12/05 22:57:12 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=9079676
13/12/05 22:57:12 INFO mapred.JobClient:   Map-Reduce Framework
13/12/05 22:57:12 INFO mapred.JobClient:     Map output materialized bytes=11240854
13/12/05 22:57:12 INFO mapred.JobClient:     Map input records=360189
13/12/05 22:57:12 INFO mapred.JobClient:     Reduce shuffle bytes=5639237
13/12/05 22:57:12 INFO mapred.JobClient:     Spilled Records=720378
13/12/05 22:57:12 INFO mapred.JobClient:     Map output bytes=10520448
13/12/05 22:57:12 INFO mapred.JobClient:     Total committed heap usage (bytes)=345362432
13/12/05 22:57:12 INFO mapred.JobClient:     CPU time spent (ms)=6730
13/12/05 22:57:12 INFO mapred.JobClient:     Map input bytes=9079676
13/12/05 22:57:12 INFO mapred.JobClient:     SPLIT_RAW_BYTES=276
13/12/05 22:57:12 INFO mapred.JobClient:     Combine input records=0
13/12/05 22:57:12 INFO mapred.JobClient:     Reduce input records=360189
13/12/05 22:57:12 INFO mapred.JobClient:     Reduce input groups=64
13/12/05 22:57:12 INFO mapred.JobClient:     Combine output records=0
13/12/05 22:57:12 INFO mapred.JobClient:     Physical memory (bytes) snapshot=440111104
13/12/05 22:57:12 INFO mapred.JobClient:     Reduce output records=0
13/12/05 22:57:12 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1182158848
13/12/05 22:57:12 INFO mapred.JobClient:     Map output records=360189
13/12/05 22:57:12 INFO mapreduce.SortDictMR: ======================= Done ==========================


13/12/05 22:57:12 INFO mapreduce.SortEdgeMR: ==== Job: Partition the input edges by hash(sourceid) =========
13/12/05 22:57:12 INFO mapreduce.SortEdgeMR: Input = /user/en-wiki-articles-output/graph_raw/edata
13/12/05 22:57:12 INFO mapreduce.SortEdgeMR: Output = /user/en-wiki-articles-output/graph_norm/temp/partitionededata
13/12/05 22:57:12 INFO mapreduce.SortEdgeMR: ===============================================================
13/12/05 22:57:12 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/12/05 22:57:13 INFO mapred.FileInputFormat: Total input paths to process : 1
13/12/05 22:57:13 INFO mapred.JobClient: Running job: job_201312052240_0004
13/12/05 22:57:14 INFO mapred.JobClient:  map 0% reduce 0%
13/12/05 22:57:35 INFO mapred.JobClient:  map 100% reduce 0%
13/12/05 22:57:56 INFO mapred.JobClient:  map 100% reduce 100%
13/12/05 22:58:01 INFO mapred.JobClient: Job complete: job_201312052240_0004
13/12/05 22:58:01 INFO mapred.JobClient: Counters: 30
13/12/05 22:58:01 INFO mapred.JobClient:   Job Counters 
13/12/05 22:58:01 INFO mapred.JobClient:     Launched reduce tasks=1
13/12/05 22:58:01 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=39329
13/12/05 22:58:01 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/12/05 22:58:01 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/12/05 22:58:01 INFO mapred.JobClient:     Launched map tasks=2
13/12/05 22:58:01 INFO mapred.JobClient:     Data-local map tasks=2
13/12/05 22:58:01 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=13465
13/12/05 22:58:01 INFO mapred.JobClient:   File Input Format Counters 
13/12/05 22:58:01 INFO mapred.JobClient:     Bytes Read=21476129
13/12/05 22:58:01 INFO mapred.JobClient:   File Output Format Counters 
13/12/05 22:58:01 INFO mapred.JobClient:     Bytes Written=21472832
13/12/05 22:58:01 INFO mapred.JobClient:   FileSystemCounters
13/12/05 22:58:01 INFO mapred.JobClient:     FILE_BYTES_READ=50895706
13/12/05 22:58:01 INFO mapred.JobClient:     HDFS_BYTES_READ=21476401
13/12/05 22:58:01 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=76410296
13/12/05 22:58:01 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=21472832
13/12/05 22:58:01 INFO mapred.JobClient:   Map-Reduce Framework
13/12/05 22:58:01 INFO mapred.JobClient:     Map output materialized bytes=25447850
13/12/05 22:58:01 INFO mapred.JobClient:     Map input records=662471
13/12/05 22:58:01 INFO mapred.JobClient:     Reduce shuffle bytes=12729709
13/12/05 22:58:01 INFO mapred.JobClient:     Spilled Records=1987413
13/12/05 22:58:01 INFO mapred.JobClient:     Map output bytes=24122803
13/12/05 22:58:01 INFO mapred.JobClient:     Total committed heap usage (bytes)=369000448
13/12/05 22:58:01 INFO mapred.JobClient:     CPU time spent (ms)=10870
13/12/05 22:58:01 INFO mapred.JobClient:     Map input bytes=21472832
13/12/05 22:58:01 INFO mapred.JobClient:     SPLIT_RAW_BYTES=272
13/12/05 22:58:01 INFO mapred.JobClient:     Combine input records=0
13/12/05 22:58:01 INFO mapred.JobClient:     Reduce input records=662471
13/12/05 22:58:01 INFO mapred.JobClient:     Reduce input groups=64
13/12/05 22:58:01 INFO mapred.JobClient:     Combine output records=0
13/12/05 22:58:01 INFO mapred.JobClient:     Physical memory (bytes) snapshot=453345280
13/12/05 22:58:01 INFO mapred.JobClient:     Reduce output records=662471
13/12/05 22:58:01 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1138835456
13/12/05 22:58:01 INFO mapred.JobClient:     Map output records=662471
13/12/05 22:58:01 INFO mapreduce.SortEdgeMR: =================== Done ====================================


13/12/05 22:58:01 INFO mapreduce.TransEdgeMR: ============= Job: Normalize Ids in Edges ====================
13/12/05 22:58:01 INFO mapreduce.TransEdgeMR: Input = /user/en-wiki-articles-output/graph_norm/temp/partitionededata
13/12/05 22:58:01 INFO mapreduce.TransEdgeMR: Output = /user/en-wiki-articles-output/graph_norm/edata
13/12/05 22:58:01 INFO mapreduce.TransEdgeMR: Dictionary = /user/en-wiki-articles-output/graph_norm/temp/partitionedvidmap
13/12/05 22:58:01 INFO mapreduce.TransEdgeMR: ===============================================================
13/12/05 22:58:01 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/12/05 22:58:02 INFO mapred.FileInputFormat: Total input paths to process : 1
13/12/05 22:58:02 INFO mapred.JobClient: Running job: job_201312052240_0005
13/12/05 22:58:03 INFO mapred.JobClient:  map 0% reduce 0%
13/12/05 22:58:23 INFO mapred.JobClient:  map 68% reduce 0%
13/12/05 22:58:26 INFO mapred.JobClient:  map 99% reduce 0%
13/12/05 22:58:29 INFO mapred.JobClient:  map 100% reduce 0%
13/12/05 22:58:44 INFO mapred.JobClient:  map 100% reduce 86%
13/12/05 22:58:50 INFO mapred.JobClient:  map 100% reduce 100%
13/12/05 22:58:55 INFO mapred.JobClient: Job complete: job_201312052240_0005
13/12/05 22:58:55 INFO mapred.JobClient: Counters: 30
13/12/05 22:58:55 INFO mapred.JobClient:   Job Counters 
13/12/05 22:58:55 INFO mapred.JobClient:     Launched reduce tasks=1
13/12/05 22:58:55 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=39357
13/12/05 22:58:55 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/12/05 22:58:55 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/12/05 22:58:55 INFO mapred.JobClient:     Launched map tasks=2
13/12/05 22:58:55 INFO mapred.JobClient:     Data-local map tasks=2
13/12/05 22:58:55 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=19826
13/12/05 22:58:55 INFO mapred.JobClient:   File Input Format Counters 
13/12/05 22:58:55 INFO mapred.JobClient:     Bytes Read=21476129
13/12/05 22:58:55 INFO mapred.JobClient:   File Output Format Counters 
13/12/05 22:58:55 INFO mapred.JobClient:     Bytes Written=8911036
13/12/05 22:58:55 INFO mapred.JobClient:   FileSystemCounters
13/12/05 22:58:55 INFO mapred.JobClient:     FILE_BYTES_READ=40407036
13/12/05 22:58:55 INFO mapred.JobClient:     HDFS_BYTES_READ=39777374
13/12/05 22:58:55 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=60678227
13/12/05 22:58:55 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=8911036
13/12/05 22:58:55 INFO mapred.JobClient:   Map-Reduce Framework
13/12/05 22:58:55 INFO mapred.JobClient:     Map output materialized bytes=20203515
13/12/05 22:58:55 INFO mapred.JobClient:     Map input records=662471
13/12/05 22:58:55 INFO mapred.JobClient:     Reduce shuffle bytes=10183173
13/12/05 22:58:55 INFO mapred.JobClient:     Spilled Records=1987413
13/12/05 22:58:55 INFO mapred.JobClient:     Map output bytes=18878543
13/12/05 22:58:55 INFO mapred.JobClient:     Total committed heap usage (bytes)=362643456
13/12/05 22:58:55 INFO mapred.JobClient:     CPU time spent (ms)=18980
13/12/05 22:58:55 INFO mapred.JobClient:     Map input bytes=21472832
13/12/05 22:58:55 INFO mapred.JobClient:     SPLIT_RAW_BYTES=306
13/12/05 22:58:55 INFO mapred.JobClient:     Combine input records=0
13/12/05 22:58:55 INFO mapred.JobClient:     Reduce input records=662471
13/12/05 22:58:55 INFO mapred.JobClient:     Reduce input groups=64
13/12/05 22:58:55 INFO mapred.JobClient:     Combine output records=0
13/12/05 22:58:55 INFO mapred.JobClient:     Physical memory (bytes) snapshot=462606336
13/12/05 22:58:55 INFO mapred.JobClient:     Reduce output records=662471
13/12/05 22:58:55 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1139138560
13/12/05 22:58:55 INFO mapred.JobClient:     Map output records=662471
13/12/05 22:58:55 INFO mapreduce.TransEdgeMR: ========================= Done ===============================
13/12/05 22:58:55 INFO linkgraph.NormalizeGraphIds: ========== Done normalizing graph ============
13/12/05 22:58:55 INFO linkgraph.LinkGraphEnd2End: Normalize graph finished in : 211 seconds
13/12/05 22:58:55 INFO linkgraph.PartitionGraph: ========== Partitioning Graph ============
13/12/05 22:58:58 INFO edge.EdgeIngressMR: ===== Job: Partition edges and create vertex records =========
13/12/05 22:58:58 INFO edge.EdgeIngressMR: input: /user/en-wiki-articles-output/graph_norm/vdata,/user/en-wiki-articles-output/graph_norm/edata
13/12/05 22:58:58 INFO edge.EdgeIngressMR: output: /user/en-wiki-articles-output/graph_partitioned/edges
13/12/05 22:58:58 INFO edge.EdgeIngressMR: numProc = 3
13/12/05 22:58:58 INFO edge.EdgeIngressMR: subpartPerPartition = 8
13/12/05 22:58:58 INFO edge.EdgeIngressMR: keyclass = generatedclass.MyIngressJobKey0
13/12/05 22:58:58 INFO edge.EdgeIngressMR: valclass = generatedclass.MyIngressJobVal0
13/12/05 22:58:58 INFO edge.EdgeIngressMR: ingress = constrainedrandom
13/12/05 22:58:58 INFO edge.EdgeIngressMR: gzip = false
13/12/05 22:58:58 INFO edge.EdgeIngressMR: ===============================================================
13/12/05 22:58:58 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/12/05 22:58:59 INFO mapred.FileInputFormat: Total input paths to process : 2
13/12/05 22:58:59 INFO mapred.JobClient: Running job: job_201312052240_0006
13/12/05 22:59:00 INFO mapred.JobClient:  map 0% reduce 0%
13/12/05 22:59:18 INFO mapred.JobClient:  map 34% reduce 0%
13/12/05 22:59:24 INFO mapred.JobClient:  map 37% reduce 0%
13/12/05 22:59:27 INFO mapred.JobClient:  map 38% reduce 0%
13/12/05 22:59:36 INFO mapred.JobClient:  map 41% reduce 0%
13/12/05 22:59:39 INFO mapred.JobClient:  map 43% reduce 0%
13/12/05 22:59:42 INFO mapred.JobClient:  map 47% reduce 0%
13/12/05 22:59:51 INFO mapred.JobClient:  map 49% reduce 0%
13/12/05 22:59:54 INFO mapred.JobClient:  map 53% reduce 0%
13/12/05 22:59:57 INFO mapred.JobClient:  map 54% reduce 0%
13/12/05 23:00:09 INFO mapred.JobClient:  map 55% reduce 0%
13/12/05 23:00:12 INFO mapred.JobClient:  map 57% reduce 0%
13/12/05 23:00:15 INFO mapred.JobClient:  map 64% reduce 0%
13/12/05 23:00:19 INFO mapred.JobClient:  map 72% reduce 11%
13/12/05 23:00:22 INFO mapred.JobClient:  map 76% reduce 11%
13/12/05 23:00:25 INFO mapred.JobClient:  map 77% reduce 11%
13/12/05 23:00:31 INFO mapred.JobClient:  map 78% reduce 11%
13/12/05 23:00:34 INFO mapred.JobClient:  map 84% reduce 11%
13/12/05 23:00:37 INFO mapred.JobClient:  map 91% reduce 11%
13/12/05 23:00:40 INFO mapred.JobClient:  map 95% reduce 11%
13/12/05 23:00:43 INFO mapred.JobClient:  map 97% reduce 11%
13/12/05 23:00:55 INFO mapred.JobClient:  map 99% reduce 11%
13/12/05 23:00:58 INFO mapred.JobClient:  map 100% reduce 11%
13/12/05 23:02:29 INFO mapred.JobClient:  map 100% reduce 22%
13/12/05 23:02:35 INFO mapred.JobClient:  map 100% reduce 76%
13/12/05 23:02:38 INFO mapred.JobClient:  map 100% reduce 80%
13/12/05 23:02:41 INFO mapred.JobClient:  map 100% reduce 84%
13/12/05 23:02:44 INFO mapred.JobClient:  map 100% reduce 88%
13/12/05 23:02:47 INFO mapred.JobClient:  map 100% reduce 92%
13/12/05 23:02:50 INFO mapred.JobClient:  map 100% reduce 97%
13/12/05 23:02:56 INFO mapred.JobClient:  map 100% reduce 100%
13/12/05 23:03:02 INFO mapred.JobClient: Job complete: job_201312052240_0006
13/12/05 23:03:02 INFO mapred.JobClient: Counters: 30
13/12/05 23:03:02 INFO mapred.JobClient:   Job Counters 
13/12/05 23:03:02 INFO mapred.JobClient:     Launched reduce tasks=1
13/12/05 23:03:02 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=379790
13/12/05 23:03:02 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/12/05 23:03:02 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/12/05 23:03:02 INFO mapred.JobClient:     Launched map tasks=3
13/12/05 23:03:02 INFO mapred.JobClient:     Data-local map tasks=3
13/12/05 23:03:02 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=182772
13/12/05 23:03:02 INFO mapred.JobClient:   File Input Format Counters 
13/12/05 23:03:02 INFO mapred.JobClient:     Bytes Read=12041935
13/12/05 23:03:02 INFO mapred.JobClient:   File Output Format Counters 
13/12/05 23:03:02 INFO mapred.JobClient:     Bytes Written=27761437
13/12/05 23:03:02 INFO mapred.JobClient:   FileSystemCounters
13/12/05 23:03:02 INFO mapred.JobClient:     FILE_BYTES_READ=64995481
13/12/05 23:03:02 INFO mapred.JobClient:     HDFS_BYTES_READ=12042346
13/12/05 23:03:02 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=96983574
13/12/05 23:03:02 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=27761437
13/12/05 23:03:02 INFO mapred.JobClient:   Map-Reduce Framework
13/12/05 23:03:02 INFO mapred.JobClient:     Map output materialized bytes=31893712
13/12/05 23:03:02 INFO mapred.JobClient:     Map input records=1022660
13/12/05 23:03:02 INFO mapred.JobClient:     Reduce shuffle bytes=31893712
13/12/05 23:03:02 INFO mapred.JobClient:     Spilled Records=2210272
13/12/05 23:03:02 INFO mapred.JobClient:     Map output bytes=67430035
13/12/05 23:03:02 INFO mapred.JobClient:     Total committed heap usage (bytes)=633384960
13/12/05 23:03:02 INFO mapred.JobClient:     CPU time spent (ms)=110260
13/12/05 23:03:02 INFO mapred.JobClient:     Map input bytes=12041627
13/12/05 23:03:02 INFO mapred.JobClient:     SPLIT_RAW_BYTES=411
13/12/05 23:03:02 INFO mapred.JobClient:     Combine input records=2747113
13/12/05 23:03:02 INFO mapred.JobClient:     Reduce input records=725283
13/12/05 23:03:02 INFO mapred.JobClient:     Reduce input groups=360192
13/12/05 23:03:02 INFO mapred.JobClient:     Combine output records=1124797
13/12/05 23:03:02 INFO mapred.JobClient:     Physical memory (bytes) snapshot=777744384
13/12/05 23:03:02 INFO mapred.JobClient:     Reduce output records=360189
13/12/05 23:03:02 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1516822528
13/12/05 23:03:02 INFO mapred.JobClient:     Map output records=2347602
13/12/05 23:03:02 INFO edge.EdgeIngressMR: ================== Done ====================================


13/12/05 23:03:02 INFO vrecord.VrecordIngressMR: ====== Job: Distributed Vertex Records to partitions =========
13/12/05 23:03:02 INFO vrecord.VrecordIngressMR: input: /user/en-wiki-articles-output/graph_partitioned/edges/vrecord
13/12/05 23:03:02 INFO vrecord.VrecordIngressMR: output: /user/en-wiki-articles-output/graph_partitioned/vrecords
13/12/05 23:03:02 INFO vrecord.VrecordIngressMR: numProc = 3
13/12/05 23:03:02 INFO vrecord.VrecordIngressMR: gzip = false
13/12/05 23:03:02 INFO vrecord.VrecordIngressMR: ==============================================================
13/12/05 23:03:02 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/12/05 23:03:03 INFO mapred.FileInputFormat: Total input paths to process : 1
13/12/05 23:03:03 INFO mapred.JobClient: Running job: job_201312052240_0007
13/12/05 23:03:04 INFO mapred.JobClient:  map 0% reduce 0%
13/12/05 23:03:27 INFO mapred.JobClient:  map 100% reduce 0%
13/12/05 23:03:45 INFO mapred.JobClient:  map 100% reduce 66%
13/12/05 23:03:48 INFO mapred.JobClient:  map 100% reduce 77%
13/12/05 23:03:54 INFO mapred.JobClient:  map 100% reduce 88%
13/12/05 23:04:03 INFO mapred.JobClient:  map 100% reduce 100%
13/12/05 23:04:08 INFO mapred.JobClient: Job complete: job_201312052240_0007
13/12/05 23:04:08 INFO mapred.JobClient: Counters: 32
13/12/05 23:04:08 INFO mapred.JobClient:   Job Counters 
13/12/05 23:04:08 INFO mapred.JobClient:     Launched reduce tasks=1
13/12/05 23:04:08 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=33341
13/12/05 23:04:08 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/12/05 23:04:08 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/12/05 23:04:08 INFO mapred.JobClient:     Launched map tasks=2
13/12/05 23:04:08 INFO mapred.JobClient:     Data-local map tasks=2
13/12/05 23:04:08 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=32004
13/12/05 23:04:08 INFO mapred.JobClient:   File Input Format Counters 
13/12/05 23:04:08 INFO mapred.JobClient:     Bytes Read=27762064
13/12/05 23:04:08 INFO mapred.JobClient:   File Output Format Counters 
13/12/05 23:04:08 INFO mapred.JobClient:     Bytes Written=35581968
13/12/05 23:04:08 INFO mapred.JobClient:   FileSystemCounters
13/12/05 23:04:08 INFO mapred.JobClient:     FILE_BYTES_READ=38336865
13/12/05 23:04:08 INFO mapred.JobClient:     HDFS_BYTES_READ=27762368
13/12/05 23:04:08 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=76739897
13/12/05 23:04:08 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=35581968
13/12/05 23:04:08 INFO mapred.JobClient:   com.intel.hadoop.graphbuilder.partition.mapreduce.vrecord.VrecordIngressReducer$COUNTER
13/12/05 23:04:08 INFO mapred.JobClient:     OWN_VERTICES=360189
13/12/05 23:04:08 INFO mapred.JobClient:     VERTICES=459172
13/12/05 23:04:08 INFO mapred.JobClient:   Map-Reduce Framework
13/12/05 23:04:08 INFO mapred.JobClient:     Map output materialized bytes=38336871
13/12/05 23:04:08 INFO mapred.JobClient:     Map input records=360189
13/12/05 23:04:08 INFO mapred.JobClient:     Reduce shuffle bytes=19050294
13/12/05 23:04:08 INFO mapred.JobClient:     Spilled Records=918344
13/12/05 23:04:08 INFO mapred.JobClient:     Map output bytes=37418515
13/12/05 23:04:08 INFO mapred.JobClient:     Total committed heap usage (bytes)=390500352
13/12/05 23:04:08 INFO mapred.JobClient:     CPU time spent (ms)=22210
13/12/05 23:04:08 INFO mapred.JobClient:     Map input bytes=27761437
13/12/05 23:04:08 INFO mapred.JobClient:     SPLIT_RAW_BYTES=304
13/12/05 23:04:08 INFO mapred.JobClient:     Combine input records=0
13/12/05 23:04:08 INFO mapred.JobClient:     Reduce input records=459172
13/12/05 23:04:08 INFO mapred.JobClient:     Reduce input groups=3
13/12/05 23:04:08 INFO mapred.JobClient:     Combine output records=0
13/12/05 23:04:08 INFO mapred.JobClient:     Physical memory (bytes) snapshot=471478272
13/12/05 23:04:08 INFO mapred.JobClient:     Reduce output records=459175
13/12/05 23:04:08 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1139294208
13/12/05 23:04:08 INFO mapred.JobClient:     Map output records=459172
13/12/05 23:04:08 INFO vrecord.VrecordIngressMR: ==========================Done===============================

13/12/05 23:04:08 INFO linkgraph.PartitionGraph: ========== Done partitioning graph ============
13/12/05 23:04:08 INFO linkgraph.LinkGraphEnd2End: Partition graph finished in : 312 seconds
13/12/05 23:04:08 INFO linkgraph.LinkGraphEnd2End: Total flow time : 723 seconds
job完成后,生成了一个文件。

[grid@localhost ~]$ hadoop dfs -ls /user/en-wiki-articles-output
Warning: $HADOOP_HOME is deprecated.
Found 3 items
drwxr-xr-x   - grid supergroup          0 2013-12-05 22:58 /user/en-wiki-articles-output/graph_norm
drwxr-xr-x   - grid supergroup          0 2013-12-05 23:03 /user/en-wiki-articles-output/graph_partitioned
drwxr-xr-x   - grid supergroup          0 2013-12-05 22:55 /user/en-wiki-articles-output/graph_raw

具体这个如何可视化。继续学习中。