–无论是对于hdfs的读和写,对于用户来说都是无感知的、透明的操作,用户并不关心数据如何读出来如何写进去的,只要返回一个结果告诉用户数据读出来了或写进去了,至于怎么读怎么写,用户并不关心
补充:
读:hdfs dfs -ls / = hdfs dfs -ls hdfs://hadoop001:9000/
hdfs dfs -ls / /是hdfs文件系统的根目录 而不是Linux磁盘
→
hdfs dfs -ls hdfs://hadoop001:9000/ 其中hdfs://hadoop001:9000是一个参数,来自core-site.xml文件的配置
→
hdfs dfs -ls 读取当前命令操作的用户的路径:/user/用户/
hdfs dfs -ls = hdfs dfs -ls /user/hadoop
Hadoop put权限剖析
上传文件:将windows上的文件存放到linux上:hdfs dfs -put 目标文件 目标路径
如果文件事先存在或由于权限问题上传不了就会抛错。
→ 如果是文件已存在就抛出文件exists错误
→ 如果是权限问题,抛出这样的错误:
put: Permission denied: user=root, access=WRITE, inode="/":hadoop:supergroup:drwxr-xr-x
在这种情况下:
建议:
1.切换目录所需的用户 su - hadoop
2.强行修改对应的用户和用户组:要么是生产管控不严格 或者 是测试学习用
→
生产上一般不会改根目录的权限,而是将文件上传到一个新建的文件夹里面,修改该文件的用户和用户组权限
在root用户下将文件上传到hadoop用户新建的文件夹里面:
[[email protected] hadoop-2.6.0-cdh5.7.0]# bin/hdfs dfs -put NOTICE.txt /hahadata
18/10/13 11:12:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
没有抛错!
再返回hadoop用户查看这个新建的文件夹里面有没有上传的文件:是有的
这样就解决了文件上传没有权限的问题了,但是生产上一般都不会这样做的!
HDFS常用命令:
命令帮助:hdfs dfs (回车)
[[email protected] hadoop-2.6.0-cdh5.7.0]$ hdfs dfs
Usage: hadoop fs [generic options]
[-appendToFile … ]
[-cat [-ignoreCrc] …]
[-checksum …]
[-chgrp [-R] GROUP PATH…]
[-chmod [-R] <MODE[,MODE]… | OCTALMODE> PATH…]
[-chown [-R] [OWNER][:[GROUP]] PATH…]
[-copyFromLocal [-f] [-p] [-l] … ]
[-copyToLocal [-p] [-ignoreCrc] [-crc] … ]
[-count [-q] [-h] [-v]
[-cp [-f] [-p | -p[topax]] … ]
[-createSnapshot []]
[-deleteSnapshot ]
[-df [-h] [
[-du [-s] [-h]
[-expunge]
[-find
[-get [-p] [-ignoreCrc] [-crc] … ]
[-getfacl [-R]
[-getfattr [-R] {-n name | -d} [-e en]
[-getmerge [-nl] ]
[-help [cmd …]]
[-ls [-d] [-h] [-R] [
[-mkdir [-p]
[-moveFromLocal … ]
[-moveToLocal ]
[-mv … ]
[-put [-f] [-p] [-l] … ]
[-renameSnapshot ]
[-rm [-f] [-r|-R] [-skipTrash] …]
[-rmdir [–ignore-fail-on-non-empty]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>}
[-setfattr {-n name [-v value] | -x name}
[-setrep [-R] [-w]
[-stat [format]
[-tail [-f] ]
[-test -[defsz]
[-text [-ignoreCrc] …]
[-touchz
[-usage [cmd …]]
Generic options supported are
-conf specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|resourcemanager:port> specify a ResourceManager
-files specify comma separated files to be copied to the map reduce cluster
-libjars specify comma separated jar files to include in the classpath.
-archives specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]
hdfs dfs -ls / :查看/路径下的文件和文件夹
hdfs dfs -put a.txt / :上传a.txt文件到 / 目录路径下
hdfs dfs -get /a.txt ./ :下载a.txt文件到 ./ 当前目录路径下
hdfs dfs -copyFromLocal a.txt / :上传a.txt文件到 / 目录路径下
hdfs dfs -copyToLocal /a.txt ./ :下载a.txt文件到 ./ 当前目录路径下
hdfs dfs -cat /a.txt :查看文件内容
hdfs dfs -rm /a.txt :删除文件
hdfs dfs -rmr /filename :删除文件夹
注意:hadoop fs等价于 hdfs dfs,因为两者调用同一个代码结构
[[email protected] hadoop-2.6.0-cdh5.7.0]$ cat bin/hadoop | grep fs
if [ ‘’$ COMMAND" = “fs” ] ; then
CLASS=org.apache.hadoop.fs.FsShell
[[email protected] hadoop-2.6.0-cdh5.7.0]$ cat bin/hdfs
elif [ “$COMMAND” = “dfs” ] ; then
CLASS=org.apache.hadoop.fs.FsShell
注意2个命令:
(1)dfsadmin
dfsadmin run a DFS admin client
报告hdfs的集群健康状况 (-report)
[[email protected] hadoop-2.6.0-cdh5.7.0]$ hdfs dfsadmin
Usage: hdfs dfsadmin
Note: Administrative commands can only be run as the HDFS superuser.
[-report [-live] [-dead] [-decommissioning]]
[-safemode <enter | leave | get | wait>]
[-saveNamespace]
[-rollEdits]
[-restoreFailedStorage true|false|check]
[-refreshNodes]
[-setQuota …]
[-clrQuota …]
[-setSpaceQuota …]
[-clrSpaceQuota …]
[-finalizeUpgrade]
[-rollingUpgrade [<query|prepare|finalize>]]
[-refreshServiceAcl]
[-refreshUserToGroupsMappings]
[-refreshSuperUserGroupsConfiguration]
[-refreshCallQueue]
[-refresh host:ipc_port [arg1…argn]
[-reconfig <datanode|…> host:ipc_port <start|status|properties>]
[-printTopology]
[-refreshNamenodes datanode_host:ipc_port]
[-deleteBlockPool datanode_host:ipc_port blockpoolId [force]]
[-setBalancerBandwidth ]
[-fetchImage ]
[-allowSnapshot ]
[-disallowSnapshot ]
[-shutdownDatanode <datanode_host:ipc_port> [upgrade]]
[-getDatanodeInfo <datanode_host:ipc_port>]
[-metasave filename]
[-triggerBlockReport [-incremental] <datanode_host:ipc_port>]
[-help [cmd]]
[[email protected] ~]$ hdfs dfsadmin -report
18/10/13 11:46:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
Configured Capacity: 42139451392 (39.25 GB)
Present Capacity: 33108271104 (30.83 GB)
DFS Remaining: 33107415040 (30.83 GB)
DFS Used: 856064 (836 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Live datanodes (1):
Name: 172.31.95.246:50010 (hadoop001)
Hostname: hadoop001
Decommission Status : Normal
Configured Capacity: 42139451392 (39.25 GB)
DFS Used: 856064 (836 KB)
Non DFS Used: 9031180288 (8.41 GB)
DFS Remaining: 33107415040 (30.83 GB)
DFS Used%: 0.00%
DFS Remaining%: 78.57%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sat Oct 13 11:46:35 CST 2018
这些信息都可以在客户端50070的web界面可以看到,以显示hdfs的集群健康状况
(2)安全模式:一种保护功能
hdfs dfsadmin -safemode <enter | leave | get | wait>
log日志说了一句话 NN节点处于 safemode, 只读的,就要离开:hdfs dfsadmin -safemode leave
1、如果是手工让它进入安全模式,代表集群只读,可以做维护,做完维护,hdfs dfsadmin -safemode leave
2、如果是异常情况,比如hdfs损坏(或部署有问题),NN节点自动进入安全模式
a.手工尝试的去hdfs dfsadmin -safemode leave是成功的,进入正常的 -put正常操作
b.手工尝试的去hdfs dfsadmin -safemode leave不成功,那么仔细去看NN节点的log日志,去分析问题出在哪里
→可能是block块损坏