【分布式存储】GlusterFS failing to mount at boot with Ubuntu 14.04

时间:2023-03-08 16:41:28
【分布式存储】GlusterFS failing to mount at boot with Ubuntu 14.04【分布式存储】GlusterFS failing to mount at boot with Ubuntu 14.04

Previously I asked about mounting GlusterFS at boot in an Ubuntu 12.04 server  and the answer was that this was buggy in 12.04 and worked in 14.04.  Curious I gave it a try on a virtual machine running on my laptop and in  14.04 it worked. Since this was critical for me, I decided to upgrade  my running servers to 14.04 only to discover that GlusterFS is not  mounting localhost volumes automatically either.

This is a Linode server and fstab looks like this:

# <file system> <mount point>          <type>    <options>                 <dump>  <pass>
proc /proc proc defaults 0 0
/dev/xvda / ext4 noatime,errors=remount-ro 0 1
/dev/xvdb none swap sw 0 0
/dev/xvdc /var/lib/glusterfs/brick01 ext4 defaults 1 2
koraga.int.example.com:/public_uploads /var/www/shared/public/uploads glusterfs defaults,_netdev 0 0

The booting process likes like this (around the networking mounting part, which are the only fails):

 * Stopping Mount network filesystems                                    [ OK ]
* Starting set sysctls from /etc/sysctl.conf [ OK ]
* Stopping set sysctls from /etc/sysctl.conf [ OK ]
* Starting configure virtual network devices [ OK ]
* Starting Bridge socket events into upstart [ OK ]
* Starting Waiting for state [fail]
* Stopping Waiting for state [ OK ]
* Starting Block the mounting event for glusterfs filesystems until the [fail]k interfaces are running
* Starting Waiting for state [fail]
* Starting Block the mounting event for glusterfs filesystems until the [fail]k interfaces are running
* Stopping Waiting for state [ OK ]
* Starting Signal sysvinit that remote filesystems are mounted [ OK ]
* Starting GNU Screen Cleanup [ OK ]

I believe the log file /var/log/glusterfs/var-www-shared-public-uploads.log contains the main clue to the problem, as it's the only one that is really different between this server, where mounting is not working, and my local virtual server, where it is:

[2014-07-10 05:51:49.762162] I [glusterfsd.c:1959:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.1 (/usr/sbin/glusterfs --volfile-server=koraga.int.example.com --volfile-id=/public_uploads /var/www/shared/public/uploads)
[2014-07-10 05:51:49.774248] I [socket.c:3561:socket_init] 0-glusterfs: SSL support is NOT enabled
[2014-07-10 05:51:49.774278] I [socket.c:3576:socket_init] 0-glusterfs: using system polling thread
[2014-07-10 05:51:49.775573] E [socket.c:2161:socket_connect_finish] 0-glusterfs: connection to 192.168.134.227:24007 failed (Connection refused)
[2014-07-10 05:51:49.775634] E [glusterfsd-mgmt.c:1601:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: koraga.int.example.com (No data available)
[2014-07-10 05:51:49.775649] I [glusterfsd-mgmt.c:1607:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2014-07-10 05:51:49.776284] W [glusterfsd.c:1095:cleanup_and_exit] (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x23) [0x7f6718bf3f83] (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x90) [0x7f6718bf7da0] (-->/usr/sbin/glusterfs(+0xcf13) [0x7f67192bbf13]))) 0-: received signum (1), shutting down
[2014-07-10 05:51:49.776314] I [fuse-bridge.c:5475:fini] 0-fuse: Unmounting '/var/www/shared/public/uploads'.

The status of the volume is:

Volume Name: public_uploads
Type: Distribute
Volume ID: 52aa6d85-f4ea-4c39-a2b3-d20d34ab5916
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: koraga.int.example.com:/var/lib/glusterfs/brick01/public_uploads
Options Reconfigured:
auth.allow: 127.0.0.1,192.168.134.227
client.ssl: off
server.ssl: off
nfs.disable: on

If I run mount -a after booting up, the volume is mounted correctly:

koraga.int.example.com:/public_uploads on /var/www/shared/public/uploads type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)

A couple of related log files show this:

/var/log/upstart/mounting-glusterfs-_var_www_shared_public_uploads.log:

start: Job failed to start

/var/log/upstart/wait-for-state-mounting-glusterfs-_var_www_shared_public_uploadsstatic-network-up.log:

status: Unknown job: static-network-up
start: Unknown job: static-network-up

but on my testing server, it shows exactly the same, so, I don't think this is relevant.

Any ideas what's wrong now?

Update: I tried the change of WAIT_FOR from static-network-up to networking and it still didn't work but all the [fail] messages at boot disappear. These are the contains of the log files under these conditions:

/var/log/glusterfs/var-www-shared-public-uploads.log contains:

wait-for-state stop/waiting

/var/log/upstart/wait-for-state-mounting-glusterfs-_var_www_shared_public_uploadsstatic-network-up.log contains:

start: Job is already running: networking

/var/log/glusterfs/var-www-shared-public-uploads.log contains:

[2014-07-11 17:19:38.000207] I [glusterfsd.c:1959:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.1 (/usr/sbin/glusterfs --volfile-server=koraga.int.example.com --volfile-id=/public_uploads /var/www/shared/public/uploads)
[2014-07-11 17:19:38.029421] I [socket.c:3561:socket_init] 0-glusterfs: SSL support is NOT enabled
[2014-07-11 17:19:38.029450] I [socket.c:3576:socket_init] 0-glusterfs: using system polling thread
[2014-07-11 17:19:38.030288] E [socket.c:2161:socket_connect_finish] 0-glusterfs: connection to 192.168.134.227:24007 failed (Connection refused)
[2014-07-11 17:19:38.030331] E [glusterfsd-mgmt.c:1601:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: koraga.int.example.com (No data available)
[2014-07-11 17:19:38.030345] I [glusterfsd-mgmt.c:1607:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2014-07-11 17:19:38.030984] W [glusterfsd.c:1095:cleanup_and_exit] (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x23) [0x7fd9495b7f83] (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x90) [0x7fd9495bbda0] (-->/usr/sbin/glusterfs(+0xcf13) [0x7fd949c7ff13]))) 0-: received signum (1), shutting down
[2014-07-11 17:19:38.031013] I [fuse-bridge.c:5475:fini] 0-fuse: Unmounting '/var/www/shared/public/uploads'.

Update 2: I also tried this in the upstart file:

start on (started glusterfs-server and mounting TYPE=glusterfs)

but the computer failed to boot (don't know why yet).

参考资料:

http://serverfault.com/questions/611462/glusterfs-failing-to-mount-at-boot-with-ubuntu-14-04/