Linux系统中,进行频繁的读写操作,容易发送只读、以及磁盘损坏等故障。下文为其解决方案:
1、如何界定磁盘已经存在故障
方法一(界定将如下内容另存为Repair.sh然后执行即可):
#!/bin/sh
cd /root
DiskFlag=`/bin/df -k | /bin/awk '{print $1"\t"$2}' | /bin/sort -k 2 -n | /bin/awk 'END{print $1}'`
num=`tune2fs -l $DiskFlag | grep -c "clean with errors"`
echo $num
if [ $num -lt 1 ];then
date >> RepairDisk.log
echo -e "System Is OK ! " >> RepairDisk.log
echo >> RepairDisk.log
exit 0
else
echo -e '\033[0;31;1m Repairing Operationing System!\033[0m'
date >> RepairDisk.log
echo "Start Repairing Disk ! " >> RepairDisk.log
fsck.ext3 -y /dev/sda6 >> RepairDisk.log ###修复
echo "Repairing Disk End! " >> RepairDisk.log
date >> RepairDisk.log
fi
====上文的脚本中,包含了如下查找最大的磁盘以及将发现故障时自动修复。这种修复方案在逻辑层损坏尤其有效。
方案二(通过查看mount信息界定磁盘是否存在只读只读时,文件会有ro的信息):
cat /proc/mounts
rootfs / rootfs rw 0 0
/dev/root / ext3 rw,data=ordered 0 0
/dev /dev tmpfs rw 0 0
/proc /proc proc rw 0 0
/sys /sys sysfs rw 0 0
/proc/bus/usb /proc/bus/usb usbfs rw 0 0
devpts /dev/pts devpts rw 0 0
/dev/sda2 /b ext3 rw,data=ordered 0 0
/dev/sda1 /boot ext3 rw,data=ordered 0 0
tmpfs /dev/shm tmpfs rw 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
/etc/auto.misc /misc autofs rw,fd=7,pgrp=2664,timeout=300,minproto=5,maxproto=5,indirect 0 0
-hosts /net autofs rw,fd=13,pgrp=2664,timeout=300,minproto=5,maxproto=5,indirect 0 0
/dev/sda6 /usr/share/TSMIS ext3 rw,data=ordered 0 0
方案三(界定是否存在硬件故障,方案只做只读测试):
- # badblocks /dev/sda1 从物理层扫描硬盘有无坏块
- # badblocks -v /dev/sda1 同上,运行时输出详细信息
- Checking blocks 0 to 200781
- Checking for bad blocks (read-only test): done
- Pass completed, 0 bad blocks found.
可以看到进度:
- # badblocks -vsn /dev/sda1 检查坏块,不具破坏性
- Checking for bad blocks in non-destructive read-write mode
- From block 0 to 200781
- Testing with random pattern: Pass completed, 0 bad blocks found.
方案四(有损测试,会擦拭硬盘内所有的数据):
警告 这条命令会擦除硬盘分区里的所有数据。
- # badblocks -vsw /dev/sda1 检查坏块,具有破坏性
- Checking for bad blocks in read-write mode
- From block 0 to 200781
- Testing with pattern 0xaa: done
- Reading and comparing: done
- Testing with pattern 0x55: done
- Reading and comparing: done
- Testing with pattern 0xff: done
- Reading and comparing: done
- Testing with pattern 0x00: done
- Reading and comparing: done
- Pass completed, 0 bad blocks found.
方案五(如果是ext3的文件系统,可以用fsck进行测试)
- # fsck -TVy /dev/sda1
- [/sbin/fsck.ext3 (1) -- /mnt/mymount] fsck.ext3 -y /dev/sda1
- e2fsck 1.39 (29-May-2006)
- Couldn't find ext2 superblock, trying backup blocks...
- Resize inode not valid. Recreate? yes
- mypart was not cleanly unmounted, check forced.
- Pass 1: Checking inodes, blocks, and sizes
- Pass 2: Checking directory structure
- Pass 3: Checking directory connectivity
- Pass 4: Checking reference counts
- Pass 5: Checking group summary information
- Free blocks count wrong for group #0 (3552, counted=3553).
- Fix? yes
- Free blocks count wrong (188777, counted=188778).
- Fix? yes
解决:
1、mount的信息优化,比如日志文件,不更新文件
2、 tune2fs -c 5 /dev/sda1 强制重启多次后磁盘检查
3、关闭cache,尤其对于电压不稳的环境, hdparm -W 0 /dev/sda6