现象:
FE节点上安装了KVM之后,虚拟网卡多了一个192.168.12.1的地址,导致FE在重启的时候无法找到元数据的故障,报错如下
2023-02-15 09:53:22,082 WARN (UNKNOWN 10.172.128.77_9015_1667993591722(-1)|1) [BDBJEJournal.open():319] catch exception, retried: 0
com.sleepycat.je.EnvironmentFailureException: (JE 7.3.7) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid
because of previous exception: (JE 7.3.7) 10.172.128.77_9015_1667993591722(1):/opt/starRocks/fe/meta/bdb Feeder: 10.172.128.69_9015_1667997292760(3). Conflicti
ng hostnames for replica id: 10.172.128.77_9015_1667993591722(1) Feeder thinks it is: 10.172.128.77 Replica is configured to use: 192.168.16.1 HANDSHAKE_ERROR:
Error during the handshake between two nodes. Some validity or compatibility check failed, preventing further communication between the nodes. Environment is inv
alid and must be closed.
at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:228) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1766) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:827) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:781) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.log.LogManager.getLogEntryHandleFileNotFound(LogManager.java:932) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.dbi.DiskOrderedScanner.fetchEntry(DiskOrderedScanner.java:2062) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.dbi.DiskOrderedScanner.fetchAndProcessBINs(DiskOrderedScanner.java:1634) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.dbi.DiskOrderedScanner.scanSerial(DiskOrderedScanner.java:783) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.dbi.DiskOrderedScanner.scan(DiskOrderedScanner.java:703) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.dbi.DatabaseImpl.count(DatabaseImpl.java:2266) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Database.count(Database.java:1910) ~[je-7.3.7.jar:7.3.7]
at com.starrocks.journal.bdbje.BDBJEJournal.getMaxJournalId(BDBJEJournal.java:225) ~[starrocks-fe.jar:?]
at com.starrocks.journal.bdbje.BDBJEJournal.open(BDBJEJournal.java:309) [starrocks-fe.jar:?]
at com.starrocks.persist.EditLog.open(EditLog.java:841) [starrocks-fe.jar:?]
at com.starrocks.catalog.Catalog.initialize(Catalog.java:884) [starrocks-fe.jar:?]
at com.starrocks.StarRocksFE.start(StarRocksFE.java:111) [starrocks-fe.jar:?]
at com.starrocks.StarRocksFE.main(StarRocksFE.java:66) [starrocks-fe.jar:?]
解决方案:
1.找到fe的配置路径
cd /opt/starrocks/fe/conf
打开fe的配置文件
vim fe.conf
3.配置本机IP
priority_networks = 10.172.128.77/32
4.重启FE
./bin/start_fe.sh --daemon
导致此次故障的原因是因为多网卡IP导致,需要故障FE的出口IP和其他FE通信