I fail to reconnect the machine via ssh as usual after I reboot it. Then I check the machine physically, and find the reason:
LNetError: 131-3: Received notification of device removal Please shutdown LNET to allow this to proceed INFO: task reboot: 30396 blocked for more than 120 seconds.

Please shutdown LNET to allow this to proceed
It seems that we need shutdown LNET(Lustre service) before we execute the reboot command. So I run the below commands to shutdown/stop Lustre service.
# umount /lustre # lustre_rmmod
Now, everything is OK.
分类:Lustre