Jetson TX1 compile pytorch issues

时间:2021-10-28 05:12:27

1. c++: internal compiler error: Killed (program cc1plus)

reason: memory out, need swapfile

2. NCCL issues

/home/ubuntu/Project/pytorch/build/lib/libcaffe2_gpu.so: undefined reference to `ncclAllReduce'
/home/ubuntu/Project/pytorch/build/lib/libcaffe2_gpu.so: undefined reference to `ncclGetErrorString'
/home/ubuntu/Project/pytorch/build/lib/libcaffe2_gpu.so: undefined reference to `ncclGroupEnd'
/home/ubuntu/Project/pytorch/build/lib/libcaffe2_gpu.so: undefined reference to `ncclGroupStart'
/home/ubuntu/Project/pytorch/build/lib/libcaffe2_gpu.so: undefined reference to `ncclBcast'
/home/ubuntu/Project/pytorch/build/lib/libcaffe2_gpu.so: undefined reference to `ncclCommDestroy'
/home/ubuntu/Project/pytorch/build/lib/libcaffe2_gpu.so: undefined reference to `ncclReduceScatter'
/home/ubuntu/Project/pytorch/build/lib/libcaffe2_gpu.so: undefined reference to `ncclCommInitAll'
/home/ubuntu/Project/pytorch/build/lib/libcaffe2_gpu.so: undefined reference to `ncclAllGather'
/home/ubuntu/Project/pytorch/build/lib/libcaffe2_gpu.so: undefined reference to `ncclReduce'
collect2: error: ld returned exit status
caffe2/CMakeFiles/utility_ops_gpu_test.dir/build.make:: recipe for target 'bin/utility_ops_gpu_test' failed
make[]: *** [bin/utility_ops_gpu_test] Error
CMakeFiles/Makefile2:: recipe for target 'caffe2/CMakeFiles/utility_ops_gpu_test.dir/all' failed
make[]: *** [caffe2/CMakeFiles/utility_ops_gpu_test.dir/all] Error
make[]: *** Waiting for unfinished jobs....
The dependency target "nccl_external" of target "gloo_cuda" does not exist.
Call Stack (most recent call first):
CMakeLists.txt: (include)
This warning is for project developers. Use -Wno-dev to suppress it.

solver: https://devtalk.nvidia.com/default/topic/1042821/jetson-tx2/pytorch-install-with-python3-broken/post/5291480/#5291480

CmakeLists.txt : Change NCCL to 'Off'
setup.py: Add USE_NCCL = False

################################################################################
# Parameters parsed from environment
################################################################################
USE_NCCL = False
VERBOSE_SCRIPT = True
RUN_BUILD_DEPS = True

3. package is in a very bad inconsistent state

sudo apt-get -f --reinstall install <your package>