作者:李繼武
1
創(chuàng)建文檔的目的
從CDSW1.1.0開始支持GPU,請參閱Fayson前面的句子。
如何在CDSW中利用GPU進行深度學習
“您可以在最新的CDSW支持GPU網(wǎng)站上查看相應(yīng)的NVIDIA驅(qū)動器版本、CUDA版本和tensorflow版本,如下所示:
我們注意到CUDA的版本是9.2,但目前正式發(fā)布的編譯版TensorFlow的CUDA版本仍然是9.0。在CDSW環(huán)境中,為了使TensorFlow在GPU上運行,必須使用CUDA9.2。我們需要手動編譯TensorFlow源代碼。
這里,以編譯Ten和Ten的版本為例,指定CUDA的版本為9.2,cudnn的版本為7.2.1。2
安裝編譯過程中需要的包及環(huán)境
此部分兩個版本的操作都相同
1.配置JDK1.8到環(huán)境變量中
2.執(zhí)行如下命令,安裝依賴包
yum -y install numpy yum -y install python-devel yum -y install python-pip yum -y install python-wheel yum -y install epel-release yum -y install gcc-c++ pip install --upgrade pip enum34 pip install keras --user pip install mock如果安裝時沒有可用的包,可到下面的地址下載,然后制作本地yum源:
3.下載CUDA9.2并安裝
到下面的地址下載CUDA9.2安裝包:
;target_arch=x86_64&target_distro=RHEL&target_version=7&target_type=runfilelocal選擇runfile(local)版本:
上傳到服務(wù)器:
修改文件權(quán)限,并運行該文件:
chmod +x cuda_9.2.148_396.37_linux.run .將CUDA添加到環(huán)境變量:
export PATH=/usr/local:$PATH export LD_LIBRARY_PATH=/usr/local:$LD_LIBRARY_PATH執(zhí)行如下命令應(yīng)能看到cuda版本:
source /etc/profile nvcc -V4.cuDNN v7.2.1 下載并安裝
到如下地址下載cudnn v7.2.1,需要注冊之后才能下載:
上傳到服務(wù)器CUDA的安裝目錄/usr/local/cuda,解壓到該目錄下
tar -zxvf cudnn-9.2-linux-x64-v7.2.1.38.tgz在該目錄下執(zhí)行下面命令將cudnn添加到cuda的庫中:
sudo cp cuda/include /usr/local/cuda/include sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64 sudo chmod a+r /usr/local/cuda/include /usr/local/cuda/lib64/libcudnn*進入lib64目錄,建立一個軟連接:
cd /usr/local/cuda/lib64 ln -s stub libcuda.3
安裝編譯工具bazel
這部分編譯不同的tensorflow版本需要安裝不同版本的bazel,使用太新的版本有 時會報錯。
A.Ten使用的bazel版本為0.19.2:
1.下載bazel-0.19.2:
wget2.添加可執(zhí)行權(quán)限,并執(zhí)行:
chmod +x bazel-0.19.2-in ./bazel-0.19.2-in --user該--user標志將Bazel安裝到$HOME/bin系統(tǒng)上的目錄并設(shè)置.bazelrc路徑$HOME/.bazelrc。使用該--help 命令可以查看其他安裝選項。
顯示下面的提示表示安裝成功:
如果使用--user上面的標志運行Bazel安裝程序,則Bazel可執(zhí)行文件將安裝在$HOME/bin目錄中。將此目錄添加到默認路徑是個好主意,如下所示:
export PATH=$HOME/bin:$PATHB.Ten使用的bazel版本為0.13.0:
1.下載bazel-0.13.0
wget https://github.com/bazelbuild/bazel/releases/download/0.13.0/bazel-0.13.0-in其余的操作與上面安裝bazel-0.19.2相同。
4
下載Tensorflow源碼
A. 下載最新版的tensorflow:
git clone --recurse-submodules該命令會在當前目錄下創(chuàng)建一個tensorflow目錄,在其中下載最新版的tensorflow源碼:
編寫此文檔時tensorflow最新的版本為1.12。
B.下載ten:
wget /archive/v1.8.0.tar.gz解壓到當前文件夾:
wget /archive/v1.8.0.tar.gz5
配置tensorflow
不同版本的配置略有不同。
A.Ten
進入ten的源碼目錄,執(zhí)行./configure并根據(jù)提示選擇:
[root@cdh4 tensorflow]# ./configure Extracting Bazel installation... WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". INFO: Invocation ID: cc8b0ee2-5e84-4995-ba12-2c922ee3646b You have bazel 0.19.2 installed. Please specify the location of python. [Default is /usr/bin/python]: Found possible Python library paths: /usr/lib /usr/lib64 Please input the desired Python library path to use. Default is [/usr/lib] Do you wish to build TensorFlow with XLA JIT support? [Y/n]: n No XLA JIT support will be enabled for TensorFlow. Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n No OpenCL SYCL support will be enabled for TensorFlow. Do you wish to build TensorFlow with ROCm support? [y/N]: n No ROCm support will be enabled for TensorFlow. Do you wish to build TensorFlow with CUDA support? [y/N]: y CUDA support will be enabled for TensorFlow. Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]: 9.2 Please specify the location where CUDA 9.2 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.2.1 Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Do you wish to build TensorFlow with TensorRT support? [y/N]: n No TensorRT support will be enabled for TensorFlow. Please specify the locally installed NCCL version you want to use. [Default is to use ]: Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: . Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,7.0]: Do you want to use clang as CUDA compiler? [y/N]: n nvcc will be used as CUDA compiler. Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Do you wish to build TensorFlow with MPI support? [y/N]: n No MPI support will be enabled for TensorFlow. Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n Not configuring the WORKSPACE for Android builds. Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. --config=gdr # Build with GDR support. --config=verbs # Build with libverbs support. --config=ngraph # Build with Intel nGraph support. --config=dynamic_kernels # (Experimental) Build kernels into separate shared objects. Preconfigured Bazel build configs to DISABLE default on features: --config=noaws # Disable AWS S3 filesystem support. --config=nogcp # Disable GCP support. --config=nohdfs # Disable HDFS support. --config=noignite # Disable Apacha Ignite support. --config=nokafka # Disable Apache Kafka support. --config=nonccl # Disable NVIDIA NCCL support. Configuration finishedB.Ten
進入ten的源碼目錄,執(zhí)行./configure并根據(jù)提示選擇:
[root@cdh2 ]# ./configure WARNING: Running Bazel server needs to be killed, because the startup options are different. You have bazel 0.13.0 installed. Please specify the location of python. [Default is /usr/bin/python]: Found possible Python library paths: /usr/lib /usr/lib64 Please input the desired Python library path to use. Default is [/usr/lib] Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: y jemalloc as malloc support will be enabled for TensorFlow. Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n No Google Cloud Platform support will be enabled for TensorFlow. Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n No Hadoop File System support will be enabled for TensorFlow. Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n No Amazon S3 File System support will be enabled for TensorFlow. Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n No Apache Kafka Platform support will be enabled for TensorFlow. Do you wish to build TensorFlow with XLA JIT support? [y/N]: n No XLA JIT support will be enabled for TensorFlow. Do you wish to build TensorFlow with GDR support? [y/N]: n No GDR support will be enabled for TensorFlow. Do you wish to build TensorFlow with VERBS support? [y/N]: n No VERBS support will be enabled for TensorFlow. Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n No OpenCL SYCL support will be enabled for TensorFlow. Do you wish to build TensorFlow with CUDA support? [y/N]: y CUDA support will be enabled for TensorFlow. Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.2 Please specify the location where CUDA 9.2 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.2.1 Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Do you wish to build TensorFlow with TensorRT support? [y/N]: n No TensorRT support will be enabled for TensorFlow. Please specify the NCCL version you want to use. [Leave empty to default to NCCL 1.3]: Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: . Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2] Do you want to use clang as CUDA compiler? [y/N]: n nvcc will be used as CUDA compiler. Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Do you wish to build TensorFlow with MPI support? [y/N]: n No MPI support will be enabled for TensorFlow. Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n Not configuring the WORKSPACE for Android builds. Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tool for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. Configuration finished6
編譯tensorflow
兩個版本都使用下方的命令進行編譯
bazel build --config=opt --config=cuda --config=monolithic //tensorflow/tools/pip_package:build_pip_package注意:執(zhí)行該命令要在tensorflow的源碼目錄下
開始編譯:
等待編譯結(jié)束,該過程比較耗時,出現(xiàn)下面提示表示編譯成功。
編譯結(jié)束后,執(zhí)行下面命令:
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg執(zhí)行完畢后可在/tmp/tensorflow_pkg目錄中看到編譯成功的tensorflow安裝包:
注意:在編譯過程中,磁盤不足或者內(nèi)存不足都將導致編譯失敗,內(nèi)存不足可能出現(xiàn)下面的錯誤,可通過設(shè)置交換區(qū)來解決。
設(shè)置緩沖區(qū):
sudo dd if=/dev/zero of=/var/cache/swap/swap0 bs=1M count=1024 sudo chmod 0600 /var/cache/swap/swap0 sudo mkswap /var/cache/swap/swap0 sudo swapon /var/cache/swap/swap0當編譯結(jié)束后,刪除該交換區(qū):
swapoff /var/cache/swap/swap0 rm -rf /var/cache/swap/swap07
驗證
此處以驗證ten為例:
1.安裝編譯好的tensorflow安裝包:
sudo pip install /tmp/tensorflow_pkg/-cp27-none-linux_x86_64.whl2.安裝成功后,打開Python的交互界面,導入tensorflow,查看版本及路徑:
注意:測試的時候別在tensorflow目錄下import tensorflow,可能直接引用里 面的目錄下的包。
提示:代碼塊部分可以左右滑動查看噢
為天地立心,為生民立命,為往圣繼絕學,為萬世開太平。
溫馨提示:如果使用電腦查看圖片不清晰,可以使用手機打開文章單擊文中的圖片放大查看高清原圖。
推薦關(guān)注Hadoop實操,第一時間,分享更多Hadoop干貨,歡迎轉(zhuǎn)發(fā)和分享。
原創(chuàng)文章,歡迎轉(zhuǎn)載,轉(zhuǎn)載請注明:轉(zhuǎn)載自微信公眾號Hadoop實操
1.《【922ee最新】0490-如何為GPU環(huán)境編譯CUDA9.2的TensorFlow1.8與1.12》援引自互聯(lián)網(wǎng),旨在傳遞更多網(wǎng)絡(luò)信息知識,僅代表作者本人觀點,與本網(wǎng)站無關(guān),侵刪請聯(lián)系頁腳下方聯(lián)系方式。
2.《【922ee最新】0490-如何為GPU環(huán)境編譯CUDA9.2的TensorFlow1.8與1.12》僅供讀者參考,本網(wǎng)站未對該內(nèi)容進行證實,對其原創(chuàng)性、真實性、完整性、及時性不作任何保證。
3.文章轉(zhuǎn)載時請保留本站內(nèi)容來源地址,http://f99ss.com/yule/3196997.html