Fedora 35 and cuda: How to get GPU tf to work?

I would like to use gpu-accelerated tf/keras on fedora-35. For this I installed cuda-11-5 from rpmfusion and installed a dedicated (ana)conda env ‘automl’ on a fresh miniconda.

Cuda seems to be up and running:

$ nvidia-smi 
Sun Nov  7 12:11:29 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44       Driver Version: 495.44       CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 30%   35C    P8    15W / 160W |    904MiB /  5926MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      4480      G   /usr/libexec/Xorg                 431MiB |
|    0   N/A  N/A      5292      G   /usr/bin/kwin_x11                  92MiB |
|    0   N/A  N/A      5349      G   /usr/bin/plasmashell              137MiB |
|    0   N/A  N/A      5786      G   /usr/bin/nextcloud                 11MiB |
|    0   N/A  N/A      5841      G   ...akonadi_archivemail_agent        2MiB |
|    0   N/A  N/A      5849      G   .../akonadi_mailfilter_agent        2MiB |
|    0   N/A  N/A      5853      G   ...n/akonadi_sendlater_agent        2MiB |
|    0   N/A  N/A      5854      G   ...nadi_unifiedmailbox_agent        2MiB |
|    0   N/A  N/A      5969      G   /usr/lib64/firefox/firefox        213MiB |
|    0   N/A  N/A      8017      G   /usr/lib64/firefox/firefox          1MiB |
+-----------------------------------------------------------------------------+

However:

$ conda activate automl
(automl) [tpasch@redsnapper automl-in-action-notebooks]$ python 
Python 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tensorflow.python.client import device_lib
>>> device_lib.list_local_devices()
2021-11-07 12:13:43.836762: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 13378765430595217633
]

Is there anyone here that is already using fedora-35 with tf-gpu/cuda?

I don’t know if it will help but try

conda install -c anaconda tensorflow-gpu==2.4.1

or Build from source  |  TensorFlow is the always working way. It’s probably not Fedora issue but tensorflow deps.

I’ve finally found the solution (it has nothing to do with tensorflow-gpu or keras-gpu missing)!

It is indeed needed to build tf on your own, and I’ve done this by following the instruction at build from source for the docker method (but I used podman instead of docker). (This is exactly what @akza suggested.)

In this case, you use a docker image as build environment:

mkdir docker-tensorflow
cd docker-tensorflow
podman run --gpus all -it -w /tensorflow_src  -v $PWD:/mnt:z -e HOST_PERMS="$(id -u):$(id -g)" tensorflow/tensorflow:devel-gpu bash

and than follow the instructions given in the link to do a GPU build on docker. Building tf took several hours on my machine. At the end you could quit the docker container, and

$ ls 
tensorflow-2.8.0-cp38-cp38-linux_x86_64.whl

You’ve got a tensorflow for python 3.8 (because the docker image has installed this version). Hence:

conda create --name tf
conda activate tf
conda install python=3.8
pip install tensorflow-2.8.0-cp38-cp38-linux_x86_64.whl

will install tf in a newly create (ana)conda env ‘tf’.

Trying to use that you will probably run into:

$ python 
Python 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tensorflow.python.client import device_lib
2021-11-07 16:22:21.912950: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2021-11-07 16:22:21.912969: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

The solution for this is to install the latest version of cudnn. I unpacked the distribution in /opt and:

export LD_LIBRARY_PATH=/opt/cuda/lib64:$LD_LIBRARY_PATH
$ python
Python 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:57:06) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tensorflow.python.client import device_lib
>>> device_lib.list_local_devices()
2021-11-07 17:40:55.694335: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-11-07 17:40:55.748089: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-07 17:40:56.212687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /device:GPU:0 with 3327 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 13785827214071798721
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 3489464320
locality {
  bus_id: 1
  links {
  }
}
incarnation: 11688225645642055078
physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5"
xla_global_id: 416903419
]
2 Likes