NVIDIA Optimus: A tale of two graphic cards

This is a short summary on how I managed to use the nvidia graphic card with my Debian Testing installation. I’ve cooked this blog since it may help other (and a future me) during their debian configuration stage

L’Entrée

Most Linux users are familiar with the hassle of making Nvidia GPU’s working with their linux distributions. I’ve bought a Laptop that happens to run a new – at least for me – technology known as NVIDIA Optimus which allows to have an internal Intel Graphics Card running with dedicated Nvidia GPU, each independent of the other, so we can selectively render graphics at each card according to our performance needs, thus optimizing the use of resources mostly for Laptops like mine.

You tought having only a nvidia graphics card was troublesome? No wonder that making Optimus work is not an exception.

Main course

First of all, you need to check that your system is optimus compatible by checking which graphic cards you have

$ lspci | grep -i VGA
# Should output something like
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Mobile)

And then

$ lspci | grep -i 3D
# Should output something like
01:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050 Ti Mobile] (rev a1)

So, you have two graphic units. In my case, an Intel Graphics Card, and the Nvidia GPU.

Spoiler Alert!: Installing just the nvidia drivers might work, but is not the essence of having an optimus system. As a fallback alternative I’ve been running manually installed nvidia drivers for my laptop and is not the most energy saving solution

In order to check the GLX implementation we need to install mesa-utils since they include the glxinfo command,

$ glxinfo | grep OpenGL
# Should output something like
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) UHD Graphics 630 (Coffeelake 3x8 GT2) 
OpenGL core profile version string: 4.5 (Core Profile) Mesa 19.2.3
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.0 Mesa 19.2.3
OpenGL shading language version string: 1.30
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 19.2.3
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
OpenGL ES profile extensions:

Now, lets install bumblebee with the nvidia drivers. Bumblebee aims to provide support for NVIDIA Optimus laptops for GNU/Linux distributions.

sudo apt install nvidia-driver bumblebee-nvidia primus libgl1-nvidia-glx nvidia-smi

Now, we can check that optirun passes the execution to the nvidia card

optirun glxinfo | grep OpenGL

I won’t tell you that things worked straightforward, the first time I ran this I got:

$ optirun glxinfo | grep OpenGL
[23794.793713] [ERROR]Cannot access secondary GPU - error: Could not load GPU driver

[23794.793750] [ERROR]Aborting because fallback start is disabled.

Here’s where the adventure begins, and I feel really thankful with the people of the Arch Linux community (see the resources below)

Dessert

Here comes the tricky part. I’ll be adding almost ad literam what’ve found on the arch linux forum. There is not really much to elaborate here.

In order to verify the performnace, we need to install two additional things

# apt install tlp powertop

And perform the following configurations

/etc/default/tlp

Add GPU to TLP with variable RUNTIME_PM_BLACKLIST at /etc/default/tlp. To that end we need the slot for the nvidia card (the 01:00.0 when one does lspci over the nvidia)

RUNTIME_PM_BLACKLIST="01:00.0"

/etc/bumblebee/bumblebee.conf

Driver=nvidia

And in nvidia section:

PMMethod=none

/etc/tempfiles.d/nvidia_pm.conf

Allow gpu to poweroff on boot

w /sys/bus/pci/devices/0000:01:00.0/power/control - - - - auto

/etc/X11/xorg.conf.d/01-noautogpu.conf

Section "ServerFlags"
	Option "AutoAddGPU" "off"
EndSection

/etc/X11/xorg.conf.d/20-intel.conf

Section "Device"
 Identifier  "Intel Graphics"
 Driver      "modesetting"
EndSection

Create blacklist files

/etc/modprobe.d/blacklist.conf

blacklist nouveau
blacklist rivafb
blacklist nvidiafb
blacklist rivatv
blacklist nv
blacklist nvidia
blacklist nvidia-drm
blacklist nvidia-modeset
blacklist nvidia-uvm
blacklist ipmi_msghandler
blacklist ipmi_devintf 

/etc/modprobe.d/disable-ipmi.conf

These modules are loaded together with nvidia and block its unloading. I do not need ipmi therefore I simply disabled this functionality.

install ipmi_msghandler /usr/bin/false
install ipmi_devintf /usr/bin/false

/etc/modprobe.d/disable-nvidia.conf

install nvidia /bin/false

Create GPU management scripts

GPU management scripts were created by tyrells to which manipulation of blacklist config was added.

Create two following management scripts. Creation of aliases is recommended.

enableGpu.sh

#!/bin/sh
# allow to load nvidia module
mv /etc/modprobe.d/disable-nvidia.conf /etc/modprobe.d/disable-nvidia.conf.disable

# remove NVIDIA card (currently in power/control = auto)
echo -n 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove
sleep 1
# change PCIe power control
echo -n on > /sys/bus/pci/devices/0000\:00\:01.0/power/control
sleep 1
# rescan for NVIDIA card (defaults to power/control = on)
echo -n 1 > /sys/bus/pci/rescan

disableGpu.sh

modprobe -r nvidia_drm
modprobe -r nvidia_uvm
modprobe -r nvidia_modeset
modprobe -r nvidia

# change NVIDIA card power control
echo -n auto > /sys/bus/pci/devices/0000\:01\:00.0/power/control
sleep 1
# change PCIe power control
echo -n auto > /sys/bus/pci/devices/0000\:00\:01.0/power/control
sleep 1

# lock system form loading nvidia module
mv /etc/modprobe.d/disable-nvidia.conf.disable /etc/modprobe.d/disable-nvidia.conf

Create service which locks GPU on shutdown

Service which locks GPU on shutdown / restart when it is not disabled by disableGpu.sh script is necessary. Otherwise on next boot nvidia will be loaded together with ipmi modules (even if we have blacklist with install command for them) and it would not be possible to unload them.

/etc/systemd/system/disable-nvidia-on-shutdown.service

[Unit]
Description=Disables Nvidia GPU on OS shutdown

[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/true
ExecStop=/bin/bash -c "mv /etc/modprobe.d/lock-nvidia.conf.disable /etc/modprobe.d/lock-nvidia.conf || true"

[Install]
WantedBy=multi-user.target

Enabling

Reload systemd daemons and enable service:

systemctl daemon-reload 
systemctl enable disable-nvidia-on-shutdown.service

Now, after rebooting the system I ran

# chmod 755 enableGpu.sh disableGpu.sh

and then, tested running

$ nvidia-smi 
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Now, the moment of truth, lets see how it goes afet running doing,

# ./enableGpu.sh
$ nvidia-smi 
Tue Nov 12 17:48:28 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50       Driver Version: 430.50       CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   56C    P0    N/A /  N/A |      0MiB /  4040MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |

Voilá! Now the GPU can be enabled or disable as needed

Resources

  1. Install Tensorflow with GPU support on Debian Sid
  2. [SOLVED] Dell XPS 9570 bbswitch not working, Nvdia won’t power off/on
  3. Error: Could not enable discrete graphics card
  4. Bumblebee
comments powered by Disqus