Multiple GPUs Not Showing Up in CUDA

We recently upgraded our server, adding 6 Tesla GPUs to our existing 580 GTX. However, once we had them installed, we noticed some issues, namely, they weren’t fully being recognized.

1
2
$ deviceQuery --noprompt | grep "^Device"
3
Device 0: "GeForce GTX 580"
4

However, they were being detected and the devices were being setup:

01
02
$ nvidia-smi -a | grep "^GPU"
03
GPU 0000:08:00.0
04
GPU 0000:0A:00.0
05
GPU 0000:0D:00.0
06
GPU 0000:8B:00.0
07
GPU 0000:8D:00.0
08
GPU 0000:96:00.0
09
GPU 0000:98:00.0
10
 
11
$ sudo lspci | egrep "3D controller|VGA compatible"
12
08:00.0 3D controller: nVidia Corporation GT200 [Tesla C1060] (rev a1)
13
0a:00.0 3D controller: nVidia Corporation GT200 [Tesla C1060] (rev a1)
14
0d:00.0 VGA compatible controller: nVidia Corporation GF110 [GeForce GTX 580] (rev a1)
15
10:04.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200eW WPCM450 (rev 0a)
16
8b:00.0 3D controller: nVidia Corporation GT200 [Tesla C1060] (rev a1)
17
8d:00.0 3D controller: nVidia Corporation GT200 [Tesla C1060] (rev a1)
18
96:00.0 3D controller: nVidia Corporation GT200 [Tesla C1060] (rev a1)
19
98:00.0 3D controller: nVidia Corporation GT200 [Tesla C1060] (rev a1)
20
 
21
$ ls -lha /dev/nv*
22
crw-rw-rw- 1 root root 195, 0 2012-05-17 16:29 /dev/nvidia0
23
crw-rw-rw- 1 root root 195, 1 2012-05-17 16:29 /dev/nvidia1
24
crw-rw-rw- 1 root root 195, 2 2012-05-17 16:29 /dev/nvidia2
25
crw-rw-rw- 1 root root 195, 3 2012-05-17 16:29 /dev/nvidia3
26
crw-rw-rw- 1 root root 195, 4 2012-05-17 16:29 /dev/nvidia4
27
crw-rw-rw- 1 root root 195, 5 2012-05-17 16:29 /dev/nvidia5
28
crw-rw-rw- 1 root root 195, 6 2012-05-17 16:29 /dev/nvidia6
29
crw-rw-rw- 1 root root 195, 255 2012-05-17 16:29 /dev/nvidiactl
30

Many users in various forums had a similar problem which was solved by setting the 666 permissions, which were already correct in our setup. However, we luckily found the resolution on http://ambermd.org/gpus/

01
02
$ echo $CUDA_VISIBLE_DEVICES
03
0
04
 
05
$ export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6"
06
$ deviceQuery --noprompt | grep "^Device"
07
Device 0: "GeForce GTX 580"
08
Device 1: "Tesla T10 Processor"
09
Device 2: "Tesla T10 Processor"
10
Device 3: "Tesla T10 Processor"
11
Device 4: "Tesla T10 Processor"
12
Device 5: "Tesla T10 Processor"
13
Device 6: "Tesla T10 Processor"
14

I never did find where this variable was being set, but now we know to include this in our scripts.

This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to Multiple GPUs Not Showing Up in CUDA

  1. Nisto curso do que SEO, você aprenderá as técnicas usadas tanto junto de ambiente imo
    do sítio da Internet, On-line-Page, quanto dentro de
    envolvente extrínseco, Off-Page, que visam conseguir as melhores colocações
    nas páginas com resultados da busca orgânica das principais ferramentas de procura.

  2. Everyone loves what you guys are up too. This kind of clever work and exposure! Keep up the very good works guys I’ve included you guys to my personal blogroll.|

  3. With three different certification levels, Landscaping the
    Sustainable Campus is accessible and adaptable to your campus.

Leave a Reply

Your email address will not be published. Required fields are marked *