NVIDIA’s latest Tesla accelerator is without a doubt the most powerful GPU available. With almost 3,000 CUDA cores and 12GB GDDR5 memory, it wins in practically every* performance test you’ll see. As with the “Kepler” K20 GPUs, the Tesla K40 supports NVIDIA’s latest SMX, Dynamic Parallelism and Hyper-Q capabilities (CUDA compute capability 3.5). It also introduces professional-level GPU Boost capability to squeeze every bit of performance your code can pull from the GPU’s 235W power envelope.
Maximum GPU Memory and Compute Performance: Tesla K40 GPU Accelerator
Integrated in Microway NumberSmasher GPU Servers and GPU Clusters
Specifications
- 2880 CUDA GPU cores (GK110b)
- 4.2 TFLOPS single; 1.4 TFLOPS double-precision
- 12GB GDDR5 memory
- Memory bandwidth up to 288 GB/s
- PCI-E x16 Gen3 interface to system
- GPU Boost increased clock speeds
- Supports Dynamic Parallelism and HyperQ features
- Active and Passive heatsinks available for installation in workstations and specially-designed GPU servers
The new GPU also leverages PCI-E 3.0 to achieve over 10 gigabytes per second transfers between the host (CPUs) and the devices (GPUs):
[root@node3 tests]# ./gpu_bandwidthTest --memory=pinned --device=0 [CUDA Bandwidth Test] - Starting... Running on... Device 0: Tesla K40m Quick Mode Host to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 10038.7 Device to Host Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 10046.7 Device to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 202665.0 Result = PASS
Technical Details
Here is the full list of capabilities reported by NVIDIA’s SMI tool. Memory error detection and correction (ECC) is supported on all components of the Tesla GPU. Notice that GPU Boost allows the top CUDA core clock frequency to be set to 745 MHz, 810 MHz or 875 MHz:
[root@node3 ~]# nvidia-smi -a -i 0 ==============NVSMI LOG============== Timestamp : Mon Nov 11 21:42:13 2013 Driver Version : 325.15 Attached GPUs : 3 GPU 0000:02:00.0 Product Name : Tesla K40m Display Mode : Disabled Display Active : Disabled Persistence Mode : Enabled Accounting Mode : Disabled Accounting Mode Buffer Size : 128 Driver Model Current : N/A Pending : N/A Serial Number : 032391304xxxx GPU UUID : GPU-3964f3ae-5ee0-2afc-5d93-9f1edd2axxxx VBIOS Version : 80.80.24.00.06 Inforom Version Image Version : 2081.0202.01.04 OEM Object : 1.1 ECC Object : 3.0 Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A PCI Bus : 0x02 Device : 0x00 Domain : 0x0000 Device Id : 0x102310DE Bus Id : 0000:02:00.0 Sub System Id : 0x097E10DE GPU Link Info PCIe Generation Max : 3 Current : 1 Link Width Max : 16x Current : 16x Fan Speed : N/A Performance State : P8 Clocks Throttle Reasons Idle : Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active Unknown : Not Active Memory Usage Total : 11519 MB Used : 69 MB Free : 11450 MB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Ecc Mode Current : Enabled Pending : Enabled ECC Errors Volatile Single Bit Device Memory : 0 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : 0 Total : 0 Double Bit Device Memory : 0 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : 0 Total : 0 Aggregate Single Bit Device Memory : 0 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : 0 Total : 0 Double Bit Device Memory : 0 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : 0 Total : 0 Retired Pages Single Bit ECC : 0 Double Bit ECC : 0 Pending : No Temperature Gpu : 26 C Power Readings Power Management : Supported Power Draw : 19.49 W Power Limit : 235.00 W Default Power Limit : 235.00 W Enforced Power Limit : 235.00 W Min Power Limit : 150.00 W Max Power Limit : 235.00 W Clocks Graphics : 324 MHz SM : 324 MHz Memory : 324 MHz Applications Clocks Graphics : 745 MHz Memory : 3004 MHz Default Applications Clocks Graphics : 745 MHz Memory : 3004 MHz Max Clocks Graphics : 875 MHz SM : 875 MHz Memory : 3004 MHz Compute Processes : None
[root@node3 ~]# nvidia-smi -q -d SUPPORTED_CLOCKS -i 0 ==============NVSMI LOG============== Timestamp : Mon Nov 11 21:42:45 2013 Driver Version : 325.15 Attached GPUs : 3 GPU 0000:02:00.0 Supported Clocks Memory : 3004 MHz Graphics : 875 MHz Graphics : 810 MHz Graphics : 745 MHz Graphics : 666 MHz Memory : 324 MHz Graphics : 324 MHz
NVIDIA deviceQuery on Tesla K40
The output below, from the CUDA 5.5 SDK samples, shows additional details of the architecture and capabilities of the Tesla K40 GPU accelerators.
deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "Tesla K40m" CUDA Driver Version / Runtime Version 5.5 / 5.5 CUDA Capability Major/Minor version number: 3.5 Total amount of global memory: 12288 MBytes (12884705280 bytes) (15) Multiprocessors, (192) CUDA Cores/MP: 2880 CUDA Cores GPU Clock rate: 876 MHz (0.88 GHz) Memory Clock rate: 3004 Mhz Memory Bus Width: 384-bit L2 Cache Size: 1572864 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device PCI Bus ID / PCI location ID: 2 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime Version = 5.5, NumDevs = 1, Device0 = Tesla K40m Result = PASS
*Caveat on Tesla K40 performance boost: users with very specific, memory-intensive, single-precision floating point and/or integer math may be better served by the NVIDIA Tesla K10 GPU Accelerator with 8GB GDDR5 memory. Please speak with one of our GPU experts.
Additional Tesla K40 Information
To learn more about the differences between the Tesla K40 and other versions of the Tesla product line, please review our In-Depth Comparison of NVIDIA Tesla “Kepler” GPU Accelerators.