,

In-Depth Comparison of NVIDIA Tesla “Maxwell” GPU Accelerators

This article provides in-depth details of the NVIDIA Tesla M-series GPU accelerators (codenamed “Maxwell”). “Maxwell” GPUs improve upon the previous-generation “Kepler” architecture, although they do not necessarily replace all “Kepler” models.

Important changes available in the “Maxwell” GPU architecture include:

  • Energy-efficiency – Maxwell GPUs deliver nearly twice the power-efficiency of Kepler GPUs.
  • SMM architecture – the Maxwell Multiprocessor (SMM) provides power-efficient performance, with 40% higher performance per CUDA core. Each SMM contains 128 CUDA cores (changed from 192 cores in Kepler).
  • Larger, dedicated shared memory in each SMM. The L1 cache is now separate from Shared Memory (they competed for space on Kepler).
  • Larger L2 caches are available on Maxwell GPUs (ranging from 2MB to 3MB, which is two to four times the size of L2 on Kepler).
  • Reduced latencies on GPU instructions improve utilization and throughput. Furthermore, the throughput of many Integer instructions has been improved.
  • Shared memory atomics improve upon Kepler’s device memory atomics by allowing threads to perform atomic operations on locations in shared memory.
  • Maximum active thread blocks are increased from 16 to 32 per SMM.
  • Dual NVENC H.264 encoders for increased throughput of video workloads. H.265 support is also added.

“Maxwell” Tesla GPU Specifications

The table below summarizes the features of the available Tesla GPU Accelerators. To learn more about any of these products, or to find out how best to leverage their capabilities, please speak with an HPC expert.

Feature Tesla M40 Tesla M60
GPU Chip(s)Maxwell GM2002x Maxwell GM204
Recommended WorkloadMachine Learning & Single-Precision appsVirtualized Desktops (VDI)
Peak Single Precision (GPU Boost)6.84 TFLOPS9.64 TFLOPS (both GPUs combined)
Peak Double Precision (GPU Boost)0.213 TFLOPS0.301 TFLOPS (both GPUs combined)
Onboard GDDR5 Memory112 GB or 24GB16GB (8GB per GPU)
Memory Bandwidth1288 GB/s160 GB/s per GPU
L2 Cache3 MB2MB per GPU
PCI-Express Generation3.0
Achievable PCI-E transfer bandwidth12 GB/s
# of SMM Units2432 (16 per GPU)
# of CUDA Cores30724096 (2048 per GPU)
Memory Clock3004 MHz2505 MHz
GPU Base Clock948 MHz899 MHz
GPU Boost SupportYes – Dynamic
GPU Boost Clocks23 levels between 532 MHz and 1114 MHz25 levels between 532 MHz and 1177 MHz
Compute Capability5.2
Workstation Support
Server SupportYes
Wattage (TDP)250W300W

1. Measured with ECC disabled. Memory capacity and performance are reduced by 6.25% with ECC enabled.

Comparison between “Kepler” and “Maxwell” GPU Architectures

Feature Kepler GK104 Kepler GK110(b) Kepler GK210 Maxwell GM200 Maxwell GM204
Compute Capability3.03.53.75.2
Threads per Warp32
Max Warps per SM64
Max Threads per SM2048
Max Thread Blocks per SM1632
32-bit Registers per SM64 K128 K64 K
Max Registers per Thread Block64 K
Max Registers per Thread255
Max Threads per Thread Block1024
L1 Cache Configurationsplit with shared memory24KB dedicated L1 cache
Shared Memory Configurations16KB + 48KB L1 Cache

32KB + 32KB L1 Cache

48KB + 16KB L1 Cache

(64KB total)
16KB + 112KB L1 Cache

32KB + 96KB L1 Cache

48KB + 80KB L1 Cache

(128KB total)
96KB dedicated
Max Shared Memory per Thread Block48KB
Max X Grid Dimension232-1
Hyper-QYes
Dynamic ParallelismYes

Additional Tesla “Maxwell” GPU products

NVIDIA has also released Tesla M4, Tesla M6, and Tesla M10 GPUs. These products are primarily for embedded and hyperscale deployments. These models are not expected to be used in the HPC space.

You May Also Like

  • Knowledge Center

    Common Maintenance Tasks (Clusters)

    The following items should be completed to maintain the health of your Linux cluster. For servers and workstations, please see Common Maintenance Tasks (Workstations and Servers). Backup non-replaceable data Remember that RAID is not a replacement for backups. If your system is stolen, hacked or started on fire, your data will be gone forever. Automate this…

  • Knowledge Center

    Detailed Specifications of the “Ice Lake SP” Intel Xeon Processor Scalable Family CPUs

    This article provides in-depth discussion and analysis of the 10nm Intel Xeon Processor Scalable Family (formerly codenamed “Ice Lake-SP” or “Ice Lake Scalable Processor”). These processors replace the previous 14nm “Cascade Lake-SP” microarchitecture and are available for sale as of April 6, 2021. The “Ice Lake SP” CPUs are the 3rd generation of Intel’s Xeon…

  • Knowledge Center

    Detailed Specifications of the AMD EPYC “Milan” CPUs

    This article provides in-depth discussion and analysis of the 7nm AMD EPYC processor (codenamed “Milan” and based on AMD’s Zen3 architecture). EPYC “Milan” processors replace the previous “Rome” processors and are available for sale as of March 15th, 2021. These new CPUs are the third iteration of AMD’s EPYC server processor family. They are compatible…