Nvidia Support:
"The windows WDDM memory manager (controlled by Microsoft, not NVIDIA) does virtual memory management on the GPU memory, and this applies to both CUDA and gaming. The WDDM memory manager may swap out even CUDA allocations to system memory, according to its own heuristics.
You don't have any direct control over this.
You can influence the behavior by not just allocating the memory using cudaMalloc, but also writing CUDA code that accesses the memory. WHen the CUDA kernel is running, the WDDM memory manager will ensure that the needed allocations are physically resident in GPU memory, not "swapped out" to system memory, which is used as a backing store.
However, when a CUDA kernel is running in a WDDM environment, many other GPU activities (including display updates) are "frozen" until the kernel completes. And if you run the kernel too long, you will hit a WDDM TDR timeout.
So there is no simple method to do this. You could experiment with trying to run a short CUDA kernel (say, 100ms or less, in duration) once every second or so. But this is just playing games with the WDDM memory manager, and at some point your attempt to evaluate game behavior is going to be influenced by this, apart from any memory considerations."
@Phil: Ist vielleicht auch interessant für den Artikel?!