cuda - New issue stall reasons in NVIDIA Nsight Visual Studio Edition 4.1 RC1 -
In the 4.0 version, due to the junk, the stall has been divided into 9 types instead of 6. About this, "Data Records" was removed, "Memory Throttle", "Memory Dependency", "Constant Miss" was added.
However, in the NVIDIA Nsight Visual Studio Edition 4.1 User Guide, you are redirected by clicking the blue icon with the upper right-hand side of the Issue Stall Use by clicking on the blue icon) NVIDIA NSIIISIS Visual Studio Version 4.0 has not been updated with User Guide.
I am thinking that the new issue is the reason for the stall, and what are some ways to reduce them.
4.1:
4.0:
The following reasons were removed in Nsight 4.1:
Data Request was deleted by NSIT and LT; A data warranty was called in 2005 when a warp was unable to issue a data request.
Memory Throttle - A warp memory emphasizes throttle stall when resources of memory in the data path Due to lack of issuing a direction it is blocked. If for this reason an attempt is made to resolve more memory coalescing issues (data deviation), which leads to replay or attempts to access the only memory access to the vector access. The reason for CC 5.x devices may also be at the end of the kernel if the thread exits from many memory stores.
Memory Dependency - A warp claims the memory dependency stall when its next instruction can not be released due to dependence on memory or texture dependencies. If this reason is high then try to improve memory coalescing, (B) Improve memory level parallelism, (C) Often transfer the data accessed by SM (like in shared memory), (D) instead of loading Try computing data to try the data on SD 3.5, or (E) LDG.
Continuous Miss - A Warp emphasizes a continuous Miss Stall if Warps try to reach continuous and are not stable in the cache, Announcing to move closer to each other) In the same area, if the kernel tries to use the kernel in the same area, then try to bring constant stability to global memory, or computing is constant Try Nk.
Pipe busy - A warp pipe emphasizes the busy stall if the data is required for the path, the next instruction is busy and can not release the warp. If this reason the pipe usage monitors high on the uses of the experiment and tries to reduce any pipe with high use. Avoiding less throughput instructions can also help.
Comments
Post a Comment