Wmma 3 dawn of mma

2/13/2023

Wmma::mma_sync( acc_frag, a_frag, b_frag, acc_frag ) Wmma::load_matrix_sync( a_frag, d_A, 16 ) _global_ void test_wmma( _half* d_A, _half* d_B, _half* d_C ) In the hope that someone here can help me understand what I am doing wrong, I will post a small repro-case here. I am attempting to use the tensor cores efficiently in a custom DL inference kernel, but I get very poor performance.

0 Comments

Wmma 3 dawn of mma

Leave a Reply.

Author

Archives

Categories