Academia.eduAcademia.edu

Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency   Figure 7: Measured speedup versus theoretical speedup at varying sparsity levels for a GPT-3 layer 12k x 12k matrix multiplication (MatMul) (Lie, 2021).

Figure 7 Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency Figure 7: Measured speedup versus theoretical speedup at varying sparsity levels for a GPT-3 layer 12k x 12k matrix multiplication (MatMul) (Lie, 2021).