PAI & TVM Meetup - Shanghai 20191116
计算平台事业部 。TensorCore AutoCodeGen in TVM “。FP16 Mixed-Precision Training on PAI 。INT8 Inference on PAI-Blade 计算平台事业部 COMPUTING PLATFORM TensorCore AutoCodeGen Background Matching 计算平台事业部 shared/global lecal 印16/int8 - fpl6/ints ecal globalyshared Vocal 40X 1.26X 1.51X 1.30X 1.21X Performance on T4 计算下从事业部 国 Cublas INT8, 9 国 TVM INT8 国 TVM INT4 罩 TVMINT1 675 旨 号 昌 45 全 2.25 ”cublas baseline (512, 64, 512 ) (5120 码力 | 26 页 | 5.82 MB | 5 月前3Facebook -- TVM AWS Meetup Talk
weight matrices - not a new idea, cf WaveRNN, Sparse Transformers, etc - Reduce precision with int8/float16 - very helpful to maintain model in core-private L1 dcaches - Use rational approximations0 码力 | 11 页 | 3.08 MB | 5 月前3
共 2 条
- 1