A paper on 16-bit floating point training is up on arXiv

This work was done while interning at Cerebras Systems during 2020 summer, and the paper can be found here.

Di Wu
Di Wu
PhD student

A Wisconsin Badger in Computer Architecture!