Smart Data Mover for Ascend

Hardware-software codesign for efficient machine learning tasks processing on Ascend. Internship project at Huawei Zürich Research Center.

I did a 6-month full-time internship at Huawei Zürich Research Center and was working on the Da Vinci Architecture.

Da Vinci Architecture in Huawei Ascend AI processor. Image from: Huawei Atlas AI Computing Solution, Springer.

My work focused on machine learning acceleration leveraging the sparsity in real-world workloads. I developed software operators on both GPU(CUDA) and Ascend(TIK) platform, and also the hardware extensions for the datamover in Da Vinci Architecture.

On Ascend, with custom-designed hardware components and software operators, I demonstrated up to 7x acceleration for convolution operators in EfficientDet on the MOT17 task, and up to 10% acceleration for the matrix-matrix multiplication operator for an internal recommender model without any precision loss.