🚗 Object-Detection Models: Autonomous Vehicles

May 31, 2025

During my internship at UntetherAI (now acquired by AMD) in Toronto, I focused on adapting deep learning models to run efficiently on the company’s custom AI accelerator chip. Unlike GPUs, this chip is optimized for extremely high-throughput inference but does not support every operation found in standard machine learning frameworks. As a result, many off-the-shelf models could not be deployed directly.

My main responsibility was to restructure ONNX graphs — essentially the computational blueprints of these models — so that unsupported operations were either removed or handled outside the chip. By extracting subgraphs and reconnecting them in a way the hardware could process, I enabled full on-chip execution for networks such as YOLOv11 and ResNet-50.

Adding, removing, editing nodes of the object-detection model. Resnet50 in this case.

Another important piece of my work was handling data conversion between the CPU and the accelerator. Since the chip only accepts certain numeric formats, I added streaming quantization and dequantization callbacks that automatically converted data at the host–device boundary. This allowed inference pipelines to run seamlessly without manual intervention, even when models produced or required unsupported data types.

To test these modifications in real-world scenarios, I built complete object detection pipelines in both Python and C++. These pipelines achieved up to 1,300 frames per second, which is significantly faster than conventional GPU baselines.

I also developed automated evaluation tools using COCO-style mean Average Precision (mAP) metrics, along with visualization utilities that displayed bounding boxes and confidence scores directly on images. Together, these tools made it possible to quickly check both the accuracy and speed of a model after deployment on the accelerator.

Because many teams were interested in bringing new models onto Untether’s chip, I also worked on making this process repeatable. I built a modular framework for object detection that reduced the amount of custom code required for each model. This framework, combined with detailed SDK documentation and step-by-step user guides I wrote, helped internal engineers and customers onboard new networks more easily.

Connection to Autonomous Vehicles

At the time of my internship, UntetherAI was engaged in a partnership with General Motors to integrate its accelerator into future GM autonomous vehicles. The idea was to provide a dedicated chip capable of running perception models — such as pedestrian detection, traffic light recognition, and vehicle tracking — at the scale required for autonomous driving.

My contributions directly tied into this effort by showing how standard computer vision models could be modified to run at high speed on the accelerator. The evaluation and visualization tools I built also helped demonstrate that the models met accuracy requirements while still delivering the low-latency performance.