Using Robust Networks to Inform Lightweight Models in Semi-Supervised Learning for Object Detection.

A common trade-off among object detection algorithms is accuracy-for-speed (or vice versa). To meet our application's real-time requirement, we use a Single Shot MultiBox Detector (SSD) model. This architecture meets our latency requirements; however, a large amount of training data is required to achieve an acceptable accuracy level. While unusable for our end application, more robust network architectures, such as Regions with CNN features (R-CNN), provide an important advantage over SSD models—they can be more reliably trained on small datasets. By fine-tuning R-CNN models on a small number of hand-labeled examples, we create new, larger training datasets by running inference on the remaining unlabeled data. We show that these new, inferenced labels are beneficial to the training of lightweight models. These inferenced datasets are imperfect, and we explore various methods of dealing with the errors, including hand-labeling mislabeled data, discarding poor examples, and simply ignoring errors. Further, we explore the total cost, measured in human and computer time, required to execute this workflow compared to a hand-labeling baseline.

PDF Addendum Poster

October 2019 - Applied Imagery Pattern Recognition Workshop

March 2020 - Poster, GPU Technology Conference


Pittsburgh, PA