Automated road surveillance has become increasingly crucial as road crashes have become the 8th leading cause of death worldwide. A World Health Organization study (2018) on road safety claims that violations lead to 1.35 million in fatalities and affect 50 million people yearly. Another recent report by World Bank (2021) mentions that more than 50 % of road fatalities involve two-wheeler vehicles, also showing that no helmet and triple-riding (more than two riders) violations are common causes. Studies carried out in Asian countries also account for two-wheeler vehicles among the significant share of road fatalities. Motivated by the worldwide need to regularly update research on road safety, we work on problems like counting motorcycle violations and street trees, open-world object detection, self-supervised image deraining, domain adaptation, and incremental learning in the field of autonomous navigation.
1. CVPR, UG2+ 2022: Detecting, Tracking and Counting Motorcycle Rider Traffic Violations on Unconstrained Roads
We work on the problem of recognizing license plates and street signs automatically, particularly in challenging conditions such as chaotic traffic. We leverage state-of-the-art text spotters to generate a large amount of noisy labeled training data. The data is subsequently filtered using a pattern derived from domain knowledge. We augment training and testing data with interpolated boxes and annotations which makes our training and testing robust. We further use synthetic data during training to increase the coverage of the training data. We trained two different models for recognition. Our baseline is a conventional Convolution Neural Network (CNN) as the encoder followed by a Recurrent Neural Network (RNN) decoder. As our first contribution, we bypass the detection phase by augmenting the baseline with an Attention mechanism in the RNN decoder. Next, we build in the capability of training the model end-to-end on scenes containing license plates by incorporating inception based CNN encoder that makes the model robust to multiple scales. We achieve improvements of as large as 3.75% at the sequence level, over the baseline model. We present the first results of using multi-headed attention models on text recognition in images and illustrate the advantages of using multiple-heads over a single head. We observe even more gains as large as 7.18% by incorporating multi-headed attention. We also experiment with multi-headed attention models on French Street Name Signs dataset (FSNS) and a new Indian Street dataset that we release for experiments. We observe that such models with multiple attention masks perform better than the model with single-headed attention on three different datasets with varying complexities. Our models also outperform state-of-the-art results on FSNS dataset and IIIT-ILST Devanagari dataset.
1. CBDAR 2021 Best Paper and MDPI Journal of Imaging 2022: Improving Scene Text Recognition for Indian Languages with Transfer Learning and Font Diversity
Optical Character Recognition (OCR) is the process of converting the document images into an editable electronic format. This has many advantages like data compression, enabling search or edit options in the images/text, and creating the database for other applications like Machine Translation, Speech Recognition, and enhancing dictionaries and language models.
OCR in Indian Languages is quite challenging due to richness in inflections.
Using Open Source and Commercial OCR systems, we have observed the Word Error Rates (WER) of around 20-50% on printed documents in four different Indic languages. Moreover, developing a highly accurate OCR system with an accuracy as high as 90% is not useful unless aided by the mechanism to identify errors. So, we started with the problem of developing "OpenOCRCorrect", an end-to-end framework for Error Detection and Corrections in Indic-OCR. Our models outperform state-of-the-art results in “Error Detection in Indic-OCR” for six Indic languages with varied inflections and we have solved the Out of Vocabulary problem for “Error Correction in Indic-OCR” in our ICDAR-2017 conference paper. We further improve the results with the help of sub-word embeddings in our ICDAR-2019 conference paper.
1. CVPRW 2020: An OCR for Classical Indic Documents Containing Arbitrarily Long Words
a. Our team "CLAM" secured 2nd position in Multilingual PostOCR Competetion at ICDAR'19. Our model achieved highest corrections of 44% in Finnish, which is significantly higher than overall topper (8% in Finnish). Final report and poster available.
3. ICDAR2019: Sub-word Embeddings for OCR Corrections in Highly Fusional Indic Languages