ROHIT SALUJA

Road Safety and Robust multilingual OCR:

Road Safety:

Automated road surveillance has become increasingly crucial as road crashes have become the 8th leading cause of death worldwide. A World Health Organization study (2018) on road safety claims that violations lead to 1.35 million in fatalities and affect 50 million people yearly. Another recent report by World Bank (2021) mentions that more than 50 % of road fatalities involve two-wheeler vehicles, also showing that no helmet and triple-riding (more than two riders) violations are common causes. Studies carried out in Asian countries also account for two-wheeler vehicles among the significant share of road fatalities. Motivated by the worldwide need to regularly update research on road safety, we work on problems like counting motorcycle violations and street trees, open-world object detection, self-supervised image deraining, domain adaptation, and incremental learning in the field of autonomous navigation.

1. CVPR, UG2+ 2022: Detecting, Tracking and Counting Motorcycle Rider Traffic Violations on Unconstrained Roads

a. You can read the paperhere

b. Source code: here

c. Demo video:


2. WACV 2022: "FLUID: Few-Shot Self-Supervised Image Deraining"

a. You can read the paperhere

b. Demo video coming soon.

3. WACV 2022: To miss-attend is to misalign! Residual Self-Attentive Feature Alignment for Adapting Object Detectors

a. You can read the paperhere

b. Demo video coming soon.

4. WACV 2022: Multi-Domain Incremental Learning for Semantic Segmentation

a. You can read the paperhere

b. Demo video coming soon.

5. NeurIPS, Machine Learning for Autonomous Driving 2021: "ORDER: Open World Object Detection on Road Scenes"

a. You can read the paperhere

b. Demo video coming soon

6. arXiv 2021: Evaluating Computer Vision Techniques for Urban Mobility on Large-Scale, Unconstrained Roads

a. You can read the paperhere

b. Demo video coming soon

7. ICVGIP 2021: Automatic Quantification and Visualization of Street Trees

a. You can read the paperhere

b. Source code: here

c. Demo video:

Photo OCR:

We work on the problem of recognizing license plates and street signs automatically, particularly in challenging conditions such as chaotic traffic. We leverage state-of-the-art text spotters to generate a large amount of noisy labeled training data. The data is subsequently filtered using a pattern derived from domain knowledge. We augment training and testing data with interpolated boxes and annotations which makes our training and testing robust. We further use synthetic data during training to increase the coverage of the training data. We trained two different models for recognition. Our baseline is a conventional Convolution Neural Network (CNN) as the encoder followed by a Recurrent Neural Network (RNN) decoder. As our first contribution, we bypass the detection phase by augmenting the baseline with an Attention mechanism in the RNN decoder. Next, we build in the capability of training the model end-to-end on scenes containing license plates by incorporating inception based CNN encoder that makes the model robust to multiple scales. We achieve improvements of as large as 3.75% at the sequence level, over the baseline model. We present the first results of using multi-headed attention models on text recognition in images and illustrate the advantages of using multiple-heads over a single head. We observe even more gains as large as 7.18% by incorporating multi-headed attention. We also experiment with multi-headed attention models on French Street Name Signs dataset (FSNS) and a new Indian Street dataset that we release for experiments. We observe that such models with multiple attention masks perform better than the model with single-headed attention on three different datasets with varying complexities. Our models also outperform state-of-the-art results on FSNS dataset and IIIT-ILST Devanagari dataset.

1. CBDAR 2021 Best Paper and MDPI Journal of Imaging 2022: Improving Scene Text Recognition for Indian Languages with Transfer Learning and Font Diversity

a. You can read the papers here: CBDAR and MDPI

b. Dataset: here

c. Source code: here

2. CBDAR 2021: CATALIST: CAmera TrAnsformations for multi-LIngual Scene Text recognition

a. You can read the paper here

b. Dataset: here

3. ICDAR2019: OCR-On-the-go

a. You can read the paperhere

b. Source Code for the paper is here

c. Dataset can be requested via email onrohitsaluja22@gmail.com

d. Demo video for our ALPR model ishere


4. ICDAR-OST 2019: StreetOCRCorrect

a. You can read the paperhere

b. Demo video for our framework ishere

c. Source code for our framework is availablehere

Document OCR:

Optical Character Recognition (OCR) is the process of converting the document images into an editable electronic format. This has many advantages like data compression, enabling search or edit options in the images/text, and creating the database for other applications like Machine Translation, Speech Recognition, and enhancing dictionaries and language models. OCR in Indian Languages is quite challenging due to richness in inflections.

Using Open Source and Commercial OCR systems, we have observed the Word Error Rates (WER) of around 20-50% on printed documents in four different Indic languages. Moreover, developing a highly accurate OCR system with an accuracy as high as 90% is not useful unless aided by the mechanism to identify errors. So, we started with the problem of developing "OpenOCRCorrect", an end-to-end framework for Error Detection and Corrections in Indic-OCR. Our models outperform state-of-the-art results in “Error Detection in Indic-OCR” for six Indic languages with varied inflections and we have solved the Out of Vocabulary problem for “Error Correction in Indic-OCR” in our ICDAR-2017 conference paper. We further improve the results with the help of sub-word embeddings in our ICDAR-2019 conference paper.

1. CVPRW 2020: An OCR for Classical Indic Documents Containing Arbitrarily Long Words

a. You can read the paperhere

c. Source code: here

c. Demo video coming soon

2. ICDAR 2019: Post-OCR Competetion

a. Our team "CLAM" secured 2nd position in Multilingual PostOCR Competetion at ICDAR'19. Our model achieved highest corrections of 44% in Finnish, which is significantly higher than overall topper (8% in Finnish). Final report and poster available.


3. ICDAR2019: Sub-word Embeddings for OCR Corrections in Highly Fusional Indic Languages

a. You can read the paperhere


4. ICDAR2017: Error Detection and Corrections in Indic OCR using LSTMs

a. You can read the paperhere

b. Dataset can be requested via email onrohitsaluja22@gmail.com


5. ICDAR-OST 2017: OpenOCRCorrect

a. You can read the paperhere

b. Source code for our framework is availablehere

c. Demo video: