I am working in the area of video content analysis. The main objective of our research is to address various issues in representing videos which can help in video classification and video abstraction for automatic indexing and retrieval of videos. Some of the important studies include: (a) exploring features to describe video content, (b) video genre classification, (c) activity recognition, (d) surveillance video analysis, (e) expression recognition, and (f) active video understanding for recommendation, using various techniques like compressive sensing, deep learning and kernel methods.


An important issue in video content analysis is the choice of the features that are compact, robust to illumination changes, camera/object motion and descriptive of the content.  Video classification is the task of categorizing a given video into one of the predefined classes. Pattern recognition techniques like auto-associative neural network model, support vector machines, hidden Markov models, sparse coding, dictionary learning are used to capture class distribution and for video classification and event detection. Compressive sensing involves the exploitation of sparsity for compact representation. Deep learning is a set of algorithms that learn layered models of input in which each layer corresponds to a distinct level of concept where higher-level concepts are formed using lower-level ones. Compressive sensing, deep learning and kernel methods are being explored for video classification, and activity recognition in videos. Video summarization refers to the creation of a smaller extract of the original video preserving the essential message of the original video.


Video content analysis is especially useful for traffic management in smart cities where there is a need to analyse a large amount (hours/days) of video footage in order to locate abnormalities such as appearance of restricted objects, accident incidents, snatch thefts, traffic rule violations, and variety of threats. Traditional computer vision techniques are unable to analyse such a huge amount of visual data generated in real-time. So, there is a need for visual big data analytics which involves processing and analysing large streams of images and videos to find semantic patterns that are useful for interpretation. We investigate the scalable implementation of several visual computing methods on GPUs. Also, we explore efficient and real-time detection of motorcyclist driving without a helmet and accident detection in a city-scale CCTV surveillance network using efficient and compact deep-net models and edge computing frameworks resulting in patents.


We are also planning to broaden our research activity in the areas of video activity recognition and understanding by participating in the ongoing and upcoming competitions like THUMOS challenge for action localization and YouTube 8M Video Understanding sponsored by Google. Also, we plan to collaborate with industry partners Sellomni and The Hook in order to develop social video recommender systems with active video content understanding and community interest detection.


Back..