Snehasis Mukherjee

Eminent Speaker

Short CV: Snehasis Mukherjee is an Associate Professor at Shiv Nadar Institution of Eminence, Delhi NCR, India. He obtained his PhD in Computer Science from Indian Statistical Institute in 2012. He worked as a Post Doctoral Fellow at NIST, Gaithersburg, USA for 2 years. Then he spent 6 years as an Assistant Professor at IIIT Sricity till April 2020. He has authored several peer-reviewed research papers in reputed journals and conferences. Snehasis is an Associate Editor of the Springer Journal of SN Computer Science. He is an active reviewer of several reputed journals such as IEEE Trans. NNLS, IEEE Trans. CSVT, IEEE Trans. IP, IEEE Trans. ETCI, IEEE Trans. HMS, IEEE Trans. Cyb, IEEE CIM, CVIU, Pattern Recognition, Neural Networks, Neurocomputing, and many more. He chaired sessions at several prestigious conferences such as ICARCV, ICVGIP, and NCVPRIPG. His research areas include Computer Vision, Machine Learning, Image Processing, and Graphics.

Title of Talk 1: Engineering on Supervised Network for Training with Less Data

Synopsis: This talk will focus on improving the efficacy and effectiveness of deep self-supervised CNN models for image classification. With the recent advancements of deep learning-based methods in image classification, the requirement of a huge amount of training data is inevitable to avoid overfitting problems. Moreover, supervised deep learning models require labeled datasets for training. Preparing such a huge amount of labeled data takes a lot of human effort and time. In this scenario, self-supervised models are becoming popular because of their ability to learn even from unlabeled datasets. However, the efficient transfer of knowledge learned by self-supervised models into a target task is an unsolved problem. First, this talk will discuss a method for the efficient transfer of knowledge learned by a self-supervised model, into a target task. The hyperparameters such as the number of layers, number of units in each layer, learning rate, and dropout are automatically tuned in the Fully Connected (FC) layers, using a Bayesian optimization technique called the Tree-structured Parzen Estimator Approach (TPE) algorithm. In the second problem, we extended the concept of automatically tuning (autotuning) the hyperparameters (as proposed in the first problem), to apply on CNN layers. This work uses a pre-trained self-supervised model to transfer knowledge for image classification. We propose an efficient Bayesian optimization-based method for autotuning the hyperparameters of the self-supervised model during the knowledge transfer. The proposed autotuned image classifier consists of a few CNN layers followed by an FC layer. Finally, we use a softmax layer to obtain the probability scores of classes for the images. In the third problem, in addition to the challenges mentioned in the second problem, we further focus on parameter overhead and GPU usage for hours. This work proposes a method to address the three challenges for an image classification task, namely the limited availability of labeled datasets for training, parameter overhead, and GPU usage for extended periods during training. We introduce an improved transfer learning approach for a target data set, in which we take the learned features from a self-supervised model after minimizing its parameters by removing the final layer. The learned features are then fed into a CNN classifier, followed by a multi-layer perceptron (MLP), where the hyperparameters of both the CNN and MLP are automatically tuned (autotuned) using a Bayesian optimization based technique. Further, we reduce the GFLOPs measure by limiting the search space for the hyperparameters, not compromising with the performance. Our ongoing work aims to propose an approach for transferring knowledge from a larger self-supervised model to a smaller model through a knowledge distillation task. The approach includes hyperparameter tuning of the MLP classifier in the student model, as well as the balancing factor and temperature, which play a crucial role in the knowledge distillation process. To enhance the effectiveness of the knowledge distillation, the ongoing work aims to introduce a loss function which is a linear combination of the hard target loss function, soft target loss function, and Barlow Twins loss. This approach is expected to improve the accuracy and efficiency of the knowledge transfer process. In the future, we plan to prune CNN layers and filters from the self-supervised model which are not contributing to the training process.

Title of Talk 2: Egocentric Activity Recognition by Subject-Action Relevance

Synopsis: Human Activity Recognition in videos is a well-studied problem during the last two decades. However, the problem of recognizing human activity, especially in challenging videos with unusual abrupt motion and background disturbances, still remains an unsolved problem. Egocentric videos are videos captured from a camera placed on the performers’ head. The unexpected head movements of the wearer make the process of recognition challenging in egocentric videos. Due to the absence of gesture information of the performers, motion information of the surrounding objects are the only cue in the recognizing process. However, the surrounding objects may even get changed in different videos of the same activity, leading to a high intra-class variation. The intra-class variation makes it difficult to apply any learning based algorithms. Further, the extreme camera jerk makes the recognition task even more challenging. In such a scenario, this talk will feature a few published articles on the same topic, and show how the published methods address the challenges involved in Egocentric Activity Recognition. The method proposed in this talk, will establish a relationship between the subject (the object, the wearer interacts with), and the action. Further, this talk will conclude with a few possible directions of future research based on this topic.

Title of Talk 3: Analyzing Motion in Videos: Past, Present and Future

Synopsis: Motion analysis is an important task in video content analysis. During the recent years, there have been significant development in analyzing the motion of a video. This talk will start with the traditional optical flow method, followed by dense optical flow. Next, it will briefly talk about a few optical flow-based methods for motion analysis, which were proposed before the deep learning era. Then the talk will cover the deep learning based approaches in details, with the recent advances in the field. Finally, the talk will conclude with an idea on the future research directions and applications of the research on this topic. Synopsis: Motion analysis is an important task in video content analysis. During the recent years, there have been significant development in analyzing the motion of a video. This talk will start with the traditional optical flow method, followed by dense optical flow. Next, it will briefly talk about a few optical flow-based methods for motion analysis, which were proposed before the deep learning era. Then the talk will cover the deep learning-based approaches in details, with the recent advances in the field. Finally, the talk will conclude with an idea on the future research directions and applications of the research on this topic.

Snehasis Mukherjee

Qualifications: PhD from Indian Statistical Institute

Title: Associate Professor

Affiliation: Shiv Nadar Institution of Eminence, Delhi NCR

LinkedIn:

Twitter/X:

Facebook:

Instagram:

Email:

About the speaker: