I am currently a Senior Software Engineer in Google where I am working on MultiModal Post-training of Gemini. I was one of the core-contributors for the Gemini launch which led to the model being top on the LMSYS Vision leaderboard.
Before Google, I worked at Waymo in the Perception Team, where I improved the data efficiency of various perception models. I graduated with a Master's degree in the department of Computer Science at Stanford University in 2021.
I am interested in the domain of computer vision and deep learning. I am specifically interested in topics such as Large Vision Language Models, Post-training RL algorithms, generative models, self-supervised learning and active learning.
During my Masters, I was a Research Assistant under Professor Stefano Ermon where I pursued research in generative models and self-supervised learning. One of my projects won the "Best Paper Award at ICLR, 2022".
I was also a Course Assistant for the couse CS 330 - "Deep Multi Task and Meta Learning" taught by Professor Chelsea Finn.
Prior to starting my Masters, I was working at Adobe India as Member of Technical Staff-2. I worked on a deep learning based visual search product for clothing based recommendation. This work won the "Best Paper Award" at a CVPR workshop, 2019. I was also involved in several other research based projects during my work.
I did my undergraduation from Indian Institute of Tenchnology Kharagpur with a major in Electronics and Electrical Communication Engineering and a minor in Computer Science.
Made a 2D grid environment in which an agent performs tasks on the basis of natural language sentence. Developed a new fusion mechanism for the fusion of visual and textual features to solve the problem.
The proposed methodology outperformed the state-of-the-art in terms of both speed and performance for the 2D as well as a 3D environment.
Proposed a hybrid model that can be used both as a generative as well as a representation learning model.
Trained auto-encoder with contrastive learning to learn good representations, followed by diffusion model over the latent space to learn a generative model.
Paper accepted at ICLR 2021 Workshop: Neural Compression: From Information Theory to Applications.
Analyzed the models adversarially trained with small perturbation. Such models have interpretable gradients without incurring a significant drop in the performance over clean images.
Used these models for zero-shot transfer and weakly supervised object localization tasks, achieveing significant gains in performance.
Paper accepted at ICLR 2020 Workshop: Towards Trustworthy ML.
Proposed a multi-conditional image generation pipeline that generates an image which contains the shape of first input and texture of second input image.
Paper accepted at NeurIPS Workshop on Machine Learning for Creativity and Design 3.0, 2019.