Abhishek Sinha

I am currently a Senior Software Engineer at Google DeepMind, working on native Image Generation from Gemini. I was a core contributor to launching this feature, enabling text-to-image/editing and interleaved generation capabilities currently available in AI Studio and the Gemini App. Prior to this I was working on MultiModal Post-training of Gemini. I was a core contributor to the Gemini 1.5 launch, helping it reach the top of the LMSYS Vision leaderboard.

Before Google, I worked at Waymo in the Perception Team, improving data efficiency for perception models. I graduated with a Master's degree in Computer Science from Stanford University in 2021.

During my Masters, I was a Research Assistant under Professor Stefano Ermon, researching generative models and self-supervised learning. One of my projects won the Best Paper Award at ICLR, 2022. I was also a Course Assistant for CS 330: "Deep Multi Task and Meta Learning" taught by Professor Chelsea Finn.

Prior to my Masters, I worked at Adobe India as a Member of Technical Staff-2 on a deep learning-based visual search product for clothing recommendation. This work won the Best Paper Award at a CVPR workshop, 2019.

I completed my undergraduate studies at Indian Institute of Technology Kharagpur, majoring in Electronics and Electrical Communication Engineering with a minor in Computer Science.

I am interested in Large Vision Language Models, Generative Models, Post-training RL algorithms, self-supervised learning, and active learning.

Abhishek Sinha Profile Photo

Selected Publications

Comparing Distributions Project Image

Comparing Distributions by Measuring Differences that Affect Decision Making

Proposed a way to measure the discrepancy between two probability distributions based on optimal decision loss. Our approach outperformed prior approaches for two-sample tests across different datasets. The proposed divergence can also be used for feature selection, sample quality evaluation or even studying the effect of climate change.

Best Paper Award at ICLR, 2022.

D2C Project Image

D2C: Diffusion-Denoising Models for Few-shot Conditional Generation

Developed a single model that can both learn rich latent representations, and sample images from that latent space. Added contrastive loss on top of VAE to learn good representations and learnt a strong prior over the latent space of VAE, using diffusion models. Our model allows us to perform few shot conditional generation tasks, such as conditional image manipulation with limited examples.

Paper accepted at NeurIPS, 2021.

Negative Data Augmentation Project Image

Negative Data Augmentation

Proposed a new GAN training objective to incorporate negative data augmentation. Obtained significant gain in conditional/unconditional image generation and anomaly detection using the discriminator. Incorporated negative augmentations for contrastive learning based approaches for images and videos and achieved gains in linear classification.

Paper accepted at ICLR, 2020.

Youtube Video summarizing the paper.

Few-shot Learning Project Image

Charting the Right Manifold: Manifold Mixup for Few-shot Learning

Used self-supervision techniques - rotation and exemplar, followed by manifold mixup for few-shot classification tasks. The proposed approach beats the current state-of-the-art accuracy on mini-ImageNet, CUB and CIFAR-FS datasets by 3-8%.

Paper accepted at WACV, 2020.

Language Grounding Project Image

Attention Based Natural Language Grounding By Navigating Virtual Environment

Made a 2D grid environment in which an agent performs tasks on the basis of natural language sentence. Developed a new fusion mechanism for the fusion of visual and textual features to solve the problem. The proposed methodology outperformed the state-of-the-art in terms of both speed and performance for the 2D as well as a 3D environment.

Paper accepted at WACV, 2019.

Hybrid Mutual Information Project Image

Hybrid Mutual Information Lower-Bound Estimators For Representation Learning

Proposed a hybrid model that can be used both as a generative as well as a representation learning model. Trained auto-encoder with contrastive learning to learn good representations, followed by diffusion model over the latent space to learn a generative model.

Paper accepted at ICLR 2021 Workshop: Neural Compression: From Information Theory to Applications.

Perceptually Aligned Gradients Project Image

On The Benefits Of Models With Perceptually ALigned Gradients

Analyzed the models adversarially trained with small perturbation. Such models have interpretable gradients without incurring a significant drop in the performance over clean images. Used these models for zero-shot transfer and weakly supervised object localization tasks, achieving significant gains in performance.

Paper accepted at ICLR 2020 Workshop: Towards Trustworthy ML.

Selected Projects

Face Analyzer Tool Project Image

Face analyzer tool

Built a face analyzer tool utilizing deep learning techniques to provide users with an unbiased analysis of their facial appearance.

The project was the winner of the Microsoft AI Hackathon competition held at IIT Kharagpur.

Autonomous Snake Game Project Image

Autonomous snake game using DQN

Implemented Deep Q learning for the snake game. Built the game in pygame which could be then controlled by a deep learning agent.