Abhishek Sinha

I am currently a Senior Software Engineer in Google where I am working on MultiModal Post-training of Gemini. I was one of the core-contributors for the Gemini launch which led to the model being top on the LMSYS Vision leaderboard. Before Google, I worked at Waymo in the Perception Team, where I improved the data efficiency of various perception models. I graduated with a Master's degree in the department of Computer Science at Stanford University in 2021. I am interested in the domain of computer vision and deep learning. I am specifically interested in topics such as Large Vision Language Models, Post-training RL algorithms, generative models, self-supervised learning and active learning.

During my Masters, I was a Research Assistant under Professor Stefano Ermon where I pursued research in generative models and self-supervised learning. One of my projects won the "Best Paper Award at ICLR, 2022". I was also a Course Assistant for the couse CS 330 - "Deep Multi Task and Meta Learning" taught by Professor Chelsea Finn.

Prior to starting my Masters, I was working at Adobe India as Member of Technical Staff-2. I worked on a deep learning based visual search product for clothing based recommendation. This work won the "Best Paper Award" at a CVPR workshop, 2019. I was also involved in several other research based projects during my work. I did my undergraduation from Indian Institute of Tenchnology Kharagpur with a major in Electronics and Electrical Communication Engineering and a minor in Computer Science.

Email  /  LinkedIn  /  Resume  /  Google Scholar

Selected Publications
project_img

Comparing Distributions by Measuring Differences that Affect Decision Making

Proposed a way to measure the discrepancy between two probability distributions based on optimal decision loss.

Our approach outperformed prior approaches for two-sample tests across different datasets.

The proposed divergence can also be used for feature selection, sample quality evaluation or even studying the effect of climate change.

Best Paper Award at ICLR, 2022 .

project_img

D2C: Diffusion-Denoising Models for Few-shot Conditional Generation

Developed a single model that can both learn rich latent representations, and sample images from that latent space.

Added contrastive loss on top of VAE to learn good representations and learnt a strong prior over the latent space of VAE, using diffusion models.

Our model allows us to perform few shot conditional generation tasks, such as conditional image manipulation with limited examples.

Paper accepted at NeurIPS, 2021 .

project_img

Negative Data Augmentation

Proposed a new GAN training objective to incorporate negative data augmentation.

Obtained significant gain in conditional/unconditional image generation and anomaly detection using the discriminator.

Incorporated negative augmentations for contrastive learning based approaches for images and videos and achieved gains in linear classification.

Paper accepted at ICLR, 2020 .

Youtube Video summarizing the paper.

project_img

Introspection: Accelerating Neural Network Training By Learning Weight Evolution

Developed an algorithm to speed up training of deep neural networks by predicting future weight values.

Achieved 20% and 40% improvement in training time for Cifar-10 and ImageNet datasets respectively.

Paper accepted at ICLR, 2017 .

project_img

Charting the Right Manifold: Manifold Mixup for Few-shot Learning

Used self-supervision techniques - rotation and exemplar, followed by manifold mixup for few-shot classification tasks.

The proposed approach beats the current state-of-the-art accuracy on mini-ImageNet, CUB and CIFAR-FS datasets by 3-8%.

Paper accepted at WACV, 2020 .

project_img

Harnessing the Vulnerability of Latent Layers in Adversarially Trained Models

Analyzed the adversarial trained models for vulnerability against adversarial perturbations at the latent layers.

The algorithm achieved the state-of-the art adversarial accuracy against strong adversarial attacks.

Paper accepted at IJCAI, 2019 .

project_img

Attention Based Natural Language Grounding By Navigating Virtual Environment

Made a 2D grid environment in which an agent performs tasks on the basis of natural language sentence. Developed a new fusion mechanism for the fusion of visual and textual features to solve the problem.

The proposed methodology outperformed the state-of-the-art in terms of both speed and performance for the 2D as well as a 3D environment.

Paper accepted at WACV, 2019 .

project_img

Hybrid Mutual Information Lower-Bound Estimators For Representation Learning

Proposed a hybrid model that can be used both as a generative as well as a representation learning model.

Trained auto-encoder with contrastive learning to learn good representations, followed by diffusion model over the latent space to learn a generative model.

Paper accepted at ICLR 2021 Workshop: Neural Compression: From Information Theory to Applications.

project_img

Powering Robust Fashion Retrieval with Information Rich Feature Embeddings

Proposed a grid based training of siamese networks, allowing the network to observe mutiplte positive and negative image instances simultaneously.

Best Paper Award at CVPR Workshop, 2019 .

project_img

On The Benefits Of Models With Perceptually ALigned Gradients

Analyzed the models adversarially trained with small perturbation. Such models have interpretable gradients without incurring a significant drop in the performance over clean images.

Used these models for zero-shot transfer and weakly supervised object localization tasks, achieveing significant gains in performance.

Paper accepted at ICLR 2020 Workshop: Towards Trustworthy ML.

project_img

cFineGAN: Unsupervised multi-conditional fine-grained image generation

Proposed a multi-conditional image generation pipeline that generates an image which contains the shape of first input and texture of second input image.

Paper accepted at NeurIPS Workshop on Machine Learning for Creativity and Design 3.0, 2019.

project_img

Improving Classification Performance of Support VectorMachines via Guided Custom Kernel Search

Used a modification of the neural architecture search to discover a kernel function for SVM over MNIST dataset.

Paper accepted at GECCO, 2019 .

Selected Projects
project_img

Face analyzer tool

Built a face analyzer tool utilizing deep learning techniques to provide users with an unbiased analysis of their facial appearance.

The project was the winner of the Microsoft AI Hackathon competition held at IIT Kharagpur.

project_img

Autonomous snake game using DQN

Implemented Deep Q learning for the snake game. Built the game in pygame which could be then controlled by a deep learning agent.


Inspired from this website