cv | Digbalay Bose

Basics

Name	Digbalay Bose
Label	Ph.D. Candidate
Email	dbose@usc.edu/digbose92@gmail.com
Url	https://digbose92.github.io/

Work

2023.05 - 2023.08
Computer Vision and Graphics Intern

NVIDIA Maxine AI
- Developed end-to-end deep learning models for controllable portrait video animation as part of NVIDIA Maxine.
2022.05 - 2022.08
Software Engineering Intern

NVIDIA Maxine AI
- Developed end-to-end visual and audio-visual deep learning models for high-fidelity facial animation of avatars as part of Maxine ARSDK.
2016 - 2018
Research Software Engineer

IBM Research India
- Developed an end-to-end soil moisture extraction system from satellite images by incorporating image interpolation techniques as a part of IBM Geospatial Analytics suite.
- Developed explainable deep learning models in the domains of image classification and visual search as a part of retail and operations effort.
2013.05 - 2013.07
Research intern

Indian Statistical Institute, Kolkata
- Developed a key recovery scheme based on the properties of Slid Pairs for stream cipher Salsa20.

Education

2018 - 2024

Los Angeles, CA
Ph.D.

University of Southern California

Ming Hsieh Department of Electrical and Computer Engineering
- Grounding Natural Language
- Advanced Computer Vision
- Affective Computing
- Mathematics of High Dimensional Data
2014 - 2016

Mumbai, India
M.Tech

Indian Institute of Technology, Bombay

Electrical Engineering
- Matrix Computations
- Machine Learning
- High-performance Computing
- Optimal Control
2010 - 2014

Kolkata, India
Bachelor of Engineering

Jadavpur University

Electronics and Telecommunication Engineering

Skills

	Languages
	Python
	C
	C++
	R
	Javascript
	HTML
	Bash

	Machine Learning Frameworks
	PyTorch
	Tensorflow
	Keras
	Caffe
	Scikit-learn
	OpenCV

	Softwares
	Maya
	Blender
	VTK

Languages

	English
	Fluent

	Hindi
	Fluent

	Bengali
	Native speaker

Projects

2022 - 2023
Automated analysis of advertisement videos

Introduced large-scale advertisement benchmark dataset (MM-AU) and multimodal models for semantic video understanding tasks.
- Keywords: multimodal learning, media understanding, advertisements
- Work published in ACM MM 2023 proceedings.
2022 - 2022
Context driven human affect perception

Developed multimodal context fusion module for emotion recognition in context-driven scenarios.
- Keywords: multimodal fusion , emotion recognition
- Work published in ICASSP 2023 proceedings.
2022 - 2023
Multimodal federated learning

Co-developed multimodal benchmark tasks and baseline models for federated learning applications
- Keywords: multimodal fusion, federated learning
- Work done in collaboration with Amazon Alexa AI
- Work published in KDD 2023 proceedings.
2021 - 2022
Visual scene understanding

Proposed a large-scale weakly labeled movie-centered scene dataset (MovieCLIP) and knowledge transfer to scene and genre classification tasks across diverse domains.
- Keywords: visual scene recognition, automatic labeling
- Work done in collaboration with Google Research
- Work published in WACV 2023 proceedings.
2021 - 2022
Automated analysis of facial paralysis patients

Developed a facial-landmark based video pipeline involving novel asymmetry measures for predicting standardized scores in a mixed effects modeling setup.
- Keywords: facial landmarks, automated analysis, linear mixed effects model
- Work done in collaboration with Keck School of Medicine and Pacific Neuroscience Institute.
- Work published in Facial Plastic Surgery & Aesthetic Medicine proceedings.
2021 - 2021
Understanding emotion perception in art work

Developed multimodal transformer based architectures with configurable image features for evoked emotion recognition in art images.
- Keywords: multimodal transformer, emotion recognition, art images
- Work published in ICCV CLVL workshop 2021.
2020 - 2023
Computational analysis of gender portrayal in media

Analyzed emerging trends in TV shows and advertisements across the dimensions of age, perceived skintone and gender
- Keywords: representation in media, media understanding
- Work done in collaboration with Geena Davis Institute on Gender in Media and Google Research.

Basics

Work

NVIDIA Maxine AI

NVIDIA Maxine AI

IBM Research India

Indian Statistical Institute, Kolkata

Education

University of Southern California

Ming Hsieh Department of Electrical and Computer Engineering

Indian Institute of Technology, Bombay

Electrical Engineering

Jadavpur University

Electronics and Telecommunication Engineering

Skills

Languages

Projects

Introduced large-scale advertisement benchmark dataset (MM-AU) and multimodal models for semantic video understanding tasks.

Developed multimodal context fusion module for emotion recognition in context-driven scenarios.

Co-developed multimodal benchmark tasks and baseline models for federated learning applications

Proposed a large-scale weakly labeled movie-centered scene dataset (MovieCLIP) and knowledge transfer to scene and genre classification tasks across diverse domains.

Developed a facial-landmark based video pipeline involving novel asymmetry measures for predicting standardized scores in a mixed effects modeling setup.

Developed multimodal transformer based architectures with configurable image features for evoked emotion recognition in art images.

Analyzed emerging trends in TV shows and advertisements across the dimensions of age, perceived skintone and gender