active · last updated …
Digbalay Bose
Research Scientist at Adobe Research, Bengaluru. I work on multi-modal document understanding.
currently
Building multi-modal document understanding systems at Adobe Research.
open to
Collaborations on multi-agent systems, multimodal document understanding, and
controllable generation.
contact
01about
I build agentic pipelines that mine multimodal information from document collections and synthesize it into new media — videos, podcasts, slide decks.
My work sits at the intersection of vision-language modeling, controllable image/video generation, and and multi-agent systems. I am interested in building systems that reason over heterogeneous inputs—text, images, audio, video—without requiring massive supervision.
read more on the research02by the numbers
19
papers
9
venues
8
first-author
1
us patent
publications per year · 2016–2025
03selected work
-
Interspeech '25
Can Multimodal Foundation Models Help Analyze Child-Inclusive Autism Diagnostic Videos?A. Kommineni, T. Feng, D. Bose, S. Narayanan
-
ICMI '24
Can Text-to-image Models Assist Multi-modal Learning with Visual Modality Missing?T. Feng, D. Yang, D. Bose, S. Narayanan
-
ACM MM '23
MM-AU: Towards Multimodal Understanding of Advertisement VideosD. Bose, R. Hebbar, T. Feng, K. Somandepalli, A. Xu, S. Narayanan
-
WACV '23
MovieCLIP: Visual Scene Recognition in MoviesD. Bose, R. Hebbar, K. Somandepalli, et al.
04recent notes
-
2025-06-13awardRecognized as an outstanding reviewer at CVPR 2025.
-
2025-01-27new roleJoined Adobe Research, Bengaluru as a Research Scientist.
-
2024-12-18thesisDefended my Ph.D. — Multimodal Perception Guided Computational Media Understanding.