Digbalay Bose

currently

Building multi-modal document understanding systems at Adobe Research.

open to

Collaborations on multi-agent systems, multimodal document understanding, and controllable generation.

contact

Send email

01about

I build agentic pipelines that mine multimodal information from document collections and synthesize it into new media — videos, podcasts, slide decks.

My work sits at the intersection of vision-language modeling, controllable image/video generation, and and multi-agent systems. I am interested in building systems that reason over heterogeneous inputs—text, images, audio, video—without requiring massive supervision.

02by the numbersall papers →

papers

venues

first-author

us patent

publications per year · 2016–2025

03selected workall 19 →

Interspeech '25

Can Multimodal Foundation Models Help Analyze Child-Inclusive Autism Diagnostic Videos?

A. Kommineni, T. Feng, D. Bose, S. Narayanan
ICMI '24

Can Text-to-image Models Assist Multi-modal Learning with Visual Modality Missing?

T. Feng, D. Yang, D. Bose, S. Narayanan
ACM MM '23

MM-AU: Towards Multimodal Understanding of Advertisement Videos

D. Bose, R. Hebbar, T. Feng, K. Somandepalli, A. Xu, S. Narayanan
WACV '23

MovieCLIP: Visual Scene Recognition in Movies

D. Bose, R. Hebbar, K. Somandepalli, et al.

04recent notesall notes →

2025-06-13

award

Recognized as an outstanding reviewer at CVPR 2025.
2025-01-27

new role

Joined Adobe Research, Bengaluru as a Research Scientist.
2024-12-18

thesis

Defended my Ph.D. — Multimodal Perception Guided Computational Media Understanding.

email scholar github linkedin orcid