My research focuses on building intelligent, human‑centered systems at the intersection
of computer vision, natural language processing, and human–computer interaction.
I develop multimodal AI models that integrate visual, linguistic, and motion-based signals
to enable natural, interpretable, and accessible interaction between humans and machines.
A central theme of my work is multimodal perception and interaction,
spanning sign language technologies, AI‑driven human–computer interfaces, human activity
understanding, and generative models for expressive human representation.
I place particular emphasis on real‑time systems, robustness in unconstrained environments,
and interpretability for safety‑critical and accessibility-oriented applications.
Key Contributions
-
CVPR 2026 (Main):
Real-time vision-based fingertip contact detection for AI-driven human–computer interaction
-
ACL 2026 (Findings):
Personalized emotion visualization and interpretable multimodal NLP systems
-
High-impact survey contributions in talking face generation and human-centered generative AI
-
Multimodal learning frameworks for sign language understanding and accessibility
Featured Project
-
Vision-based Fingertip Contact Detection for AR/VR Interfaces
A real-time RGB-based system that fuses monocular depth estimation and motion cues
to achieve millimeter-level fingertip contact detection without dedicated depth sensors.
(CVPR 2026, Main Conference)
-
Sentimentogram: Personalized Emotion Visualization Framework
A human-centered multimodal NLP system that learns individual emotion-visualization
preferences via interpretable fusion and minimal user feedback.
(ACL 2026, Findings)
AI-driven Human–Computer Interaction
Developing natural, real-time, and vision-based interaction systems for AR/VR
and accessibility-focused human–computer interfaces.
Hand & Finger Tracking
Virtual Keyboards
AR/VR/XR
Real-time Vision
Accessibility
Featured Project
-
Vision-based Fingertip Contact Detection for AR/VR Interfaces
A real-time RGB-only system that fuses monocular depth estimation and motion cues
to achieve millimeter-level fingertip contact detection without dedicated depth sensors.
Validated in interactive VR keyboard scenarios.
(CVPR 2026, Main Conference)
Generative AI for Human Interaction
Human-centered generative and multimodal AI systems for expressive communication,
interactive visualization, and accessibility-oriented applications.
Multimodal NLP
Affective Computing
Interpretability
Personalization
Human-centered AI
Featured Project
-
Sentimentogram: Personalized Emotion Visualization Framework
A human-centered multimodal NLP framework that learns individual emotion-visualization
preferences from minimal user feedback, supported by interpretable audio–text fusion
and controlled human studies.
(ACL 2026, Findings)
Multimodal Sign Language Technologies
Developing advanced systems for automatic sign language recognition and translation. This research combines multiple input modalities including RGB video, skeletal keypoints, and depth information to achieve robust sign language understanding.
Real-time Translation
Cross-linguistic Corpora
Wearable Sensors
Keypoint Vectorization
Vision Transformers
Multimodal Fusion
Key Contributions
- PhD Thesis: "Advancing Sign Language Recognition: A Multimodal Deep Learning Framework with Keypoint Vectorization"
- Deep learning pathways for automatic sign language processing (Pattern Recognition, IF: 9.84)
- Interpretable sign language recognition systems
Advanced Human Activity Recognition
Researching novel approaches to recognize and classify human activities using various sensing modalities. Special focus on Doppler radar-based recognition for privacy-preserving applications and skeleton-based pose estimation for sports analytics.
Doppler Radar Analysis
Privacy-preserving Surveillance
Healthcare Monitoring
3D CNN
Skeleton-based Recognition
Sports Analytics
Key Contributions
- DDC3N: Doppler-driven convolutional 3D network for human action recognition (IEEE Access)
- Privacy-preserving human identification in CCTV data
- Athletes' action recognition through skeleton-based pose estimation
Generative AI for Human Interaction
Exploring generative models for realistic human face and body synthesis. Research includes talking face generation, speech-to-face synthesis, and 3D facial animation for virtual avatars and accessibility applications.
3D Facial Animation
Speech-to-Face Synthesis
GANs
Diffusion Models
Talking Head Generation
Virtual Avatars
Key Contributions
- Talking human face generation: survey (Expert Systems with Applications, IF: 9.29)
- Generative adversarial networks and their application to 3D face generation: a survey (Image and Vision Computing)
- Research on diffusion models and LLMs for sign language synthesis
AI-driven Human-Computer Interaction
Developing natural and intuitive interfaces for human-computer interaction. Research includes hand and finger detection for virtual input devices, AR/VR interfaces, and accessibility tools.
Hand Detection
Finger Tracking
Virtual Keyboards
AR/VR/XR
Smart Glasses
Gesture Recognition
Key Contributions
- Human pose, hand, and mesh estimation using deep learning: survey (Journal of Supercomputing, IF: 3.96)
- VR keyboard project at KAIST SpaceTop Research Center (ITRC)
- Research on natural interaction paradigms for accessibility
Medical AI & Smart Healthcare
Applying AI and deep learning techniques to medical imaging and healthcare applications. Focus on improving diagnostic accuracy and enabling smart healthcare monitoring systems.
Medical Image Processing
Smart Healthcare
Diagnostic AI
Health Monitoring