|
Automated Model Discovery via Multi-modal & Multi-step Pipeline
Lee Jung-Mok, Nam Hyeon-Woo, Moon Ye-Bin, Junhyun Nam, Tae-Hyun Oh
NeurIPS 2025, IPIU 2025 (Best Poster Awarded!!)
[project page]
[arxiv]
[code]
We present a multi-modal & multi-step pipeline for effective automated model discovery, using the multimodal LLM agents. We model the interpretation of time-series data using Gaussian Process kernel discovery.
|
|
Training-free Multimodal Embedding for Structure-Aware Retrieval of Scalable Vector Graphics and Images
Kyeongseon Kim, Baek Seong-Eun, Lee Jung-Mok, Tae-Hyun Oh
WACV 2026 (Round 1 Acceptance!!)
[project page]
[arxiv]
[code]
We propose the first training-free multimodal embedding method that uses a Multimodal Large Language Model (MLLM) to project text, images, and SVG code into an aligned space.
|
|
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Kim Sung-Bin*, Oh Hyun-Bin*, Lee Jung-Mok, Arda Senocak, Joon Son Chung, Tae-Hyun Oh
ICLR 2025
[project page]
[arxiv]
[code]
We introduce a comprehensive audio-visual hallucination benchmark specifically designed to evaluate the perception and comprehension capabilities of audio-visual LLMs.
|
Ongoing Projects
These include coursework, side projects and unpublished research work.
|
|
Efficient Hyper-Parameter Search for LoRA via Language-aided Bayesian Optimization
Baek Seong-Eun, Lee Jung-Mok, Kim Sung-Bin, Tae-Hyun Oh
ICLRsubmission
2025-09-20
We present an approach that integrates LLMs with Bayesian Optimization to efficiently tune LoRA hyperparameters. This method enables adaptive search over the hyperparameter space, improving fine-tuning performance and computational efficiency.
|
|
Does Visual Arrow Improve Motion Comprehension in Multimodal Large Language Models?
Baek Seong-Eun, Nam Hyeon-Woo, Lee Jung-Mok, Tae-Hyun Oh
ongoing
2025-03-20
In ViMP, we draw a visual arrow on a sampled frame of a video by hand. ViMP conveys motion information with visual contexts, reducing the need for multiple frame.
|
|
SMILE-NEXT: An Enhanced Multimodal Dataset and Model for Laughter Understanding
Lee JungMok, Kim Sung-Bin, Lee Hyun, Tae-Hyun Oh
ongoing
2024-10-15
[project page]
[code]
We introduce the SMILE-NEXT, a comprehensive corpus combining audio, visual, and textual cues for laughter understanding across diverse contexts. We propose Laugh Expert MOE architecture, a lightweight yet expressive model architecture to efficiently model laughter perception.
|
|