😆 SMILE-Next:
Teaching Large Language Models to Detect, Classify, and Reason about Laughter

Accepted at ACL 2026 (Main, Oral Presentation)
1KAIST, 2POSTECH
Interpolate start reference image.


SMILE-Next defines three laughter-related tasks:
Laughter Detection, Laughter Classification, and Laughter Reasoning.

Abstract

Laughter is a complex social signal that conveys communicative intent beyond amusement. While prior work has focused on isolated laughter analysis tasks, a comprehensive understanding of laughter in real-world scenarios remains underexplored. We introduce SMILE-Next, a dataset for real-world laughter understanding with multimodal textual representations and question–answer annotations across three tasks: laughter detection, laughter type classification, and laughter reasoning. Building on this dataset, we propose a laughter expert LLM that leverages disentangled multimodal textual cues, together with a Mixture-of-Laugh-Experts framework and laughter-specific self-instruction for task-adaptive specialization. Experimental results show that the combination of our proposed components substantially outperforms multimodal LLM baselines, advancing robust real-world laughter understanding.

SMILE-Next: Real-World Laughter Dataset

Dataset Statistics

Dataset statistics.
Figure 1. Statistics of the SMILE-Next dataset across three laughter understanding tasks: detection, classification, and reasoning.

Laughter-tailored Self-Instruct

Task frequency distribution.
Figure 2. Task frequency distribution of the SMILE-Next dataset across three laughter understanding tasks.
Examples of Generated Self-Instruction Instances
Evaluating task: Rate each laugh in the scene based on intensity and context, and determine whether it was genuine or forced.
Input: During a tense board meeting at her office, Sarah tries to lighten the mood with a faint chuckle after the boss makes a dry joke. Her co-workers don't seem to respond much.
Answer: Forced, low intensity
Correlation task: Derive the relationship between the acoustic features and the intensity or type of laugh.
Input: Acoustic feature: irregular pace, variation in pitch (Laughter)
Answer: This could suggest a nervous laughter or possibly a fake laugh.
Table 1. Examples of laughter-specific self-instruction instances generated for the SMILE-Next dataset.

Mixture-of-Laugh-Experts

Mixture-of-Laugh-Experts architecture.
Figure 3. Architecture of the Mixture-of-Laugh-Experts (MoLE) framework for laughter-adaptive specialization.
Router activation weights.
Figure 4. Router activation weights across laughter understanding tasks, showing task-adaptive expert selection.

Results

Quantitative Results

Quantitative results.
Table 2. Quantitative comparison of SMILE-Next against multimodal LLM baselines across laughter detection, classification, and reasoning tasks.

Qualitative Results

Qualitative results.
Figure 5. Qualitative examples of SMILE-Next predictions across laughter detection, classification, and reasoning tasks.