Jimin Yeom

Research Statement · 2026

I work on trustworthy and interpretable AI. My goal is to understand how models learn well enough that we can build the ones we are willing to trust. I am a master's student at Hanyang University, advised by Sungyoon Lee.

The objective I keep coming back to.

The two perturbations, δ in the input and η in the parameters, are usually studied in different rooms — one as adversarial robustness, the other as training dynamics. I think they belong on the same page. Studying them jointly is a useful lens on robustness, generalization, and what a model is really doing when it generalizes.

Concretely, I am drawn to questions like: why does the robustness–accuracy trade-off persist? Why does catastrophic overfitting appear so abruptly? When does in-context learning behave like an implicit optimizer, and when does it not? And what is actually being forgotten in model unlearning?

Research Map

Trustworthy AI

Adversarial Robustness
- — Robustness–Accuracy Trade-off
- — Catastrophic Overfitting
LLM Jailbreaking
- — Certifiable Defense

Interpretable AI

In-Context Learning
Linear Transformer
Model Unlearning
Reasoning

Currently

Edge of Stability in Adversarial Training — how the EoS regime interacts with catastrophic overfitting.
Reward Hacking — mechanisms behind reward hacking in alignment-tuned models, and training-dynamics-aware ways to mitigate it.

For papers, see publications. For the long form, CV. For anything else, email works.