Jimin Yeom
Hanyang University · M.S. in Computer Science
Jimin Yeom
Research Statement · 2026

I work on trustworthy and interpretable AI. My goal is to understand how models learn well enough that we can build the ones we are willing to trust. I am a master's student at Hanyang University, advised by Sungyoon Lee.

The objective I keep coming back to.

The two perturbations, δ in the input and η in the parameters, are usually studied in different rooms — one as adversarial robustness, the other as training dynamics. I think they belong on the same page. Studying them jointly is a useful lens on robustness, generalization, and what a model is really doing when it generalizes.

Concretely, I am drawn to questions like: why does the robustness–accuracy trade-off persist? Why does catastrophic overfitting appear so abruptly? When does in-context learning behave like an implicit optimizer, and when does it not? And what is actually being forgotten in model unlearning?

Research Map
Trustworthy AI
  • Adversarial Robustness
    • — Robustness–Accuracy Trade-off
    • — Catastrophic Overfitting
  • LLM Jailbreaking
    • — Certifiable Defense
Interpretable AI
  • In-Context Learning
  • Linear Transformer
  • Model Unlearning
  • Reasoning
Currently
  • Edge of Stability in Adversarial Training — how the EoS regime interacts with catastrophic overfitting.
  • Reward Hacking — mechanisms behind reward hacking in alignment-tuned models, and training-dynamics-aware ways to mitigate it.

For papers, see publications. For the long form, CV. For anything else, email works.