Jimin Yeom
Hanyang University · M.S. in Computer Science
Jimin Yeom
Research Statement · 2026

I work on trustworthy and interpretable AI. My research goal is to understand how models learn (a.k.a Trainig Dynamics) well enough that we can build the ones we are willing to trust. I am a master's student at Hanyang University, advised by Sungyoon Lee.

The objective I keep coming back to.

The two perturbations, δ in the input space and η in the parameters space, are usually studied in different rooms — one as adversarial robustness, the other as training dynamics. I think they belong on the same page. Studying them jointly is a useful lens on robustness, generalization, and what a model is really doing when it generalizes.

Concretely, I am drawn to questions like: Why does the robustness–accuracy trade-off persist? Why does catastrophic overfitting appear so abruptly? When does in-context learning behave like an implicit optimizer, and when does it not? And which component of training dynamics cause reward hacking?

Research Map
Trustworthy AI
  • Adversarial Robustness
    • — Robustness–Accuracy Trade-off
    • — Catastrophic Overfitting
  • LLM Jailbreaking
    • — Certifiable Defense
    • — Continuous Defense
Interpretable AI
  • In-Context Learning
    • Linear Transformer
  • Model Unlearning
  • Reasoning
  • Reward Hacking
Currently
  • Edge of Stability in Adversarial Training — how the EoS regime interacts with catastrophic overfitting.
  • Reward Hacking — mechanisms behind reward hacking in alignment-tuned models, and training-dynamics-aware ways to mitigate it.

For papers, see publications. For the long form, CV. For anything else, email works.