Preference Optimization Beyond DPO: From Noisy Feedback to Conflicting Objectives

SPEAKER

Professor Xi Chen
Professor of Technology, Operations, and Statistics
Stern School of Business
New York University

ABSTRACT

Preference optimization has emerged as a leading paradigm for aligning large language models (LLMs), offering a scalable alternative to reinforcement learning from human feedback. Yet two key challenges remain: preference data are often noisy, and real-world alignment objectives frequently conflict, requiring trade-offs among attributes such as helpfulness, harmlessness, conciseness, and quality.

In this talk, I present two recent advances that address these challenges through new optimization perspectives.

First, I introduce ComPO (Comparison-Oracle Preference Optimization) , published at NeurIPS 2025 . Existing DPO-style methods largely ignore ambiguous or noisy preference pairs. ComPO instead leverages such pairs through a comparison-oracle framework and zeroth-order stochastic optimization, extracting additional learning signals that improve alignment performance without relying on explicit reward models.

Second, I present RACO (Reward-free Alignment for Conflicting Objectives) , an ICML 2026 Spotlight paper. RACO tackles multi-objective alignment by directly optimizing multiple preference objectives simultaneously. Using a clipped conflict-aware gradient method, it achieves stable convergence to Pareto-optimal trade-offs and consistently improves performance on benchmarks involving competing objectives.

Together, these works highlight a broader shift from reward modeling toward direct preference optimization, showing how modern optimization techniques can make alignment more robust to noisy feedback and more effective in balancing competing objectives.

BIOGRAPHY

Xi Chen is the Andre Meyer Full Professor at NYU Stern School of Business and an affiliated faculty member of the Courant Institute of Mathematical Sciences and NYU’s Center for Data Science. He earned his Ph.D. in Computer Science from Carnegie Mellon University and completed a postdoctoral fellowship at UC Berkeley, advised by Michael I. Jordan.

Xi has held senior industry roles, including full-time science leadership at Amazon Ads (2021–2023), where he led forecasting, pricing, recommendation, and delivery systems for a multi-billion-dollar video advertising marketplace.

He is a Fellow of the IMS (Institute of Mathematical Statistics) and ASA (American Statistics Association), has published 120+ papers across AI, statistics, and operations research, and is a Forbes 30 Under 30 (Science) and Poets & Quants 40 Under 40 honoree

Events

Preference Optimization Beyond DPO: From Noisy Feedback to Conflicting Objectives