Courts of Tomorrow: Evidence from a Nationwide Introduction of Generative AI
Professor Sultan Mehmood
Associate Professor of Economics
New Economic School
We present the first large-scale field experiment on integrating generative AI into a national judicial system. In partnership with Pakistan’s judiciary, we developed \textit{JudgeGPT}, a custom-built generative AI assistant designed specifically for Pakistan’s trial courts, and randomized 1,559 judges into one of three arms: (i) access to \textit{JudgeGPT} with targeted training tailored to the tool; (ii) access to \textit{JudgeGPT} with placebo training in technology and law; and (iii) a control group that received the placebo training but no access to the AI assistant. Treated judges are more likely to use generative AI and to support its broader adoption in the judiciary. Using an LLM-based, lawyer-validated measure of opinion quality, we find that \textit{JudgeGPT} improves writing quality when paired with targeted training, but lowers it when provided without such training. Administrative records also point to sizable productivity gains: a one-standard-deviation increase in the share of judges assigned to \textit{JudgeGPT} plus targeted training raises case resolutions by roughly 1,100 cases per district-year. Consistent with this pattern, usage data from \textit{JudgeGPT} suggest that targeted training shifts judges away from open-ended legal search and toward more structured writing tasks, where the tool is likely more useful. We find little evidence that either treatment amplifies gender or ethnic bias in case outcomes or judicial language. Overall, the results suggest that generative AI can raise public-sector productivity, but that these gains may depend on targeted training that directs use toward tasks for which the tool is better suited.















