2024-06-29, 16:20–16:45 (US/Pacific), Barn
As Large Language Models (LLMs) gain trust across various sectors for tasks ranging from generating text to solving complex queries, their influence continues to expand. Yet, this trust is shadowed by significant risks, such as the subtle yet serious threat of data poisoning. This talk will delve into how deceptively crafted data can infiltrate an LLM’s training set, leading these models to propagate errors, biases, or outright fabrications—a real challenge to the integrity of their outputs.
While there are various algorithms and approaches designed to mitigate these risks, this session will focus particularly on the Rank-One Model Editing (ROME) algorithm. ROME is notable for its ability to edit an LLM's knowledge in a targeted manner after training, providing a means to recalibrate AI outputs. However, it also presents a potential for misuse, as it can be employed to embed false narratives deeply within a model.
Key Discussion Points:
- Why People Trust LLMs: Exploring the reasons behind the widespread trust in LLMs and the associated risks.
- The Art of Data Poisoning: A closer look at how maliciously crafted data is inserted into training sets and its profound impact on model behavior.
- Focus on ROME: Discussing how the Rank-One Model Editing algorithm can both safeguard against and potentially contribute to the corruption of LLMs.
- Ethical Considerations: Reflecting on the ethical implications of manipulating the knowledge within LLMs, which requires not just technical skill but also wisdom and responsibility.
This presentation is designed for data scientists, AI researchers, and Python enthusiasts interested in understanding the vulnerabilities of LLMs and the tools available to protect these systems. While acknowledging other algorithms and methods, this talk will provide a quick demonstration of ROME, offering insights into its utility and dangers.
As people continue to integrate LLMs into everything, we must remain vigilant against the risks of data manipulation. This session challenges us to consider whether we are paying enough attention to these threats, or if we are, metaphorically, just fiddling while Rome burns—allowing foundational trust in data to erode.
Join me in this exploration of ROME, where we navigate the fine balance between correcting and corrupting the digital minds that are—whether we like it or not—becoming an integral part of our technological landscape.
Dr Paris Buttfield-Addison is co-founder of Secret Lab Pty. Ltd., a game development studio, and Yarn Spinner Pty. Ltd., an interactive narrative tools provider, both based in beautiful Hobart, Tasmania, Australia. Secret Lab is best known for the BAFTA- and IGF-winning Night in the Woods, and the Qantas Joey Playbox. Yarn Spinner builds the wildly popular open source YarnSpinner narrative game framework. Paris formerly worked as a software engineer, and product manager for Meebo, which was acquired by Google. He has a degree in medieval history, a PhD in Computer Science, and has written more than 30 technical books on machine learning, programming, and game development, mostly for O’Reilly Media. He can be found on Elon's Hell Site as @parisba and on Mastodon at @parisba@cloudisland.nza, and online at http://paris.id.au