Stefano Zuffi

Reasoning around alignment and the shape of AI related risks — and taking action on the concrete ones already here.

now

What does 'simplicity' mean for neural networks? how complex a property is to represent in an input tensor (a priori), and how complex an internal mechanism must be to detect and exploit it for reward (a posteriori). The project aims to formalize the first with descriptive complexity theory and the second with singular learning theory, and to study how the two interact — with simplicity bias and goal misgeneralization as the motivating phenomena.

Arguments Map

Building a comprehensive map of the arguments around AI safety and AI risk — their assumptions, their sources, and how claims relate to one another. The aim — make it easy for anyone to locate where they stand and what follows. Currently in Obsidian; will become its own site. This is one of the main activities I decided to pursue at AFFINE 2026.

Field-Building

As I am building my theoretical and practical background to help in AI safety, the way I can make the most impact is by helping shape the landscape for those who can already impact the field with their work. Founding team member of Safe AI Netherlands (SAIN). Active in AI Safety Amsterdam (AISA). Working to seed an AI safety community in Milan.

→ all projects

writing

2026-04-01 Inner and Outer Alignment: a survey — An intro to the inner/outer alignment distinction. → all posts

trajectory

2019 – 2022 BA in Philosophy, Catholic University of the Sacred Heart, Milan. Thesis on GL modal calculus and its applications in recursion theory.

2022 – 2025 MSc in Logic and Mathematics, University of Amsterdam. Coursework in set theory, model theory, modal logic, abstract algebra, probability. Thesis: Arbitrary Terms with no Arbitrary Objects — semantics of arbitrary reference, proposing a “quasi-referential” account.

2025 Did an internship as a Data Analyst + got the basis of web development. Useful to learn about digital infrastructures, but the questions that pulled me in were elsewhere.

Late 2025 – present Pivoted to AI safety full-time. Self-taught ML and deep learning fundamentals; completed BlueDot Impact’s Technical AI Safety course; working through ARENA’s mechanistic interpretability curriculum. Active in AISA (Jan 2025 –). Founding team of SAIN (Apr 2026 –). AFFINE Superintelligence Alignment Seminar fellow (May 2026).