AI Alignment: Then and Now

July 7, 2023

On February 22nd, 2023, I gave an introductory talk on the field of AI Alignment to the AI Journal Club at Northwestern University. Usually, we discuss a single paper each week, but I felt it was necessary to deviate from that pattern due to the limited awareness of the field, even among AI researchers. Dissecting a single alignment paper for an audience unfamiliar with the fundamental problem at hand would have been akin to teaching someone about assembly-level code optimizations when they were just learning to use a computer.

The 45 minutes assigned to me were hardly enough to cover even the basics of the control problem and why attempting to solve it was crucial. However, I did my best to expose my audience to at least the core thrusts of alignment theory. I was pleasantly surprised at how well the content was received, and though some were inevitably skeptical of the purported magnitude of risk, the ensuing conversations were thought-provoking and constructive, and I thoroughly enjoyed them.

I’ve embedded the presentation slides below, for archival and easy access.

Note: several slides have speaker notes.

I gave my presentation less than 5 months ago, and the climate surrounding alignment has changed tremendously in that little time. Back then, the still-fresh hype around generative AI meant that anyone calling for regulation or even restraint was being mocked and/or called a luddite. This was hardly new for those of us who have long believed that superintelligent AI might pose an existential risk to humanity. What was new, however, was the volume of the pushback, coming from all corners of the internet, not just from skeptical AI researchers.

Fast-forward to July 5th, 2023, and OpenAI has made four separate announcements regarding AI Safety and Alignment, with the first coming out just 2 days after my talk. In the latest one, they’ve put their money where their mouths are, announcing a new “Superalignment” team, with the goal to “solve the core technical challenges of superintelligence alignment in four years.”

How well they fare, time will tell. But this is the first time in a while that I’ve felt hopeful about our chances in this uphill battle.

Obligatory May 2024 Edit: Nevermind, putting all my eggs in the CAIS and Anthropic baskets for now.