Library / In focus

Back to Library
AXRPCivilisational risk and strategy

Alan Chan on Agent Infrastructure

Why this matters

This episode strengthens first-principles understanding of alignment risk and the strategic conditions that shape safe outcomes.

Summary

This conversation examines core safety through Alan Chan on Agent Infrastructure, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Perspective map

MixedTechnicalMedium confidenceTranscript-informed

The amber marker shows the most Risk-forward score. The white marker shows the most Opportunity-forward score. The black marker shows the median perspective for this library item. Tap the band, a marker, or the track to open the transcript there.

An explanation of the Perspective Map framework can be found here.

Episode arc by segment

Early → late · height = spectrum position · colour = band

Risk-forwardMixedOpportunity-forward

Each bar is tinted by where its score sits on the same strip as above (amber → cyan midpoint → white). Same lexicon as the headline. Bars are evenly spaced in transcript order (not clock time).

StartEnd

Across 22 full-transcript segments: median 0 · mean -3 · spread -170 (p10–p90 -100) · 0% risk-forward, 100% mixed, 0% opportunity-forward slices.

Slice bands
22 slices · p10–p90 -100

Mixed leaning, primarily in the Technical lens. Evidence mode: interview. Confidence: medium.

  • - Emphasizes alignment
  • - Emphasizes safety
  • - Full transcript scored in 22 sequential slices (median slice 0).

Episode deep read

Pilot template v1 · updated Apr 2, 2026

A tight AXRP interview on how society can shape a world where software agents act on our behalf—through shared infrastructure, not only model alignment.

  1. Scene-setting & who is speaking~0%–7%

    Workshop context, Alan Chan’s GovAI / Mila background, and why talking to peers at the event actually informs the work.

  2. Why agents change the stakes~7%–20%

    Agents do things in the world (email, transactions, computer use). Capability growth raises questions about interventions and a ladder of harms—from nuisance spam toward more serious misuse.

  3. Agent infrastructure in plain terms~20%–42%

    Infrastructure as large-scale systems that enable functions—analogous to IP, HTTPS, PKI. Examples: knowing which agent is whose, tracing bad messages, accountability after incidents.

  4. A trichotomy (with traffic as intuition)~42%–64%

    Three families: make the environment legible to well-intentioned agents; add friction so unsafe actions are harder; leave traces and identifiers so bad behavior can be handled after the fact.

  5. Policy hooks & reversibility~64%–80%

    IDs as a plausible ‘easy win’ tied to transparency norms and EU AI Act–style disclosure; scaling intensity with evidence; tension when infrastructure includes hard-to-reverse switches.

  6. Channels, isolation, and control research~80%–93%

    Agent channels / highways to isolate agent traffic; comparison to internal AI-control protocols as something that could become shared, downloadable infrastructure between parties.

  7. Publications & sign-off~93%–100%

    Where to find IDs-for-systems and visibility work; forthcoming paper; credits.

Risk-forward thread

The through-line is institutional: what happens when autonomous systems can act at scale, who is liable, and whether transparency and traceability can be misused (e.g. kill switches baked in by mandate). The discussion stays analytical—threat models, worm-like failures, government overreach—but those frames pull the conversation toward caution and governance design rather than cheerleading for capability.

Opportunity-forward thread

Chan frames infrastructure as complementary to aligning the model itself: shared protocols, IDs, and separated channels can help responsible builders and users coordinate, recover from incidents, and scale assurance—similar to how traffic systems help both cautious drivers and everyone else. Easy wins like disclosure and identifiers are pitched as politically feasible steps that still improve legibility.

The catalog headline score sits in the balanced band with a slight risk lean: metadata plus transcript-wide slice statistics pick up governance, alignment, and safety vocabulary while most of the middle is neutral technical design talk. Use the strip for where language spikes; use this deep read for the argument structure those spikes belong to.

Stored text is YouTube auto-captions—good for navigation and rough lexicon scoring, but expect garbled words and dropped nuance. Prefer AXRP’s publisher transcript for exact quotations.

Editor note

A high-leverage addition to the AI Safety Map that clarifies one important safety bottleneck.

ai-safetyaxrpcore-safetytechnical

Play on sAIfe Hands

Episode transcript

YouTube captions (auto or uploaded) · video 1OQyGH-IlM4 · stored Apr 2, 2026 · 645 caption segments

Captions are an imperfect primary: they can mis-hear names and technical terms. Use them alongside the audio and publisher materials when verifying claims.

No editorial assessment file yet. Add content/resources/transcript-assessments/alan-chan-on-agent-infrastructure.json when you have a listen-based summary.

Show full transcript
hello everyone this is one of a series of short interviews that I've been conducting at the Bay Area alignment Workshop which is run by far AI uh links to what we're discussing as usual are in the description um a transcript is as usual available at axr p.net and as usual if you want to support the podcast you can do so at patreon.com axr podcast well let's continue to the interview yeah so hello Alan Chan hey for people who don't know who you are can you say a little bit about yourself yeah so I'm a research fellow at the center for the governance of AI I'm also a PhD student at MAA in Quebec but I'm about to defend in the next few months okay so my background is in math and machine learning but starting 2 SL one and a half years ago I started doing a lot more uh AI governance stuff basically and joinan GOI where I've been for the past year now okay cool and at the moment we're at this uh alignment workshop being run by fari um I'm wondering how are you finding it yeah it's pretty interesting I mean um I think the main barrier I have to get over internally in my head is like oh like I really would just like to you know chill and not really do much SL or get other sort of work done right but I'm finding it is actually very helpful to talk to to people about the work they're doing because it actually does does inform my my my my own work so it's yeah way better than I than I expected let's talk a little bit about the work you're doing so I understanding I I understand that you've been thinking about uh agent infrastructure can you tell us a little bit about what's going on there yeah yeah so maybe let's like back up and think about agents in general so what's like the story here St story here is something like okay we're building systems that cannot only tell us what to do in our lives or give us nice advice we're also Building Systems that can actually act in the world so Cloud computer use it can interact with your computer maybe do kind of transactions for you maybe send emails for you and as IA systems get better and better the question is like how do we you know what does this world look like where we have agents interacting and mediating a b mediating a bunch of our digital interactions what sort of interventions do we need to handle what kinds of risks that could come up so that's a sort of question with agents in in general a lot of my work in the past year has focused on thinking about like what interventions could we try to prepare in advance that could be relatively robust to a range of threat models and when I when I say threat models I mean okay like maybe right now we're worried about agents that could spam call people like that's just really annoying like it also disproportionately affects people who are older not as well at real educated right um as agents get more capable maybe we become more worried about agents carrying out cyber attacks or like you know being able to act autonomously in a biolab and do stuff like that okay and and just to help me so when you say threat model you mean like a bad thing do rather than like a bad capability or a bad like way of training run could happen like okay gotcha yeah yeah yeah yeah so one question for me because I guess I have a lot of uncertainty over how fast capabilities will improve and whether things will Plateau so like I guess one way I guess in the past year to deal with s you try to think about like what sorts of things might be pal palatable to people across a very wide range of of beliefs um particularly drawing upon things that have worked in a sort of other kinds of domains and I guess your answer is Agent infrastructure so yeah I think this is one answer so so what what is that um yeah so I guess infrastructure sort of RIT large is just any sort of tech large scale technical system that enables particular kinds of functions so if we think about the internet right like what is internet infrastructure it's like IP protocols it's like https it's like public key cryptography that allows you to interact with people in a secure way on online so that's sort of infrastructure um thinking about agents like what are some examples of agent infrastructure so some of the stuff that I've been working on in the past year includes well okay like your agent is interacting with my agent it'd be really good if I knew sort of who the agent was and who it belonged to so maybe you want some sort of ID ID system as like infrastructure right um or maybe one other example is okay you know I interacted with your agent I sent your agent some sort of message it turned out this message was actually like a worm or something and I like authorities want to be able to trace that worm back to somebody and like somehow like take it off the internet or something like this so you need like systems to be able to do this kind of thing as well I I guess like I gu a moner is like an off switch sort of thing for that particular example so I don't know a question I have in the back of my mind is like why is this more acceptable to like a or or why does this make sense for a wider range of beliefs um than I don't know just interventions on the agents itself and it sounds like it's because a lot of these things are sort of just like tracking what agents are doing having a better sense of like okay this agent was at this place doing this thing and now if like a bad thing happened you know we can like deal with it um is is is that roughly the right idea or is there something else going on um so guess I'm not claiming that it's more robust to sources of risk than interventions on the agents themselves um so so I guess there are two things here like the first thing is that I think interventions on agents themselves can like itself be robust to the type of risk so for example like alignment or something right like I think alignment has become a pretty widely accepted meme nowadays just because people think oh man like it is number one very useful to not have ai systems that want to kill everybody but it is also actually very useful if you can align a system well enough such that you can the developer can prevent somebody from using an AI system to generate spam or something like this so yeah so I'm not making any Claim about sort of the robustness of of um interventions on the agent thes okay I I think my claim is more that like interventions on the agent themselves seem useful and probably necessary it seems good if we also thought about additional interventions just those don't out in a sort ofss cheese Tye model um so like why might interventions on the agent itself not work out well maybe people just don't adopt them because of competitive like Dynamics type reasons maybe they don't actually generalize that well and we don't understand generalization that well so it'd be good to have sort of things in the environment that can stop the agent you know even if the agent itself is behaving poorly gotcha so so roughly so do I understand the perspective right that it's a combin hey here's this like type here's like Dom of things we could do that have been like previously not about and also within the domain okay let's like maybe think about things that are more robust to different threat models yes yeah okay yeah so sort of like let's do this whole whole a society approach or something like this to handling AI risk and yeah like if if you push me I guess um I think theistic I'm operating under is something like well it seems just in a lot of other domains in life we do things on the agent itself in addition to things like outside so if you think about traffic safety for instance well we we it is very important to train very safe drivers but we also recognize that drivers can't act safely perfectly in a perfect way all of the time so that's why we you know in some jurisdictions have stuff like roundabouts because they're actually safer than regular interest in in intersections yeah so so I guess this is the traffic analogy is interesting because I can think of like two types of things that um that is done in traffic to improve safety so one kind of thing is just making the making the situation more legible to drivers right so like making sure that the lines on the road are bright right making sure that like you can makeing sure there's not one of these weird kinds of Corners where you can't see the cars coming ahead of you and one of them is interventions so so I guess that's that's an intervention to like make the situation more legible to the relevant agents there's a second kind of Intervention which is sort of like kind of slow or literally slow down the agents right yeah yeah yeah like have like physical things that make mean that people can't like do their stuff in a dangerous way um but but can do stuff in a safe way right and a third kind of infrastructure in the driving realm is basically identifications so that you can catch bad drivers and punish them right so like license plates being uh the major one of these um speeding cameras being another one of these um yeah so so I I guess this is a proposed trichotomy of things you could do with agent infrastructure I'm wondering yeah what what do you think of the tronomy and like which ones do you think make more or less sense in the AI realm sorry so just just so that I have things clear um so the third one was like identification type stuff yeah the third one is like yeah so make the relevant domain more legible to the relevant agents um kind of literally pre like somehow physically just prevent the relevant Agents from doing bad stuff and identification leaving traces so that if a thing does do a bad thing then you can like somehow punish or deal with the behavior later see yeah yeah it's very interesting question yeah I I like this trichotomy so the third thing I think works even if your agents are bad because then other agents can refuse to interact with your bad agent right the first thing I think works if you have agents that are in some sense law abiding or can respond to positive incentives um yeah what what was your traffic example again uh the traffic example was like literally making sure the lines on the road are you know like making sure that they're not faded out right right replacing the like dingy old traffic lights I see so so this could be something like like supposing you had agents that were good at like following instructions Andor the law this would be something like okay like before every economic transaction with another agent you just put in the context like here are the relevant economic you know laws or something like this or or like here are the relevant um economic facts know like how to judiciously behave right right and here are sort of the legal obligations you have to your user as like a fiduciary or something yeah or maybe it's not even legal of right like like um I'm sure there are you do with roads that help you you know like drive more considerately or like it doesn't even have to be legal just like things you can do to make like sometimes I behave inappropriately because I have a misread of the situation right like I know that happens to me sometimes maybe like sometimes it happens to well intentioned AI agents yeah and like making various facts more legible uh yeah what what could such facts be like I know how scammy is some other counterparty right like um if I have an if I have my own AI assistant and it's like negotiating with some other AI assistant you could imagine some infrastructure that's like oh yeah make sure my AI assistant knows that like that other negotiating AI has gotten tons of bad reviews like somehow that's making the world more legible to my AI that is like helping it you know be a better player and not like accidentally send all my money to some scam yeah there is an interesting way in which these categories blend because you could imagine a law or instruction being something like okay you know follow the relevant laws when you engage with the other agent but if the other agent is doing something Shady like please report it you know to some sort of agency or whatever and maybe this feels sort of like a little scrummy scummy or something talking about in the in the abstract but like yeah I I could imagine some sort of system like like this um what was the second uh part of the otomy again uh so there was make the world more legible to the agents there was like physical kind of physically either make it impossible or make it like difficult for the agents to behave unsafely so like speed bumps would be one example of this on roads um uh I don't know things that prevent you from going too fast or things that kind of ensure that you look in the right direction or or yeah I guess the laws that um okay I'm not sure I have tons of examples of this but yeah this is the I I don't drive a car so right right right my lack of knowledge is showing right now but speed speed bumps I'm aware of I haven't driven a car in a very long time um yeah I mean like I guess the direct analogy to speed bumps is like rate limits which we already sort of do to prevent dos attacks so maybe this is relevant across more domains um I I guess another analogy is like limiting the affordances of an agent so so I I guess there's some sort of ambiguity about whether this is whether you it's like the developer doing this or like the agent you're interacting with is is doing this right uh but there is a difference between okay I'm just going to give my agent access to my computer and my terminal they're going to have pseudo access you know like computer use right like let's just let's just lose everything versus okay like we're really only give going to give your agent access to specific kinds of apis we've tested them very thoroughly like the functions are very well defined hm yeah so so I guess there's like just like a whole host of things you could potentially do in the agent infrastructure space I'm wondering are there specific examples of things that you're particularly keen on yeah I guess this is the thing I'm pretty confused about what to prioritize some horis STS to Think Through are one what are more or less easy wins especially things that you might be able to get in through existing efforts or existing legislation okay so one example of this is IDs for agents like at the most basic level just some sort of identifier but you could imagine attaching more information to the identifier like an agent or System card or something like this or history of incidents or some sort of certification that we might develop in the future okay and the reason why I think this is a relatively easy win is because um so firstly there is just like a general consensus I think about transparency around where AI systems are being used um so to the extent that exists it seems very useful to tell a user or another party oh you are actually interacting with an agent and maybe this is the agent identifier so that you can tell opening ey or something what went wrong uh that this agent caused the problem if the thing went wrong um so like there's this already the social consensus um the next thing is that I think it's quite plausible in legislation that you could get this through so in the AI act I I forget if it's article 50 or something but there's the section saying like providers of general purpose AI systems have to make sure that when users interact or or sorry when natural persons interact with those systems they know that they interacting with general purpose AI systems and I guess to me like a read is like oh like yeah like let's have some sort of identification for these kinds of systems um so seems like an easy win I guess the question though is like even if it if even if it is an easy win is it actually useful thing to do which gets to my second horis um it's like let's try to identify pieces of infrastructure that can be scaled up or down in I'm going to call it intensity or severity according to where to what evidence we have before us right now so I I mean if you don't actually think that agents are going to cause a bunch of problems like if you think it's mostly going to be fine I think it's totally fair for you to say okay like maybe I'm fine with having an ID that has some identifier but like I really don't want to have any user identifying information like I just really want to be the basic oh somebody knows that an agent is doing something right but if you we really if we did get substantial evidence that okay like we just these agents are just generalizing all the time and they're just finding ways of getting around our like limitations their affordances and getting a bunch of new tools right and maybe somebody's more likely to say okay like we really number one have to make sure that um some user at the end of the day is going to be accountable um for like spitting up this agent and um number two like we also want to attach information to this agent about the affordances that it has available so that other people can be like oh like why does this agent have so many tools you know maybe I'm I'm going to be little suspicious not want to it I wonder like so that makes a lot of sense I'm do do you have examples of um infrastructure that fail or proposed things you could do that would fail that second test where you couldn't like easily scale them up or down um just to help me think about yeah yeah yeah yeah so yeah this is a good prompt I guess it depends on how you individuate infrastructure right so one example is um suppose you had some sort of ID system and like you built some sort of kill switch so that some actor was able to just unilaterally kill an agent uh correspond to a particular ID it seems hard to like get rid of that if it's if it's like if you force open Ai and to like build it into their agents or something it seems liable that the government might misuse it in in in some sort of way okay um so so maybe what I'm alluding to here is that like um there are like steps you can take and then like there is in some sense a point of no no return or something like this yeah yeah fair enough um yeah so so I identification for agents um it seems like that's a thing that you're something of a fan of is there or or at least interested in right now um are there other things yeah so one thing is I guess yeah we call it like agent channels or agent highways in the paper and the idea here is you probably want to incentivize some isolation of agent traffic from just so it depends on how you operationalize it it could be from like human traffic or from traffic from other software systems in in general um but the reason is that you might want to be able to shut down agent specific channels and minimize the impacts of that on just people going about their daily lives so for example um some agent spread a text worm over a channel you won't be able to just shut that down and not have that text worm infect other people's AI assist uh AI assistance gotcha and and by Channel you mean like like the like like the medium that the agent is using like there are a variety of ways of operationalizing this the sort of easiest way is okay let's just have separate apis for services so like for Airbnb there's like an agent API versus you know the regular API and maybe eventually Airbnb realizes oh wow like actually most of our traffic is coming from Agents let's just shut down the regular API or something like this that's like maybe the easiest way there of course some feasibility problems and questions here um you can even go a level more in be like okay like should we have like separate IP addresses for for agents you know like separate you know even just internet Hardware agent stuff running so that we can just like go in and shut down those servers specifically and have the rest of the human internet running it potentially seems like a win-win like suppose you're really bullish on agents you might want to be like hey I I don't want humans cluttering up my you know beautiful agent Channel you know yeah totally totally yeah I I think there is definitely an argument you can make to people who who do want to develop and deploy agents that yeah this is also win for them so one thing this is kind of reminding me of is sort of like Redwood researchers work on AI Control right like like just you know making sure AI aren't able to pull any funny business like whereas this is seems like sort of an externalized version of control I'm wondering um do you think that's an app comparison and like what what do you think the similarities and differences to that approach are yeah so I'm a huge fan um I will say though I am not like fully read up on everything that they are doing at this particular fair enough moment um yeah do you happen to be no okay sorry I I I haven't been productive enough at this Workshop um yeah I mean I I could imagine that like so I guess it's my my understanding is something like okay the setup is we have some like trusted models we have this bigger model that's potentially not trusted we want to understand what we can do to either constrain what this bigger model can do or have some sort of assurance about what the big Models Behavior is yeah um and I guess the connection like I could totally imagine like if you come up with some really good protocol for making sure that your smaller models can say something meaningful about the larger models like this thing could just be infrastructure you scale it up being prot you know everybody just like downloads this or something right so like how this could work is like okay I have my gpd6 agent and it's interacting with your gpt7 agent and like just to make sure that I'm not getting fooled or something like this I take an AI Control protocol and I just like use it or right right yeah yeah and a lot of control yeah is just like oh okay you know we're going to make sure we have the right internal API so that you know this model um submits commands to this thing and this model gets to gate execution of those commands in this place right um like somehow it seems like it's a similar flavor um yeah so so um so so you've mentioned this uh paper um like if if listeners are listening to this and they're interested to read your thoughts about like agent infrastructure stuff um where and when can they do that yeah so uh hopefully this paper will be out soon um yeah I mean it's been in the works for quite a while I feel like I've started writing this paper in like June is but like the thinking has been you know and it's gone through a lot of revisions and still going through some sort of feedback um hopefully by the end of the month like hopefully by December I think like yeah I should be able just to go to my website or go on archive and find it um otherwise I do have like I would say some agent infrastructure stuff but maybe not as cohesive as a as a vision like maybe more more High Lev stuff already out okay um and where's where's the stuff that's already out um you can check out my Google Scholar so visibility into AI agents was the most recent thing or sorry um IDs for AI system so we actually try to sketch out how this ID system could could work um in that paper and then visibility into AI agents is another paper that's a bit more high level trying to sketch out what other stuff might we need to help governments and Civil Society just know what's going on with agents great well um thanks very much for chatting with me today thank you it was a pleasure this episode was edited by Kate Bruns and Amber on a helped with transcription the opening and closing themes are by Jack Garrett financial support for this episode was provided by the long-term future fund along with patrons such as Alexi maaf to read a transcript of the episode or to learn how to support the podcast yourself you can visit hrp.net finally if you have any feedback about this podcast you can email me at feedback axr p.net [Music] [Laughter] [Music] for [Music]

Related conversations

AXRP

3 Jan 2026

David Rein on METR Time Horizons

This conversation examines core safety through David Rein on METR Time Horizons, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -0 · 108 segs

AXRP

7 Aug 2025

Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -5 · 133 segs

AXRP

6 Jul 2025

Samuel Albanie on DeepMind's AGI Safety Approach

This conversation examines core safety through Samuel Albanie on DeepMind's AGI Safety Approach, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -4 · 72 segs

AXRP

1 Dec 2024

Evan Hubinger on Model Organisms of Misalignment

This conversation examines technical alignment through Evan Hubinger on Model Organisms of Misalignment, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med -6 · avg -7 · 120 segs

Counterbalance on this topic

Ranked with the mirror rule in the methodology: picks sit closer to the opposite side of your score on the same axis (lens alignment preferred). Each card plots you and the pick together.

Mirror pick 1

AXRP

3 Jan 2026

David Rein on METR Time Horizons

This conversation examines core safety through David Rein on METR Time Horizons, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Spectrum vs this page

This page -10.64This pick -10.64Δ 0
This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -0 · 108 segs

Mirror pick 2

AXRP

7 Aug 2025

Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Spectrum vs this page

This page -10.64This pick -10.64Δ 0
This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -5 · 133 segs

Mirror pick 3

AXRP

6 Jul 2025

Samuel Albanie on DeepMind's AGI Safety Approach

This conversation examines core safety through Samuel Albanie on DeepMind's AGI Safety Approach, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Spectrum vs this page

This page -10.64This pick -10.64Δ 0
This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -4 · 72 segs