Library / In focus

Future of Life Institute PodcastCivilisational risk and strategy

How AI Can Help Humanity Reason Better (with Oly Sourbut)

Why this matters

This episode strengthens first-principles understanding of alignment risk and the strategic conditions that shape safe outcomes.

Summary

This conversation examines core safety through How AI Can Help Humanity Reason Better (with Oly Sourbut), surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Perspective map

MixedSocietyHigh confidenceTranscript-informed

The amber marker shows the most Risk-forward score. The white marker shows the most Opportunity-forward score. The black marker shows the median perspective for this library item. Tap the band, a marker, or the track to open the transcript there.

An explanation of the Perspective Map framework can be found here.

Episode arc by segment

Early → late · height = spectrum position · colour = band

Risk-forwardMixedOpportunity-forward

Each bar is tinted by where its score sits on the same strip as above (amber → cyan midpoint → white). Same lexicon as the headline. Bars are evenly spaced in transcript order (not clock time).

StartEnd

Across 92 full-transcript segments: median 0 · mean -0 · spread -11–9 (p10–p90 0–0) · 0% risk-forward, 100% mixed, 0% opportunity-forward slices.

Slice bands

92 slices · p10–p90 0–0

Mixed leaning, primarily in the Society lens. Evidence mode: interview. Confidence: high.

- Emphasizes alignment
- Emphasizes safety
- Full transcript scored in 92 sequential slices (median slice 0).

Editor note

A high-leverage addition to the AI Safety Map that clarifies one important safety bottleneck.

ai-safetyflicore-safetytechnical

Play on sAIfe Hands

Episode transcript

YouTube captions (auto or uploaded) · video BTe7kczm2oc · stored Apr 2, 2026 · 2,356 caption segments

Captions are an imperfect primary: they can mis-hear names and technical terms. Use them alongside the audio and publisher materials when verifying claims.

No editorial assessment file yet. Add content/resources/transcript-assessments/how-ai-can-help-humanity-reason-better-with-oly-sourbut.json when you have a listen-based summary.

Show full transcript

In the medium to long term, automation will just win out eventually in most cases because it's always going to be more efficient to have the tireless doesn't get sick AI systems that can just kind of run 24/7. But that of course comes with all kinds of costs. There are some really important decisions to be made by individuals, by groups, by societies about where we actually go next and how it is that we're directing that. If we can in the meantime enable better decisions to be made by individuals, by societies, then hopefully even if it's the case that everything eventually gets handed off to AI, we'll be in a better position to trust that and to know that it's going to be trustworthy and as a society kind of endorse that and move in that direction in a way that we think is wise. If your system has systematic blind spots, then you might expect that it could perhaps surreptitiously or even inadvertently kind of surface a biased summary of the situation that that could lead you to systematically biased decisions. Far too often things are just sort of chaotic and confusing. People fail to coordinate. People fail to understand the consequences of their actions. They even fail to understand what their options even are in the first place. And each of these problems we think is remediable to some extent with support from tools many of which would incorporate AI. >> Ollie, welcome to the Future of Life Institute podcast. >> Hi guys. Nice to be here. >> Great. All right. Do you want to introduce yourself? >> Sure. Yeah, I'm Ollie. Um, I work at the Future of Life Foundation. Um, one way to think about FLF is it's it's kind of like a little spinout from FLI. We take a slightly different strategy. We're looking at being a kind of accelerator or a I think an accelerator is the right term for projects that might be neglected in making the future go well and especially we got a big focus on AI right now as everyone has this is you know this is a hot topic right now. >> Mhm. So, so we can categorize these tools into say three categories. So, epistemics, coordination and risk targeted applications. >> Yeah. So one one area that we're really interested in is what we've called AI for human reasoning. And this is kind of one large focus for FLF right now. It's not the only focus, but it's an important one. We have some other kind of back burner priorities that we're trying to work on as well. But when we say human reasoning, uh what do we mean human? Well, both individuals, but also groups uh and all the way up to large societies and and even humanity as a whole. And then reasoning, we're referring to kind of the whole decision-m cycle. So from making observations, coming to understandings, modeling the world, um, and with groups, we're talking about communicating and then through to making decisions and even acting together and coordinating and this kind of thing. So reasoning is supposed to encompass this whole thing. And part of why we think this is an important kind of uh package of things to consider together is that they're really important synergies. So when you understand things better, you can come to better decisions. When you can come to better decisions and understand each other better, you can coordinate better and so on. There's all these interesting synergies. So the other reason we think it's important right now is of course the world is only getting more complicated and enabling individuals and groups and societies to reason better about the options we have in front of us for our near and long-term future is going to be really important to make sure that that goes well. It's very often things kind of just kind of meander around. It feels like things are happening by accident or it feels like things aren't really being chosen in a what we might think of as a wise way. And often things go in directions that really not anyone wants and and that's you know on its face that's paradoxical and we think it's because as a society we need to kind of elevate this ability to do reasoning. >> Mhm. What's a good example of a project that's that relates to AI for human reasoning? >> Yeah. So I guess I can give a couple of examples and then talk about the we've kind of mapped the space a little bit more generally and we've got some initiatives that we're trying to get started and we've got some that we're already supporting. So, a great example of something already existing here. Um, have you heard of community notes on Twitter? I can give a kind of rough rough explanation, but you can think of um things like factchecking and adding context. Historically, they've been performed in this kind of centralized or roughly centralized fashion. You've got these media broadcasters. They're relatively few. They're relatively longived and they therefore have this kind of reputation and identity. They're sort of trackable by states and so on and there are some rules they have to follow. Maybe they publish retractions. maybe there are certain kinds of standards they have to follow with regards to their evidence and so on. So this kind of centralized facteing happened and then of course social media kind of blew open that uh now everyone is a publisher and so we're in this slightly different world and for a while people tried to apply the same kind of centralized facteing mindset and and this wasn't really scalable in practice now community notes is one of some kind of innovations in that space. The first principle is can we crowdsource facteing and it's not just facteing it's adding useful context as things that might that people might find useful you know in the context of some kind of social media post the way community notes achieves its kind of trust because of course this facteing is quite a sort of powerful and important um duty in a way or like providing context is really important people need to be able to trust that and merely having it be kind of vote based and this kind of thing you could imagine that being quite swingy or it could be dominated by some factions or whatever or manipulable and so community notes achieves this kind trust. It uses this bridging algorithm which means it it looks for proposed notes that people have written which achieves some kind of consensus of being useful um according to uh an inferred sort of axis of usual disagreement. So it kind of looks for and what that ends up being in practice is that like the principal component of disagreement is usually left to right politics but not always but that kind of gives you this um it gives you this kind of gold standard for this note is considered useful by a broad segment of society. It bridges gaps. It's useful in that sense. So this is community notes. It's crowdsourced in that sense. Now one project that we're supporting is ways of accelerating that. So a real issue with community notes is um you know some misleading or confusing or unhelpful uh message can kind of make it around the world go viral in minutes or hours and the trouble is this this community notes process of course takes time. Someone has to notice and flag this. Um then it has to enter kind of a queue or a pool and the community of people that contribute um notes they have to then notice this. They have to do some research. They have to put together a potential context that they want to provide. then that has to go into this voting algorithm and we have to kind of infer is it a useful note and all this kind of thing. So this can take hours or even days and so that's unfortunate if this thing has already gone around. You know there's the classic saying the the lie makes its way around the world before the footh has got its boots on. So one thing that we're supporting there is can we accelerate that through tools which support um note makers and note graders and one way that you can accelerate that is actually have AI perform some of the research or even drafting of notes but of course maintaining the gold standard of the voting and the rating and this kind of thing and there's there's an interesting kind of design tension there. So that's one direction we're supporting and that comes under the kind of umbrella that we're calling collective epistemics. It's a sort of platform focused intervention there. We could talk about other kinds of interventions there. As another quick example, um another thing we're supporting is various initiatives in scenario planning. You may have come across deep research. This is a kind of form factor that various AI companies have put their products into where language models have been trained to do not only kind of surface level web search but also kind of jumping a little bit deeper doing something like a literature review and then putting that together into some kind of context. So you can you can see that there's potential there for these quite deeper sort of agentic workflows using LLMs in sort of structured ways with appropriate scaffolding and tuning and so on. And they're able to support these kind of exploratory investigative processes. They're not replacing them. It's not like you would just fire and forget or you know trust the output entirely but it's able to inform an analyst or someone like that. And so for scenario planning and and this kind of other sort of institutional decision-m and perhaps you know outside of institutions as well we think there's a lot of potential there. So we've got a couple of people that we're supporting working on that kind of thing as well. Um this scenario planning thing is it's sort of close to my heart because certainly I would have been a customer about a year ago or a bit longer now when I was working at the AI safety industry in the UK. I was just such a kind of person who would have absolutely loved to have this support if it was well made into into a quality tool. And so I have several friends who are still you know still working in that context and and there are people working in similar contexts in other countries. Um yeah, so and and and this really is like elevating augmenting decision-m and this is another kind of broad umbrella of human reasoning that we think is really important. >> Yes. So one reason why this approach is interesting is because we're interested in keeping people informed and people in the loop while they're making decisions. And so I see both of these examples you just mentioned as trying to empower people to understand what's going on while it's happening. And so not sort of handing off decisions fully to AI but trying to use AI to to empower ourselves. How do you see that going forward? Do you think that the approaches like these can scale as AI scales uh in in the tempo that AI is progressing at? >> Yeah. So this this is a really this is really thorny and we we we do give some thought to this and one way you can look at it is realistically perhaps we should expect in the medium to long term automation will just win out eventually in all you know in most cases because it's always going to be more efficient to have you know the tireless um doesn't get sick you know AI systems that that can just kind of run 24/7 and and furthermore they can think thousand times faster and and perhaps even you but they can think better. I think this is definitely on the table as well. Um, but that of course comes with all kinds of costs because once you've taken this, I'm not saying anything new here like once you've taken humans out of the loop like the trust is maybe harder to get and then also maybe you might think over time like you might lose the ability to kind of oversee and understand what's being done. So keeping humans in the loop is one phrase. I actually really like a phrase that Audrey Tang sort of has has publicized a little bit which is why can't we have machines in the human loop? And and that is that is kind of one of the kind of it's a sort of aesthetic we're going for here which is there are innumerable ways we can imagine we already have tools which are part of our individual systems that augment us part of our collective systems which augment us why why stop that like why why end that process we have all these interesting building blocks now LLMs are the most obvious one and they enable all kinds of design the design space is massively opened up right now and it's really underexplored and underexploited and yeah so so we think that there's much room to kind of improve that but it might well maybe you think that you know this is only a temporary measure. I think that could be the case but even in that case there are some really important decisions to be made by society by individuals by groups by societies about where we actually go next and how it is that we're directing that when we're making various kinds of transitions and which transitions we're making. I think a lot of people think of this kind of linear timeline, you know, when are things going to happen and then it's just bish bash bos you know ABC happens but actually the sequence of things and even the choices we have about kind of what parts of the tech tree we explore I think these are genuinely open I I don't think it's I don't think it's just naive to think that there are actually decisions to be made and so if we can in the meantime enable better decisions to be made by individuals by societies and better on their own lives we're not trying you know we're not the two examples I've given you can see they're very augmentative they're not kind preachy. They're not giving a specific position, but they're they're supporting reasoning and helping people to come to better decisions. And if we can do that in the meantime, then hopefully even if it's the case that everything eventually gets handed off to AI, we'll be in a better position to trust that and to know that it's going to be trustworthy and as a society kind of endorse that and move in that direction in a way that we think is uh wise, let's say. >> Mhm. There's a lot of there's a lot of talk about chatbots becoming AI agents instead. And how so how much how much do you think this is about the form factor or the interface, how you're interacting with the AI models? Do you think that for us to stay informed when we make decisions and make better decisions? Do do we need something different than a chatbot? And and specifically, do we need to avoid agents or do we do we just need to use agents differently? >> Yeah, this is a good question. Again, I don't think it's a dichotomy necessarily. So, you already have chat bots which have, you know, tool integrations. Is that an agent? It's kind of an agent. The form factor, the interface, at least the human is still chat. Sometimes there's this whole massive wall of text that you get when it's not been when it's not doing a good job of keeping you informed about detail. Sometimes you just get this annoying wall of text and that can be overwhelming. Other times you get too little detail like it's maybe it's compressing far too little. But the form factor is kind of the same, at least to the human side. But it's doing it's using tools. is going and making web searches and so on. And then there's the kind of I mean you can build any kind of tool integrations and we did a lot of this at AC and at that point you're kind of stepping towards something like an agent but it's not autonomous. It's not living on its own. It's not like out in the world. So and that would be like the really extreme end of agency where maybe this thing is uh self- sustaining in some way. I mean in maybe even replicating these kind of scenarios can get I guess I mean scary and yeah. So, so, so moving along that and maybe it's not even one axis either, but I'm kind of painting as if it's one axis. Certainly moving along that axis things get the stakes get higher. Um, the capacity for oversight goes down and hopefully people are going to make the right trade-offs about where to kind of be on that axis. So, let's think about deep research. That's kind of an agentic workflow. You're you're kicking off an agent. It's a kind of contained agent. It has various affordances for for search primarily but also to kind of do sequential search and kind of perform searches on the basis of findings it's already made. You've got various other kinds of like limited agents. People use agents for coding. They often have access to some kind of sandbox code environment. They're able to make edits to files. They're able to maybe pull from APIs. They're able to load libraries and so on. This is another kind of kind of somewhat contained agent. And then maybe the oversight you have there is in principle you could look at every single commit it's made and every single edit it's made. But in practice, you're probably not going to. You're going to look at some kind of summary and this kind of thing. So yeah, moving along that agent access it, at least your first approximation, it's it's enabling more to get done between human kind of check-ins, but the exact trade-off there is that it's kind of reducing oversight potentially. And I think there's a lot of design that can be done around enabling the really important pieces of information to be surfaced and also attempting to guarantee that like appropriate check-in happens at appropriate points as well because the more longrunning tasks the more the agents going to kind of run into all kinds of small decisions maybe but perhaps larger decisions as well. Um I mean at some stage you maybe you're imagining things like partially autonomous corporations where you've got various kinds of functions being fulfilled by AI systems. They don't need to be agents as well. I think people imagine, you know, drop in um remote worker. This is like a phrase that gets thrown around. People imagine this kind of fullyfledged self-contained agent, but it doesn't need to be. It can be a kind of agentic workflow. It can maintain context over time. It can perhaps learn over time, but it doesn't need to have every single affordance under the sun. And you know, the oversight there can be entirely reasonable if it's welldesigned. And that's that's the big that's the big crux there. It's like if it's well designed, how do you know it's well designed? I mean, there's all these kinds of questions. Do you think we are ready to work with the models as they are now? So for example um we can do a lot of scaffolding we can implement a lot of tools into a model but if the model is not aligned with our interests it might go go badly anyway. So you you can imagine that the model presenting information selectively to us and so pushing us to make a certain decision. Is that is that something that that that's potentially solvable with um extras on top of the model uh scaffolding tools stuff like that or is that a deeper issue that has to be solved before we can begin integrating uh these models? >> I think that's that's a great question. So maybe I have a few thoughts about that. One is yes absolutely we should get the guarantees that we need to get about the inclinations of the of uh the models we're putting into these systems to be thorough to produce legible outputs to not be biased in certain ways to not have you know blind spots and this kind of thing and a lot of this we might call these these are kind of epistemic qualities or virtues. So in fact, one of the one of the pieces of our kind of human reasoning puzzle is um can we ensure that the AI components we're putting into these tools for human reasoning in particular LMS like particularly salient LMS can we ensure that they are virtuous in these ways that we think are necessary not only as kind of conversation partners and as agents but as building blocks for tools that we might want to build downstream of that. So let's go back to this kind of scenario planning uh deep deep researchesque but geared towards scenario planning. Let's go back to that example. If your system has systematic blind spots or even if it's scheming and it like wants to hide certain kind of uh parts of the mechanics of the world or or like of the specifics of the of the situation which is kind of a much more penicious thing then you might expect that it could perhaps surreptitiously or even inadvertently kind of surface a biased uh like summary of the of the situation. that that could lead you to kind of systematically biased decisions. I guess I'm trying to think kind of concretely if there was some kind of political shenan going on and like there were particular parties that were behaving shadily. You could imagine a sufficiently um either blind spotted or sufficiently kind of scheming model that was part of a system like that. You can imagine it just kind of discarding such references before they kind of bubble up into the eel system and that could give you these blind spots. So yeah, one way can we can potentially counteract that is building in these guarantees and and that's a strong word I perhaps need to soften that but building in some um some level of incentives for the developers and also uh benchmarking and testing and so on for these epistemic like thoroughess legibility and this these kind of qualities. One way that that gets done in contemporary AI very often the way that you make progress is by having benchmarks by having testing suites and this kind of thing. So what one way we're hoping to kind of incentivize that whole area is to enable people to build really great environments for testing these epistemically virtuous properties like thoroughess legibility skepticism this kind of thing. There are already kind of examples of that. So you may have heard of retrieval augmented generation rag. This is a kind of very common way of kind of scaffolding up an LM is you provide a you provide a corpus of of of extra materials. Maybe it's like some some materials on a particular domain. Those are indexed sort of semantically and then the model rather than doing a web search it can do a kind of search into this index and retrieve content and then that content enters the model context and then the the subsequent kind of answer or generation or whatever it is can incorporate that and they're already evaluations that people build well how grounded are the generations of systems which are using these retrieval augmented components and so that's that's like I can imagine that's like that's a microcosm of this kind of epistemic virtue of being thorough or something like that and so you can imagine really expanding the way that people for looking at this and a big part of that is things like being biased or or being um particularly sensitive to framing or like treating particular people or characters or institutions preferentially or dispreferentially these kind of virtues again you can kind of produce evaluations that that at least give some confidence I'm not going to use the word guarantee again if in I but at least produce some confidence that the systems when deployed in in other important contexts are not going to have these kind of venicious properties. >> Mhm. Another thing is again look thinking about rag retrieve augmented generation um and thinking about web search as a kind of augmentation there and the importance of legibility of the inputs which have been gathered for a given downstream generational task or whatever I think that's an important thing as well that kind of auditability and traceability and then knowing that the corporate being drawn from so whether it's the web search indexes or whether it's these kind of rag indexes knowing that those are themselves well constructed well structured and are able to provide these auditability legibility guarantees that can also enable these systems. So it's not then it's not about the LM itself, it's about the overall system and those other components but it knowing that those things are well constructed can enable ideally more trustworthy downstream system overall. So this gets to another direction that we're kind of considering as well which is can we and this is just kind of one interface into that but I recently read a blog post with my colleague Ben where we're talking about the concept of a full epistemic stack and this is where we're asking the questions like what is it that actually constitutes epistemic practice and how does that happen between people? So how does a community have good episodic practices? And it's things like legibility of citations and sources, what we might call provenence of ideas and data and so on. And if you think about in science, you have some of this. You have citations, but sometimes they're very crude or kind of or rough. And then those citations have citations and they make claims and maybe those claims are kind of backed up by some of the evidence they're pointing to or some of the kind of like reasoning or the kind of research they've performed in that piece of work. And then you, you know, the conclusion gets kind of distorted. you got this I think in America it's called the telephone game in Britain we say Chinese whispers there's this process where the information kind of get distorted uh so so so enabling that to to act to work better is one thing and then there's also the process of discourse where a position is put forward there are reasons to be uncertain or skeptical and then other investigation is performed other insights and and positions are brought brought in and then the kind of the dialogue moves forward and the state of the discourse expands and improves and enabling that to in a more kind of fluent and perhaps auditable, legible way is important. This is a little bit more kind of nent. I I feel I realize I'm talking kind of various kind of abstractions. I'm waving my hands which is a sign that I'm maybe not being concrete enough. Um you can read the blog post if you're interested. Um but but we think so so this is like this is a system where you can imagine building lots of tools and processes many of which are AI augmented which enable us to build up these structures and these corporate and these metadata and annotations and so on which then themselves can be fed into downstream AI systems. So so so it's using LMS to build the data sets and structures which can enable AIS and other systems to be kind of more effically virtuous downstream. So how how much do we need to improve on the human side of the full epistemic stack? If we imagine that stack to be a collaboration between AI and humanity, how much of of that is is on us? So for example, uh there's a massive difference between naively using NLM to try to produce some output and then sort of using best practices, doing everything you can to get the most out of the current models that that you can. Do we need do we need some form of training? Do we need is could this be integrated into the scaffolding in some way such that how however simple and naive your prompt is, the system will will sort of perhaps ask additional questions of you. Some systems already do that. Do we need to improve in in order for this system to work? >> I think ideally you're not asking everyone to suddenly become an expert in you know best practices of using AI. But of course people will learn and adapt and the best practices will spread. I guess a slightly different spin on on your question when you say do we need to improve? Yes, absolutely. Like the whole point of this AI for human reasoning is to enable us to improve in the ways that we you know we might endorse if we were able to coordinate better or if we were able to kind of communicate better. But is it necessary for us to improve in our practices for how we're even using these systems? I think part of the point of the of things like the full epistemic stack is so the way we're conceiving it at least is it's meant to be pretty adversarily robust. it's meant to be pretty robust to various kinds of scales of contribution. So whether it's a kind of offhand remark on Twitter or whether it's a full scientific paper, I'm kind of thinking like ultimately that can be um ultimately that can be incorporated into the same structure. It's all discourse and it all has some degree of evidence and sourcing and so on. So you know on Twitter maybe the evidence is usually pretty implicit. Occasionally people provide links to to articles or to papers or whatever when they're making a point and then that article itself or that paper itself has some context and so on. So there's some structure there. And then scientific papers, I mean, they're a bit better, right? But we've all read papers where we think the citations are or some of us have written papers where we think the citations could could be better. Um, so yeah, absolutely. I think we the way we're conceiving is this. Can we can we be as robust as possible to these varying scales of contributions? Adversariality is a really important part of this. So the community notes example is a case where it's supposed to be as adversarily robust as possible because of course in a in a large community of communicating individuals, people have different priorities. Sometimes they compete and sometimes it benefits me for you to believe something that maybe I don't actually believe and so you know this is deception and this is a huge problem but we think that the we think an epistemic sack can be constructed which is at least largely robust to that kind of thing. >> Mhm. >> Um improving practices yeah there is a kind of bias I've forgotten the name that people use for it but there's a certain kind of overly trusting approach to using automated systems and computers in general. I don't know how widespread this bias is still today. I do wonder if some of the research done on that may have been older. I don't know what the latest and greatest research on that is now. It might be that I would hope at least that people have kind of learned to be a bit more skeptical, but I'm not sure. >> Yeah, it might be increasingly tempting for anyone basically working at a computer to to hand over more and more tasks to these systems and perhaps to hand it over in a way that is not fully thought out. So maybe you're you have to present something to to your boss tomorrow and you haven't really prepared anything and so you you throw something something together using a model and this is not really this is not really how you get the most out of these systems but again I'm thinking this is sort of a human weakness and so is there are people really trying to get the most out of these systems as they currently exist or is the bottleneck perhaps what we are trying to do with them as opposed to what they can become and can do. >> Yeah, that is interesting. So, I mean you said there it might be tempting to kind of have these lazy uses. I think in some cases it might even be imperative if people are under a lot of pressure. There might even be sort of selection pressure like hiring and firing and this kind of thing can even come into play and that can happen at different levels as well. Um yeah, I I believe in that as a pressure. I think it's really concerning. Um when you say are people trying to get the best out of these things? I mean, this is the other fascinating thing about LMS is that as people learn how to kind of prompt them and how to how to tool them up and how best to give them the context they need. It it often turns out that um you know a given LM from 6 months ago can actually do some stuff that people had thought it couldn't do as long as long as it's provided the right context and tools. This isn't shocking but this is you know this remains the case like even as the kind of you know the latest models come out also scaffolding improvements continue to reveal capabilities that are new. So in some sense people are trying. Could they try harder? Probably. I think maybe what you're asking is are most people trying or can we expect most people to try hard to get really I'm going to use the phrase epistemically virtuous again really epistemically virtuous outputs out of these systems when they're under all these pressures and so on. No, I don't I don't think it's fair or even sensible to expect um everyone to be doing that. But what we can hope to do is enable people to make the best choices about which tools to be using and to be using the tools which are going to equip them best to understand the situation and to make better decisions rather than the tools which are kind of maybe they're biased or maybe they just aren't helping that person to understand they're just kind of producing slop or whatever. >> Mhm. uh uh and the way you enable people to use better tools is both to make the better tools and also to enable people to evaluate which tools are better. >> Mhm. And perhaps we can then have norms and rules if you are high up in governments or high up in corporations for how to use these tools best or when they should be used. >> I can imagine that happening. Yeah. Yeah. I certainly there are already rules for all kinds of things like there are already auditing rules for various kinds of high stakes business functions. there are already auditing rules for and scrutiny rules for various kinds of at least in the UK I'm most familiar with the UK context but I think it's the case in other countries as well all kinds of auditing and scrutiny rules for political decision makers as well so we already have these processes yeah so it's imaginable that that could extend to um to use of various kinds of AI tools and so on whe whether we can expect or want that to happen like on a more broad basis I think that's perhaps less likely you can imagine extending some kind of um we're with kind of epistemic virtue evaluations. We're imagining that primarily applying kind of normative pressure to developers and enabling people to see at an easy glance benchmarks which can kind of tell them, well, this system is the most honest system. You know, this system is the the produces the most legible outputs. This kind of thing. And at that point, you know, there's pressure on the developers. There's you're enabling consumers and users to equip themselves in the way that they're not going to be they're going to be least misled. And we're not currently imagining that kind of entering into some kind any kind of regulatory regime. You could imagine though if it turns out the best practices and make it relatively easy actually to construct these systems which are on the whole epistemically virtuous. You can imagine it being you know mandatory or semandatory like well why not just use those best practices? Why not actually make sure that your systems are kind of honest and and legible? Um it may even not be necessary at that point to regulate. It might just be kind of why wouldn't you do that? Of course, there are always these kind of long tale of developers, especially open source and fine-tuning and this kind of thing, like kind of anyone can make their own system. If they want to make a lying bot, then they can do that and it's pretty hard to stop them doing that. But people can already go around and lie. So perhaps that's not um they're not going to get a reputation for honesty that way. So maybe that's how we defend ourselves. >> It seems to me that at the moment you can you can also become dumber by using AI more. You can also your output can also become worse. So if you are but it depends on what you're producing right if you're producing if you're a lawyer producing a standard contract and NDA or something that's probably or that that might be doable by by by models. If you're producing a very high stakes complex contract between you know billion dollar corporations that's a different story and you would you would get fired for AI generating that. How do you how do you see sort of is it just the higher the stakes the more you have to think about whether you're using whether you should use models and perhaps you should not but then it seems that the higher the stakes are the more we should want these tools to help us and we shouldn't just >> Yeah. Yeah. Yeah. For sure. >> Yeah. >> I think this gets back to the this gets back to the sort of aesthetic is the wrong word but the point of AI for human reasoning is to enable and augment human reasoning to have machines in the human loop. um in Audrey Tang's words and that um to to maintain those guarantees that people are at least kind of understanding what's going on but yeah I want to I guess I want to say a few things about that one one thing that I think is maybe penicious is there's this kind of process in many industries where junior people come in they do their time doing these like somewhat rote tasks they're learning by doing this a kind of apprenticeship they're developing the skills that they then become more fluent in which enable them to then take the more expansive tasks on and so on and this is kind of progression of of apprenticeship and mastery. And this happens in all kinds of domains and disciplines. And uh we're now entering the regime where the more seniors, the more experienced people can actually instead of delegating to this apprentice who's going to take many years or whatever months to to train. They can just delegate some of these tasks to AI. They themselves have been sort of augmented and elevated in this way. They now have an army of apprentices. But now the human apprentices are kind of not long on that ladder that's enabling them to um to develop whatever it is in that discipline which they need to become a master which can include you know the these skills these kind of drills almost but it can also include to some extent it can include networking as well which is just you know as a part of people's context isn't it so that's one kind of penicious thing here which is a a little bit of a tangent from what you asked but I think this happens in in at the societal level as well and then there's at the individual level where people's skills are kind of eroding because there's a feeling I've noticed in myself a few times recently where I'm like, "Oh, I'm sure that like this thing I'm trying to write like I'm pretty sure, you know, Claude or Gemini, whatever could write this." And there's this kind of urge to maybe go do that. And I've tried it a few times. But for me, I tend to find that producing the actual content. It's still preferable to be primarily doing that myself. Even if I'm quite heavily making use of AI tools for kind of giving me extra context and research and so on. Um there's interesting findings in both directions for for coding actually. And so I've used some of these tools myself. I'm a bit behind the times myself. So I'm kind of interested to try the latest and greatest. But certainly what I found was that as a kind of enriched autocomplete, it can be very useful. You can move a lot faster actually because you know what it is that like that you're about to produce and if you're able to kind of describe that in a comment or if there's some kind of preceding context which kind of makes it obvious. There's so much code which is kind of boilerplate. You can just kind of tab through that and you basically get something that's about right. It's not the exact thing you were about to write but like far fewer keystrokes, you know, and you're you're spared along your way. That's really useful. And I've tried a few of these kind of more like full hands-off vibe cody things and I found that I'm sure there's a skill to develop there for me to be able to do that properly and I've tended to find the outputs are kind of appealing in certain ways but they're definitely not the thing I was trying to get at. And then there's a whole load of effort to kind of refine that like no please do this please do this and and and that's kind of effortful but I'm I'm sure the skilled vibe coders are a little bit better at this but there's interesting findings in both directions there because you hear there's definitely kind of some research which suggests that and definitely anecdotal evidence people saying they're being accelerated you know 10 or 100 times faster or whatever by use of these coding tools and then there's been some research that people's perceived velocity is much improved but so far their actual velocity isn't um I absolutely think in principle and perhaps quite soon the velocity really will be sped up. It may even be that we're at that stage already. But it's interesting that we're kind of on the cusp there I think right now. And this is a case where as a coder like the more hands-off you get potentially the more you're kind of losing access to the those kind of low-level drills which are kind of keeping you in touch with the process. I think and I'm sure this this is on other disciplines as well but I'm most familiar with coding in that sense. uh and so so the same level I was describing with the you know within a firm or within a society you've got the kind of apprentices are kind of losing out because they're not able to kind of gain those skills because those are being delegated out to AI now you've also got within an individual the kind of the drills that you're by force necess by necessity you're continuing to kind of carry out these drills in the course of your operations those are kind of keeping you sharp to some extent and then if you're delegating those out those are slowly kind of atrophy I think that kind of works so for many people the progress of their career here possibly includes kind of moving up into this more managerial role and if that's the case you can continue to be quite effective even if your kind of low-level coding skills atrophy I think that's kind of actually fine like many coding managers the first year or two of being a primarily a manager they're kind of lamenting oh I don't they don't let me write code anymore you know I've met lots of people like that and then and they kind of find their feet as a manager and then actually become really effective as a as a consultant as a like with this kind of more high level vision so as an individual it's it's maybe okay if you've kind of got got off the Um but but as a society it's maybe harmful there. >> Is this just at a societal level is this just sort of another step uh up in abstraction that has also happened historically. So I I guess people now are worse at doing computation like doing calculations by hand >> mental arithmetic. >> Yeah. Writing very low low-level code. And so maybe we just we don't need this uh this entry level training anymore because everyone is now has has now stepped up one level of abstraction because >> right hardly anyone writes assembly code anymore. Yep. Hardly anyone writes assembly. Very few people write C you know and this was at one time considered a high level programming language. Um so yeah there's this progression there. It's not it's kind of hm is it the same or is it analogous? In some sense you could say it's literally the same where now we're programming in in English and this is the highest you know highest level of all and that's being translated into this uh I mean in practice maybe a kind of intermediate level programming language like Python which is which is kind of again historically been considered a very high level programming language and then under the hood maybe that's being transpiled into you know CPython or something and that's being turned into assembly you know so there's this kind of stack you know maybe who knows what what language we'll be programming next it would be like gestures or dance or something but so in some sense it's literally the same process And you know it's got us this far so maybe we shouldn't be scared of it. I think that's kind of right. I mean English is much squishier though. So all so far all of these kind of intermediate stages they have like fairly well defined semantics and so on. Whereas English is kind of fuzzier. But maybe that just enables more flexibility in another sense especially when we're not talking about coding in particular. It's more analogous than literally the same thing. But by building tools which enable us to carry out operations which would have been laborious previously sometimes much faster or certainly at less expense. we're able to not only operate the same kinds of activities more quickly potentially we're able to kind of creatively move beyond you know and open up new prospects because we're able to access reason about and actually like achieve these kind of lower level activities faster and more fluently and so on and able to then compose them into kind of bigger things. I've forgotten who said it, but there's some there's some quote on on this which is something like society progresses by the number of operations it can perform kind of atomically or something. I'm I'm probably mangling that. But there's some I think some cyberneticist or other sort of said this kind of thing or some early kind of computer pioneer. I think I believe in that to some extent. And certainly for AI for human reasoning, we're we're hoping that the systems that we're supporting and building and so on can be tools which enable people to perform important operations more effectively or you know enable operations at all but without just sort of handing off entirely and sitting back and not really understanding what's going on. Um but there's a there's a there's a gradient there, isn't there? >> Yeah. If we think about English as the next uh programming language or as the next prompting language, as the next way of expressing what you want from these models, it's true that we get le that's much less precise. It's much more high level. It's also much more expressive. So you can say much more in English than you can in in Python for example. Is this a potential security issue where what you say in English can be understood in so many different ways that it's difficult to actually produce it's difficult for the model to produce the kind of code or the kind of legislation or whatever it is you're trying to achieve? >> Yeah, I think so there's there's pros and cons here. I do think it means people need to be careful and not just sort of naively uh adopt whatever thing gets spat out at the end at least in high stakes situations. The law makingaking case is pretty interesting actually because um or even various kinds of judgment and decision-m because these things can be repeatable um software is repeatable in a way which really no other process is because it can be repeatable. It can be auditable and more transparent in principle than a process which involves a bunch of context, messy context being kind of put into a bunch of decision makers who then go into a room behind closed doors come to some decision which is then kind of brought out on a stone tablet. This is a very illeible process and in principle more parts of that could be made legible and that might actually be a real benefit to society because going back to these kind of functions I mentioned in political scrutiny and auditing it's going to be really important like why is it that people are coming to or like what were the considerations at least which went into the decision that's being made there and that could be made more legible >> actually on that note so if you have a system that comes to that recommends a decision say I you know it recommends a decision on what to do in aor operation. Now you ask that system to explain why it came to that decision. Do you think current models are good at explaining their own decision-m or are they sort of confabulating and hallucinating whatever actually led them to that decision? >> Yeah, I think the jury's out on that. I've seen a fair bit of evidence that these kind of post hawk rationalizations are, as they are with humans, often basically confabulated. even if there's no kind of attempt at deception it like it's sometimes hard to have introspective access to what were the reasons that you came to various decisions and so I I don't think that's the best way necessarily to be um you know decision was made and then you ask well why was that decision made what's much better is if you can by structure make it so that the uh as many as possible of the inputs into that decision are legible and and potentially scrutinizable after the fact and so this looks more like what's the corpus which is being drawn from for extra context in this decision. What's the reasoning trace preceding the decision? What are the various other kinds of inputs? What's the training data look like? And these kinds of things can be scrutable. In many cases, it might be kind of a big data problem, especially if we're talking about training data. But if we're just talking about things like coming to like a judicial decision, you can look at what were all the inputs that went in there and at least that tells you what was salient and kind of in scope for that decision. What you don't know then is what else is being kind of brought into latent context by the training and so on into the model. So, so there's that gap there. Um, I don't I don't exactly know the state of the art in terms of figuring out which training inputs were most relevant for a given decision or given output and this kind of thing. I know there's work being done on that. So, um, yeah, there's maybe hope to make this kind of thing a little bit more scrutable as well. But of course, it's the same for humans. you can't, you know, um you can ask a human why did you come to that decision, but you don't know whether what they're saying is either honest or um or kind of factual. >> Mhm. A a big question here, and I guess you get asked this a lot, is is there demand for these tools? Do people actually want to get better at thinking and get better at citing their sources and so on? We could imagine right now a a form of social media that works much better than what we have where whenever you're making a claim, you're asked to support that claim. And you know, but it seems that people are interested in playing other games and maybe you're not optimizing for finding truth. You're optimizing for what's most interesting, what's most engaging, and so on. >> Yeah. >> Is there a demand problem? >> There is absolutely a demand problem. This is this is this is very central and actually um yeah so I don't think I've touched on this yet but definitely that's pretty central to FLF strategy. Probably the reason I haven't touched on it yet is because among the staff of FLF I'm probably kind of least experienced in dealing with distribution and demand and so on. I appreciate the need and yeah you know I I have some kind of context there but yeah some some of my colleagues are really skilled at kind of figuring out um who has the burning need who hasn't recognized it yet what form factors are going to resonate with people what workflows already exist which we could integrate with and thereby achieve some distribution and enable people now you also ask is there demand yes there's demand there's demand there's demand challenge but there is demand so it kind of varies depending on the tool and on the context but certainly for things like so I was in parliament recently and I was talking to an MP who was interested in what what work I was doing and one of the things I said was you know what one of our ambitions is we'd really like to improve institutional decision-m capacity and he winced he winced he looked very pained he absolutely was uh he thoroughly understood that this was a real need an urgent need and so he is is a is an example he's you know a prototypical example of someone who really knows that there is a need and if something was sufficiently good and trustworthy I believe he and people in his kind of reference class would want to use that. I also know my my previous self when I worked in the government I would also have used such tools if they were good and trustworthy and I know some of my colleagues today would uh my my former colleagues who still work there would today so there is this demand certainly this is kind of the more kind of institutional context I'm sure there are other kinds of context so certainly things like investment decisions and this kind of discipline like there's a lot of demand there for tooling as well whether we're whether we specifically want to kind of differentially enable better investment decisions. I think that's not like high on our priority list, but certainly enabling better decision- making. I also think there is a more widespread demand for better sense making as a society. I I do believe this somewhat tentatively, but I actually have I have a fairly strong feeling of this. But of course, like I have my own sort of biased and filtered view onto what what even society is and like and who is in society and like the people I meet and so on. Um and that will be pretty biased. But I also get the sense from I get the sense of there's a kind of a zeitgeist. Um maybe that's too strong a word, but there's a latent desire and a recognition that we need better sensem. I part of the trouble is I think people don't diagnose what that means for them and for their communities and so on. But I do think with the right integr community notes is a great example. Um, so this has really taken off and the desire is there and people use it and in part that's because it's integrated into a workflow that people are already using, namely browsing Twitter, but also other platforms and this is another thing we're kind of trying to work on is can we get systems like community notes adopted in other platforms. So there and when they're there people use them and and and benefit from them and are glad that they're there. So sometimes it's not necessarily a kind of proactive demand, but it's like a recognition of a need, a kind of latent recognition of a need, which then if the right form factor and workflow integration resonates with that need, people then will go and adopt potentially quite quickly. Um, I don't know. I I definitely point to there's definitely some largeish segment of the online population on Twitter at least who who are really hungry for kind of better sensemaking and they maybe have their own interpretation of what that means. But if you can produce tools which can enable people who have that hunger to better see through deception to better gain context on the decisions they need to make to better gain context on the decisions society needs to make then you know there's some demand there but it is a real problem. So absolutely central to any project that we're running and any one that we're supporting is um yeah like how are we going to get distribution here um in a way that's not sort of in a way that's sustainable and that's actually kind of helping people >> and how important is trust for that it it seems like um if you don't trust a tool you might not be inclined to use it and depending on where the tool is coming from. So say if as you know if a government published a a tool for online sense making a lot of people would be sort of uh you know skeptical of using that to to develop their views. >> Yeah. Yeah. Yeah. There's um h well one one one antidote here historically has been open source. So if you can see the source code in principle, of course most people don't actually go and look at the source code at all, but if you can in principle see the source code or if your friend can in principle go and see the source code or even if your AI assistant that you kind of trust can go and scrutinize the source code, >> then you have some more reason to at least believe the thing isn't like deliberately manipulating you or doesn't have some kind of back door or like secret inclination. This is harder of course for things like and and by the way I think it's really unhelpful to use the term open source for for like ML components like language models the because they're not really open source it's more like open I mean if you publish the weights of something it's more like publishing a binary or something it's pretty inscrutable but yeah open source is good and and the trouble is of course as soon as you've got ML components it's much harder to scrutinize those and it definitely has been demonstrated it's certainly possible to kind of hide biases in them through training or even a kind of lightweight fine tuning on top at the end. Um so this this gets a bit harder. So yes, if some institution or some organization publishes a an ML component of such a system, even if they make it kind of open source in this sense, which is more like open binary, then that trust problem remains. And there this is where we go back to well are the ways that we can generalizably evaluate for the kind of non-bias honesty um all these kinds of epistemic virtues that you might want from such a component. And then can you trust plugging that into your open source software which is a scaffold and so on like maybe the kind of trust is able to kind of be built in this peacemeal way which adds up to some level of justified trust. But yeah trust trust is trust is a difficult problem for the kind of epistemic stack concept that I was very vaguely gesturing at you about um that Ben and I recently have written on. One of the ways that we would hope to achieve trust would be through potentially open sourcing of various tools that are kind of feeding into the system. And also another source of trust is kind of participatory input or participatory input in principle. So if you look at Wikipedia almost no one actually bothers to contribute to Wikipedia. I learned that I'm extremely rare in doing so. But you know that you can in principle go and look at the history of edits. You know that you can in principle go and look at the conversation about a given page. you know that can even in principle contribute you know portions or edits to the page and so I think that gives a certain amount of trust as well although even that you know under under intense pressure of like adversarial pressure even that kind of participarently not be sufficient to justify trust for some people so yeah there are various ways of building trust and a lot of them don't come through I think you know the naive way is just like ah well you know a nice friendly company created this nice friendly AI so you trust it I think this is um not worth nothing but It's there are very there are so many other ways we can kind of build layers of trust and we should be doing that and I think a lot of those other ways are neglected and I've kind of named a few of those. >> Yeah. Yeah. Wikipedia is an interesting example. It's sort of the the crowning achievement of human knowledge creation but it's it's also so on the one hand anyone can participate. On the other hand it's Wikipedia as it exists is quite elitist in a sense. there's there's a hierarchy and people you know you need to be a trusted contributor in order to for your contributions to actually count this as you as you mentioned there's a very few people contribute almost all of the content is is that how you see sort of is that the default state of human sense making that that you have this elite and it's not actually it's not actually everyone participating it's actually almost no one participating everyone on being a consumer of information rather than a producer of information. >> Yeah, there's been a really interesting sort of trajectory on the internet where I my understanding is that the early internet was a lot more um the power law was much less sharp with regards to who was a contributor and who was a consumer. Nowadays, a lot of people just like they scroll the Tik Tok, they scroll the Twitter and they're basically consuming and occasionally messaging people or something like that. Um it is interesting and yeah Wikipedia has this pretty sharp power law of contribution and all these things have stack overflow I'm sure is similar you know um I think there's some inevitability to that and perhaps even some desiraability to that because naively at least we we don't necessarily want kind of more democracy in all contexts right because I don't want to go and I don't want to be responsible for contributing to a topic I have no understanding of why would I want to do that and what I want to know is that the people who do understand it are honestly putting forth their best effort at contributing to that. And that's a hard problem because of course then there's a kind of principal agent separation. Um I the principal who wants to understand something that I don't currently understand. I'm trusting the agent the ex current expert to honestly put forth their description of that thing. And this just is a this just is a tricky problem. And often we balance uh often sort of checks and balances like situations and scrutiny and auditability and so on. they enable this kind of a bit more trust of a pool of experts rather than mere single expert. Um it's harder for a pool of experts to kind of collude to you know to forward all of their interests at once whereas perhaps an individual could kind of distort things subtly in a way which is less traceable. So Wikipedia has this property where at least most pages are contributed to by some number of experts at once. But yeah, it's difficult. I think I think the power law of contribution is probably here to stay in a big way. Um it might well be that the for many things the the largest contributors may come to be AI systems or outputs which are largely AI generated and you'd hope that they were well scaffolded you know um and sufficiently scrutable and legible and their that their outputs maybe auditable after the fact. You can look at you know what sources went into this and all this kind of thing. Um yeah, it remains the case that even with these steep power laws of contribution, things like Wikipedia are to some extent a gold standard of public sense making, Wikipedia suffers from some of the similar problems I've already identified for kind of what we might call kind of legacy epistemic systems in that they're it's slow. Um it's it it struggle to kind of up update in real time. That's that's kind of difficult. So for for many topics that's absolutely fine if it's like his you know historical biography of or something like that. There's not going to be much new evidence every minute or something is there but for other things it's important to kind of keep up to date and systems like Wikipedia just can't really do that very well right now. Um but there are various kinds of incremental ways we can imagine improving that. In fact, one of the other visions we have for an epistemic stack is to be a kind of foundation for an improved Wikipedia. Perhaps literally Wikipedia itself kind of ingesting from the more kind of structured metadata annotated epistemic structures of citations and evidence and so on, but also discourse because Wikipedia at its best on a given topic, it's not meant it's not necessarily so so something's kind of sealed like done. It's known science. It's like we're confident on the whole that that something is the case then the Wikipedia page for that is just going to describe that situation and then it might describe some kind of history it like so it is it is in an ideal case a Wikipedia page is making legible the provenence structure like how did we get to this knowledge in the first place there's a bit of history maybe it describes some of the characters as well I don't know and then it's describing kind of the kind of concluded a summary of the concluded position on this topic um if it's not a done deal if it's like an active debate right now again an ideal case, the Wik Wikipedia page for that will present a selection ideally all sides of that debate like which have some degree of merit and and where why those camps are coming to that position and what their evidence is and why the other camps are coming to the other positions and so on and where the remaining uncertainties are and where there's maybe lacking evidence and so on. So in an ideal case that's like Wikipedia type format is able to encompass both like solid science and active debate. >> Yeah. and and much of this will probably be written by AI. Will it also be read by by AI? So, will it also be ingested by my agent that's then presenting it to me in in a in a in a way that's easier for me to understand or perhaps easier for me to digest in quickly. >> Yeah, for sure. So whether whether it's kind of got this intermediate form of something like a Wikipedia page or whether actually you've just got the slightly more raw corpus which is the you know claims their evidence the citations and so on and the positions and the way that the discourse maps together there's a bunch of structure there that you can imagine actually ingesting directly into a chatbot interface or directly into a Wikipedia like view. Like I actually I tend to quite like reading Wikipedia. It's it's often got like an it's a nice kind of familiar format. I already mentioned that often I want to see a summary. Then I want to see maybe something about how we got to that position. Then I want to see are there competing claims? Are there areas of uncertainty and this kind of thing. It's a nice layout. And maybe if I said to my chatbot, hey on topic X, can you give me like a Wikipedia like summary of this thing? It could then go and you know access the um I well today it would have to just do some web search and kind of do its best, but if there are better data uh annotations available, it could it could query those things, bring them into context and produce me my kind of customized thing. and maybe it knows that I'm like I'm quite good at kind of calculus but I struggle with um you know economics or something and it's so it can be like oh well this topic I'm going to explain in a little bit more detail or something like that. >> Mhm. >> And yeah so absolutely there's many ways you can imagine um AI might be a big consumer of of epismic content in the future but hopefully as a kind of intermediate to humans and helping them understand rather than just kind of I don't know milling around and doing their own AI things. >> Mhm. Can you paint us a picture of sort of the perfect state or the ideal vision for AI for human vision uh reasoning? What does the future look like in in say 10 years if this project really takes off? >> Okay. Yeah, that's that's kind of difficult. So, let me let me first give an outline of one way of thinking about the map of human reasoning. So I I one um kind of concept I often go back to is the UDA loop which you may be familiar with the observe orient decide act is something similar is kind of raised in all kinds of disciplines but this is a kind of US military context thing and this is for it's it's a decision-m cycle and you see this in reinforcement learning you see this in cybernetics you see this in business cycles as well um as well as that kind of rapid decision-m cycle that we all have to do in kind of individual decision making moments, we're also going through learning and growth. So, it's kind of udla um L for learning. And I think that encompasses a lot of things. So, this is one way I kind of I often use this lens to kind of break down, well, when I say reasoning, what does it actually mean? I mean all of these things. And then humans, we're individuals, we're groups, we're societies. So, you kind of got two axes there. This is how I think about the very basics of the map here. You got your udela on one axis and you've got your individuals, your groups, your societies on the other axis. And you can start to ask well what are the activities that what are the broad clusters of activities that we need to satisfy to to be doing these things really well. We've talked a little bit about collective epistemics. I think that kind of inhabits the observe orient what we might call the kind of understanding side of the um udla process and it's both an individual thing and it's a collective thing. So collective epistemics is is actually largely about individuals as a community of communicating as a communicating network like how is it that we come to make sense of things. So collective epistemics kind of carves off that part of the of that map. Then there's decision- making which is bringing your rich enriching your understanding making decisions that you endorse and this happens at individual levels primarily but also small groups and so on. And then a big piece that we haven't talked much about at all actually is coordination which I think is really essential actually to this human reasoning vision and very broadly coordination encompasses is the collective part and it's not only understanding and making something like decisions. is also being able to act together and this is a whole um this is a whole part of coordination which is kind of difficult and at least for uh for collective epistemics there are various aspects to that we can imagine improving the way that we understand pieces of information that are kind of coming our way how do we source those pieces of information that's a kind of more network kind of question um we can improve the way that we relate to other communicators how do we understand this person's motives like do they conflicts of interest and so on we can improve the way that we're sharing information between ourselves like this is kind of network question for decision-m we can imagine improving our abilities to model the world to understand systems to forecast to predict to understand the consequences of what we might decide to do we can also imagine improving our ability to reflect and like understand what even are our preferences >> and then coordination uh certainly I I think when I used to think about coordination I think I used to overemphasize the kind of commitment challenges. I think in game theory 101 this is often kind of presented as like how do you commit how do you credibly commit to the good thing. I think it's a really important aspect and there there's probably a lot of potential for AI to help with that and we could talk about that but I think there's other aspects to coordination which are at least by previous me were were not really not really thought about sufficiently and there are other researchers who are kind of far far further along that that journey than me. Because other aspects of coordination include just finding a f finding the right people that you need to be in contact with or you need to recognize as a counterparty and some kind of interaction. So we have kind of primitive versions of this now just word of mouth. People tell each other about their contacts who they might want to know. People introduce each other to friends and so on. We have LinkedIn now which is I think possibly maybe the kind of frontier of what technological society has mustered in terms of this kind of networking thing. But it's really important and um one of the kind of high stakes coordination questions I think is how can larger communities and societies coordinate to better navigate a confusing and risky future with technology. And and a lot of this might um it won't necessarily fully come down to but it might really benefit from being able to connect the right people together. And I can 100% imagine ways for AI systems to do that in a much better way than we are able to today. already the internet kind of enables this um through just web search like if I Google something maybe I find a blog post that someone wrote and that's really useful and then I'm connected with that person a little bit but it is crude and it's lossy and it's accidental and so on so this is kind of networking and connecting another really important aspect of coordination is surfacing group wisdom and group will um there are already there are processes like polling historically um focus grouping um these kinds of things where people get together and try to kind of and even just having a team meeting. This is kind of attempting to surface a kind of group wisdom. I think this struggles to scale. So there's a professor Jim Fishiskin I think perhaps he's at Stanford. I'm not sure. So he has a deliberative democracy group and and they're um they point to this trilmma where it's difficult to achieve at once all of nuanced quality discourse and a kind of political equality and broad participation. >> You can imagine getting two of those at once, right? Like so you can imagine getting an expert group together. >> You can imagine doing a poll. It's not really just it's it's widespread, but it's very very much not nuanced and and so on. And are there ways that we can kind of get more of all three of those things in in larger groups? I 100% think there are um there there are already actually interesting tools that some of FL's fellows have created and are going on to kind of continue to create there. I think there's a lot of potential there for incorporating AI to improve how we elicit the group wisdom and the group will of larger groups. This is this is a really interesting thing. Another aspect of coordination is how do we even reach negotiated agreement especially in cases where there are misaligned priorities as well as some aligned incentives. Uh here at least naively you can imagine minimally you can imagine just throwing a huge amount of reasoning and compute at the problem. Can we can we simply just discover better options just by reasoning our way there. I think this is kind of it's not nothing. Um but I I think there's probably much improvement there as well but this is less clear in my mind. I think the process of negotiation is itself like a very thorny uh and actually quite kind of complex process and it's not yet clear to me how to better facilitate that with AI other than just this kind of naive like exploration possibilities >> but it might just become much cheaper to negotiate using AI >> right yeah >> it's it's just much cheaper to send a bunch of messages back and forth and you know maybe my agent can take my preferences negotiate on my behalf and there can be a whole workflow there that I'm not involved D in it would be too costly for me to negotiate directly. But yeah, I could definitely see a world in which we can negotiate over things that are that now seem way too trivial to negotiate over. >> Exactly. So we can enable more negotiations to happen at scale and so on. I think this is kind of exciting. It's pretty unclear to me what it unlocks. I think in part because it's hard to, you know, this is we're probably talking about millions or even more of like extra mini micro negotiations happening in principle being unlocked by this kind of capability. And then the question I don't know I struggle to imagine millions of things at once. So so so there's the question like what does that actually um what does that really enable in society? I think what I think it enables a certain kind of flourishing. It probably unlocks some kind of economic potential. It maybe also enables better quality of life in certain ways like just better communities and this sort of thing. Um it's pretty hard to envisionage quite how potent that could be. My rough sense is it doesn't massively elevate sort of human flourishing, but but but I could be wrong. I think there's a chance that it does. >> Why are you skeptical that that it might be uh powerful? >> I think because I we have kind of limited negotiation capacity and budget and bandwidth and so on. I kind of suspect that at least in societies which are functioning reasonably well, we currently allocate that budget and attention and so on moderately efficiently and therefore doing the negotiations which are kind of getting us the most bang for the buck. So unless there's that really unless the tale of the kind of less salient lower priority negotiations is really kind of quite fat and adds up to a huge amount then I'm skeptical that we can get like really transformative change there. Um this is a rough model and I don't think I kind of believe it 100%. So I there's definitely room in there for it to be really really like massively transformative. I think some people do believe that and and I wouldn't yeah I wouldn't discourage people from being really excited by that. I think it is potentially really exciting. Um another aspect of coordination which I think I used to neglect is so we notice that when we're interacting a huge amount of our interaction is iterated. um even if we're not interacting with the same people over and over again, the world keeps on kind of ticking over like there's this it's it's a very temporally extended process we live in. And plus there's this uh we've already talked about kind of networking. There's this very complex and dynamic network of interacting and communicating people um forming and adapting and growing and shrinking coalitions and groups and so on. This whole complex system produces institutions. It produces constitutions and norms and so on. And what that means is a lot of our because because coalitions because constitutions because institutions and norms are so important for framing future interactions and cooperation and competition because they're so important they become themselves an object of interactions and communication. Very often when we're fighting we're actually fighting over what are the norms by which we should later be coordinating. Um perhaps concretely to make this more concrete we can imagine pretty high stakes kinds of interactions and negotiations might be themselves about trade agreements which are meant to then frame a huge number of downstream interactions. So very very often interaction is about interaction. Coordination is about coordination. This is kind of property of the complex system we live in. And I think it's also just a like because humans are that kind of creature like we care about coalitions. We care about institutions. And so because of that, I think it's really worth giving some emphasis when we're asking about coordination, how coordination can be improved, giving some emphasis to how it is that we create and perhaps even design institutions and constitutions and that kind of thing. >> I think especially as well as new kinds of technological capabilities are entering into the world, there will be questions about how are we managing them? What are the institutions? what are the norms and this kind of thing like how is society going to manage this whether it's at an international level whether it's a national level whether it's at smaller community levels and that will be itself a process of negotiation about institutions and constitutions and so this to me looks particularly high stakes as well and I think that there's room for AI to really elevate how we do that because very often setting up these institutions and the kind of processes we elevate for how we then do coordination and how we like apply rules and norms these things are often at best they have unintended consequences or or they're kind of inefficacious. At worst, they can be captured. They can become corrupted and this kind of thing. Now, we as humans have a huge track record of building and building successful and failed institutions. So, we have a lot of data there actually. And I think minimally you can imagine equipping people with some kind of red team constitution expert uh AI which is able to say it all. Oh, have you considered, you know, in ancient Greece there was this time when they tried something like this and it fell horribly flat and they all died or um oh have you considered this kind of nearby um example where you know in in India they managed this kind of water resource in this way and it was really effective and it's lasted for decades you know so there's there's at least that kind of track record and then some other interesting capabilities from computation and from AI is potentially being able to simulate and red team like proactively like try to break institution designs and see how corruptible they are, how capturable are they, and try to and and language models actually are able to kind of bring in actually quite semantically rich scrutiny of this kind of thing in a way that we probably wouldn't have been able to do before. This might be somewhat computationally expensive, but I don't think intractably so. And especially for kind of high stakes decisions, you can imagine this actually becoming a kind of move that we make actually is like, well, can we red team this? How capturable is it? Like what are the most vulnerable? What are the vulnerabilities we haven't yet seen? I mean, one feature of one feature of of constitutions as they exist now is that they resist change. So they they carry the preferences of of of past generations into into our present world. And you know, so you wouldn't be you wouldn't be allowed to sort of brainstorm a new constitution because the previous constitution is there to prevent you from from making any any changes that actually matter. And so unless you're so at at what level are you imagining that we would be free to design new rules and new constitutions? >> Yeah, that's a really good question. I think I use this word constitution a bit perhaps a bit loosely and I should I should make clearer that I'm doing that. Um I'm including things like charters for organizations and companies and uh I think in constitutions I'm also including like the more flexible parts of lawmaking and this kind of thing. the the word constitution often gets used for this like absolutely foundational kind of principles of a society. Yes, you're right. The very biggest of these things, but I'm using it a little bit more flexibly than that. But of course, there are these kind of constitutional moments. You know, amendments get made even to these absolutely foundational documents. We wouldn't necessarily want that to be happening much much faster. um uh of often these things have lived a long time because they are at least reasonably wellcraftrafted but they do need amending from time to time. So there are constitutional moments even for the most foundational things. Another thing about um I think about this at the societal level but I think you can imagine it from a from a team and a and an organizational level as well. One thing about the way that we do institutions right now, there's often quite a tension between agility on the one hand and centralization or or rather agility often comes with centralization in a way that you may not want. You may you may want the agility, you may fear the centralization because it potentially is capturable or corruptal and and so the examplide and coups and this kind of thing. Very often um it's not a kind of you get there are usually violent kind of aspects to it at one or another stage but very often it's actually a gradual process of a real or perceived need for agility and executive action which gets manifested into a relatively small number of people who promise or perhaps implicitly but perhaps they don't but you know perhaps they promise that you know they will lay down these emergency powers when you know when it's necessary and then of course they never do because the power has been centralized in there and and at that point there's a kind of moral hazard. So, even if they were well-meaning to begin with, and this is kind of this is very blue sky and sort of uh one of the loosest thoughts I have, but I I think to myself, well, we have a bunch of new tech. Is there a way that we can apply that to getting at least a bit more of the agility with a bit less of the fearsome parts of centralization of power? I don't know what that looks like. So all the all the things I've described to you to this point I have like a moderately concrete vision in my head of at least one kind of instantiation of that for this I kind of don't but I just have a sense that it's maybe possible is a little bit utopian. Um so you ask me my vision for like what does what does AI for human reasoning really enable you know some years down the line if it goes really well. I think in in one sentence perhaps and now I'm probably going to accidentally insert a full stop as I speak but anyway let's say in one sentence it might look something like a society where it feels like the decisions being made about the important things at the large scale and important things at the local and more small scales are being made with more deliberateness and wisdom and with more alignment and adherence to the people involved. and their interests and their needs. I think this is what we want from human reasoning and what we believe is in principle possible. And it's what we look around us and we see is often not the case. We see that instead the world is sort of confusing. It's not clear that anyone really is deciding much of what happens. We wouldn't we wouldn't want it to be the case that individuals or oligarchs were deciding all of what happened. We what we would want for affected people for their interests and you know and principles and and values to be reflected in what ends up actually happening and and far too often things are just sort of chaotic and confusing. People fail to coordinate. People fail to understand the consequences of of their actions. They even fail to understand what their options even are in the first place. Um and each of these um problems we think is remediable to some extent with support from tools many of which would in like incorporate AI and we don't think this is going to happen by default necessarily um we don't think you know the chat bots or like the you know the imagined sort of everyone has a personal agent like what reason does that have to actually solve this problem I don't think it does I think it kind of helps some things it exacerbates some other things whereas with judicious design of tools and technology which is supporting human reasoning. I think there's a lot more room to imagine that things could get better in that direction. >> What's the best way for listeners to contribute to this project? Where should they begin? How can they see whether they have the interest or the skills necessary to contribute? >> Yeah. So, there's a forhuman reasoning.com. Um there's also fl.org or where you can get in touch with us and you can see some of our other initiatives which includes the human reasoning fellowship and other work that we were doing in human reasoning. So AI for human reasoning.com currently has a has a summary of the fellowship we ran in the latter part of 2025. So it describes some of the fellows that we hosted some of the projects that they've worked on and some of the kind of ambitions that they have. Now it's likely that we might update that page with other content as we produce it. So that's probably a good homepage for the human reasoning parts in particular. If you want to learn more about FLF and our other initiatives and priority areas, you can go to fl.org. Um yeah, I mean I I I blog intermittently. I I keep thinking I I ought to blog more. It's one of these kind of serendipity generators which I'm I think are really important. Um just putting ideas out there. Earlier I spoke about the kind of the way that the internet sometimes facilitates connections that you wouldn't have imagined possible previously through things like I stumbled across someone's blog post and they mentioned this thing that sounds really interesting and now I want to go and talk to them about that thing. So I believe in blogging. Uh I sometimes blog. I have recently blogged about um one of these one of the slightly more design and engineering heavy parts of human reasoning that we're imagining with my colleague Ben. And I intend to and expect to blog a bit more about that and other things as well. So that's another place you can look. >> And what's the URL for for that? >> Oh, it's uh my blog is oliver.net. >> Great. Perfect. All right, Ollie. Thanks for chatting with me. It's been great. >> Yeah, it's great to be here.

Related conversations

AXRP

3 Jan 2026

David Rein on METR Time Horizons

This conversation examines core safety through David Rein on METR Time Horizons, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -0 · 108 segs

AXRP

7 Aug 2025

Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -5 · 133 segs

AXRP

6 Jul 2025

Samuel Albanie on DeepMind's AGI Safety Approach

This conversation examines core safety through Samuel Albanie on DeepMind's AGI Safety Approach, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -4 · 72 segs

AXRP

1 Dec 2024

Evan Hubinger on Model Organisms of Misalignment

This conversation examines technical alignment through Evan Hubinger on Model Organisms of Misalignment, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med -6 · avg -7 · 120 segs

Counterbalance on this topic

Ranked with the mirror rule in the methodology: picks sit closer to the opposite side of your score on the same axis (lens alignment preferred). Each card plots you and the pick together.

Mirror pick 1

Lex Fridman Podcast

12 Jul 2025