Library / In focus
AXRPCivilisational risk and strategy
Negotiable Reinforcement Learning with Andrew Critch

Why this matters
Auto-discovered candidate. Editorial positioning to be finalized.
Summary
Auto-discovered from AXRP. Editorial summary pending review.
Perspective map
MixedGovernanceMedium confidenceTranscript-informed
The amber marker shows the most Risk-forward score. The white marker shows the most Opportunity-forward score. The black marker shows the median perspective for this library item. Tap the band, a marker, or the track to open the transcript there.
An explanation of the Perspective Map framework can be found here.
Episode arc by segment
Early → late · height = spectrum position · colour = band
Risk-forwardMixedOpportunity-forward
Each bar is tinted by where its score sits on the same strip as above (amber → cyan midpoint → white). Same lexicon as the headline. Bars are evenly spaced in transcript order (not clock time).
StartEnd
Across 46 full-transcript segments: median 0 · mean -2 · spread -30–0 (p10–p90 -7–0) · 2% risk-forward, 98% mixed, 0% opportunity-forward slices.
Slice bands
46 slices · p10–p90 -7–0
Mixed leaning, primarily in the Governance lens. Evidence mode: interview. Confidence: medium.
- Emphasizes safety
- Emphasizes ai safety
- Full transcript scored in 46 sequential slices (median slice 0).
Editor note
Auto-ingested from daily feed check. Review for editorial curation.
ai-safetyaxrp
Play on sAIfe Hands
Episode transcript
YouTube captions (auto or uploaded) · video dEwr00sVELI · stored Apr 2, 2026 · 1,425 caption segments
Captions are an imperfect primary: they can mis-hear names and technical terms. Use them alongside the audio and publisher materials when verifying claims.
No editorial assessment file yet. Add content/resources/transcript-assessments/negotiable-reinforcement-learning-with-andrew-critch.json when you have a listen-based summary.
Show full transcript
hello everybody today we're going to be talking to Andrew crit Andrew crit got his PhD in algebraic geometry at UC Berkeley he's worked at Jan Street as an algorithmic Trader the machine intelligence Research Institute as a researcher and he also co-founded the Center for Applied rationality where he was a curriculum designer but currently he's a research scientist at UC Berkeley's Center for human compatible AI um today the paper we're going to be talking about is negotiable reinforcement learning for perto optimal sequential decision- making the author are Nishan Desai Andrew crit and Stuart Russell hello Andrew hi nice to be here nice to have you here um so I guess my first question about this paper is what problem is it solving right so um when I when I quote unquote solve a research problem um I'm usually trying to to do two different things uh one is that I'm answering an actual sort of formally specifiable math question or maybe computer science question uh but the other thing is I'm trying to draw attention to an area so for me the purpose of this paper was to draw attention to an area that I felt neglected um and so that's sort of a meta research problem that it's trying to solve and then the research problem that it's solving is um or like the object level problem that it solves is that it it demonstrates uh and explicates what a Paro optimal sequential decision- making procedure must look like when the the people that it's making decisions for have different beliefs okay um cool so one thing that uh I guess I'm interested in an area that I'm interested in um that sort of jumped out to me at this is the disagreements between the two um I guess principles that this uh quential decision- Mak policy surfing um and I'm interested in this because uh there's a line of work starting with Alman that basically says that persistent disagreements between um two people are sort of irrational like as long as people can communicate they like shouldn't disagree about anything really um so there's like alman's agreement theorem um this got followed up by um our disagreements Honest by um Tyler and Hansen and the paper uncommon priors require origin disputes by Robin Hansen I'm wondering what you think about like this line of work and just in general the setup of two people disagreeing like isn't that kind of crazy that's great that's a great question so I all the a lot of things you just said like really triggered me like oh no it's like irrational to disagree um uh so first of all I think Almond's line of work is really important and things that that build on it are are like a good area of inquiry um there's just not that many people who think about how beliefs work in a in a inter inter subjective formalized setting um but there's a number of problems with trying to assert that you know disagreement is irrational at the individual level uh first uh and and I say at the individual level because rationality is a descriptor of a system and you can have a system where each individual's rational but the system as a whole is not in some sense in many different senses so um the first thing is that you know Almond's agreement theorem applies when the two parties have a common prior which seems extremely unrealistic for the real world um and I'll say more about what that means but they require to have a common prior and also common knowledge of the state of disagreement or or or state of their their beliefs uh which I think is also very unrealistic for the real world and I don't mean unrealistic in some kind of like uh you know you can only get 0 N9 when the theem requires a one on some agreement metric I mean like you can only get three when the theorem requires a one on some kind of agreement metric is is kind of what I'm saying so I think the assumptions of omen are robustly wrong um as a descript of the real world but they're an important technical starting point for you know you you you you describe a scenario under simple assumptions and ominous assumptions were simplifying and then we have to keep complexifying those assumptions to get a real understanding of how beliefs between agents work um why why don't you think that um the common knowledge of so so I've heard a lot of people said that the common PRI um assumption is not realistic for humans um why don't you think the common knowledge of disagreement is not realistic among humans like I think often I have disputes with people where we know that we have this dispute right so right so first I want to address the prior thing even though other people have addressed it you know um common prior I mean what what is what is my prior you know is it something I was born with as a baby is It Something in My DNA um it's uh you know it's there's many different ways of conceiving as conceiving of a human as having had a prior and then some updates and in I think any of the reasonable uh conceptions of a human as as a basian updating agent the prior is a pretty old thing that they that they've had for a long time um and it it comes from before they've had a chance to interact with a lot of other people so um I don't think people have equal priors they just they're genetically different they're culturally different um and even as adults you know we maybe have only interacted with our own culture uh and I think that's deeply bubbling for people if I can use the word bubble as a verb um it it sequesters people and people you know even even as an educated adult you have an educated you you haven't interact with educated adults from other cultures who've read a lot and seen a lot and I say educated because that's how you get information uh if you if you have less education you you have a different prior as well so um it's uh I think it's a big deal and if we're going to start talking about how AI is going to benefit Humanity we need to be thinking about people having different beliefs about whether it's beneficial and what is beneficial um and then separately the the common knowledge or disagreement thing um first of all I would I would call into question your experience that you really have had full common knowledge of disagreement with a person um you know uh there's always this uncertainty like how do you know you were using words in the same way as them you talk to them for you talk to them for a while and you you gain some evidence that you're using words in the same way and you know if you're if you're careful thinker and and and you engage carefully in discourse you check to see you're using words in the same way but you only ran that check for so long and what if you're using Concepts in a different way too like what if the concept you know you let's say you had a debate with somebody about um you know whether whether uh people should have privacy from AI systems um you know okay you talk for a while about what privacy means uh you talk for a while about should means you know and uh and then you ground it out to some kind of empirical prediction like you know I think if the people don't have this kind of privacy they will end up distressed in the following way and then the other person the argument says I predict they will not end up distressed but now you know you're satisfied you've made progress and it's good to be satisfied that that progress was made but you know have you grounded at what distress means and you know it eventually you just go home eventually you just you've done a good job today you made progress with this interlocutor and you disagree still and you you you you don't know for sure whether the concepts you're using are the same as the concepts they're using uh and I think that that's profoundly important um like if you didn't settle what you meant by distress um that can be um you know that can be important a difference in culture for example um where like maybe for you something that just like uh makes you makes you a little bit sweaty and makes your mind go faster counts as distress whereas for the other person it doesn't um and maybe that maybe that's you know and now you have to ground that out as well and I and in my experience you know it whenever there appears to be a persistent agree disagreement if you talk longer you can always uncover some kind of confusion or miscommunication or difference in in information such that um such that prior to that un in you were both deluded as to the nature of the disagreement so that's why I call in question this idea that you really have common knowledge of the disagreement because I think you probably are both deluded as to the nature of it when you when you still disagree H that's interesting that that seems plausible to me I guess I also now that you you talked about the common prior assumption I actually want to um talk about that a little bit uh I hope our listeners are interested in the philosophy of ban disagreement um yeah so when I think of like the priors for a person [Music] um and like how how to sort of make sense of them in terms of like basan reasoning or whatever um to me it seems like this has to this involves like some amount of like epistemological knowledge that you can update like for instance um I think it's possible that at some point in my life I didn't know about the modus P's inference rule or something um or you know maybe some other inference rules uh and now I do and that those have sort of become part of my prior where like what by what I mean about by the word prior is like okay today how do I form beliefs based on everything I know in the past and like maybe that might be like a little bit different tomorrow because I'll like have some different conception about like Simplicity priors or something um so in this case like your prior sounds strange to say this but I do want to I do want to say that like I think you should be able to like change your prior over time and if you can do this then like okay if somebody comes from a different culture where they like understand things differently like hopefully you can like reason it out with them by some means other than basion updating uh hopefully otherwise it's layers spawn layers but um yeah I just wanted to like address like what priors should mean in this kind of context because I think if you I think under this different conception it like becomes maybe a bit more realistic cool so before I get into that I I do want to say you know why why I care about this and the reason is that I'm I'm hoping that uh we can make progress in AI that makes it easier for people with diverse backgrounds and beliefs not just diverse preferences but diverse beliefs to share control of the systems and the reason I want us to be able to share control of systems is twofold when I think it's just fair if you have very powerful systems and Powerful Technologies it's more fair to share it um and the other one is that if you can share things you don't have to fight over them so um that that decreases the likelihood of conflict over powerful artifacts of of technology in the future so that and I think there's quite a lot of societal and exist potentially existential risk that come from that so that's the source of my interest here I think there's many other reasons to be interested in uh making AI compatible with diverse human beliefs um and making it possible to negotiate for the control of the system even even knowing that but I I just want to flag that like while I'm going down this rabbit hole with you that's what's steering me and um if I say something that seems important to that it's because I if I say something that seems important to your question I'm also filtering it for importance to like is this going to matter to the Future governance of Technology okay um so the first thing is that yes I agree with you that people can change their priors but I mean aasian agent quote unquote changes prior changes prior when it updates and I think you mean something more nuanced than that which is yeah I mean you can think you can you can think longer and decide that you're prior at the beginning of time ought to have been something else and uh so you know my my statement that you're responding to in that is that a reasonable conception of a human being as a bean agent has the prior as something that has existed for a long time or that came into existence a long time ago uh and that reasonable conception of a human basion uh is doing a lot of work because humans aren't basan agents in fact physical agents are not basian agents because you have to do a lot of computation to be agent so um in fact you have to do infinite computation so this is where my interest in logical uncertainty and if you've heard of logical induction uh comes from and so I would just argue that you know when you're changing your prior there you're changing which beian agent you are you're not being a bean agent uh in that moment okay that seems fair um all right sir and for listeners who've never thought about that uh why is that important well I think there's ethical there's ethical questions that can be resolved by thinking there's ethical questions that can't uh I think Ra's veil of ignorance is an example of an ethical principle that helps you figure things out by thinking longer and harder about what if I were somebody else um you already knew everything you needed to know about those people to start realizing some of the things you should do to be fair um but you have to think about it and the same way you know know I think uh I think that I think that's going to apply um in the governance of AI and for that reason I think it is going to be important not to treat people like bans because people are are entities and computers are entities that change what they what they believe merely by thinking even without making further observations so that's a major shortcoming of the of the negotiable reinforcement learning framework that's only alluded to in the in the older um uh archive draft with just me on it that says you know bit naturalized agency is going to be a key future work and I and I do not think that the paper addresses that well at all yeah I think that's a good point um I'm going to tackle a little bit back to the uh the paper or like the literature on um you know basian agents cooperating and such um the related work section of this paper has a lot of like interesting stuff on social Choice Theory and such um like there's a Rich Work of literature I guess both on social Choice Theory and on like can reasoners disagree in some sense it's a little a reader might find it a little bit surprising that this work hasn't already been done at least like that the main theorem in this paper hasn't already been proven I was wondering if you have thoughts about like why it took until uh 20 was it 2018 this got published 2019 2017 was the first this the the nurs version is 2018 and but the F the theorem you're referencing was proven in 2017 okay so why do you think it took until 2017 yeah um this is something I grapple with deeply I mean for me you know how how do how do agents with different beliefs get along is like a pretty basic question so um it it has been analyzed a little bit um like you said by alond and people in alen you Google Scholar search people who site alen and you'll get a lot of interesting thoughts about that but but um not really much looking at sequential decision making so you get you get these kind of really static analyses that are like you know imagine your at the end of Eternity and you've reached common knowledge of disagreement and at the beginning of Eternity you had a common prior um now you know alus theorem applies but it's a very it's a fixed moment you know it's not something evolving over time and things evolving over time are more complicated than things that are static there so if I were going to guess it's just you know you've got people who do who who who work on sequential decision- making which is like reinforcement learning people and operations research people and then you've got people who think about beliefs a lot and what is a belief you know and and um and there's there hasn't been that crossover of like okay what happens when you when you put the sequential decision- making and the belief disagreement together yeah I guess it's related to how um in statistical mechanics it's much easier to come up with the theory of equilibrium statistical mechanics Than non-e equilibriums Physical mechanics and like it took Humanity like like we got the equilibrium Theory way before we got a good non equilibrium Theory um yeah and I think there's a lot of things like that in multi-agent in multi like analyses of multi-age interactions like game theory in general is just all about equilibria not about how you get there I mean there's a there's some research on that but I think it's going to be a lot of hard work still to figure out oh do you have examples of uh these kind of non-equilibrium problems that maybe our listeners can help Sol oh um well I mean there's a lot of games where finding the Nash equilibria is NP hard so that means in particular that the two agents playing against each other if you take those two agents as a computation they're not going to be finding an Nash equilibrium unless unless they've got enough compute to solve an NP hard problem which they probably don't so just Google NP hard Nash equilibria and then you'll just see how many Nash equilibria just aren't really going to happen okay so um I guess uh with that out of the way um we'll get into the uh I guess the details of the paper so we're talking about um different principles who are who somehow have to negotiate over a policy that's um going to act in the world um and one theorem that's a little bit about this is the harani utilitarianism theorem right where uh there's you and me and where perhaps we're electing a government or something and haran's utilitarian theorem basically says well what the government should optimize is a weighted linear combination of our utility functions and in your paper you prove something that isn't that theorem um could you tell us a little bit more about why that theorem doesn't apply like or like why can't you just use that result yeah um so I mean sort of answering that kind of just is the theorem but I can try to like give an intuitive version of it um so um first of all the theorem is pretty easy you know it's just like it's just linear algebra it's not like a I don't think it's a deep fact I think the only thing special about it is like bothering to think about it um you also need to know a bit about convex geometry I guess like a tiny yeah a little bit but I mean it's really you know like if you just draw a picture it's uh it's kind it's kind of clear so um so haran first of all let talk about what harani theorem says haran theorem is is a brilliant theorem um it really simplifies the number of different ways you could imagine aggregating people's preferences um by by showing that basically many many different reasonable ways of doing it are all equivalent to just giving a linear weight to each person's preference and then maximizing that sum uh so that's cool and um it's it's a little counterintuitive to me in the sense that you know you you would it feels intuitively or it used to feel intuitively to me like there ought to be more different ways of aggregating preferences that that feel compelling but that aren't linear combinations and um it a lot of things that felt different from linear combinations to me just turned out to be linear combinations so that was kind of cool um and um I'm not sure I can even remember what they were now because my brain has sort of like compressed them into like the linear combination bucket um but you know this key Assumption of having the same beliefs is is a key Assumption of her's theorem so um and it's not even explicitly stated it's it's just like fact is kind of lurking in the background and it's kind of assumed that everybody has access to the facts um but in reality we don't have access to fact we have beliefs and we have things we do to update our beliefs and get better information um so um here's here is by the way uh still putting off on answering your question I'm G to say that the paper is not normative um you said like a govern you know like you can take car's theorem as being normative um and maybe he intended it to be normative but I I use it not normatively but just descriptively like look all all these things you might do they're all just linear combinations of preferences that's that's a nice simplifying fact u in the same way I don't take the negotiable reinforcement learning uh result or the toward negotiable reinforcement learning result uh as normative because actually I think there's a lot of bad outcomes that result from the Dynamics described in the uh it's it's simultaneously a negative result it's it's not exactly how I I don't think that's how you should do things so um um so I'm going to answer a slightly different question which is like why would you do it that way uh like if you were doing it that if you were you know accounting for differences in beliefs as described in the paper why would you be doing it um and uh the reason is this so let's just say you and I are deciding to um I don't know let's say we're deciding to do a podcast together and you're G to you're going to you know interview me in a podcast um that's a negotiation you know we got to decide out the time we got to decide which you know am I comfortable with the you know recording tools you're using or whatever all that kind of stuff uh and then once that's decided then we like go ahead and we do the thing we execute the sequential decision- making that is you know a podcast interview together um but uh before we do that there's you know there's always the possibility that we might just not the negotiation could just fail it could fall through and we go back to what people call the best alternative to negotiated agreement or Batman um so my bat of today if I didn't do this podcast um was going to be to write some things up I was going to do some writing um and I don't know what your baton was but like if this failed maybe you would have just uh interviewed somebody else today um and uh so if you want to maximize the probability that two people are going to choose to cooperate and you know execute a literally Co-operative um sequence of decisions um you want you want them to be able to find a plan that they both like more than their bad and so we both liked this idea of the podcast today more than the other stuff we were going to do so now we're doing it um and if there's any um if there's any sort of well what people call Paro sub optimality mean opportunities to improve the plan for you without making it worse for me if there's any Paro suboptimality um on the table then there's a chance that we're below year batna needlessly if if if we're make if we're in the midst of making a plan um and we're crappy at planning together we're get we're bad at negotiations uh such that you know the plan we have is Paro suboptimal meaning we could make it better for you without making it worse for me or we could make it better for me without making it worse for you that's bero suboptimality um if it's bero suboptimal uh you know there's a risk that the plan is going to be below your bat and you're going to bail and and it's below your bat needlessly it's like we could we should just bump it up and that way if we think if you treat your bat as a random variable if I treat your batna or a mediator were to treat both of our batas as random variables there's a chance that those random variables are going to be below the best negotiated plan we have so for me Paro optimization is is related to or subservient to maximizing the probability that the negotiators will succeed in coming up with a sufficiently appealing Cooperative plan that they choose to cooperate uh and that's because I think AI governance is going to require people to cooperate a lot um and negotiate a lot in the course of that um and so you um now if you if you want to maximize entirely for cooperation and not for other important principles like say fairness um one thing you you might inadvertently or you you might intentionally do this or you might inadvertently do this you you might take you might exploit people's differences and beliefs to implicitly have them bet against each other with the policy um so you know let's say um you know I uh you know I I guess you're using zenc Caster so let's say I don't know much about Zen Caster but um I think it's going to be fine because most software companies are reasonably careful with their with their data um and you know Zen caster and you know all about them and you happen to know that Zen Caster is like a terrible company that doesn't respect anybody's privacy and uh but you know that I don't know that um people studied this Dynamic by the way you know bargaining with asymmetric information that's not new um but if we par optimize we end up with this plan like show up on Zen cter and you know if you want me to do the podcast and and I and I want to do it subject to privacy constraints and I don't know about them I sign up to do it and then later I find out oh no you know my privacy is being violated by zanter and uh they recorded my n someone downloaded the data and recorded my neighbors conversations from like tiny Trace audio and now their privacy has been violated too um and that's you know I've been penalized for having incorrect beliefs about how Zen cter was going to turn out for me um and you can do that in a you know the singleshot version of that is just making a bet you know it's like we made a bet that that we we made a bet that kind of two bets at once you bet that you would like the plan and I bet that I would like the pl and I lost my part of that bet um because santer turned out bad so um the interesting thing is what happens when that bet settling is becomes a continuous process that happens for the rest of forever which is what you see in an AI system that is Paro ex anti meaning before it runs ex anti subjectively par optimal to its to the people or the principles that it's serving is that it will actually every time step for the rest of Eternity settles a little bet between the principles who created it or who who agreed to defer to it um and then if if one of the principles had very inaccurate beliefs about what the AI system was is going to observe um that principle's um priority the weight that it gets in the system's judgment goes down and down and down because every second it's like losing a bet for how much control the AI how much control that principle is going to have over the AI or how much the AI is going to choose to serve that principal's values and so you know in the same way I could lose one bet with you over how good zcas is going to be I could actually lose a whole series of Bets with you every second about how how Z C is going to turn out and if you got really accurate beliefs about the world you're going to win all those bets and our cooperation is going to end up great for you and worse for me um but you know it was my willingness to bet on my own false beliefs that caused me to cooperate with you in the first place and if I hadn't if I if I had known if I hadn't been deluded as Su Zen Caster's ethics I might have just not done the podcast so maybe and maybe that's ethically the right thing to to turn out to do but with AI I worry that fragmentation could be quite bad if it leads to like War over or or even just like standards there's like physical Wars and then there's standards Wars about companies you know just like fighting over what standards are going to be important because they're fragmenting and um you know I think that can cause a lot of chaos waste a lot of attention it could even if it's physical Wars actually get people uh killed if if if countries are fighting over over over AI uh technology in the way that you might see companies fighting over oil as a resource so you know it's I think I think there's there's I guess I want to point out an important trade-off this paper sort of points out a tradeoff between fairness and uh and cooperation which is that cooperation ex Ane rewards people with with more accurate beliefs upon entry into the cooperation and uh and it's unfair to the people who ended up who Who had who had wrong misconceptions of what was going on so um so your original question is like why should we use this well if cooperation the original question I think was why why doesn't the harani aggregation theorem apply right and why or like why should we use why should we use the belief updating rule instead or something I don't yeah something like that um and it's more like well um we could say in a meta in a meta problem of like fairness and cooperation being two different principles you want to serve uh you're trying to invent a negotiation framework uh that's pretty good for cooperation and pretty good for fairness I would say haran's approach is is is Paro suboptimal because you can get more cooperation um uh but it also you know you should also be adding fairness um as a constraint to herani so I think um you know it's I don't know the answer yet of what is what I would what I would personally subscribe to is like the right way I don't I don't think that there is no I I don't think that I'm not so sure that no one will ever find a way that I'll look at and say that's actually the right way let's do it I'm not so anti-realist um about that moral judgment but I don't have a strong view on it right now okay so there are a few questions I could ask from there um I gu first of all [Music] um I'll ask a quick technical question um in in this theorem you sort of assume that um policies can be stochastic you say can can you say a little bit about like what exactly you mean by that assumption because I think it's slightly different than what readers might think oh um I just mean that at every um time step you can randomize um what you're going to do so the the AI system you know at every time step it's like flipping a coin and and its policy is just what the weight of that coin is um and it can also randomize at the outset if it wants it can choose a random seed at the beginning uh to choose between two different random policies so it has a memory in a sense you you could there's a few different ways of formalizing it one is it it generates a random seed at the beginning of time and then remembers that seed for the rest of the time uh or you could just have that uh that it can remember everything is previously done including that initial coin flip so and that's that's the formalization that I adopted because it's stronger proves a stronger theorem okay yeah so yeah discussing this theorem um so we already have yeah so so if I think about um institutions where people can be rewarded or punished uh B on what they know it seems like we already have a few of those that are like that people are broadly okay with so for instance um the stock market like I have in my time I just take take exception to the claim that people are broadly okay with the stock market but the stock market is still I'll agree that the stock market is in fact still happening yeah I I think people are okay with some individual trades like if if if you and I trade in equity like uh the person who like knows more about the value of that equity in the future is like has an edge on that trade and I think I I guess I haven't seen Pauling my assumption is that most people are okay with the idea that like I can trade in equity with you um but maybe you don't think that well um claims about what most people think are okay are like a little bit dangerous um and I'm a little bit uncomfortable making them um how are they dangerous well they can force people into equilibria that they wouldn't have been they can force people into equilibria that they that they didn't didn't want to be in um so like you know you're in a room full of 30 people and somebody says well we're all clearly okay with uh the meetings being at 6: a.m. every week right and like there's this brief pause and then like now the meetings are at 6 a.m. every week and you know uh a bunch of people like objected but they didn't know everybody else would have objected so they didn't object um and so I don't know when you say everybody agrees blah I'm if I can think of some people who don't agree with blah I'm like hesitant to just get on the everyone agrees with blah train because I'm oppressing that view if I do it so I I'm not going to get on board with the everybody's okay with the stock market claim and I might not even get on board with the everyone's okay with asymmetric information trades although more people are I would agree more people are okay with that H like I could imagine a future where you know like education is a human right right now I could imagine a future where like informed trade is a right and not thinking that's ridiculous I mean we' create a lot more work for the economy to do to produce information for you know for people whenever they enter a trade um but it's like I think that's a table position I know people who think that bargaining without transparency is just bad and wrong and I'm like yeah I don't know I I don't want to like close the book on that by just saying everybody agrees with it already yeah I guess to F this tangent a bit um so I think one argument against this against like ensuring that every trade is transparent is sometimes it's like it impo it might impose like a really high communicative overhead for instance like iomic work exactly yeah and especially where it's like it's not like like suppose we're going to have a podcast today you know more potentially about what you'd be like on a podcast than I do and like a lot of that knowledge is implicit it's based on like you know your experience of you talking people over a lifetime that I don't have like it seems hard I'm not even sure I know what it would look like for that trade to be definitely comfortable saying some people sometimes are okay with asymmetric information trades and like I was okay with this one and I guess you are too and you know um yeah all right um hm but whatever you're I think that was just like a prude to some other claim you're going to make and I think you can still make that subsequent claim without couching it in like everybody agrees that asymmetric information trades are fine yeah I guess I guess uh the claim that I might make is we have institutions that have a symmetric information trades or at least like trades where participants believe different things when they make the trade um and those institutions seem like they work more well basically work um yeah I don't want to undercut that but flag for minor potential disagreements but go on yeah so like if I think about the well if I okay if I think about the stock market and the maing market right these are two well one of them is more like a market than the other but they're two cases where people make asymmetric information trades or at least like trades with different beliefs um they both the stock market as far as I can tell seems to successfully serve the purpose of like predicting itself in the future and the mating market like seems Sor what do you mean by mating Market uh I mean the market by which like humans pair off and like become romantic partners and maybe they like pair bond for life okay um The Mating Market seems like roughly successful in getting like a pretty large majority but not everybody uh romantic partner eventually and both of these like importantly both of these seem stable institutions like yeah it seems like they're above PE probably they're not literally above everybody's bner but they're above most people's batna they're like roughly doing what they're trying to do and like we're not seeing like really big revolts against them I mean so there have been revolts against the stock market yeah and so I guess I guess that's that's my Counterpoint yeah that's that's a decent Counterpoint um yeah h s which revolts are you thinking of specifically when I think of like revolts in that class I'm not sure I can think of ones that were actually like a stock market existed like what's what's Occupy Wall Street right it's like like let's just take the meme you know we are the 99% what happens in a world where 1% of people acquire the resources necessary to make the best predictions about where other resources are going to go and gain so much Advantage from that that they just dominate the exchange of resources to the point of accumulating you know most of the resources in a very in the under the control of a small minority of people now is that good or is that bad well it has properties right it's it's it rewards effort that uh makes people and institutions better at prediction so there's an incentive there to get better at prediction but it also is a little incestuous in the sense that these people institutions are just predicting each other it's like you said said the stock market is good at predicting itself but I mean what is the stock market doing it's it's ensuring efficiency of trade among the owners of a very large number of business activities um and is that good well I don't know maybe constantly changing the ownership of very large powerful entities creates a diffusion of responsibility where you can just rotate in and out board members and CEO of things go wrong and you know so it's not clear that the stock market is a good thing for all the people who didn't end up in control of it um and I I mean I I worked in finance so I think the stock market does some good and I you know I don't think it's um I don't think it's all bad and I wouldn't like erase the stock market right now if I had an anarchy button um but it's got I think it has some problems and I think many people agreed that it has some problems and one of the problems is that um you know it it just heaps resources onto onto people who are good at predicting it or institutions that are good at predicting it and you know and not um it really leaves out you know um just people who don't have those big powerful institutions behind them to help them make their financial decisions h i [Music] mean so this feature of the stock market doesn't generalize to other things we're talking about the one that I'm about to say okay sorry okay um yeah but um I mean the stock market does have this feature where you can just like buy the whole market and then if you if you're time if you're like more willing to wait for resources than like the market quote unquote is you can sort of just like buy the market waight and then eventually like get reasonably wealthy from doing that right um like you're just saying like the stock market has G IND funds stock market value has gone up over time and index funds like help protect you from the adverse selection of choosing which things to own yeah and like I I guess I'm saying that like even if I don't know much about like individual stocks or about like I I have very little information about what companies are doing what or like where profit is being generated they can buy an index fund and like right you'll never you will not become a billionaire by buying index funds that seems correct unless I'm a 999 millionaire to start with yeah I didn't mean to dock you as a non-b billionaire there um I just know you're not listed on any of the publicly public Registries of billionaires you could still be a billionaire yeah you don't know how much my shell companies have um yeah and and so I guess the analogy is if we made AI that literally rewarded people who knew stuff um this would not be like like it would it would reward some kind of like insider trading or like fooling others or something and this would like not be a stable situation is that a summary of what do you think that is a thing that I think yeah I don't say I wouldn't say it's a summary but it's a thing that I believe yeah and it's not really addressed by the neural paper it's more like the neural paper is pointing out NRL being negotiable re enforcement learning it's pointing out you know if you par optimize for cooperation or if you par if you just par optimize xan you get this bet settling outcome and the paper doesn't really explore you know very much about the unfairness of that of that outcome um it's a there's a little bit in there but that that's more of a future work type of thing that I hope people will will will think about okay yeah so speaking of that um um yeah I guess I guess you wrote this paper because you thought that it would have consequences in the world um or I gather that you did yeah uh yeah can can you say a bit more like what what things do you hope will happen because you wrote this paper yeah I mean at just proximately I hope people will I hope I hope you know researchers and AI students fact industry folks who are building sequential decision making systems will take an interest in differences in belief um as an interesting bearing on what happens with the system and I I I I hope that they can see wow there's something there's something different about how belief Bears on the system from the way preference Bears on the system um and or how belief ought to bear on the system versus how preference ought to bear on system you know preferences you just sort of leave them as they are whereas or you try not to disturb them there beliefs you know you have this opportunity to share information and update each other's beliefs for example that's completely missing from the paper I'd like to see people designing mechanisms for uh for individuals and institutions who govern powerful AI systems to share information with each other so it evens the playing field so they they all have the same same information they might they might have to there might be a small benefit to the to the institution that had more information at first or something to sort of sell their information to other institutions but um but I'm hoping we don't kind of lock the entire future into some technological equilibrium that really disenfranchised like a massive amount of people or a massive amount of different value systems that just didn't manage to like have a say on what what AI what AI does um so that's you know and to an extent there's a lot of people thinking about you know fairness and accountability in Ai and then transparency at least to to Engineers so there's a cluster fairness accountability transparency that really appeals to me here um and so you know maybe more if people in that uh people with those interests could think more about differences in belief and how that's going to play out in sequential decision making you know policies are going to run for a long time um how's that how should that play out yeah okay yeah so I guess a sub a follow on from that um I see this as kind of similar to the social Choice literature and I I guess if you're hoping if you're hoping and I'm not sure that you actually said this but if if it's true that you're hoping that um This research will eventually help facilitate bargaining and like thinking about okay how do we like actually get people to cooperate over the creation of like powerful artificial intell do you think the existing social Choice literature has analogously done a good job at fostering cooperation oh that's interesting I have wondered this um and I I don't know I mean almond and shelling and people like that were commissioned by Rand Corporation to try and and devise nuclear disarmament protocols um and they tried and they admittedly failed you know their their writings on this say look we tried we couldn't come up with anything and they seem to have earnestly tried I don't think they were lazy I mean maybe they were I just they didn't seem that way to me maybe they're just brilliant and they can have good ideas while being La um but so in a sense I'm going to say no like there were things that the world called on you know mathematicians and game theorists and decision theorists to figure out that they didn't figure out and I I think I think we're not done making that call I think I think we need to I mean that happened during the Cold War right so there it was time to to make that call and some of the greatest mind came together to to think about disarmament you know how do you like gradually deescalate uh a threatening situation between between nations but um there hasn't been that much work since like there hasn't been that flurry of of Brilliance in the area of you know how to Foster peace and cooperation as there was back then in the Cold War and I and I hope I think with the Advent of of increasingly capable AI technology um we're going to see more and more brilliant people taking an interest in how to maintain peace and Harmony in the world with that much capability so I'm half making a prediction and half making a bid that says you know let's have lots of smart people thinking about let's revisit let's revisit these foundational questions about how to achieve cooperation and see if we can do better than the 70s um yeah yeah yeah it's interesting that we stopped because it's not as if we don't currently live in a world where many countries have a lot of nuclear weapons or like many countries disagree about who gets what bit of land right it's true it's not as if we don't live in that world yeah um so another question on consequences um suppose there's a question in the so so uh at nurs in 2020 papers supposed to have a broader impact statement and the broader impact statement is supposed to include how could this research have a negative consequence if there's like a plausable way your research ended up uh making the world worse by means other than just opportunity cost there was something else that people could have done that was great but instead they paid attention to yours like like it sort of actively made the world worse how do you think that would have happened um yeah I mean I guess I've alluded to it right someone just grabs the formula from negotiable reinforcement learning and just like runs it and then a bunch of people uh end up unwittingly signed on to a protocol that's going to really that's that's exciting to all of them at the start because according to their own individual beliefs it looks great but some of the people's beliefs about how it's going to go down are wrong and then they end up getting really screwed over and I don't want that to happen so uh do I have a theorem for how not to do it or what's what's the correct balance between getting everybody together versus um versus um you know making sure everybody's signed out to something Fair no I don't but that's another potential future work you know maybe there's maybe there's an interesting boundary there between fairness and unity to be to be hugged uh but yeah if it goes wrong I think you know that that'd be the guess but that doesn't seem likely so so um I mean like it's just like one idea who's going to use this um but maybe it's the mode of the distribution yeah of like unlikely ways this idea could end up having a large netive effect okay um so speaking of the consequences the paper's been out for a while um how's the reception been yeah um it's it's there been a bunch of people who sort of came up to me to try to take an interest in it like um it's it seem it seems like what happens is um it seems like there are little pockets of interest in it but that are that are isolated or like not even Pockets like people there hasn't been like a whole Lab of people who all got interested in it and so there have been like person from this group person from that group like oh this is interesting um but I think it's hard for people to stay motivated on projects when they're kind of when it's not there uh like where the rest of their working environment is like not adequately obsessed with it or something so so there hasn't been it hasn't there hasn't been a flourish of of follow-up work on it and uh maybe now's the time for that you know maybe today maybe this podcast I also have more free time now I just finished up a giant document um you know maybe now is the time that that we can try to to get um you know a cluster of people working together to solve you know the next problems in what I would call like machine implementable social choice but uh hasn't happened yet we'll see all right so speaking of that um uh do you have any final things you'd like to say or um if people are interested in following your research how should they do so oh yeah uh well I guess I don't have a Twitter account out if that's what you're asking um uh but thanks for asking um and you know the easiest way to notice if I write something is just subscribe to me on Google Scholar you can make Google Scholar alerts that just tell you if someone publishes a paper so that should work um I mean I have a website but that's a more active attention intensive way of keeping track of what I do compared to the Google Scholar um and maybe you know maybe I'll get a Twitter account someday like it won't be this year though I think maybe I could lose that bet but it won't be in the next several months that's for sure okay well uh thanks for talking with me Andrew um and to the listeners thanks for listening and I hope you'll join us again
Related conversations
AXRP
7 Aug 2025
Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.
Same shelf or editorial thread
Spectrum + transcript · tap
Slice bands
Spectrum trail (transcript)
Med 0 · avg -5 · 133 segs
AXRP
1 Dec 2024
Evan Hubinger on Model Organisms of Misalignment

This conversation examines technical alignment through Evan Hubinger on Model Organisms of Misalignment, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.
Same shelf or editorial thread
Spectrum + transcript · tap
Slice bands
Spectrum trail (transcript)
Med -6 · avg -7 · 120 segs
AXRP
11 Apr 2024
AI Control with Buck Shlegeris and Ryan Greenblatt

This conversation examines technical alignment through AI Control with Buck Shlegeris and Ryan Greenblatt, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.
Same shelf or editorial thread
Spectrum + transcript · tap
Slice bands
Spectrum trail (transcript)
Med -6 · avg -9 · 174 segs
Future of Life Institute Podcast
7 Jan 2026
How to Avoid Two AI Catastrophes: Domination and Chaos (with Nora Ammann)

This conversation examines core safety through How to Avoid Two AI Catastrophes: Domination and Chaos (with Nora Ammann), surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.
Same shelf or editorial thread
Spectrum + transcript · tap
Slice bands
Spectrum trail (transcript)
Med 0 · avg -3 · 85 segs
Counterbalance on this topic
Ranked with the mirror rule in the methodology: picks sit closer to the opposite side of your score on the same axis (lens alignment preferred). Each card plots you and the pick together.
Mirror pick 1
TED Talks
18 Dec 2023