Library / In focus

Back to Library
AXRPCivilisational risk and strategy

Infra-Bayesianism with Vanessa Kosoy

Why this matters

Auto-discovered candidate. Editorial positioning to be finalized.

Summary

Auto-discovered from AXRP. Editorial summary pending review.

Perspective map

MixedGovernanceMedium confidenceTranscript-informed

The amber marker shows the most Risk-forward score. The white marker shows the most Opportunity-forward score. The black marker shows the median perspective for this library item. Tap the band, a marker, or the track to open the transcript there.

An explanation of the Perspective Map framework can be found here.

Episode arc by segment

Early → late · height = spectrum position · colour = band

Risk-forwardMixedOpportunity-forward

Each bar is tinted by where its score sits on the same strip as above (amber → cyan midpoint → white). Same lexicon as the headline. Bars are evenly spaced in transcript order (not clock time).

StartEnd

Across 60 full-transcript segments: median 0 · mean -2 · spread -100 (p10–p90 -70) · 0% risk-forward, 100% mixed, 0% opportunity-forward slices.

Slice bands
60 slices · p10–p90 -70

Mixed leaning, primarily in the Governance lens. Evidence mode: interview. Confidence: medium.

  • - Emphasizes safety
  • - Emphasizes ai safety
  • - Full transcript scored in 60 sequential slices (median slice 0).

Editor note

Auto-ingested from daily feed check. Review for editorial curation.

ai-safetyaxrp

Play on sAIfe Hands

Episode transcript

YouTube captions (auto or uploaded) · video HB7v6brXsfI · stored Apr 2, 2026 · 2,221 caption segments

Captions are an imperfect primary: they can mis-hear names and technical terms. Use them alongside the audio and publisher materials when verifying claims.

No editorial assessment file yet. Add content/resources/transcript-assessments/infra-bayesianism-with-vanessa-kosoy.json when you have a listen-based summary.

Show full transcript
before the interview i have a quick announcement to make in order to make this podcast better i've released a survey to get feedback from new listeners if you have a few minutes to spare i greatly appreciate it if you could fill it out it's linked in the podcast description as well as on excerpt.net now on to the main show hello everybody today i'm going to be talking to vanessa kosoy she is a research associate at the machine intelligence research institute she's worked for over 15 years in software engineering about five years ago she started ai alignment research and is now doing that full time she's authored three papers but today we're going to be talking about her sequence of voice on inforbationism that was co-authored by alex appel so vanessa welcome to excerpt thank you daniel all right so i guess the first question is in a nutshell what is informationism uh informationism is a mathematical framework that is uh meant to deal with the problem of non-realizability in reinforcement learning or like in the theory of reinforcement learning um so the problem of non-realizability is that you have a certain hypothesis space that your algorithm is trying to learn like it is trying to learn which hypothesis is correct but the real world is not described by you know but by not the real word is described by none of those hypotheses it because it's much more complex than any model could possibly capture and most of theoretical research or virtually all theoretical research in reinforcement learning focused on the case such as assuming that the world can be exactly described by one of the hypotheses and like that's where information is comes in and kind of explains what happens when we drop this assumption okay so um i guess how how big a problem i gather that this is this work is motivated by like ai alignment or ensuring that um you know when we create like really smart ais in the future they're gonna do what we want how big a problem do you think non-realizability is for this well there are several reasons why uh i'm interested in this in the context of vi alignment um one reason is that uh i mean one reason is just deconfusion right like trying to understand uh what does it mean for a reinforcement learning system to operate in a non-realizable setting and like how to think about this mathematically the other reason is because if ultimately we want to have algorithms that satisfy formal guarantees if you want alignment to be like provable in some mathematical model then non-realizability is one of the issues we're going to have to deal with in that context and a third reason or it's like a third group of reasons maybe is that there are multiple questions uh related to deconfusing ai that kind of um kind of hit this non-realizability obstacle for example the question of embedded agency which miri has been thinking about for some time and and like promoting is one of the important problems is something that's closely related to non-realizability because if your what is this what is embedded agency i should ask right so embedded agency talks about agents that are part of the physical world right so like the classical uh approach to reinforcement learning is kind of viewing an agent as just two channels input and output and there's no physical world anywhere in the picture and embedded agencies trying to understand how to how to fix that because uh for example human values are not necessarily expressible in terms of human inputs and outputs right so how do you think of that and because of various failure modes that can happen because we are not taking into account in our models that the eye is actually part of the environment and are just like a completely separate thing and this is closely related to non-realizability and indeed i have a program how to use information is how to apply to uh understand embedded agency and that's like one thing another thing is reflection for example agents to do self-improvement that also very quickly leads you to problems of non-realizability because in reflection an agent should be thinking about itself right but it cannot have like a perfect model of itself why can't it have a perfect model of itself well it it it's sort of this kind of paradox because that then it enters a sort of infinite loop right it's like why you cannot predict what it's like nobody can tell you a prediction of something of what you would do because maybe you will listen to this prediction and do the opposite or something like that or just you can think about it just complexity theoretically like the space of hypothesis the agent is working with their computational complexity is lower than the computational complexity of the agent itself because it needs to somehow work with all of those hypotheses so the agent just is too big to fit into its own hypothesis space uh it's really kind of similar to like self-referential paradox as we have in mathematics like in logic or other places yeah but but you can also view it as a special case of non-realizability so information is can help you there also and another thing is decision theory and this is something that we actually wrote about in the sequence that informationism basically solves all the paradoxes like nikon's problem you know the other types of prog problems that miri were writing about like counterfactual mugging and so on you know that the sort of problems uh that updateless so-called updateless decision theory or something is called functional decision theory was invented for but never fully mathematically formalized information has solved those problems or at least a very large class of those problems in a way that's completely mathematically formal yeah so we see that there's like a whole range of different applications yeah just just uh getting on that last part for a second for those who might not be familiar what what is decision theory like what's the what type of thing is decision theory trying to be a theory of yeah so decision theory is the field that talks about um it talks about how do we think of making rational decisions mathematically and i guess it began with the work of or at least the mathematical side began with the work of fun for neumann and morgenstein where they proved it expected utility maximization is like always the correct way to make decisions under some reasonable assumptions and it developed from from there so so like game theory and economics are all kind of offsprings of decision theory in some sense and uh specifically miriam yukowski wrote about some interesting paradoxes um like the newcomer paradoxes that kind of that are a problem for classical decision theory and it showed that our like understanding of decision theory is lacking in in the statistical examples so so what is the nikon paradox okay so the nikon products is um is the following setting uh you're playing a game with some entity called a mega and omega is something very powerful it it is omega is very smart or it just has a huge computer and it's so powerful that it can simulate you and it can predict everything you do right and that's that's completely like in theory that's completely possible because we know that our brains operate on the laws of physics so if you have a big enough computer you could in theory predict everything that a person would do and then omega offers you a choice you have two boxes and you need to choose and there is money inside of them and you need to choose either the first box or both the first books and the second box together and this seems like a really silly question like obviously if you can choose only a or a plus b then a plus b is always better than a but there is a catch and the catch is that omega predicted what you would choose and if you chose to take only the first box then it put a million dollars in that box beforehand before the experiment the game even started if you chose to take both boxes then the first box is going to be empty and the second is going to have a thousand dollars so the result is what happens when we see someone picking the first box like only the first box they walk away with a million dollars and when we see someone picking two boxes they walk away only with a thousand dollars so it seems like in some sense it is better to pick only one box however classical approaches to decision theory like causal decision theory which is like popular in philosophy they would advise us to take both boxes so something is wrong here and and philosophers have kind of debated these questions but nobody until now gave a completely formal decision theory that kind of explains you know why formally why you need to choose one box in this situation okay i should say i think evidential there's this thing called evidential decision theory that does choose one box in this scenario is that right right you have a financial decision theory which does choose one box but then it fails in other situations so it's like yeah and like here the idea is if you want to succeed in all of like all possible setups in which there is something predicting you and interacting with you you want to be successful in all of this so for evidential theory for example there is the sore blackmail problem in which evidential decision theory fails and causal decision theory succeeds and then there is a counterfactual mugging problem in which both evidential decision theory and causal decision theory both fail on the other hand uh updater's decision theory is supposed to succeed in all of them only to ablate this decision theory was not formally mathematically defined until information is okay so just to give listeners i'm still gonna ask preliminary questions um just to give listeners a better sense of um more of a sense i guess of what we're gonna what problems we're gonna be trying to solve um what is this counterfactual mugging thing which uh classical decision theories don't do well in but your new updateless decision theory inside of infrabasianism is going to do well in all right so counterfactual mugging is another game with omega and like before omega can always predict everything you will do and this time what happens is that omega flips a fair coin and if the coin falls on heads then omega comes to you and asks you for a hundred dollars and you can either agree to get to give them the hundred dollars or you can refuse on the other hand if the coin falls on tails then omega might give you a million dollars but they only give you the million dollars if in the counter factual scenario in the scenario where the coin falls heads you would agree to give them the 100 so in this case if you're the type of agent that would agree to give omega 100 your expected uh profit in this whole scenario is like 500 000 minus you know 50 dollars and if you're the type of agent who would refuse then your expected profit is zero which is much worse but all like this classical decision theory would refuse to to pay the 100 yeah and this is basically as i understand it because like at the time they're asked the hundred dollars basically like classical decision theories just say well i can either pay you a hundred dollars and get nothing or i cannot pay you 100 and don't not lose anything and so from now on my life is good if i don't pay you and they're just like not considering about what would have happened in this like alternate universe yeah exactly like what what happens is that in the scenarios classical decision theory they would choose to pre-commit to paying the 100 if they could but if they cannot pre-commit they have no like they have no intrinsic mechanism of doing so they need like some kind of an external crutch that would allow them to to to to make it through those scenarios okay so to summarize all of that it seems like the point of informationism my read is that it's going to do two things for us firstly it's going to let us have um something like a good i don't know a good epistemology or a good decision theory about worlds that we could be in where where like the real world isn't one of the worlds that we are like explicitly considering um but like we can still do okay in it and secondly it's going to let us choose actions it's going to give us like a decision theory that is going to solve these problems like we're going to we're going to be expected to do well in these problems where some other agent is simulating us and figuring out like what we are going to do or what we would have done how things turns out differently is that a fair summary of like what we want to get out of informationism it's it's sort of a fair summary i would say that the second is really a special case of the first because when we have agents that are predicting us then it means that we cannot predict them right because otherwise we have this against a self-referential paradox so agents that are if our environment contains agents that are predicting us then it means that necessarily our environment our environment is not described by one of the hypotheses we are explicitly considering so like the ability to deal with worlds that are more complex and your your ability to to you know fully described is what gives you the power to sort of deal with this and you combine scenarios okay so one thing that people will notice when they start reading um these posts is there's a lot of there's a lot of like math defined at the post right it's not just uh we're gonna have some we're gonna have like bog standard probability distributions and go from there um why why do you need this new math why can't we just do all of this with normal probability theory that many listeners know and love well um the thing is that like normal probability theory is kind of the problem here the whole problem with non-realizability is that bayesian uh probability theory bayesian epistemology doesn't really work with non-realizability like when your environment is precisely described by one of the hypotheses in your prior then we have theorems like the merging of opinions theorem which shows that your uh predictions about what will happen will converge to what actually happens but if your environment does not like if your prior is miss specified that's what some something that something it's called specified prior problem in statistics then there are no guarantees about what your bayesian reasoning will give you there are sometimes guarantees you can prove if you assume the environment is ergotic but that's also a very strong assumption that does not hold in realistic cases so bayesian probability reaches oh um what yeah when you say if the world is ergotic what what does it mean for a world to be ergotic how could i know if my world was or was not yeah so ergotic is like i want try to give the formal definition but roughly speaking it just means that uh everything converges to stationary probability distribution that's what our god it means so like either thinking of something like a markov chain with a finite number of states then it always converges to stationary probability distribution after sufficiently you know sufficient time and like in physics also like when systems reach thermodynamic equilibrium they converge into a stationary distribution and it's true that eventually everything reaches from a dynamic equilibrium but in the real world all the interesting things happen when we're not at thermodynamic equilibrium right and like when we reach thermodynamic equilibrium that will be like the heat f of the universe so that's like not a very interesting like regime so okay now that you so if we can't use normal probabilities can you give us a sense of like roughly what are the things we're going to be working with if we're going to try to be infrabations yeah so and maybe this is going to explain what why why we're using the sub the prefix infra yeah so informationism um well information is first of all it's built upon something called imprecise probability so i haven't really invented it from scratch i took something called uh imprecise probability although when i had the idea i did not really read much about the precise probability i i probably just heard the general idea vaguely sober when i started thinking about this but anyway so we we took some concept called imprecise probability which is something uh which is already known in decision theory in some contexts and is used for mathematical economics for example and we apply this to the theory of reinforcement learning and we also generalized it in certain ways so our main novelty is this is like creating this connection and imprecise probability what it does is it says well instead of just having one probability distribution let's have a convex set so formally a closed compact set of probability distributions so for example just to like think of a very simple example suppose we have our probability space has just two elements right so a probability distribution is just a single probability between zero and one so what imprecise probability tells us is instead of using a probability use an interval of probabilities like you have some interval between i don't know maybe your your interval is between 0.3 and 0.45 or something and when you go to larger probability spaces it becomes something more interesting because you have not just like intervals but you have some kind of convex bodies inside the space of probability distributions and then we did a further generalization on top of that which kind of replaces it with uh concave functionals in the space of function but but if you want to just understand the basic intuition then you can be just thinking of this kind of sense of probability distributions okay and i guess this gets to a question that i have which is is the fact that we're dealing with these convex sets of distributions because that's the main idea and i'm wondering how that lets you deal with non-realizability because like just because it like it seems to me that if you have a convex set of probability distributions standard pageanism you could just you know have a mixture distribution over all of that convex set and you know you'll you'll do well on things that are inside your convex set but you'll do poorly on things that are outside your convex set yeah could you give me a sense of like how maybe this isn't the thing that helps you deal with non-realizability but if it is how does it so the thing is a kind of accept you can think of it as some property that you think the world might have right like suppose just let's think of a trivial example suppose your world is a sequence of bits so just an infinite sequence of bits and one hypothesis you might have about the world is maybe all the even bits are equal to zero so this hypothesis doesn't tell us anything about odd bits it's only a hypothesis about the even bits and it's very easy to describe it as such a convex set which is considered all probability distributions that predict uh that the odd bits will be zero with probability one and without saying anything at all like the even bits they can be anything the behavior there can be anything okay so what happens is if instead of considering this kind of accept you consider some uh distribution on this convex set then you always get something which makes concrete predictions about even bits like it's important to think you can think about it in terms of computational complexity or the probability distributions that you can actually work with have bounded computational complexity because here you have bounded computational complexity therefore as long as you're assuming a partic a probability distribution a specific probability distribution or it can be a prior over distributions but that's like just the same thing you know you can also average them get one distribution it's like it's like you're assuming that the world has certain low computational complexity so one way to to think of it is that bayesian agents have a dogmatic belief that the world has low computational complexity they like believe this fact with probability one because all the hypotheses have lot of computational complexity so like you're assigning probability one to this fact and you know and this is a wrong fact and like when you're assigning probability you want to something wrong then it's not surprising to iran into trouble right like even bayesianists know this but but they kind of help it because you know it's just nothing you can do invasionism to avoid it so with infra patients you can have some properties of the world like some aspects of the world can have low computational complexity and other aspects of the world can have high complexity or they can even be uncomputable like with with this example with the bits your your hypothesis it says that the odd bits are zero the even bits they can be uncomputable they can be like the halting oracle or whatever you're not trying to to have a prior over there because you know that you will fail or at least you know that you might fail that's why you have like different hypothesis in your prior it is the idea that like thinking about the infinite bit string example right is the idea that like like in my head i'm gonna think about okay there's one convex set of like all the distributions where all of the even bits are zero you know maybe like there's another hypothesis in my head that says like all the even bits are one maybe there's a third that says that all of the even bits maybe they alternate or maybe they like spell out the digits the binary digits of pi or you know they're all zero until the trillionth one and then their one is the idea that like what i'm gonna do is i'm gonna have like a variety of these convex sets in my head and the the hope is i'm gonna hit the right convex set uh so basically it works just like bayesianism in the sense that you have a lot of you have a prior over hypothesis so just invasionism every hypothesis is a probability distribution whereas in informationism every hypothesis is a convex set of probability distributions and unlike you have a prior over those and if the real world happens to be inside some of those sets then you will learn this fact and exploit it yeah so getting so think about the connection to the mathematical theory so so in the mathematical theory you have these things called sa measures and you have like these sets of these uh closed convex cones which people can look up but these sets of sa measures and they're like minimal ones in the space is the idea that the the minimal things in these coins these are the like convex sets and yeah i i guess i'm still trying to get a sense of like just exactly what the connection is between um this description and the theory yeah so the thing is that what i describe is what we call in one of the latest posts we called it crisp infra distributions so a convex set of probability distribution that's the like in some sense the simplest type or like one of the simplest types of infrared distributions you can consider but there are also more general objects we can consider and the reason we introduce this more general objects is to have a dynamically consistent update group because like in ordinary evasionism you know you have some some beliefs and then like some event happens and you update your belief and you have a new belief and like you have you know the bias theorem which tells you how you should do that and we were thinking about okay how should we do that for informationism and we started with just sets of we started with just those convex sets of probability distributions which are just like something taken from imprecise probability and with those objects you do not have a dynamically consistent um update rule so you can still work in the sensitive like just decide all of your policy in advance and follow it but there's no way to do updates well um like the naive way to do updates is by sort of updating every distribution in your set but that turns out to be not dynamically consistent so it's like the behavior it prescribes after updating is not the same behavior it's prescribed before updating because like the decision rule you're using with those config sets is the maximum decision rule so you're trying to kind of uh you're trying to maximize your worst case expected utility where like the worst is taking over this kind of accept and then when there are several things that can happen then your optimal policy might be something that kind of tries to hatch your bats something that tries to okay i'm going to choose a policy session in either of those branches i will do to poorly but then when you're just in one of those branches and you're updating you're kind of throwing all those other branches to the garbage and forgetting about them yours like new optimal policy is going to be something different so that's like like the counter factual mugging that we discussed before is like exactly the perfect example of that where like just like updating causes you to kind of disagree with your a priori optimal policy so in order to have a dynamically consistent object rule you need some object which is some mathematical object which is more general than just those kind of extensive probability distributions okay and so the object you have are these convex sets of like souped-up probability distributions is my rough understanding yeah so there are like two ways to think about it uh which are related by uh legendary fanshawe duality uh yeah the way we introduce them initially in the sequence is we consider so instead of just thinking of probability distributions think of measures that uh like measures that might not sum to one and you also have like this constant term like you're thinking of a probability distribution that someone multiplied by a constant and also kind of added a constant which is like not part of the distribution but it's like added to your expected utility and then you have a convex set of those things and like with those things you can have um you can have dynamic consistency because this constant term is kind of keeping track of the expected utility in counter factual scenarios and the multiplicative term is keeping track of the probability that you you like your a priori probability to end up in the branch in which you're added up but there's also another way which i personally think more elegant to think about it where instead of dealing with those things you're just thinking of um you're just thinking of expected value as a function of functions right so like what what what do we have what are the probability distributions even for therefore taking like you can take it probability distributions are just gadgets for taking expected values and in classical probability theory these gadgets are always linear functionals so every probability distribution gives you a linear functional on your space of functions like the space of things that you might want to take an expected failure and concretely this this functional is like taking a function from like from your i don't know event set to the real numbers and um this functional is taking okay on average what's the value of this function given my distribution over over like what the what the true event might be um and this ends up being linear in what the function from events to real numbers is just to clarify that for the listeners yeah exactly um so in classical probabilities here there's like a classical theory which says that you can just think of probability distributions as linear functionals like you can literally just like consider all linear functionals they're continuous like there are some technical mathematical conditions they should be continuous in some topology and it should be positive so like if your function is positive the expected action is also should be positive but then you can just like define you can define probability distributions as functions it's just like equally well equally good way to define them and then the way you go from classical probability theory to informationism is instead of considering linear functions you consider functionals that are monotonic and concave so monotonic means that you know you have a bigger function it should give you a bigger expected value and concave is you know is concave when you're averaging several functions you like your value should only become higher and like just you can consider just all of those concave monotonic functionals and those are your infra distributions so the fact they're concave you can intuitively think of it as a way of like making risk averse decisions basically because that's like what corresponds to this maximin rule and like that's why those things are also used in economics sometimes because like you want to be risk averse okay so we have these um infra distributions which are kind of the same thing as um these concave monotonic functionals yeah in this case where i'm not sure i i'm just like there's an infinite sequence of bits and i have no idea what the even bits are going to be but i have some guess about like you know the odd bits maybe they're all zeros maybe they're all ones i have like a variety of hypotheses for what the odd bits might be what does my infrared distribution look like right so every every hypothesis is like a particular infra distribution like if your hypothesis is like in example with sequences of bits you would have like hypotheses that only say things about even bits you would have hypothesis that only thinks about the other beds you would have hypotheses that say think about all bids you would have hypothesis that only that you know things that the sore of every bit with the next bit is something or you know whatever and like and and basically like the convict said if you're thinking in terms of kind of accepts then your convict set is just like the convict set of all things that have a particular property like for example if if you're thinking of like okay my hypothesis is all even bits are zero then your convex set is the set of all distributions that um assign probability one to the even bits being zero so like in this more general thing you have sa distribution so on but that's like just a technicality that just means you need to like take your set of distribution and close it by taking the minkowski sound with some cone in in this bonus space but that's like really just a technical thing okay so so it's the idea that i have that like for each hypothesis for each specific hypothesis i might have i have this like con i have this convex set of distributions and i'm gonna be doing this like maximum thing within that contact set of distributions but like is the idea that i'm gonna be basically just evasion over these hypotheses and how the hypotheses internally behave um is determined by this like maximum like strange update type thing uh sort of yeah i mean like there are there are multiple ways you can think about it one way you can think about it is that your expected utility your like prior expected utility is take expectation over your prior over hypothesis and then like for each hypothesis take minimum of expected values inside this convex set or you can equivalently just take all of those convex sets and combine them into one convex set like you can literally just take like okay let's let's assume we're we we choose some point inside each of those sets and then we average those points with uh uh with our prior and like the different ways of choosing those points they give you different ways of they give you different distributions and so you get a new convex set so that's just like an ordinary patient is right you have a set of hypotheses each hypothesis is a probability distribution and you can also and you have a prior over them or you can just like average them all using this prior and get just a single probability distribution so there's like two ways of looking at the same thing all right cool hopefully this gives listeners something of an overview of informationism and there's i don't know if they want to learn the maths of it um they can read the posts for the mathematical theory going forward a little bit so it seems like this is a theory of you have a single um a single agent that's inside a big scary environment that's uh really confusing and complicated and the agent doesn't know everything about this environment i think in ai we we like to think of this uh i don't know some kind of progression from a thing that's reasoning about the world to the thing that's acting about the world to the situation where you have multiple agents in the world that are jointly acting and you know their decisions affect each other like game theory um so i'm wondering is there is there in for a game theory yet and if they're oh or if there isn't total infra game theory has there been any progress made towards creating it yeah so that's a good question um in fact one of the one of the very interesting applications of informationism is exactly to multi-agent scenarios because multi-agent scenarios are another example where ordinary bayesianism runs into trouble because of this relatability assumption right if we have multiple agents and each of those agents is sort of trying to understand the other agent trying to you know predict it or whatever then we again get the self-referential products like we can have a situation where agent a is more powerful than agent b and therefore agent a is able to have b inside its hypothesis space but we usually cannot have a situation where both of them have each other in their hypothesis space so there are like sometimes a way you can you can kind of have some trick to go like there are reflective oracles which is a work by miri that sort of tries to solve it and have agents that do have each other inside each other's hypothesis space but it's very fragile in the sense that it kind of requires your agents to be synchronized about what type of of reflective oracle like what type of prior they have they need to be like synced up in advance in order to be able to to to kind of close this loop which is a weird assumption um and information is might give you a way around it because information is precisely designed to deal with situations where um you know your environment is not precisely describable and in fact it does give you some results so you can immediately for example it's just trivial to see that in zero some games information is gives you optimal performance because well that's kind of trivial because information is sort of pessimistic so it's it imagines itself playing some kind of zero sum game but i well i haven't really written this is like some uh open problems some kind of area where i wanted to work but i haven't really developed it yet but it seems like for example one result that i believe you can you should be able to prove is that information agents that are playing uh a game in a non-cooperative setting will converge to only playing realizable strategies and then they will then you will have a guarantee that your payoff will be at least the maximum payoff inside a space of non-realizable strategies so that's already non-trivial guarantee that you cannot easily get with patients okay and yeah i guess related to the open work um so you late last year um these results were put online i'm wondering how the reception's been and what open problems you have and how how much development there's been on those yeah so by the way did i say realizable i meant rationalizable okay uh yeah so currently we continue to to to work on developing the theory we have well we have a ton of open research directions so like i said we want to apply to embedded agency and we have another post coming up which kind of gives some more details about like decision theoretic aspects of it and the game theory thing is like another direction and another direction by the way which we haven't discussed is the relation of this logic because there are also some very intriguing ways to make connections between informationism and logic and like in some sense like my my um i have this sort of thesis that says that informationism is in some sense a synthesis of probability theory and logic or or of you can think of it as a synthesis of inductive reasoning and deductive reasoning and yeah so we've been working on um a number of things uh also on like proving uh concrete regret bonds because in reinforcement learning theory what you ultimately want is you want to have like specific quantitative regret buttons that give you like specific conversions rates and we have derived some for some toy settings but we we want to have regret buttons for some more interesting uh settings so yeah so like there are a lot of direction there's work we're doing there are probably multiple papers we're going to write on the topic so i think the first paper will be some paper that will have like some basics of the formalism and proofs regret bounds under in some like basic assumptions like uh information bandits or information mdps but yeah there's definitely a whole avenue of research to be down there okay yeah i could you say a bit more about the connection to logic because this wasn't quite apparent to me when i was reading the posts yeah the logic connection is something that i think we haven't really talked much about in the post so far so the basic idea is is kind of simple like once you have kind of access once you have convex sets that you have a natural operation of intersecting those sets and there's another natural operation of taking the convex hall of those sets so in other words the sense form what's called a mathematics lattice like some sets are inside other sets and like you have a joint and and like a meat like you have the least upper bound and uh like the um the maximum lower bond which is basically just you can intersect and you can union it's intersect and convince hull because because there have to be comics all right yes yeah so and that gives you assertive logic and but and yeah and actually you can think of ordinary logic as sort of embedded in that thing because if you have some set then to every subset of the of it you can correspond an infrared distribution which is just like all distributions supported on the subset and then your operations of intersection and convex hall correspond to just intersection and union but but you also have like things that do not correspond to any subset and you can think of those things as some kind of a logical disjunction and conjunction but it's not distributed so that's like not classical logic it's some kind of weird logic and you can also define existential and universal quantifiers that kind of play well with this thing and then you can start like and then what's nice about it is that you can kind of use this sort of logic to construct your information hypothesis so like if your hypot if your hypothesis space for example consists of what we call infra palm dps which is like the information equivalent of pom dps then you can you can use the language of this information logic to specify hypotheses and that's really interesting because maybe when you do that you have some useful algorithms for how to optimize uh for like how to control those hypotheses and how to learn those apotis which you don't have for just you know arbitrary mdp instead do not have any structure and there might also be algorithms for solving problems in this information logic that do not exist for classical logic because classical logic is often intractable you know like propositional logic is already empty complete and like first order logic is is is not not computable but but like information logic well we haven't really proved anything about that but like there is like some hope that it's more computationally tractable in in some uh like under some assumptions so when we were talking about informationism there are kind of two things that informed bayesianism basically promised us right um so the first thing was that it was going to deal with the problem of non-realizability in environments like maybe we just can't imagine the true environment but we still want to learn some things about it and the second was to do is going to help us solve these decision problems um like uh like newcomb's problem where you know you can one box or two box and if somebody predicts like depending on the prediction of what you'll do um one might be better than the other um and also counter factual mugging where uh we you know somebody flips a coin and if it lands tails then depending on what we would have done if it lacked heads we can be better or worse off um so these are problems where um the environment is kind of simulating you and will make your life better or worse depending on your policy in other states so how does infrabasianism help us solve these decision problems right so what happens is that uh like what does it mean to solve a problem it means that the agent can build some a model of the problem which will lead it to taking the right actions the actions that will give it maximal utility so the question really becomes like the usual way uh those problems are considered is by starting with something like a causal diagram but here we are kind of taking a step back and saying okay suppose that the agent encounters this situation and the agent tries to understand what is happening it is trying to learn to build some model of it so the easiest way to imagine how like this thing can can happen is in an iterated setting when you're playing the same game over and over or that's newcomers problem or contrafactual monkey and then you can look okay given that my agent is in this iterated setting what sort of model will it converge to and what sort of behavior will it converge to so we're not assuming uh something like a causal diagram description instead we're letting the agent learn whatever it's going to learn in that situation and what happens in situations with uh predictors the predictor agent is there there is always uh this kind of model available which says uh well there is something in the environment which is doing those things and i have so good sort of knighting uncertainty about what it's going to do so that means that in my convex set of probability distributions so i'm reminding that like in informationism our hypothesis were convex sets of probability distributions so in our convex set we're allowing this this predictor this amica to make whatever predictions it wants because there's no way to kind of directly say well it's going to predict what i will do that's like not you know the standard it's like not a legitimate type of hypothesis like the standard way you build hypothesis in reinforcement learning right like hypothesis and reinforcement learning or you take an action and you see an observation then you take action so so this omega can do whatever it wants but then there arrives some moment in time when its prediction is tested so omega made some predictions wrote them down somewhere where we cannot see and then doing things according to those predictions and then the prediction is getting tested and at this point well what happens is that if this predictions turns out to be false then uh we can imagine that what happens is a transition to some state of infinite utility so that sounds weird at first but but like why so why is state of infinite utility well first of all notice that this is consistent with observations because the agent will never see a situation in which a prediction is falsified because by assumption the predictor is a good predictor so this will never actually happen so it is consistent from the agent's perspective to assume this but once we assume this then the optimal policy becomes behave as if those predictions will be true why because we're always planning for the worst case and the predictions becoming false can never be the worst case because if the predictions are false if they are falsified then we end up with us in the state of infinite utility so that's can never be the worst case therefore the worst case is always going to be uh when omega actually predicts correctly and our policy is going to be the optimal policy given that omega predicts correctly which is the unity policy now there is another thing where you can develop it so initially i just like introduced this idea as a kind of ad hoc thing but then we noticed that and and that's like still in the upcoming post that we're going to to publish soon then we kind of noticed that if you also allow convex sets which are empty so kind of beliefs that kind of describe the notion of contradiction like something which is you know impossible and your hypothesis are infra mdps so that's like the information version of mdps then you can instead of literally transition to a state of infinite utility you can use a transition or the transition kernel is an empty set and that's like you're just the information representation of an event that is impossible like an event which is a contradiction which does model this hypothesis for bits from ever happening and this kind of explains why you have infinite utility because our utility is always taking the minimum over this convex set but if the convex set is empty then you get the minimum with infinity all right that um when when you were saying that i i was reflecting on how the infinite utility thing seems like kind of a hack but but the nice thing about that empty transition kernel thing is it's sort of more naturally expresses your notion of impossibility right like nothing can happen if you do that yeah exactly so the way this setup kind of works is you're talking about situations where the predictor has like some idea of what you're going to do and then there's some there's some possibility that the prediction is falsified right um and if you act contrary to what the predictor thinks then the then you get infinite utility and so um you don't you know the murphy you know the the laws of minimization never pick the environment where the predictor gets wrong so one in the original posts one thing you kind of talk about as a challenge to this is transparent newcomb's problem so in transparent newcomb's problem there's box a and box b box a definitely contains one thousand dollars um box b either contains zero dollars if um the predictor thinks you're going to take both boxes or it contains a million dollars if the predictor thinks you're only going to take box b um but the difference to normal new crimes problem is that when the agent walks in the agent can just see what's inside box b um and so the agent already knows which way the predictor chose and in your post you sort of describe how this poses a bit of a challenge right because um there's no possible way for the predictor to know like if the protector guess is wrong and thinks that you would that if the box were full then you would take both boxes and therefore the predictor makes both makes the box be empty then like the predictor will never know what you will do in the world where both boxes are full um can you describe how how you think about that kind of situation and if there's anything informationism can do to succeed there yeah so this is correct so what we discovered is that there is a certain condition which we called cellular causality which kind of selects or kind of restricts the types of problems where this uh the this thing can work and um yeah and yeah so so the other consulting condition it basically sends out um well it basically says that whatever happens cannot be affected by your choices in a counterfactual which happens with probability zero and that's why like it doesn't work in transparent income well the uh when the outcome depends on your action when the box is full in the version of transparent boxes when the outcome depends on your action when the box is empty everything actually does work and right because like if if what happens is uh like if the predictor conditions its actions and wherever the box is empty and i decide to one box then the prediction is is not allowed to show me the empty box because because then i will one box and will falsify its prediction so that that's something that we call kind of effective soda causality which means that those controversial kind of affect you but only in one direction so one way to look at it is you can look at it as a sort of a fairness condition like one of the uh debates in decision theory is like which kind of decision problems should even be considered fair like in which kind of decisions problems we should be expected to succeed because obviously if you don't assume anything then you can always uh invent something which is uh kind of designed to fail your particular agent because you can say okay a mega if it says an agent of this type it it does something bad and if it is an agent of a different type it does something good so and and this is this is a something which was used to defend city but but then like elizaritkowski in his paper and timeless decision theory writes that yeah of course there should be some fairness condition but that does not justify too boxing newcomb's problem because newcomb's problem the outcome only depends on the action you actually take not on the algorithm used to arrive the sections and this was a kind of uticoscis proposal of what uh the fairness conditions should actually be so in seoda and in uh informationism in like the straightforward version of informationism the fairness condition that kind of informationism is able to deal with is uh the environment is not allowed to punish you for things you do in counterfactuals that happen with probability zero so the environment can punish you for things you will do in the future or things you will do in a counterfactual but not for things that you will do in a counterfactual that can never act this like has a probability zero so it's like it's not even a real counterfactual in some sense uh so the way you can like what what can you do with this like one thing you can just accept that this is a good in a fairness condition and maybe it is i i'm not sure like it's it's hard to to you know to say definitively uh another thing is you can try to find some way around this and we found basically uh two different ways around this how to make information and succeed even in those situations but both of them are kind of a little hacky so like one of them is uh introducing the thing well called we called it survives so you kind of do a formalist where you allow infinite decimal probabilities and then like those counterfactuals never actually happen they they get assigned an infinite decimal probability and then it kind of works but it's it's really complicates the mathematics a lot uh another thing you can do to kind of try to solve it is assume exploration so just like assume uh one thing to notice is that if we take this transparent nikon problem but assume some noise so assume that there is some probability epsilon such that uh the box will come out full no matter what right so like omega sometimes like just randomizes the outcome then immediately this problem becomes effectively the causal so because the contrafraction no longer has probability literally zero everything is fine so that's just in baseline information but what you can do to kind of use this is you can just add kind of randomization on purpose you can make your agent be be a little noisy so that's like basically a sort assertive exploration making to sometimes take random actions which are kind of not the action it intended to do so like there's the action the agent intended to do and then like what actually came out and if you kind of add this noise then the transparent newcomb problem effectively becomes equivalent to the noisy transparent snicker problem and you again succeed solving it model like some small penalty for the noise and there is some arguing here that maybe this is a good solution because we know that in learning algorithms we often need to add noise anyway in order to have exploration so maybe you can kind of justify it from this angle but uh so i don't know maybe it's a good solution but i'm not sure when when you say that exploration helps so i see how um like if you think of omega the uh the predictor kind of just randomly changing whether you get like um whether box b is full or empty i see why that um that kind of exploration is going to help you um in this um transparent newcomb scenario but when agent is um randomly you know might randomly um you know one box instead of two bucks or vice versa i don't see how that helps because like if if like with high probability you're going to pick uh both boxes when box b is full and then you know maybe i guess it's not 100 probability because you know you're gonna explore a bit but with really high probability you're going to do that doesn't omega put you in a room where box b is empty and then you probably and then it actually like just doesn't care what you do in the worlds where box b is empty so it never gets falsified okay so what happens with exploration is that the hypothesis the agent constructs of the environment it says the following thing uh well there is some biased coin which is tossed somewhere and this biased coin is kind of applied to my it's kind of sword with the action i intended to take and like produces the action that i actually end up taking and then when omega's predictions are tested they are tested against the xor so omega doesn't know when omega is making its prediction it has to make the prediction like before it sees the the the outcome of the coin so it it makes a prediction and then like the coin it's stored with the action you're taking you intended to get and then like the result is tested against omega's prediction so now if i decide to one worked then omega has to uh kind of predict that omega has has to predict it i will one box because if if it predicts it i will two box then there is some probability that like the coin will uh you know the the coin will will flip my action and uh and omega's predictions will be falsified but but how sorry i i guess i'm missing something but um don't you okay part of my struggle is like wait don't you like never end up in the world where the box is full but then it's like uh then i remember like oh okay we're actually like taking the minimum over a bunch of things so that's fine but um if omega thinks that you're going to take both boxes in the in the world where box b is full and your plan is to just take one box but with some probability um it actually takes both boxes doesn't that but because the random thing that that's happening is confirming omega's prediction right not disconfirming it so what happens is like again your model like from the agent's perspective the model of the environment is as follows there is a bias coin which the thermites whatever the noise is going to happen or not and then omega uh makes a prediction of what action the agent will take okay what it like the intended action is omega predicts the intended action then the intended action then the coin is flipped the result of the coin is sort with the intended action to get the to give us the actual action and then the box is either filled or stays empty according to the result of this xor ah okay now now it makes sense how that would work um but you mentioned that um you mentioned that you thought that this was um not necessarily a um satisfactory approach right i mean it works but it just seems like a little hacking a little like not like it doesn't feel uh super it doesn't feel like there's like some deep philosophical justification for for for doing this and maybe there is and i just don't see it but i don't know so and is it right that um as long as we're in pseudocausal environments um the agent does the uh the quote unquote edt prescriptions and it kind of gets high expected utility in in all those environments is that correct yeah exactly so in in absolute causal situations like if you take a decision problem um if you take like some finite decision problem where there is some omega that can predict the agent's action in any counterfactual it chooses then and make it like an eater in and make an iterated setting out of it then assuming that like the relevant hypothesis is in in our infra prior the agent will always converge to the udt payoff so it will converge to the policy that has the a priori maximum expected utility given uh you know given that the predictor predicts correctly so one thing you mentioned is that uh when people were coming up with these problems they were kind of thinking of it in in terms of some kind of causal graph um and like uh you know we're gonna we're gonna have things that are reasoning about programs or such and um the information approach like really has kind of a different way of um thinking about these problems um and it still does well on the ones that have been come up with except i guess for um for this transparent new comb newcomb barring like you know these uh changes to the theory i'm wondering like do you think there are other problems out there that are kind of i don't know somehow like similar to newcomb and counterfactual mugging um from the perspective where you're thinking about these like causal graphs and agents reasoning about agents but that informationism wouldn't be able to do well on i don't think so i think that this approach is kind of very general i think that um well i i think that it's true that miri initially when started thinking about those problems uh they kind of started thinking about them in a particular language uh like using uh like logic or or like thinking about progress or something i think informationism is actually the correct language to use in any situation where you have something reflective like like yeah like the original motivation for informationism is when your environment is too complicated to describe exactly right so that's kind of the same the same this is in some sense the problem of logical uncertainty right like just phrased in a different way like logic uncertainty was about uh okay maybe we have uncertainty that comes from the fact that we we are bounded agents and not from lack of information and here this is exactly what we're talking about like we are bounded agents so we cannot describe the world precisely and uh i think that kind of all of those problems uh should be um kind of addressed using this language but but there are also connections to those approaches in the sense to like for example logic well i think i i mentioned before that there are connections between logic and informationism there is like some kind of information logic which you which you can define and well and also there is some extension of this informationism that i i call it a touring information reinforcement learning where you have this information agent and it also interacts with some some thing which i'm calling the envelope where it can just read arbitrary progress and it treats it it treats this thing this kind of computer as part of its environment and this actually allows you to use informationism to kind of learn things about programs or like you can think of it as like learning things about mathematics in some sense or like at least the part of mathematics that can be formulated as like writing finite program writing finite programs and it's like a little related to um problems like decision problems that involve logical coins for example right like for example like the counter factual mugging we um like the original counterfactual mugging involves an actual physical coin but then there's also the version called logical counter factual mugging where you use a coin which is actually sailed around it instead of true random and then like this turing information reinforcement learning can allow you to to deal with that in a rather elegant way so so yeah so i kind of think that all of this class of questions like for all of this class of question information it serves kind of the starting point of like what's the correct language to think about so i guess one kind of strange question i have suppose like somebody listens to this podcast and they feel very inspired they're like i'm gonna be in information now that's just how i'm gonna live my life um but but um you know they spent the past you know however many years you know perhaps they were a causal decision theorist um with classical vaginism maybe they used 19 uncertainty with evidential decision theory who knows what they were doing but they weren't doing um proper informationism um so perhaps but but in this situation how should they think of themselves should they think that um this uh this minimization of um of all the expected values should that start like at the day they were born or the day that they converted to informationism or like the day they first heard about informationism but like before they converted um and how are there any are there any guarantees about how well you're going to do if you convert to informationism you know like as a midlife crisis instead of on the day you were born this is an interesting question i haven't really thought about it yeah i guess that um my intuitive response would be sort of um choosing a policy such that your kind of a priori expected utility like expected utility in the information sense is maximal so that would be like imagining that you were information since you were born but for some reason doing something different um because that's also kind of yeah that that's like how information updates work right so like the update rule we have an informationism it kind of like i think we haven't discussed it but there's something interesting about the object rule which is as opposed to divisionism the update rule actually depends on your utility function and your policy in the counter factuals so like the way you should update if a certain event happen depends on like what you would do if this event did not happen or like what what did you do before this event happened and i think that this kind of fits well with your question because if you're updating according to this rule then kind of automatically you're behaving in a way that's sort of optimal from an updateless perspective like from the perspective where like you should have been information since you were born but like you're constrained but you weren't and this is your constraint okay do you think you're going to get some kind of information optimality guarantee where like if i convert to informationism at age 40 then my utility is going to be kind of the best you could have done you know given that you followed your like old foolish ways before age 40 or are you not gonna be able to get anything like that yeah i i think that you will for like the usual reasons um like basically you're learning agents so you learn things and you you eventually convert to something i mean it kind of depends on what optimality guarantees uh you have right like in learning theory to have optimality guarantees we usually assume things like that the environment is reversible like you cannot do something which shoots yourself which kind of shoots yourself in the foot irreversibly and like that's a different question of how do you deal with that but like obviously if you kind of did something irreversible before then you're not going to be able to reverse it but i think that kind of given that you already did it you're going to have uh kind of the same guarantees that you usually have uh but by the way when you started asking this question it made me think about something different which is like how do we apply this to kind of ratio analogy for humans right because like i i kind of came up with this in the context of ai but then you can say okay all the things that we say about rationality like calibrating your beliefs or like making baths or whatever how do you how do you apply information isn't there and and i'm actually not sure i think that i just haven't like thought a lot about this topic but i think it is kind of another interesting topic that somebody should think about yeah so so i guess now that listeners have heard the basics of or the idea of what infobase is supposed to be about and uh how the you know i heard some tantalizing things about a spooky update role um i encourage them to have a look at the posts maybe it'll be a paper someday if listeners enjoyed this podcast and they're interested in like following you and your work more um what should they do in order to keep up to date oh so the easiest way to follow me is just to follow my user on alignment forum or on last rank which is just called vanessa kusoy and of course if someone wants to discuss something specific with me then uh they are always welcome to send me an email and my email address is also very easy to remember it's vanessa.com intelligence.org all right well thanks for uh talking with me today and to the listeners i hope you join us again next time this episode was edited by finnon adamson

Related conversations

AXRP

7 Aug 2025

Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -5 · 133 segs

AXRP

1 Dec 2024

Evan Hubinger on Model Organisms of Misalignment

This conversation examines technical alignment through Evan Hubinger on Model Organisms of Misalignment, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med -6 · avg -7 · 120 segs

AXRP

11 Apr 2024

AI Control with Buck Shlegeris and Ryan Greenblatt

This conversation examines technical alignment through AI Control with Buck Shlegeris and Ryan Greenblatt, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med -6 · avg -9 · 174 segs

Future of Life Institute Podcast

7 Jan 2026

How to Avoid Two AI Catastrophes: Domination and Chaos (with Nora Ammann)

This conversation examines core safety through How to Avoid Two AI Catastrophes: Domination and Chaos (with Nora Ammann), surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -3 · 85 segs

Counterbalance on this topic

Ranked with the mirror rule in the methodology: picks sit closer to the opposite side of your score on the same axis (lens alignment preferred). Each card plots you and the pick together.