Library / In focus

Back to Library
80,000 Hours PodcastCivilisational risk and strategy

Dario Amodei on OpenAI and how AI will change the world for good and ill

Why this matters

This episode strengthens first-principles understanding of alignment risk and the strategic conditions that shape safe outcomes.

Summary

This conversation examines core safety through Dario Amodei on OpenAI and how AI will change the world for good and ill, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Perspective map

MixedTechnicalMedium confidenceTranscript-informed

The amber marker shows the most Risk-forward score. The white marker shows the most Opportunity-forward score. The black marker shows the median perspective for this library item. Tap the band, a marker, or the track to open the transcript there.

An explanation of the Perspective Map framework can be found here.

Episode arc by segment

Early → late · height = spectrum position · colour = band

Risk-forwardMixedOpportunity-forward

Each bar is tinted by where its score sits on the same strip as above (amber → cyan midpoint → white). Same lexicon as the headline. Bars are evenly spaced in transcript order (not clock time).

StartEnd

Across 96 full-transcript segments: median -6 · mean -5 · spread -243 (p10–p90 -140) · 2% risk-forward, 98% mixed, 0% opportunity-forward slices.

Slice bands
96 slices · p10–p90 -140

Mixed leaning, primarily in the Technical lens. Evidence mode: interview. Confidence: medium.

  • - Emphasizes alignment
  • - Emphasizes safety
  • - Full transcript scored in 96 sequential slices (median slice -6).

Editor note

A high-leverage addition to the AI Safety Map that clarifies one important safety bottleneck.

ai-safety80000-hourscore-safetytechnical

Play on sAIfe Hands

Uses the global player with queue, progress, speed control, and persistent playback.

Episode transcript

YouTube captions (auto or uploaded) · video Otz0AZ0IFi0 · stored Apr 8, 2026 · 2,778 caption segments

Captions are an imperfect primary: they can mis-hear names and technical terms. Use them alongside the audio and publisher materials when verifying claims.

No editorial assessment file yet. Add content/resources/transcript-assessments/dario-amodei-on-openai-and-how-ai-will-change-the-world-for-good-and-ill.json when you have a listen-based summary.

Show full transcript
hi I'm Robert woodland director of research at 80,000 hours and welcome to the podcast if you want to make sure you never miss an episode from us you could subscribe by searching for 80,000 hours and whatever app you use to get podcasts that way you can also speed up the episodes which is how I much prefer to listen to interviews next week I'm scheduled to speak with Alex Gordon Brown about working in quantitative trading in order to and give which I expect to be very engaging today's conversation really goes into the weeds and I learned a great deal from it if you're looking for personal advice on how to pursue a career in technical a I research stick around because we get to that in a second half I apologize for the audio quality on my end I think we'll have that fixed up by next time if you'd like to offer any feedback on the podcast please do email me at Rob at 80,000 hours org we're still figuring out how we can best use podcasts to help our readers and I'll try to respond to everyone without further ado here's my conversation with Dario amedy today I'm speaking with Dario amedy a research scientist at open AI in San Francisco prior to working at open AI dario worked at Google and Baidu and helped to lead the project that developed deep speech - which was named one of ten breakthrough technologies of 2016 by MIT Technology Review Dario holds a PhD in physics from Princeton University where he was awarded the Hearst foundation doctoral thesis prize Dario is also the co-lead author of the paper concrete problems in AI safety which lays out in simple terms the problems we face in making AI systems safe today thanks for coming on the show Dario hi so we plan to talk about the motivations behind technical AI safety research the concrete problems paper and how someone can pursue a career in this for themselves but first we're at the open AI officers here in SF so tell us a bit about open AI and how you ended up actually working here so open AI is a non-profit AI research lab it was originally founded by Elon Musk Sam Sam Altman and a few other folks and generally were working on you know on kind of following the gradient to more general artificial intelligence and and making it safe so I around what was it July of last year so about about a year ago which was a few months after after after it started and I came here because you know there were a number of you know I thought there were a number of really talented researchers here and it was kind of a good environment in which to think about safety in the context of AI research that's already being done openly I was only found at about 18 months ago it was about about 18 months ago yeah and how many stuff does it have now so I think there's a last I counted about 55 people here right so this has been difficult hiring that many people that quickly I mean it's you know I've actually never never worked at a start-up before you know or our CTO Greg Greg Brockman was previously CTO of a startup called stripe which now now has you know around a thousand people or so so you know it's it's it's definitely hard he's really good at it it's not you know it's not something that I've been super involved in except on the safety side yeah I guess it's the it's the Bay Area way to go explosive growth in organizations and what's the budget like so I don't know if I can give exact numbers in the budget the main donors at this point are you on musk Sam Altman and Dustin Moskovitz threw up and fell cool and it's what you do pretty similar to what's going on a deep mind or there are the important differences you know I would say that the the general kind of the general kind of research agenda at at open AI in its kind of focus on reinforcement learning and in you know learning learning across learning across many environments and trying to push forward the boundaries of what's done instead of just focusing on the supervised machine learning I would say that's a you know that's very similar to deep mind and probably is one thing that sets open AI and deep mind apart from from other institutions we both have kind of a similar focus on safety we both have we both have safety teams you know I would say open AI is kind of trying to be a smaller or smaller organization that you know focuses on hiring just you know just the people that we want the most and and that's that's kind of been being been been one of the big difference there's probably some differences in in in culture as well a little bit a little bit intangible and and hard to describe but I think I think generally our our view of you know our view of how how AI works and what what to build in in in in nai and and the focus on safety are actually pretty similar between the two organizations okay you studied physics or I did a PhD in physics and then you switched into AI or was your physics so my yeah my my physics work was in particular I specialized in in biophysics so as I was thinking about models coming from kind of statistical physics and applying them to kind of kind of model the brain and then then also using using techniques from physics and electronics to make measurements to try and validate those models so you know I come from a physics background but I've been thinking about intelligence for quite a while and how how intelligence worked and I think you know when I did my PhD I wanted to understand that by understanding the brain but you know by the time I was done with it and by the time I did did a short postdoc you know a I was starting to get to the point where it was really working in a way that it you know hadn't worked when I when I started my PhD and so so I felt like you know maybe maybe was starting to be the case the best way to understand intelligence would be to actually directly work on on building parts of it rather than rather than kind of studying the messiness of the brain Ron so that was what kind of led to that switche you want to give a quick pitch for why what kind artificial intelligence is so important yeah I mean you know I think you know you can you can give you know kind of the standard standard arguments that a lot of people are familiar with which is that you know if you think about any technology that you know that humans have created what's allowed us to create that technology you know sanitation flight medicine you know improve improvements in improvements in human health you know provements inability to feed the world you know all this has been generated by our by our intelligence and our intelligence is relatively fixed and so if we were able to build something that was able to match or exceed our intelligence then you know that that would really be increasing the engine that produces a lot of the great things that we do and you know and ultimately maybe maybe immediately maybe it would take a long time would give us much more complete control over our own biology and neuroscience could could make us whoever and whatever we want to be could end conflict war disease that that sort of stuff that that sounds a little utopian but I think if we if we push this technology far enough and and all goes well then that'll be the result either either you know either immediately when we build it or over over a somewhat longer over somewhat longer period of time but I don't see any reason why those things why those things can't happen so I think that's the basic reason to to work on a I mean you know as I've kind of written there there there are these safety issues where we can imagine situations in which it doesn't actually it doesn't actually go well and so to the extent that that's a risk that's also risk that we can reduce and we can also have leverage by focusing particularly on reducing that risk so on both the positive side and the negative side it seems like quite a lot of like there's there's there's just a huge amount of there's a huge amount of leverage to be had right so the previous stuff I was doing was was was in biology and you know it's it's great you can you can help people you can try and try and cure some disease but this feels like it's more even more getting to the root of problems what does the name open AI mean does that relate to the two approaches the organizations taping yeah so I I wasn't actually present at the time that open AI was founded or the the name was chosen so I kind of wasn't the one who picked the name I think there's been a fair amount of misunderstanding I think there's one group of people who think it's kind of all about like like open source and releasing open tools and there's and there's kind of kind of another set of people who I don't think many people think this anymore but who for a while kind of thought that it was about kind of like you know like making uh making an AGI without any safety precautions and just having a copy to everyone solve solve solve safety problems so you know yeah these were these were kind of two to two early misconceptions that were around long before I I joined I joined open AI my my understanding is that it's meant to indicate the idea that open AI wants the benefits of AI technology to be widely distributed I think so they go to the owners assuming that kind of safety safety and control problems are solved and we build a GI there's then a question about well you know who owns it what happens with it what world do we live in after its created and again this is you know I wasn't I wasn't the one who who named this or or set the set the specific mission statement but I think anyone's intention with it was trying to think ahead to you know given given that we built given that we've built an AGI and it's you know it's not it's not it's not wildly unsafe how are its benefits distributed throughout throughout humanity and I think openness is intended to indicate the idea that you know these these benefits should accrue to everyone right that's my understanding AI is a non-profit it is a nonprofit yeah if you developed a really profitable AI how does that work it open a I becomes incredibly rich and then like gives out the money to everyone yeah I mean personally I've personally I've no interest in getting getting rich yeah from from from AGI I mean I think it would do so many interesting and wonderful things to humanity that you know I think that yeah the meet the meaning of money would would change quite a lot and even maybe the psychological motivations that would want me to get get get a larger share are things I could change and might want to change so you know in you know right like in in in many ways like you know shares shares in terms of money or maybe not not the white way to think about it but I think you know there's there's all kinds of stuff that could happen when when when AGI happens and so some of the things I some of the things I think about or you know where where where that could where that could go or what that could mean and you know the the summary is we don't we don't we don't know very much because it's it's something we haven't done yet so a lot of it's a lot of its speculation what do you research here today I so I I mainly work on safety so you know we have we have a safety team that so far myself and Paul Paul Cristiano so Paul was a co-author of mine on the concrete problems paper and you know has has also written a lot online on his blog about about AI oh I'm n is probably one of the people who's you know I think I think I think you know done the most to - you know kind of promote clear thinking about the problem and tie it tie it to current AI we have a third person joining in a few weeks who I'm who I'm super excited about so you know we're kind of trying to build up a team that focuses on on technical safety we also do kind of a little bit a little bit of strategy stuff which is you know how do we get different organizations that are working on AI to cooperate with each other how do we how do we cooperate with with policy makers on questions like these so you know we're also thinking we're also thinking a little bit about about about those issues but but mainly technical safety I also do some stuff that's not not strictly technical safety but is is generally done to kind of stay stay up to date on where where where AI is is is is currently going so I did some work on a transfer learning a while ago that was you know kind of really a little bit little bit safety motivated but you know trying to make environments that are broad enough that it's possible to see kind of a distributional shift or out of distribution problems so that's kind of the range of stuff I work on what's the organizational culture like you know what kind of people does they play I attract yeah I mean I think we've generally been very selective in who we pick so I think generally it's it's people who you know are very kind of very talented machine learning researchers but you know but but also people who you know I would say not not everyone but a large large faction of people here you know really really really do think in terms of eventually getting to to to AGI and you know and and and at least some people you know a significant fraction are you know quite quite interested in or at least at least supportive of you know safety work related to that or related to what to what we do now there's kind of a wide wide range of beliefs on you know how to work on safety how possible it is to work on safety from our from our from our current from our current vantage point so there's kind of a wide distribution of views but broadly people are pretty supportive right and openly I recently moved away from software development is that right to more focus on machine learning so that's not not quite right it's we so I think what you're referring to is we had a project called universe which actually I was somewhat involved in on the machine learning side and the idea of that project was to make a lot of environments that agents could learn using and the way we did this was using something called the VNC protocol to connect to kind of to connect directly to a browser or through pixels and so that would allow you to play kind of thousands of flash games and navigate web tasks it turned out to be the case so so I was actually really excited about this because I saw it as a testbed to study safety right if you have 100 flash racing games you can train agent on one flash racing game and then see how it behaves badly when you transfer it to another flash racing game you can kind of study some of these open world problems where an agent has a very wide space to explore and a wide range of actions it could take and so this is one thing that have our researchers have been working on is teaching computers to play computer games really well yeah yeah this is this is a superhuman level yeah this is this is so deep mind has worked on this with Atari games and so we were kind of taking it to another level with you know with kind of any any game you can find you can find on the Internet so this this ended up being a project that I think could could probably be described as a little bit ahead of its time so you know it turned out that in order to connect this way we needed all the different workers who were applying the our algorithm to be asynchronous with one another and for reasons that were kind of complicated and we only when we figured out we only figured out later actually such asynchronous communication was really hard to make it play well with ml and it it led to a lot of complexity um so we're you know were to some extent de-emphasizing that project now but we've were kind of you know we're actually trying to move to doing the same thing with more City more synchronous environment some so so basically kind of the same idea but you know but more you know more more in a way that's more amenable to ml benchmarking and to measuring how well we're doing and doesn't have this kind of hard to interact with property so it's it's more kind of like we we made a tool and you know it was a good first attempt at something ambitious but it wasn't it wasn't quite the right tool so so now we're working on you know kind of changing it to a version to a version that's better you know I wouldn't say we've gone gone away from software engineering so much as we've been experimenting with hat with how to produce tools and takes a few iterations to get that right okay so turning now to the broader issue of super human AI development what do you see is the potential dangers here why should anyone were you worried about this yeah yeah so I mean I you know my my attitude to start off with has has always been you know although although I do think about I do think about AGI which is a term I I prefer to use than super intelligence because I think I think no one knows whether you know a machine will rocket past human level or not so so that's something that that could happen or not but but AGI is something that I think definitely will will eventually happen so I kind of prefer prefer to talk in terms of that and even even even within safety you know in in concrete problems I've kind of explicitly tried to think in terms of not how powerful the systems are but conceptually what can go wrong with them so the same kind of thing could go wrong with an AGI as could go wrong with you know a very simple agent playing a video game or robot cleaning your house right if it if it has the wrong if it has the wrong objective function if you don't specify its goal correctly you can do something unpredictable when therefore dangerous and so you know in general when I talk about safety I talk about safety kind of generically whether it's in powerful systems or very very weak systems that all that said with respect to powerful systems in particular I mean I think there there is a possibility that if we either do a bad job specifying the goals of complex systems or just they're unreliable in the way that self-driving cars are unreliable you know a self-driving car has to have a very high standard of safety in order to you know trust it to drive on the road right for for almost a decade now we've had self-driving cars that are 99.9 percent safe but that's not enough we need them 99.999% safe and so with AGI which is something that you know it's going to take a lot more novel strategies than self-driving cars the space in which it operates is a lot broader than self-driving cars if you just transpose that kind of safety testing from you know self self-driving cars to to general intelligence then kind of even with all the controls you put on and even with all the safety standards it's clear that kind of at the very least we're gonna have a big challenge in making sure that something doesn't go wrong and you know if something something does go wrong it you know it it would be easy for you know or might be easy for a large amount of harm to be done relatively quickly right so you know you have a you're you're a GI controlling the stock market or the economy or something and it just doesn't know how to do it very well yet and something goes wrong and it takes it takes a long time to unwind that so there's kind of a long tail of you know things of varying degrees of badness that could happen you know I think I think you know at the at the at the extreme end is the kind of Nick Nick Bostrom style fear that an AGI could destroy humanity and I can't see any reason in principle why that couldn't happen if it was sufficient it was sufficiently powerful and you know like safety had been handled sufficiently badly that is that's definitely something that that can happen you know so you know it's I think there are folks at places like Miri who say that this is the default outcome or this is a you know like like you know really really likely to happen or there's almost no way to avoid it or you have to solve some incredibly hard math problem to avoid it I don't generally agree with with any of those things but I think I think this is a possible outcome and at the very least as a tail risk we should we should we should take it seriously I think you know an another thing another thing I'm worried about is that you know the wrong even if we manage to make superhuman AAA I say for an AGI safe then you know the it might it might be used for the wrong ends yeah deliberately use the wrong ends by you know by a disturbed individual or an organization whose views are not aligned with humanity or nation state whose views are not not not aligned with humanity so that that's in my mind the range of risks right so do you think there's much of a chance that there that the risks are being overblown here and in fact we're just gonna end up delaying something that could be incredibly useful and make life a lot better so I mean you know I it may it may very well turn out maybe it's more than 50% chance that as we get closer and closer to AGI then you know kind of it becomes kind of clearer how to make something safe maybe it's just like you know the goals are specified in a way that's that's kind of very very cordoned off from from the tasks that are done you know that that that we that there are certain you know problems of nature like you know scanning brains or something that we need a eyes to do for us in order to get you know in order to gain control over our our biology or control over resources and and then there are human values and maybe there can be an efficient division of labor where there isn't much there isn't much confusion or maybe safety problems are just a bunch of research in machine that they're just a corner of machine learning research where we haven't we haven't done much yet and so we haven't tried and so you know I can I can think of lots of ways and maybe it's even the most probable way where things turn out totally fine but you know I would I wouldn't want to count on it yeah I wouldn't any of those world say the risk was was overblown right so you know it's like um you know suppose suppose you have a fire alarm and someone's cooking a barbecue and you know it's it's smoke and you know you wouldn't you wouldn't call like installing the fire arm overblown right it's just sometimes you'll have a fire and sometimes you won't but you know like installing the installing the firearm is the right the right the right course of action so yeah I think of this I think of this as a precaution and you know I don't really think of it yeah I don't think of anything I do is slowing down the rate of AI progress or at least I'm not trying to do that you know I I think of it as broadening the scope of AI progress and thinking about AI in a more kind of interaction and human centered way if you know if anything maybe it accelerates progress a little bit although that's probably probably probably a minor effect but you know if if if people are worried about progress being slowed down III don't believe anything I do is doing mass causing that yeah so yeah how much of open areas work is focused on on these kinds of problems it's not like 5% of the staff but I guess other people are worried about it - yeah I mean I think you know I think I think broadly most people that open a I are worried about you know or you know at least think these issues are worth thinking about um but you know that's different from you know who is who is actively you know doing their technical work on it so you know I would say it's you know three three or four people now and you know I'm hoping that that that grows somewhat and you know we're actively looking for really talented people but you know I think opening eyes and institution has the the general idea that you know you have to in order to in order to work on AI safety you have to be at the forefront of AI and that you know if also if you're at the forefront of AI you have a better ability to implement AI safety in the final system that's built so you know many people are kind of interested in safety in the long run but I think until recently and even so now I think many people here don't don't you know don't know if there's a way to work on safety right now they're they're kind of skeptical that you're able to work on safety right now with with kind of concrete work and so I've been kind of trying to change that with with concrete problems and with this recent paper that that that that Paul and I wrote on kind of learning complex human human preferences you know we're trying to show that there's concrete work that can be done and that's kind of had a variety of reactions some people are like yes this is exactly what you meant with safety work now I see how it can be done some people are like well that's good machine learning work I don't actually see how it connects to to AGI and so then you know we'll try and write another paper and say okay this is how you know this this is the line we're drawing and this is how we think it it gets us there and you know it actually could turn out that this is mostly just mostly just ml work and the final systems we build are different enough that for whatever reason this ends up not being relevant to safety but again you know I'm pretty happy in that world if there was nothing concrete it was possible to work on in safety and I instead ended up you know doing a doing a different direction machine learning then then then you know that that that ends up being fine it will turn out then it will turn out we couldn't have worked on safety until later and then we'll work on safety later whereas in the world where it does matter it's it's really great and really impactful to get a head start on it yeah so I'm pleased to get your view on this debate that I've seen online yeah so you have this contrast some people like perhaps boström could be accused of this and in the book super intelligence yeah of talking as though once we have a superhuman AI then it will get like very much smarter very quickly and it could potentially just like solve all of these problems it could solve war you know so aging like of all of our health issues but I've seen some people criticizing this online saying you just think this cuz you're a bunch of nerds and you think that thinking is the way to is to where there's a way to change the world the way that everything gets done but it's but it's not gonna be so simple even if you had a very intelligent machine it wouldn't necessarily be able to solve those problems to ever give a view on that debate yeah I mean I think I actually so I think yeah not quite i rephrase the debate a little bit to I think there's an interesting technical question of like let's say I built an artificial general intelligence tomorrow and because it's software let's say I made a hundred thousand of them you know how much does that fundamentally change our our society and our technological capability so you know a lot of it is just you know you can look at individuals throughout throughout history that you know that managed to discover a lot more than than than other other other individuals right you know you look at you know you look at you know yeah fun fun you know fun fun fun knowing men or you know or Einstein or one of these figures who just you know managed to be leaps and bounds ahead of others and the question is like what's what's the ceiling on that um you know if we if we if we you know invented AGI tomorrow would it take like a couple days to scan all of our brains into into you know into software upgrade us give us indefinite life extension or you know would it just be kind of like oh it's it's you know it's it's more humans to talk to um and and I think I think it's actually complicated I think some people act like it's obvious one way or another but it's it's not really something I it's not really something I have a lot of certainty on in part because I think you know modern science has experienced a lot of kind of diminishing labor like like machine returned diminishing return like the depletion of low-hanging fruit and so you know it could turn out that like solving biology is just this kind of exponentially complicated combinatorial problem or it's limited by data and experiments of course maybe the jeans will like allow us to do the experiments much much faster then there's some limit on the physical reaction time of the biological systems um when you when you put it all together do we get zoom you know to do something much much much faster than we ever could or would do we get just some kind of mild acceleration of what of what humans can be doing I I feel like many people act as if as if the answer is obvious but as someone with a with a background in biology even even thinking about all the directions in which machines can optimize it my my guesses machines could probably make things happen pretty fast but I think there's huge uncertainty here and I don't really think anyone knows what they're talking about on this question yeah because my background is in is in economics and I an imagine if you had an incredibly smart AI and it was trying to figure out macroeconomics like I understand understand recessions and and booms and busts cycle I suppose it could have like what sort of conceptual breakthroughs but you can only take the measurements so quickly and you can't really run experiments so it could end up being that you know the processing of the data that we get is extremely good and very fast but then then the data on the code comes in so quickly learn there's there's some subtle stuff which is like I wouldn't be surprised if for example are really powerful I wouldn't be able to understand our macroeconomic systems because of this data issue but it would be able to design a better Accra macroeconomic system from first per so it's it's weird there's there's some stuff I feel like you know you just you just redesign it and you can do it much better and there's a there's there's other stuff that um you know that that it is just it's just really difficult so I find this puzzle you know I I don't I don't really have any agnostic yeah I'm pretty agnostic on and I don't really I don't really have a good answer on the kind of like nerds think I think a I can solve solve everything a question I mean I think there are some deep set problems in in in human nature but you know and and so you know just just solving resource constraints is and isn't gonna solve war we've you know probably in some ways already solved solved solved resource constraints but you know maybe maybe maybe having having true AI will allow us to redefine what it what it means to be human and ultimately will to me well will elevate ourselves above our petty human bickering or maybe the petty human victory I'm half therapist will prevent us from being able to elevate elevate ourselves so we'll be stuck in levels of pain dickering perhaps yeah I don't actually know it's it's it's actually very hard to know yeah so we've talked a mentioned a few times this paper concrete problems in air safety so let's dive into that but before we discuss those problems what was your impetus for writing it so I mean you know I kind of had been aware of the work of the AI safety community for a while but had you know had in general felt that I wasn't particularly happy with the way they were phrasing things it didn't seem like what they were describing was was actionable and there wasn't a lot of tie to like you know AG I was generally discussed in these very abstract terms as you know how it having a utility function and kind of having incentives to do this or that and you know discussing things at these very abstract level I couldn't help but feel there were a lot of implicit assumptions that were not really being discussed at the same time you know the the mainstream machine learning community which I you know been been a part of for about a year and a half having a lot of experience with speech recognition systems one thing that I you know I found about neural nets is that they're very powerful but they're very unevenly powerful so you know the the key example I gave early on was you know you can contain a speech recognition system on you know 10,000 hours of American accented data and then it's with for something with an American accent it gets it perfectly then you give it someone with a British accent or an Indian accent or something and it just does terribly on it and you know of course if you train it on enough diversity of accents then start to generalize and better but you know generally when we build engineering systems that kind of you know silent random failure you know it's it's not something that that we see the desirable property in in systems we build particularly safety critical systems so you know the the idea that you know fixing those kind of problems was was not just kind of a one-by-one thing where we're like oh we're using a neural net again in a self-driving car let's statistical test for everything we can get or you know we're using a neural net now in a drone let's let's make sure it doesn't shoot someone that you know that that we could have principles behind what what what gives us guarantees on the behavior of a system or at least what what gives the statistical guarantees that seems super interesting to me and it really didn't seem like seemed like very uh it seemed like very very few people were we're actually working on it so I yeah so I you know me and some of my colleagues Chris Chris all at Google Paul Cristiano who's now at open AI Jacob Jacob Steinhardt at Stanford John John Shulman here and and MNA who is another another Google or kind of you know you know had had all fought a little a little bit about this problem and so you know we decided to kind of get together and write down all of our ideas in in a paper that would you know kind of lay out an agenda for what why we think this is a thing and in particular I think um you know I felt that the machine learning community as a whole was like a little bit confused I think that they largely thought a I safety was about you know fears that that a eyes would kind of like malevolently rise up and attack their their creators um and and and even when they didn't think it was about that they worried that the people who talked about AI safety will feed into fears that it's about that and and so I you know I felt I felt like this was kind of a silly state of affairs and that you know like of course we can do research on making systems safer and more reliable that that doesn't kind of trade on these fears and and in particular we can even do research that ultimately points towards AGI I think the important thing is that you know we shouldn't go around with every other word we say bean bean AGI in particular like the research itself shouldn't be specific to AGI you can't really research AGI now because we can't build an AGI so you know a very I think a very standard technique when when doing you know research on a topic is you know if you want to think about a topic that's kind of abstract or in the future then come up the short-term bridge to it that lets you think about something conceptually similar in a way you can empirically test now and so that that was the general philosophy behind the paper and the philosophy behind the follow-ups that we we and others have have done to you know to implement the research agenda described in the paper yeah this so what are some of the concrete problems so I can I can I can I can I can go into them briefly you know I think we we kind of made a distinction between problems that we late to what happens if you if you don't have the right objective function and what happens if you do have the right objective function but something goes wrong in in the process of learning or training the system so you know not not having the wrong objective function the extreme version of that is kind of what you know what's what's what's talked about and you know kind of the classical AGI safety stuff which is you know you want to specify a goal and you you know you kind of for whatever reason you know you had some simple instantiation of the goal and it kind of ends up not not quite being the the right thing so you know we call that the Janie problem yeah yeah we call that reward hacking and so you know a few months ago using a an environment in the now emphasized universe program I had a kind of an example of like a boat race where you know the boats supposed to the boats supposed to you know go around and do a few laps and you know what it's trying to do is finish the races as fast as possible yeah but you know the only way it's able to get points and you kind of can't change this because it's the way that the game is programmed is you know you kind of you get points as you pass targets along the right but it turns out there's this little Lagoon with all these targets and the targets also give you turbo so they make you go faster and faster so you can just loop around in this little a little tiny Lagoon and not finish the race since all right so it's like you know in one sense you shouldn't be surprised it's the correct solution it's how you get those points but the idea is that you know the the mapping from well this is the reward function to this is the behavior that it leads to it's a very twisted mapping yeah and so the point is that it's not what we have intended it's not well you would have intended to be so the the the the the lesson of that is that it's it's very easy to make small changes in the reward space and have that lead to big differences in the behavior space and also for that mapping to be very opaque for you to look at a reward space think you know what it means and in actuality it it leads to something very very different then then what you kind of you kind of you kind of would have would have expected so you know we we called that generalized reward hacking and then kind of there's another problem called called the called negative side-effects which is a little related to that which is just that if you're if you're if your reward function relates to a few things in your environment and your environment is very big then there's kind of kind of a lot of ways for you to do destructive things so it's one particular way in which it's easy to specify the wrong the wrong reward function because you haven't put in side constraints yeah you haven't you haven't explicitly put in the ten thousand you know ten thousand other things other things you care about then there was this thing called called called scalable supervision which is you know if you're you're a human trying to specify a goal to a to a machine learning system even if you have a clear idea of what it is that that needs to be done then you know you know you don't have enough time to control every and you know to control or give feedback on every action that an AI system does and and you know and therefore you know limits to your ability to control and supervise can lead to a system behaving in a way you hadn't you hadn't intended because it's interpolating in the wrong way um so those are kind of the problems with you know like the you know the the classical AI safety type problems of like you had the wrong you know you you somehow you gave yourself the wrong goal in a way that was hard to understand and then the you know the the kind of more technical issue problems that relate to you know your system was trying to do the right thing but something went wrong it's things more like this thing we call distributional shift which is when your training set is different from your testing set and so you know the classical example of this is like you know when when when when I was at Google there was there was an incident where you know Google's photo captioning system had been you know they had this photo capturing system that was trained on a lot of photos and it turned out that mode the photos were you know statistically biased to be photos of Caucasian people and you know and there were also a lot of a lot of like animals and monkeys in it so unfortunately the system you know the system reacted you know when when when a black person took took a picture of themselves in the photo it tagged them as a gorilla because it had only seen humans with white skin so this was of course incredibly offensive and Google had to apologize for it and you know they even had you know that they'd even kind of fought it this a little bit ahead of time but the neural net ended up being so screwed up that it didn't even warn them that you know that it was kind of in a region of the state space where the it was kind of kind of dangerous yeah so you know it's because the algorithm has no concept of it's just a statistical learning system you know it doesn't doesn't know about it doesn't know about racism it doesn't know about racial slurs it doesn't know about what's offensive you know it's just it's just it's just it's just a learning algorithm and it just learned from the data was given and you know the there turned out to be some problems with the data that it was given and there's another view some problems with the algorithm and and so you know it just it just you know kind of innocently produced this extremely offensive result and and you know this is the world of neural nets is is is full of this you know I think something related to this distributional shift is uh adversarial examples which you know my colleague in Goodfellow works on a lot which is kind of when you intentionally an adversary really try to disrupt an input to a machine learning system and make a very small change to it that kind of causes something bad to happen they're a little complementary adversarial examples is like a small but carefully chosen like you know perturbation to it whereas a love distribution is this kind of like holistic perturbation to it so resistance against those two is is kind of it's it's it's it's separate right you're talking about two orthogonal directions and in the perturbation space but you know these are these are all issues with you know making sure that when you when you train something that um it you know it behaves in a new environment the way you would intend it to behave or if it goes wrong but it fails gracefully and we haven't put a lot of I mean we you know we put some work into this area we site a lot of papers in the in in the concrete problems paper but I think you know you know kind of relative to the stampede of work in in in in mainline a I I'd like to see more of this stuff hmm so I think you did an interview with the future of life Institute where I think you talked about this paper about yeah for about half an hour people who are interested can go and listen to that and you'll get more detail on each of those different five problems but yeah how do these problems tie together that the long-term concerns with with the short-term ones that we have today yeah so I think I think the attempt was kind of to come up with some conceptual problems that relate to both that have long term and short term versions right so you know with something like the distributional shift the short-term version of it is something like the gorilla the long-term version would be something like well I've you know I've trained an AGI in a simulation and then I put it in the real world and a lot of things are different so does it does it break a lot of stuff without meaning - yeah um and you know the super intelligent version of it is you know it's like whatever but Marx trains Morris a more extreme it's building a Dyson Sphere it's never built a Dyson Sphere before like something go wrong you know just whatever whatever outlandish thing you can think of so um you know I think I think I think the point and the explicit strategy was that you know people often kind of contrast long term versus short term approaches as if working on short term safety and long term safety or like different topics and like they trade off against each other what I'd rather do is have a thread running from long term to short term things where you kind of you kind of identify what the fundamental problems are right and then then you work on them on on short-term problems and then as the system's get more powerful you update your techniques and okay it creates this kind of more symbiotic where you're following along you know I think safety shouldn't be anything different from reinforcement learning right reinforcement learning is a general paradigm for learning systems you can use to do something as simple as walk across a glitter grid all the way up to plane go all the way up to you know perhaps built you know building a building in a system that's as intelligent as humans probably wouldn't literally use reinforcement learning but you know the it's the reinforcement learning is a general paradigm that runs from things that are very simple the things they're very complicated so I guess the idea was to do the same thing for safety come up come up with some some general principles that will will care well you know will carry across towards very powerful systems not and I wouldn't say these problems you know tell you everything that could go wrong with powerful systems I think they're almost certainly things that are very specific to powerful systems but you know my my general view is I'm much less confident in our ability to identify those problems maybe we can some people are trying but you know let's uh my view is just you know there's there's a lot there was seemed like there was a lot sitting on the table let's identify the problems we can identify let's work on them and then then then whatever whatever's left we either have to work on them very late in the process or you know or maybe someone can identify them but but that seems like the higher hanging fruit yeah so so the hope is that in order to solve the long-term problems you want to find cases that are similar today where you can get feedback on whether it's actually helping yeah yeah exactly i think that i think that there's kind of a magic of empiricism because it's it's very easy to kind of engage in long chains of reasoning about a topic that that kind of don't get tied back to to reality of course the risk of working on short-term stuff is that it it doesn't matter it doesn't generalize and so you know the compromise i've come up with is try and think of things that are conceptually general and then try then try to try them into into empirics yeah so to that end has opening I made any noticeable progress on these problems or other problems yeah so um you know I think about about three weeks ago Paul Cristiano and I and taught Tom Brown here and three people at deep mind including a young young leek a Martin Mill Jack and Shane Lake came out with a paper called deep deep reinforcement learning from human preferences so this kind of works on the reward hacking scalable supervision side of things so the way this paper works is you know normally you have arena or any algorithm it has a goal or a reward function and you um you know the the agent acts to maximize that reward function so this works pretty pretty well for you know for something like you know like chess or go where the behaviors are incredibly complicated but evaluating the goal is pretty easy right you know with go it's like you know are you in a winning position you have more territory with chess it's have you checkmated the king or have you been checkmated and so it's really easy to evaluate these simple goals with a script and so you can you can run the algorithm through you know millions or even hundreds of millions of games and you know the goal evaluation is easy but you know most of the stuff that we do in in real life the goal is complicated it's like you know care carry on the conversation or you know be be effective personal assistant to you know to to a human which which means scheduling things for them making their life easier but you know not emailing all their private information to their boss or whatever um you know there's a lot of like context-sensitive stuff which is you know which is kind of part of what makes some you know as part of what leads to safety problems if I take a complicated set of goals like that and I try and forced it into the framework of a hard-coded reward function it's gonna lead to something that makes everyone unhappy um because you know it just the two things don't don't fit together right you know that it's maximizing on one dimension and then fails and all the others or or just that the intrinsic number of bits of complexity in something like hold a good dialogue is is very high and so if I try and program that in I'm either going to be programming for a very long time in which case I'll probably make an error or if I try and make what I program simple then you know then it's you know it's it's there's just not going to be enough bits of information in it to fit the actual complex nature of the goal so I'm either gonna be very error-prone or I'm just not going to be capable of learning but what I need to learn and so that that's you know that's that's why people talk about you know like strategies for absorbing values and things like this so um what what our paper basically does to address this is it replaces the fixed reward function with a neural net based model of the humans reward so the idea is you have a reinforcement learning agent that's learning and and you know it starts acting randomly and every once in a while it gives it gives some examples of its behavior to a human so it'll come out with two video clips and you know the human looks at the video clips and says is the left better or is the right better the human says left is better or right is better so you know if it's if it's plane plane pawned or something then you know if the left is you know point got scored on you and the right is you scored a point then the human will say the right is better and then you know the the agent builds a model of what reward function would lie behind the humans expressed preferences so the reward function becomes something kind of implicit and learned observed from from the humans behavior and then the oral agent gets to work saying yep this is what I think the humans goal is I'm gonna go and try and maximize this but then it comes back to you and it gives you more examples of behavior and then and then you know the human decides and those and so over over time it you know the human is given kind of more and more subtly different examples of behavior and the reward predictor in response learns to discriminate them and get some more refined understanding of what the human prefers and then the our logarithm then tries to maximize that and then the consequences of its behavior are then given back to the human so it's this kind of so I kind of three steps like the human and the AI that's trying to figure out what the humans optimizing for and then the thing that does that but then like most of the time it's asking the intermediate AI is that right yeah that's the time it's feeding it back to do this intermediate model of what the human wants every so often three parts you have a model of what the human wants you have the ARA algorithm that's maximizing that model and you have the human that feeds that that that trains that trains the model but also the RL algorithm feeds back to the human so that it basically whatever the RL algorithm has learned to do goes back to the humans they so you know it basically says okay is this what you wanted or of the things I'm now doing which do you want more so it's this it's this kind of gradual preference elicitation which kind of helps to get around the you know well if you get things wrong by a little then you get the wrong behavior it's kind of unfolding behavior in real time and incrementally showing you the consequences of you know of the behavior that you've seen so this you know by no means does this you know solve all safety problems you know it's just it's just you know it's just you know one little bit of progress on one one brick in the wall yeah and one one one safety problem but um you know this is an example of the sort of sort of thing I'm talking about and so you know we use this both to solve ML tasks that you couldn't solve before because the reward functions were too hard to specify and then kind of the impact on safety is is is obvious because you know it allows us to specify goals more easily now there's all kinds of other problems you can have with it it has it you know it has to has to scale there are other safety problems you don't want AI systems tricking you there's there's there's so much but this this kind of thing is you know the there's an example of what we did and I think you know we're gonna try and do a lot more of it so how many times do you have to get feedback from the human to solve these problems is it a reasonable number yeah so it depends on the task but you know on some of these on some of these Atari games which you know take about ten million time steps to learn um usually you know a human has to give feedback a few a few thousand times okay um so you know less than less than one percent or a tenth of a percent the human actually has to pay attention to we managed to train this kind of simulated like little noodle robot to do a backflip um with a few hundred a few hundred time steps so we you know that's kind of the human clicking for about thirty minutes or so but we're trying to get that number down because so this is the learning from human preferences page yes yeah okay we'll put up a link to that and you can take a look at this this little worm thing here that that learns to jump progressively from just flailing around yeah a little bit of media coverage my favorite headline was what this backflipping noodle can teach you about AI safety seems quite a bit here's I think it was here here's what this backflipping noodle can teach you about about AI safety but that's some good click that yes so apart from those five issues that you talked about in the paper what do you think of some other kind of important problems in the open problems in the field yeah so one thing we didn't discuss in the paper is the issue of transparency of neural nets so this is kind of trying to figure out why a neural net does what it does which you know you could eventually extend to why is a reinforcement learning system taking the options it takes right it just kind of has a policy it you know it's in a situation that runs a bunch of things through its neural nets and it says you know I'm gonna move left or I'm gonna bend my joint and doesn't really have much explanation for what it does so if we could explain why you know break down the decisions made by made by neural neural nets then you know that that could could help with feedback could help with making sure that systems do what we want them to do and that they're not doing the right thing for the wrong reasons which might mean they would do the wrong thing in another circumstance yeah so you know I think that's a pretty pretty pretty important problem my co-author on the on the paper of Chris all I did a lot of work in that area with a deep dream which is you know just kind of like all the back propagated images generated by neural nets that was originally designed to be a way to to visualize what maximally activates a given neuron within within a neural net so it was it was initially a transparency technique and and you know so that's an area that that Chris is very excited about so that that's another area I think we should work on I mentioned before adversarial examples I think that's uh I think that's an area it's already getting a decent amount amount of attention but probably should get more like everything in safety should get more um so you know I think I think that's that's an area we should work on and also that has like short-term safety implications like you know someone could like you know sabotage a self-driving car with adverse cell examples and we certainly certainly put them on that we can't have that yeah yeah interesting so is that a problem for the rollout of of self-driving cars now that someone might put up a sign I'm confused I'm not I'm not the expert on it and I I definitely don't want to give anyone any ideas about how to do that I do it would certainly end up yeah criminal I would think right it'd be extremely it'd be extremely legal so I don't I don't actually know the details of whether whether that's yeah that's feasible or not and wouldn't discuss them if I did of course yeah so as we as we progressively work towards being able to control you know yeah I I is the were developing tea do you think it's going to be possible for people to understand the solutions that we've developed so here you've discussed this like three-step process but she like trained yeah machine reinforcement learning algorithm to understand the humans and then that trains the machine learning algorithm on the other side yeah so I can kind of understand that there's other like big breakthroughs in history that like you can kind of get like you know quantum physics like it's a particle and half a wave and you leave grass but do you think it's gonna look like that it would just be like impossible technical details itself for the safety I guess the the way that we're going to get machine learning or like other AI technologies to do what we want rather than flip out I mean I I guess there's two possible questions one is are we going to understand at a very granular level every decision that's made and then there's are we gonna understand the principles by which the system operates I think we'd better understand the principles by which this is some operates if we if we don't understand those I I don't know how we can build them and if we did build them I would definitely worry about their safety so I think it's it's realistic to understand the basic principles on which something is built um but then there's a question of on what level of abstraction do we do we understand it right the principles on which you know a visual neural net are built are very simple it's you know backpropagation and alternating linear and nonlinear components that's pretty much all there is to understand so then the question is you know how much do we know about what goes on inside the neural net and that's kind of the question of transparency so I'm I'm optimistic that we'll gain a better understanding of transparency inside neural nets then the question is how does that actually help us on safety how do we actually use it there's a lot going on inside neural nets even if we could individually understand every piece of them how does that actually how does it actually help us there's more units than I can you know that I can read and understand so I have to have some way of transducing that into something actionable like correcting bad behaviors or something so so somehow that component of it has to fall into place as well and I don't know yet how that's going to happen I don't I don't I don't know if it's possible but I think it's it's a urgently important research area yeah all right so let's turn now to how someone might be actually actually able to pursue a career in AI safety one of the natural paths to getting a job at open area or other similar organizations yeah so so I think my my advice is gonna be you know focused on you know kind of the kind of AI safety work that I am excited about so for example you know Miri does some safety work that's kind of more based on you know mathematics and and formal logic and so you know you if you wanted to do that you'd need kind of kind of different background but the safety work that that I'm most excited about I think you know just you know it sounds obvious but the two things you most need are you know an extremely strong background in machine learning and you know a real deep deep interest in in in in in AI safety you know I think to break those down I think the first one is you know certainly at open AI you know we really try hard to you know to have a really high bar for hiring people so I'm you know you know just because someone wants to work on safety doesn't mean that we you know lower the Machine machine learning bar you know at all we have a lot of people here who are who are very good so you know going to you know get a PhD and in in in in machine learning going going going to get a PhD with you know you know trying to work with the vet the best people you can work with do the most groundbreaking work you can do there's there's kind of you know no no no ceiling to you know how much of this you know helps you know my sense has been that people who have a deeper understanding of machine learning um if they're interested in AI safety also also tend to really grasp grasp AI safety issues are better provided they think about them and that's that's the second component which is you know I want people who really have a deep interest in in safety not just you know oh you know oh you know it would be good if if systems you know didn't didn't you know if we go to self-driving cars didn't but have have kind of a broad view of you know where we're going with with AI which could be totally different from my vision might might not involve and might not involve AGI but this general idea that we want to build machines that that do what humans want and carry out the human will I think that you know I think that idea is is is is is is a broad one and I want people working on safety to have a broad broad view of that issue so you know in the EA community I don't think the second second one is is is is is lacking there are many people who are passionate about the second one so I think I think the limiting factor is just a very strong machine learning talent right yeah so we just wrote a career review of doing a machine learning PhD which yeah for which we'll put up a put up a link to and you can ever read it is a machine learning or bust other other other options like other PhDs that people could do that there could be relevant like computer science my PhD wasn't wasn't in machine learning right so so you know and we have a number of people here who you know how backgrounds in you know neuroscience or another area of computer science or mathematics or physics so it's entirely possible if you happen to be educated another another area to to go into this field but I think going forward if you're if you're a young student I don't particularly see a case for doing a PhD in another field if what you want to do is is machine learning I mean I guess I guess I'm saying it's it's pretty easy to convert skills and related areas and you know sometimes it gives you perspectives that you don't have but if you want to do machine learning you should should get a PhD in machine learning you know I think another thing I'd add is you know we do have some people working here who don't don't have PhDs um you know my my co-author crystal actually never never even went to college right Wow he you know he just he just straight she just straight up went up to went to Google and you know he you know he had to do a lot to prove himself right the the level of technical ability you need to show is not lower it's even higher when you don't have the educational background but it's it's totally possible so you know yeah I would I would say the most important thing is to being able to do a lot of impressive and creative creative machine machine learning work I would I would even go so far as to say it's not it's not my expertise but even even the people doing safety work that doesn't involve machine learning I get I get pretty nervous when they don't have a strong background in machine learning because even if they think that a machine learning system can't can't be made safe they should they should know enough to understand why why they think that's the case and what they think the alternatives are so that includes I guess people doing like mathematical research yeah philosophy research I if it if it relates to if it relates to AI safety I would say that even those people I would encourage them to learn as much machine learning as possible if only because they should understand approaches that they're that they're critiquing yeah is it fair to say that you think that the approach you're taking where you study machine learning and try to actually improve AGI is is is the best way to make AGI safe that I mean you'd rather see someone do that than go into these kind of other adjacent areas you know it's it's a little complicated because I think that as systems get more complicated there may be ways in which we kind of combine neural nets with formal reasoning so there's been been some work by my friend Jeffery Irving and some of his colleagues on doing theorem proving so basically using neural nets to select the lemmas to be used for next theorem so you know if you take that far enough you can imagine versions of you know reasoning systems that basically you know they traverse some well-defined reasoning graph and you know they make kind of logical conclusions that are tractable but it's all driven up the bottom by neural nets driven kind of intuition or the neural Nets decide what what conclusions you draw and and where your thinking goes and so I think this is how humans do symbolic tasks like you know like physics or math or anything like that right we're we're all nets at the bottom and then we have a layer on top of that that is kind of we use those neural nets to represent symbolic reasoning and computer could probably do that even better because it can make sure that it never makes a mistake in this in this symbolic reasoning that the symbolic reasoning engine is there so you know you can imagine having formal guarantees on that kind of formal formal formal reasoning but I think I think when we get to that it'll look different from the way things are currently being done so I'm really not against using formal formal formal reasoning methods and using mathematics but I I think it'll be possible to do that work more productively once we understand how it fits in with with current systems I actually actually don't know maybe the stuff that's being done now is uh is is is is is productive but I'm pretty suspicious of anything where you can't get that tight empirical feedback loop because I think it's really easy for people to fool themselves so it sounds like the Kay piece of advice is to a PhD in machine learning so um where might like what universities can people go to like what supervised yeah II suggest so I do want to repeat again that uh it's you know there are ways not not to do a PhD and in particular you know number of people go to PhD for for a year or two and then do an internship here or a deepmind or somewhere and you know then then get hired so even partial PhD can work but uh you know that said I mean you know I think the usual suspects are places like Stanford Berkeley you know Cambridge or Oxford in in the UK you know Montreal yeah actually been Joe's group is uh you know pretty pretty well known for you know for doing a lot of good stuff and then you know they're they're kind of a number of other places right and definitely you know the PhD pools are where we hire you know a lot you know uh a lot of our folks cuz you know we know we know a lot of the we know a lot of the the relevant professors but you know but again we have some people here who didn't didn't didn't do the PhD work I think I think the most important thing is being able to keep up with the literature and make creative original discoveries that you know that are our novel and that's day on pace with what what everyone else is doing if you can do that then that's the best thing when when I talk to people who I want to switch into machine learning from from like another field the advice I always give them is just get every possible model you can if you're trying to learn supervised learning get all the imagenet models like implement them yourself read a paper implement it read a paper implement it same for supervised learning same for generative models you just kind of get this knack for it after you've done it for awhile and it's it's really just kind of practic hands-on experience and you you just kind of get a you get a sense for these things once you've done them for a while and you also find out quickly how good you are yeah so I might be showing my navigator to ask this but the machine learning is kind of only one way that you can approach AI right there's like other paradigms this is this has historically been the case it's it's gotten the little bits I think complicated you know in in the old days we had kind of things like expert systems that were based on kind of purely on logical reasoning and you know generally we found that those systems were very very very brittle because you know they couldn't represent the kind of high dimensional space that we see so for example a vision system that's based purely on rules it's it's difficult because you know like if I'm trying to identify a face or an object I mean you know I'm trying to identify these kind of blah blah base and so you know in some sense these problems are inherently statistical and so you know the the rule-based systems actually don't don't end up working all that well so but you know historically you know there were statistical systems and there were rule-based systems you know I think we can say now that the statistical systems have pretty pretty decisively pretty decisively one I you know I sometimes hear people say things like I don't think you know AGI could be built using machine learning or I don't think age I could be told safely using machine learning or something I think when people say that they aren't really thinking kind of carefully about about the alternatives so I'm quite sure that kind of like a pure rule-based system is is not is just not gonna work because of the thing I said that you know you're you have to ground what you're doing and sensory information and the the sensory information is just inherently statistical and fuzzy so you know pure rule-based systems I think are not going to work what could happen is the thing I described before where you have machine learning systems you know deep deep neural Nets being used to drive logical reasoning systems so that would be a hybrid of the two but you know people are working that that that would be considered within the field of machine machine so you know I think we'll often in the future of machine learning may include logical reasoning processes but you know they'll they'll be kind of they'll be kind of at a higher layer and so you know what what ends up you know happening it will kind of involve reasoning in the same sense that a go playing system involves go right but it won't really be like those rule-based systems that that doubt that we had before I mean you can't be 100% percent sure of this but I think just the basic argument that like percepts are these statistical blobs and so you have to use a sista chol system at least at the beginning to measure them and then then whatever concept you draw from them are kind of fun you end up being fuzzy statistical concepts and that if you want to bring if you want to bring those back to logical reasoning the reasoning has to exist on kind of a plane that's abstracted from that yeah so there's not some other AI paradigm that people should be doing if you know deep studying yeah I mean I don't again like you know in in history of AI there were a lot of a lot of rule-based systems but you know then then there were critiques written of them so I think I forget the guys name Hubert Hubert Dreyfus or something who basically he was like this continental philosopher who who wrote this critique that people found really hard to understand but what what it was really saying was just uh you know percepts or these statistical blobs and if you make a rule based system you're not gonna you're always gonna make mistakes and your system is always gonna be brittle and wrong a lot okay so assuming someone someone has been studying you know machine learning or one of these other related areas that that's a potential path in it is there a natural path from from studying to actually getting a job at open a aia or another organization or do other intermediate steps that people have today yeah I mean I think you know again PhD students who you know who do impressive first author work on papers or people who you know we're generally very excited to to interview so you know if ya if you're if you're in a good ph.d program and you do some work then I definitely you know do some good work I definitely encourage you to apply here for people who are relatively early in their careers or come from another background there's a program at Google called the what's it called the the brain residency program that allows you to kind of study machine learning with experts for a year and then that kind of allows you to you know to like you know to train your skills and we've had a number of kind of residents who have applied here or elsewhere yeah and so that ends up being a good thing yeah so yes speaking of which this so there's a bunch of different organizations right so so the zip nai there's google deepmind yeah brain yep Carius was another company maybe more in the past and there's the human compatible AI yep Oakley you know like go through a couple of these so um you know as I said opening I in deep mind are probably the most focused on reinforcement learning and and that probably spend the most time thinking about about AG I know everyone here does but you know it's it's a focus here more than it is elsewhere a Google brain is where I was before before coming here that was kind of the original research group at Google and they're I would say a more kind of decentralized group that works on a wider wider set of topics yeah and you know the Chris all of their thinks about safety there is you mentioned yeah the senator for a human compatible AI is Stuart Stuart Russell's group so you know we collaborate with them some we have some some interns from there come here so you know I think you know Stuart's been someone who's been thinking about safety for a while so that's that's another good place to work on it vicarious to my knowledge doesn't think about safety oh I'll be wrong but yeah and other other other other groups that we've missed here it is pretty small are there any government research projects or not nothing anything in China not that I'm not that I'm aware of I mean you know of course there's there's there's Mirian FHI all right yeah so the future of humanity Institute and the machine intelligence Research Institute you know the so they're but they're less doing machinery less on machine learning although I think Stuart Armstrong @fh I collaborated a bit with with deep mind and something that I think was was like broadly machine learning related this was like studying interrupter incorrigibility or something like that yeah so yeah okay yeah so we we talk to people reasonably often who would be interested in in doing like a job like like what you're doing is there any way that they can get indicators early on about whether whether that's possible or whether they're just kind of wasting their time and and should look at other options be a good fit either they don't maybe have the you know machine learning chops or that culturally they're not gonna be a good fit or something at some other reason so uh I think I think the thing I mentioned about implementing lots of models very very very quickly if you want to know whether whether you're good a way that is a good proxy for how well you'll do in in in grad school as well as for the tests we give when people apply here is a you know fine find a machine learning model that's described in a recent paper implement it and try and get it to work quickly if this is a painful process for you and you you really don't like doing it then you aren't gonna like you know you aren't gonna like any the research that we do either on AI safety or other or other or other or other AI stuff if you find you can do this quickly and or you really really like you really really like doing it you find it addictive then you know that's an indicator that this is something this might be something you really want to do I wouldn't you know I wouldn't I wouldn't worry about the cultural stuff like if you're um if you're skilled in this area and passionate about this area then I don't think you'll have I'd oh yeah I don't think it'll be a barrier I don't think I don't think you'll have I don't think you'll have any problems we try and be as open and welcoming as we can I mean we don't we don't we don't we don't have the we'd have the luxury of selecting you want anything other than their favorite TV shows yeah yeah I mean why it's you know that's just that's that's just that's just wasteful and pointless yeah absolutely so how early can people do that test can they do that as an undergraduate or like a PhD level thing I mean people can do it in high school like you know if you know you meet a seventeen-year-old or like how do you get into machine learning I'm like just go home and implement these models you know you know you actually don't need any kind of formal education you probably need a thousand dollars to buy to buy a GPU I have I have kind of considered various times like you know she we have like a like a kind of like grant program where it's like you know if you're like 17 years old and you want to get into machine learning I'll just like buy you a GPU mm-hm yeah yeah why not yeah interested in AI safety I mean you know that thousand thousand dollars you know most most most you know adults living in the developed world can can the for two thousand dollars but most seventeen year olds might not be able to so you know if they don't already have access to one you know might be a good way to get people started early yeah so okay so if there's a 17 year old listing here who wants to go and vote that machine like what should they Google for what should they start reading I mean you know you know because I'm from the kind of you know opening I slash deep mind direction research you know thinking about kind of reinforcement reinforcement learning you know trying to so what's the difference reinforcement learning and machine learning so uh so you know machine learning is kind of the broader topic and within it there are several different areas there's supervised learning which is where you kind of try and predict some data that's been labeled so an example of a supervised learning problem would be like you know you're given images and they correspond to objects so this is an image of a dog this is an image of a cat this is an image of a computer and you train the network on lots of pairs of here's the image here's what it is and then it learns over time to map the two to each other supervised learning has this kind of static quality where it's kind of like a one-off you're trying to like you know predict one thing from another reinforcement learning is is more a setup where you're interacting in a more intertwined way with an environment so you know it's the game of go is like this you make a move and then the opponent or your environment makes a move and then you know and then you make a move again and overall you're trying to win the game and you know the the reward or figuring out whether whether you've won the game or how well you're doing can be you know delayed by it by a long time and so the reason I focus a lot on reinforcement learning and you know why I open the eye focuses a lot of them is that you know reinforcement learning and things things like it extended versions of reinforcement learning seem like a better fit for what intelligent agents do in general right you know often often I have very long range goals right I'm you know trying to get an education try trying to get a PhD try and try and trying to have career you know try trying to start a family or something and these are all things that unfold over over years and involve interacting with my environment in this very very complicated way so our reinforcement learning is kind of the only paradigm we have that even comes close to capturing this okay sorry I cut you off we will figure out what the seventeen-year-old should read to get that foot in the door um so lots of papers in in reinforcement learning I'd read about what's called DQ n just Google DQ n it's not a common acronym deep deep Q learning that this was a paper done by deep mind in 2013 policy gradients on particular a3 C and just kind of follow the trail of recent reinforcement learning things that have showed up on archive um just go to our machine learning on reddit and look at some some kind of you know recent papers and the deep deep neural net literature look at them try and re-implemented get results as good as the results that others get it's really pretty self contained and you don't need that much help if you're having trouble getting started implementing them then you can start by for many popular paper's like DQ n you can find an existing implementation then start with that and try to fiddle with it to see if you can make it better yeah and what kind of program are you running presumably you're not doing this in Excel what's the no it's you know typically you know you'll use the the typical is you know Python with tensor flow so tensor tensor flow is this this tool that the Google brain team made for um you know for doing general computations but in particular deep deep neural net computations and so you'll find a large fraction of the stuff is implemented in in in tensor flow or some some similar framework so you know Python is pretty easy to learn tensor flow is pretty easy to learn and so you know read some tensor flow code of some stuff that's been implemented learn tensor flow and implement some stuff yourself okay great yes so what's the what's the range of roles available is it yeah how do they vary okay so you're talking about within machine learning or within safety-related I guess mostly within safety I think okay so I mean you know I can I can talk about what's being done here at open a I I would say there's two main directions there's the technical safety stuff and there's you know the the kind of you know policy side of things so on the technical safety you know there's not a lot of people working on it yet but you know the human preferences paper that that I showed you is a good example of it a lot of the papers we cite in concrete problems are good examples of this work deepmind has had some recent good examples of of safety work I think the skill set as I said it's very similar to the typical machine learning skill set but you know you should also be willing to work in a field that has relatively sparsely populated literature which means coming up with your own ideas or you know or working very closely with someone who's you know one of the people generating the the ideas in the field yeah so you know that has some downsides in that you have to set more of your own direction but it also has some upsides and that you can be one of the first people in a totally new field and you know I that's what excited me about writing writing concrete problems right I'm like you know I could work on something that 200 other people worked on or I could try and set a new direction and maybe maybe it won't be exciting at all and and you know oh well at least I did something interesting or you know maybe it's you know maybe it turns out to be to be really exciting and that's like a bet that I that I'm perfectly happy to take okay so what would you recommend to someone who was considering entering the AI safety industry doing Michelle machine-learning work but they're worried that they're not gonna have such great long-term career options elsewhere especially compared perhaps to doing machine learning work in a more commercial way with less of a safety focus or just going into whatever kind of pays the most or has the best career so I think ml work is so hot right now that you know anyone who goes into it particularly on the fundamental research side it's easy to transition to applications and I think the kind of safety work that we're doing has many of the same skills as you know any other Arian machine learning even though the subject matter is very different so I think you know someone who does that is going to be in a very strong position to do very well in the future and you know I think it's it's probably you know even if you're going into it for altruistic reasons it probably just also happens to be one of the most secure you know career financially secure career career is you could you could go into I mean you know we recently had someone leave open AI who became head of head of AI at Tesla like head of head of all of AI reporting directly to Elon Musk so you know I mean III myself want a stay at open AI and work on I work on safety you know I want to keep keep working on the research end all the way until until we get a GI whenever whenever that is but if I didn't want to do that if I if I if I if I wanted to leave you know there was like plenty of wonderful things that that that that I can do and you know the same will be true of other people who come here right it is it more of a concern for people who would say be working at Mary doing doing non machine learning safety research yeah I mean I think you know most of the people there are you know smart people who who either had or could have you know really great careers as as a software engineer so they probably have you know great great great options as well I generally get the sense you know people who go to Mira are really passionate about Mary's mission in particular and tend tend tend to worry about this less yeah but yeah I mean it you know it's it's it's it's that the the amount of kind of buzz and hype is definitely not as high as it is for you know the machine learning for letting approach right so yeah pretty often we talk to people who were saying in the mid 20s and they did a fairly quantitative degree maybe like economics or you know logic but but they don't know machine learning in particular is it possible for them to retrain to get into this or is it just over for them at 25 yeah so I mean you know my my own example is you know until I was a 28 or 29 or something like I I hadn't done any any machine learning so it's it's definitely possible to do this my my main advice is the same advice that I'd give to you know like the the 17 year old that we talked about earlier which is just just implement as many models as you can as quickly as you can just to see if you know if you have the knack for it and you really enjoy doing it because this is gonna be you know greater than 50% of what of what of what your job consists of just you know knowing how to have the real intuition to implement neural net models and have them work how to put together new architectures that do new things you know first implementing these papers and then tweaking them so that's a really cheap way to to find out whether whether you know this is a career that you're good at and that you'll enjoy I wouldn't recommend like going back and doing another PhD in machine learning I think you know once you have a PhD like there are some positions where Google wants you for instance wants you to have a PhD when they hire you but you know they don't really care what what area it's in they do want to know that you're like committed to some new area that you want to go into but you know it's more important to like you know for the plate for the places that care about you know whether you have a PhD which we don't care that much but even though the places that care think it's more important you have a PhD even that you have it in some particular field right right so if you already have a PhD and philosophy then you should just go and learn MLG do some kind of internship somewhere yeah I would say learn learn I'm al you know implement a bunch of models then go do an internship or the brain residency program at Google or you know come to an internship with us or a deepmind all these all these are all these are viable options and each step gives you a better idea of whether this this you know whether this career path is really for you hmm and so really it's just the case that someone who did an undergraduate degree in economics can kind of jump in and try to run machine learning try try to you know train machine learning on their computer yeah I mean I think there's if you know how to program Python and you can learn tensorflow quickly then and um you know it's it's it's a very it's a very empirical field so you know of course there's lots of kind of hidden knowledge that researchers know that they you know that they kind of tell each other but that like you know it's hard to express in the papers and so you won't pick up on everything but you can certainly get started this way and then you know talking to people about about models models you've implemented talking to professional researchers you know to get a sense of you know what's an exciting thing to work on next you know that's kind of enough to get you started yeah and so how does working at opening eye or Google had to a machine-learning role in academia so I tend I'm a bit biased but I generally tend to give people the advice to come to to come to industrial labs I think one reason is the industrial labs have gotten you know and by industrial labs I kind of mean open AI even though it's not for profit but just kind of the large non-academic research centers they tend to have more resources more compute and in part because of this they've you know I think they've been winning the talent war recently so you know I think it's don't make sense to you know to to go do to go do to go do a PhD I think you know stay in academia your whole life I mean I guess if you become a professor then it becomes a lot easier to collaborate with with the industrial labs so you know both we and deepmind have you know people who were our professors and spend part of their time there and part of their time here so it's all it's it's kind of all very feasible and there's a lot of mobility between the two but in general I felt you know many people will disagree with me but I felt that you know the most groundbreaking work has tended to happen at the industrial labs and you know over the last couple of years at least yeah is there any effort to change that our universities trying to catch up or is it just too expensive you know there's yoshua bengio group in in in in montreal is you know is is quite large is one of the few you know major figures and deploring who's kind of resisted the pressure to go into the industrial world and his lab does you know a lot of a lot of great work Pedro BIAL at Berkeley personally on that at Stanford and just kind of a number of others including folks who you know don't who do work that's not necessarily related to deep learning so you know there's there's a lot of there's a lot of interesting work work everywhere um but at least the kind of safety that that I work on tends to play best with kind of the cutting edge of m/l work and explicitly tries to keep up with the cutting edge of m/l work yeah it reminds me given the cost how how is happening I thought it is it just donations are you also excelling no no we're nonprofit you know I think think I mentioned earlier the the major donors are you on musk Sam Altman and Dustin Moskovitz at this point yes it's just donations and it's just donations yes yeah interesting do you think is it possible for you to sell things to get extra computational power if you need or like start selling services or is that it's not a legal issue I'm not yeah I'm not I'm not an expert in this I think yeah I'm not sure you can sell stuff if you're a non-profit right right yeah so is the is the work frustrating because you're not sure whether your solutions actually exists and you can you know beat your head against the wall for quite a while before you you figure out well maybe that there isn't even a way of solving it the way that you thought yeah so I think actually that's the case in any any area of machine learning where you're trying to do original research if you're trying to do something worthwhile then you don't already know if it can be done and you have to try stuff that seems crazy and you know it might not it might not work so it's true of any area I think it's especially true of an area that's very new like like like like a I safety and so yeah I definitely agree that uh you know one of the trade-offs for working in nai safety is like you know on one hand you have this exciting ability to work on you know a new field that's just starting and they could be very impactful but at the same time you know no one's kind of defined you know what what's what successful work looks like you know we're still kind of laying out what the problem is what the problems are and what the work is that needs to be done so you know I think it definitely you know kind of requires an attitude of being willing to you know kind of you know define do more to define problems yourself and you need to be kind of kind of more creative instead of doing something that's you know that's an incremental improvement on the thing the thing that was done last but you know but that that's you know I to me that's a good property yeah yeah so so is it is it a good kind of role for somebody was a little great and you know willingness to persist with things despite adversity is that yeah I mean I think the pioneering area yeah I mean I think that I think that quality is is useful in any in any you know important or original work and it is here as well yeah all right so so turning that's a non machine learning approaches to tackling you know the problems in AI safety what kind of non technical approaches did you see is promising I interviewed Miles Brundage at the future of humanity Institute recently you have a view on any of the of the AI policy topics that we spoke about yeah so I don't I don't know precisely what what you guys spoke about but um you know you know I spend a little bit of my time speaking about the relevant policy issues so I think you know you know if if the humanity at some point builds builds AGI then we're gonna have to think about you know both how to handle safety issues as we're building it um and some of the coordination issues that are going to come up with respect to safety and also the question of you know who uses it what it's used for um you know one one example of this is you you can imagine that um you know maybe if it's possible a really good way to build AGI would be to build an AGI instead of doing anything with it in the world um try and if it's possible first develop a capability to have it advise you on the situation you've put Humanity in by building it right and say look like we've just opened this can of worms by creating you can you analyze our strategic situation and say what we should do because we're kind of aware that you know if we if we don't if we don't use you in the right way or we hand you to the wrong person then it could be really bad for Humanity yeah so if we were able to turn the problem in on itself that way that would be really good and that's partially that the AI like the world safe for AI yeah it's it's partially a technical question and it's partially a policy question which is how do we get ourselves in a situation where we can do that I I think that um you know there's a lot of players there's gonna be more AI organizations you know government actors will someday have something to say about about about AI I mean they already have something to say about AI someday they'll have something to say about about AGI you know when when when we're more in the world where AGI is is going to happen and so what what strategies should we take towards towards you know towards all of the actors how do how do how do we make sure that when everything is put together it leads to a good outcome and you know and any there anything we can do today to deal with these you know these kind of distant distant problems and so those are those are kind of the set of policy issues that we tend to think about you know there's also some more thought on kind of short-term policy issues you know so how you know how can we you know how can we get you know people to think about more mundane issues of safety you know should the government regulate things should um you know you know what should policies beyond self self-driving cars and stuff what should policies beyond you know automation and job creation we do some of that stuff but a lot of people think about that so so we tend to focus more on the long range stuff not you know it's like less actionable but there aren't that many people thinking about it so you know we might as well do whatever thinking we can we can on it which which might be there's nothing Beck and it can actually be done but we want to at least consider yeah so do you have any thoughts on how we can ensure that you know all of the players kind of cooperate and avoid having an arms race where they just try to improve their machine learning techniques really quickly without regard to safety since you're collaborating a great deal with with deepmind yeah yeah this was in in part you know motivated by the idea of the organization's working together so you know it helps that you know you know myself and you know some of the some of the founders of deepmind have known each other for a while we all you know we all kind of you know think about AGI and you know and and think that that safety issues are important so you know you know when when when you know people at the major organizations are friends with each other and work to actively collaborate then that reduces the probability of any kind of conflict because you know people people know each other there isn't you know fear fear fear uncertainty and you know if there's there's a disagreement we can we can work it out so then the question is how does that scale to their being being a lot of how does that scale there being a lot of organizations yeah and you know how is that scale to others who get involved once they see how powerful AI is can we make them cooperate as well and my hope is that we can but no it's it's not you know it's not it's not an easy thing yeah so do you think it would be a good or bad thing if AI were developed sooner so there's been the kind of this explosion of investment yeah in machine learning and improvement is this something that we should be pleased about or concerned about or just kind of neutral we're not sure whether it's good it's it's it's kind of hard to say I mean the obvious bad thing is if you're like if you're like really afraid that it'll be safety problems with with with AGI then then you might think it was a bad thing and a lot of people think it's a bad thing for that reason I mean my my my view is you know we're kind of relatively early in the game and I think there's a substantial probability that you know the gloomy analyses are really misunderstanding the safety problem and how the how the safety problem how the safety problem works you know some counter risks to worry about her that something bad happens to the world in the meantime while we're trying to develop AGI or that you know or that that AGI is is used in a bad way and you know I guess I guess a couple years ago I often made the argument that we were in a relatively peaceful geopolitical time so good that if AGI would be built but I'm starting to wonder a little bit last year so that maybe we're not in such a peaceful geopolitical state as we're recording this everyone is flipping out about North Korea to developing into content oh yes I mean these things are pretty deeply concerning you know we we there's been a lot of political instability in the rest Western world and in the last year and you know aside from the usual reasons why this might make me unhappy it's it's made me unhappy because it creates a less stable political environment in which in which AGI would happen so so I don't know I mean I I will say I you know I think I think I think we're better off if AGI is developed in a stable political environment with leaders who are intelligent and have reasonable views and so I'd like that to happen and I no longer know whether that means that AGI whether that means AGI should happen soon or that it should happen in a long time I guess it depends whether the current trends the current trends that were seen in the last year continue or if they're only a blip if they're only a blip then you know that doesn't matter in a few years we're back to where we are before but if if we're on a general trend in like if in a bad in a bad action and you know then then maybe maybe it's bad to wait too long yeah it's a difficult thing to yeah so so I think it's I think it's I think it's pretty complicated and I think you know I think pure pure safety considerations tell us that it's always it's always good to have more time although you know at the same time some of the hardest safety problems to solve may be problems we can't solve until you know the last couple of years until we build a GI so so in that case you know delay delay doesn't really help us it just kind of delays the crunch period that we'll have to face because it's like someone trying to finish an essay by a particular deadline if they know they're only gonna do it the night before the deadline it doesn't much matter when the deadline is so III think I think it's a complicated question I mean my it's it's not a variable that I have a lot of control over it's it's happening at the field level so I prefer to try to control variables that I have something you can change and one thing I have control over is that it seems like there's at least some safety work that can be done now and so I'd like to do it and it seems like there are some ways that different AI organizations are not collaborating now that you know that we can encourage them to collaborate and so I've also been been working on that and and I think those uh I think those efforts have been successful and so I feel like you know it's been good to cause things to happen that wouldn't have caused otherwise but then there are all these other things that I feel like I have no control for whatsoever yeah well I've taken up an awful lot of your time here and I'm sure you have have to get back to your - research yeah hit these deadlines so is there anything you'd like to say to people who are considering you know following your example and doing this kind of research before we finish yeah I mean you know we're of course you know hiring for you know for very talented machine learning people who care care a lot about about AI safety so you know you know we welcome welcome applications ahead open ai we collaborated a lot with the deep mine safety people so you know I'm always you know as part of this collaborative spirit you know III you know I think that's a really great team as well and people should should should should should apply their there as well it's kind of convenient to have a place that's in Europe and a place that's in the US so you know I think I think there's a lot of good work going on at you know several different places yeah yeah I'll just add 80,000 hours has been doing a whole lot of research into this question of you know how can we positively shape the development of artificial intelligence and we're coaching some some people to try to help them get get jobs at places like like open AI so if you feel like you're in a good position to do that and then fill out the at the application on our website we think it's one of the most high-impact roles that someone could take if they're able to do it which is one of the reasons why we've you know we've looked into it so so much hopefully over the next few years we'll see quite a lot more people go into this field and it won't be so neglected and we can we can be a bit more relaxed about it it's been fantastic to have you on the show darío yes we can we can check back in a couple years and find out what open air has been up to and hopefully you've found lots of lots of new talented people to work in the area Oh fantastic all right thanks so much [Music]

Related conversations

AXRP

3 Jan 2026

David Rein on METR Time Horizons

This conversation examines core safety through David Rein on METR Time Horizons, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -0 · 108 segs

AXRP

7 Aug 2025

Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -5 · 133 segs

AXRP

6 Jul 2025

Samuel Albanie on DeepMind's AGI Safety Approach

This conversation examines core safety through Samuel Albanie on DeepMind's AGI Safety Approach, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -4 · 72 segs

AXRP

1 Dec 2024

Evan Hubinger on Model Organisms of Misalignment

This conversation examines technical alignment through Evan Hubinger on Model Organisms of Misalignment, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med -6 · avg -7 · 120 segs

Counterbalance on this topic

Ranked with the mirror rule in the methodology: picks sit closer to the opposite side of your score on the same axis (lens alignment preferred). Each card plots you and the pick together.

Mirror pick 1

AXRP

3 Jan 2026

David Rein on METR Time Horizons

This conversation examines core safety through David Rein on METR Time Horizons, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Spectrum vs this page

This page -14.36This pick -10.64Δ +3.719999999999999
This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -0 · 108 segs

Mirror pick 2

AXRP

7 Aug 2025

Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Spectrum vs this page

This page -14.36This pick -10.64Δ +3.719999999999999
This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -5 · 133 segs

Mirror pick 3

AXRP

6 Jul 2025

Samuel Albanie on DeepMind's AGI Safety Approach

This conversation examines core safety through Samuel Albanie on DeepMind's AGI Safety Approach, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Spectrum vs this page

This page -14.36This pick -10.64Δ +3.719999999999999
This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -4 · 72 segs