Library / In focus

Back to Library
Future of Life Institute PodcastCivilisational risk and strategy

What Happens When Insiders Sound the Alarm on AI? (with Karl Koch)

Why this matters

This episode strengthens first-principles understanding of alignment risk and the strategic conditions that shape safe outcomes.

Summary

This conversation examines core safety through What Happens When Insiders Sound the Alarm on AI? (with Karl Koch), surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Perspective map

MixedTechnicalMedium confidenceTranscript-informed

The amber marker shows the most Risk-forward score. The white marker shows the most Opportunity-forward score. The black marker shows the median perspective for this library item. Tap the band, a marker, or the track to open the transcript there.

An explanation of the Perspective Map framework can be found here.

Episode arc by segment

Early → late · height = spectrum position · colour = band

Risk-forwardMixedOpportunity-forward

Each bar is tinted by where its score sits on the same strip as above (amber → cyan midpoint → white). Same lexicon as the headline. Bars are evenly spaced in transcript order (not clock time).

StartEnd

Across 72 full-transcript segments: median 0 · mean -3 · spread -310 (p10–p90 -100) · 4% risk-forward, 96% mixed, 0% opportunity-forward slices.

Slice bands
72 slices · p10–p90 -100

Mixed leaning, primarily in the Technical lens. Evidence mode: interview. Confidence: medium.

  • - Emphasizes alignment
  • - Emphasizes safety
  • - Full transcript scored in 72 sequential slices (median slice 0).

Editor note

A high-leverage addition to the AI Safety Map that clarifies one important safety bottleneck.

ai-safetyflicore-safetytechnical

Play on sAIfe Hands

Episode transcript

YouTube captions (auto or uploaded) · video oOAlvjftCZY · stored Apr 2, 2026 · 1,934 caption segments

Captions are an imperfect primary: they can mis-hear names and technical terms. Use them alongside the audio and publisher materials when verifying claims.

No editorial assessment file yet. Add content/resources/transcript-assessments/what-happens-when-insiders-sound-the-alarm-on-ai-with-karl-koch.json when you have a listen-based summary.

Show full transcript
Usually when we think about whistleblowing, right, we think most people think of like an Edward Snowden type category, but that's not necessarily what we want to talk about. What we really care about is insiders being able to spot issues, evaluate them, and if there's concern, have them addressed, and that's what we care about. Whistleblowing is more important because if companies hold back certain knowledge or they maybe prevent evil organizations from saying certain things, then who again, who do you go to? Who's the fallback? That's going to be the inside or the whistleblowing side. If you feel like, okay, we're actually not hitting certain alignment things right now and we're using misaligned models to try and align the models of the future, [snorts] probably good to speak up now. If you violate the California state whistleblower protection provisions, you have to pay a fine of $10,000 as a company, which uh you know is probably like a 5-second burn for most companies these days. Uh not a terrible deterrent, right? Carl, welcome to the Future of Life Institute podcast. >> Thank you very much for having me, guys. >> All right, let's start with why did you decide to begin the AI whistleblower initiative? >> O, uh, big question at the start. So, I've been involved with like the AI safety community or you want to call it these days since like 2016. It was a volunteer research at the future of humanity institute. Um then did like an AI safety research camp how very different timelines sort of back in the day uh worked in consulting startups for a while um and yeah I was kind of surprised by the timelines as I think many were and then as chatbt rolled around uh I felt okay now it's really time to move back into the space and do something tractable that seems to be like robustly good a wide across a wide variety of uh scenarios and futures and relatively quickly settled on the topic of transparency as just a thing that if we have more of that that would be very good. Uh looked into compute traceability for a while. I think L time did some great research on that. Uh now I think Lucid computing is looking more into that as well. Uh but the whistleblowing topic came up quite quickly. So this was like mid23 I believe we started looking more seriously into the topic. People had been thinking about it since like the early days. I think for the same reason that we now think whistleblowing is extremely important in the AI space. people had already been thinking about it in like 2017 2018 but we just noticed nobody was uniquely like focused on this exact topic. In the beginning, we thought we were way too early as of the world still seemed a bit more rosy back in the day with I think super alignment team being set up on the open side for example. But then sort of the first cracks started to show around the open AAI board drama where it seemed okay maybe there is a descent here internally and this is not being resolved super well and then of course with uh like Daniel's big disclosure around the right to warn sort of mid 24 and especially early like also earlier in 24 with Leopold Ashana's departure it became clear that this is like an extremely important topic to work on and that's how we do it. >> Yeah. So you can do transparency in many ways. Why whistleblowing? What are the unique advantages of of whistleblowing specifically? >> Yeah. So I think one way to think about it is probably as like a backstock mechanism. So I think there's the Swiss cheese model of like control where you have [snorts] a bunch of basically different mechanisms to make sure things that we want to go well do go well. Uh transparency is certainly somewhere along those lines whether that's um self-imposed transparency obligations. So whether that's the regulator side uh there's a bunch of other of course things you can do as well uh to make sure technologies are developed in a responsible manner and sort of whistleblowing is often used as like the final sort of if you picture a bunch of like slices of Swiss cheese with holes of them where things can get through often that's sort of towards the end where if a bunch of other control mechanisms fail then you have to rely on that. That's probably one angle to think about it. Probably another thought is here like how much you actually believe for example that self-imposed transparency commitments actually sort of hold up well. >> Um that's one way to think about it and then I think probably from like a systemic level another reason why we should really care about whistleblowing eye in particular is just how feasible are the are the mechanisms really. So if it's not necessarily clear what sort of risks will arise maybe over the coming months, years, decades depending on your your your timelines, how confident can we be that we're going to be able to catch all of those things? >> Mhm. And like if the vast majority of the most highly skilled people maybe or let's say a majority of the most highly skilled people work in those companies, those private companies themselves, what sort of maybe regulatory capacity would you need on to actually be able for example to check whether things are actually going in the direction we want them to go. >> Mhm. And so if you have this massive information asymmetry and sort of risks kind of pop up maybe in areas where you don't expect them um whether that's for example on the child safety thing with like the horrible chap suicide story recently or whether it's around like situational awareness for example of models or to what extent maybe do models really in practice sort of maybe carry over misalignment into future models that they maybe help work on who can spot those things best and that's most likely going to be insiders. So it really matters that both sort of companies internally are really strong at rectifying issues and spotting them but necessarily also for that probably to happen to a strong extent having strong whistleblower protections making sure individuals who want to speak up in the public interest uh are empowered >> and what's the state of whistleblower blower protections today in in the AI industry where are we >> yeah um another big question so probably a good way to think about this is in terms of framing it into like different channels. So usually when we think about whistleblowing, right, we think most people think of like an Edward Snowden type category, but that's not necessarily what we want to talk about. What we really care about is insiders being able to spot issues, evaluate them, and if there's concern, have them addressed and that's what we care about. So there is the internal side. So to what extent can people speak up internally and see those issues addressed? Um there's then the external side to regulators. So how much regulatory capacity is there? What protections are there for people speaking up? And then it's probably the public side. Uh so speaking up to the public about risk, concerns, misbehavior on the internal side. So um we actually launched a campaign so somewhat earlier this year at the national whistleblower day event in uh in Washington DC where we called upon Frontier AI companies to publish their internal whistleblowing policies. So to make it clear what what protections are actually there at the moment we only have open AI who has published their policy following you know them trying to suppress speaking up via their extensive nondisclosure actually non-disparagement uh agreements which Daniel uncovered um after that they've published their policy maybe we can go a bit more into detail on that later we'll actually publish a pretty in-depth evaluation of their policy in a while it it doesn't look fantastic is I think on the one hand like very commendable I think that open published this and maybe maybe there's a there's a chance they just haven't had much time you know to go too deep uh into that process yet or it's not a high priority I think there's a strong interest that companies have to improve these policies for themselves um because there's plenty of like empirically proven benefits of having really strong internal whistleblowing channels um for like a variety of reasons for companies um but for example an openi side a middle bit maybe I can already share is if the dedicated team as it sounds at the moment. It's a bit confusing for evaluating internal whistleblowing claims claims is the legal team which is also not directly governed as it seems by the board. So which would be best practice for independence reasons and legal team is sort of considered generally worse practice. They also provide a bunch of other channels you can go to but it's not really clear when sort of what case will go to whom. Um and legal team is bad because there's plenty of evidence or like plenty of examples in the past where a whistleblower goes internal to the legal team who the legal team's jobs to protect the company and the legal team immediately starts a client attorney privileged case against the whistleblower. So basically having like client attorney privilege being between the company and the attorney of the company to as a litigation risk for that whistleblower. And then if a for example a whistleblower then claims retaliation down the line in discovery you cannot actually see what was the internal conversation within the company about that whistleblowing claim because it's client attorney privileged. So that doesn't look great. Might again just be the case that they thought okay let's quickly throw something together here and this is sort of the most obvious thing they came up with and we'd be super happy actually to work with them as well and make sure they do this better. But that doesn't look great. Um for the other companies, we simply don't really know uh the state. We've run a survey with insiders in the past. Um trying to understand, of course, we talk to insiders uh quite quite a bit here and there to understand like how do they feel and of course you can look at like previous president and how well concerns are handled. Um I can make one more example on the on the Google side. Um, when it comes to internal whistleblowing, we've seen Trillium Asset Management, which like an activist investor, they actually specifically called upon Google to improve its internal whistleblowing under the claim basically whistleblowers protect shareholders. Uh, they don't necessarily necessarily protect executives depending on sort of how well you handle rectification of concerns and misconduct before. Um, there's also cases where Google has retaliated against whistleblowers. Um this a somewhat contested case but by Sati Chhataji who raised concerns internally around research practices uh who was then let go and the case was settled uh for wrongful termination. Make of that what you will. And for the other organizations, I could share stuff with you. I like I hear from people. I think there's definitely degrees to which companies openly have conversations internally and which know is the the biggest sort of impact factor. How how comfortable do people feel with saying ah I disagree with this. And probably avid listeners of your podcast can maybe imagine which sort of companies have really strong internal descent cultures from the frontier and which actually really celebrate like openly disagreeing for example leadership and having leadership >> at least that's the way it's perceived by many insiders kind of address concerns and which companies maybe don't. >> Yeah. And so I think probably in terms of internal state not looking too great on the internal side. >> Yeah. So you mentioned that whistleblowing can be in the interest of the company's mission at large even though it can be contrary to the interests of specific executives. How how are the AI companies reacting to to these calls for whistleblower uh policies? Do are they perceiving it as something that's in their interest or are they perceiving it as something uh that's aggressive? because it it seems like becoming a whistleblower is is a is an adversarial action against your own company. >> It is to an extent, right? Especially um if you have an internal process and leadership, for example, says, "No, this is fine." >> Yeah. >> Right. And then you say, "Actually, no, I think this is not fine and I'm potentially even going to go regulator with it." Yeah. That's that's surely that's that's an adversarial process to to an extent if that's sort of the mindset of the company. >> Yeah. Right. Um again probably mixed across companies the reception um we definitely know that there was push back on SB53 sort of the the the whistleblower protection side. Um companies were not not happy with that to have that as sort of as broad as possible. Um probably the usual suspects the listeners can imagine uh were opposed to it and maybe the usual other suspects were okay with it. >> Right. Um when it comes to the actual reaction to our call, we haven't seen a terrible amount yet. Um we've heard from like some companies we've even it's been posted on like internal Slack channels. It's been discussed openly. The overall even I think from like leadership at least in one instance there was even positive reception of saying oh yeah this seems this seems like a good thing to do. Um we haven't really seen more publications of whistleblowing policy since then. maybe a function of just priorities and maybe a function of yeah to our employees we say this sounds great but sort of [clears throat] maybe behind closed doors let's let's maybe rather not >> not sure if that answers the question but >> no it does it does how far would you say we are from the optimal state um >> so where the optimal state would be say your preferred policies both internally and u externally surrounding AI whistleblowers. >> So, um I think most important thing is to have these strong legal protections which means should be harmonized and very clear. Yeah. And covering a really broad range of risks. Ideally, this is sort of both for regular business but also on the national security side. Yeah. There's an element here of also moving culture uh towards viewing whistlebls. So for one, obviously the clearest sort of implications of having really strong whistle protections is that you can go to a regulator, right, without having to worry about too much at least losing your job in case that happens that you have really strong protections against retaliation. We can go a bit more into depth in a second like what that exactly means. But on a high level, uh there's that. Then also having um strong incentives further than protections. We can talk about bounties a bit more later as well. This has been extremely successful on the SEC whistleblowing program. Then you also want to have really strong enforcement. So if companies actually violate whistleblower protections. So for example, if they try to prevent internally free people from speaking up going to a regulator, you want to have high fines for that. This is for example especially an issue in like in California for example. If you vi violate the California state whistleblower protection provisions, you have to pay a fine of $10,000 as a company, which uh you know is probably like a 5-second burn uh for for for most companies these days. Uh not a terrible deterrent, right? I think SB53, for example, has the 1 million fine for violations of the main body of SB53, but the whistleblower protections are part of the labor code again, so they don't really fall under that. That would be something we'd really want to see, I think. And then there is I think really importantly so you have you know what does the actual program look like? How strong are your protections? What channels can you go through? What can you report on which is extremely important but then also the case handling itself. Um so we we ran a survey with with insiders [snorts] and I think the most unanimous question like that we asked was like how strong is your trust in government to understand and act well on concerns that are brought to you and that was extremely low. Um I of course don't want to you know not quite sure what the what the English term is but you know speak this into existence either right there is plenty of evidence also for like regulators also in new fields being able to handle cases but I think there's some legit concern around here if I don't know we are seeing a um a threshold being potentially crossed in let's say an internal eval and there's just disagreement internally does this now fall is there maybe maybe there isn't really a clear regulation yet right which is the big issue in the space around what is acceptable and what is not. Can you go to a regulator with this and say h this seems really bad to me and they will actually help you understand whether that is actually bad or whether it's not and ideally also having that knowledge gathered somewhere. So there is a case we made for having plenty of different like channels that insiders can go to just so you can pick and choose and feel like this is kind of going more direction what I want this is not but ideally of course you want to collect cases also in one place where then you a picture starts to form are we seeing maybe emergent risks across a bunch of different companies where the individual risks maybe wouldn't raise too many concerns but a regulator side if you see okay wow there's four five six maybe different companies people from one company raising this sort of concern uh we should really be concerned. Um that's between regulator side. I think that's what we want to see. Maybe a bit more context on the national security side because the moment you work with classified information on like DoD contracts for example, things all get a lot messier. Um here we want to see something similar especially with like strong congressional oversight on national security relevant cases and that is not looking good at all at the moment. Um but maybe stepping back a little bit. So like okay so let's say you have all these great these strong legal protections that will lead to also internal channels becoming a lot stronger. Ideally that's part of the for example a regulation putting this into place. The proposed federal air whistleblower protection act does have provisions uh for internal channels. So did SB53 uh implied it's already in the California state whistleblower protection law because you're protected from speaking up internally but it doesn't actually mandate what a good internal process looks like. So for example, you get problems like the legal council being recipients >> and >> and you you think this is the direction it it it will go. So first you get the the strong legal protections and then those protections change internal culture because the stories we've heard from from the companies sometimes is that we'll begin having with internal mechanisms that will then serve as a test bed and then we'll see what actually works and then we'll have actual legal um requirements. But you think you think the other way is more plausible or better perhaps? >> Yes. So I think one is better than the other but they're not exclusive. So of course if even if you're not legally mandated you can have strong internal whistleblower protections already and you can have a really strong process and it's in the interest of the companies to have such ex like very strong processes right for example to not have um this is speaking from the interest of a company now have a lot of leaks happen outside the organization where maybe the organization would have been like if you had told us this maybe more clearly internally you would have actually fixed it but if there's no trust that if you speak up internally you will even be flagged >> then that's not going to happen, right? So, there's an extremely strong like business case to be made for having strong internal speaker cultures. And of course, we want to see more of that. And that's why also why we're pushing for stronger internal policies even in the absence of stronger legal protections. But for example, to like tie tie it together a little bit, internal policies, whistleblowing policies especially in the US given that will employment is prevalent. If for example [snorts] a company like and let's take the open AI um policy example they say yes you can report on all of these kind of various forms of risks and we will promise not to retaliate that reads maybe to the average employee like really well like okay there's a maybe this is even like a contract that we're entering here they told me they're not going to do this however anything that is reported in an internal channel that goes beyond what is already protected by the law so what's not a violation of is therefore purely a voluntary commitment by the company. And given that essentially all companies in California use disclaimers in their like employee handbooks and in their contracts, they basically say nothing outside of this employment contract is in any way an agreement between us. It basically means it doesn't hold. And sort of insiders keep maybe potentially running into this sort of trap thinking, "Oh, cool. I'm protected now, but in fact they're actually not. So what can you do to counter this? Either companies can basically for one at the minimum I think they should be transparent in their whistling policy saying FYI, you know, be aware this is not actually a binding contract. The next level is they could turn it into a binding contract, but I said it's not market standard practice, but this would be an extremely strong signal, for example, that a company could send to say we actually care about speaking up. And then obviously like the strongest way to make this happen to like really improve internal protections is to have strong external protections. So I think we've seen like plenty of empirical evidence here in uh Europe. We've seen it since the introduction of the whistleblowing directive but we've also seen it in the US for example that the moment you have pressure and you have a really good channel that goes external the companies basically move into a mode. Okay, now we have to really step up our game internally. And there's strong like empirical evidence asking companies actually to what extent this internal production has improved things for them. It dramatically has both in terms of detecting more misconduct and in terms of more preventing more misconduct into the future. Like this was a survey with like more than a thousand companies or something in Europe answering this where a lot of opposition was there before. So people were like oh no this is going to you know destroy our sharing culture internally. It is not actually the case because like one way you can do is of course you can fight it and you can try to isolate people and isolate knowledge and do it that way or you embrace it and say okay we're going to this is going to become really important for us and then you actually handle it well internally and if you don't handle it well yes there's going to be consequences but you basically if you go with the incentives it makes things better for everybody and we've seen the same in the US I think with the SEC there's been a pretty good paper out on like deterrence effect from uh the SEC whistleblowing program maybe a little nutshell probably one of the strongest whistleblowing programs in the world which allows people to report uh violations of SEC regulation but for example also if companies try to prevent people from speaking to the SEC go there extremely strong anonymity and confidentiality provisions which is like critically important with a strong track record strong track record they're also very strong actually investigating well so not alerting companies too much there's been a whistleblower report very important and Then they also give out bounties, so percentages of the recovery of the SEC if there's actually a case. But you actually qualify for protections under these SEC relatively quickly. You just need to have some reasonable connection, some good faith belief. So just feel like there could be something here. You don't need to prove anything which is sort of gold standard. And there was a great paper out that basically showed that um there was a significant decrease in misconduct. So strong deterrence effect uh and violations of a C law through this primarily via improved internal functions also via the actual SEC becoming active and finding I think they uncovered more than $6.3 billion uh like in fines they handed out since I think like 2010 or something so it's extremely successful program um so there's the intervention effect and importantly also sort of the deterrence effect >> and I think going by all companies will do this voluntarily is not the right path. >> No. So if you're a whistleblower or potential person who's who's looking at some uh eval metric say and thinking okay maybe you have a disagreement with leadership about whether this is important whether this is something the public needs to know about. Where do you begin? Uh do you begin by contacting outside council? Do you find a lawyer that can potentially talk to you about this confident under confidential terms? Yeah. What's where do you start? >> Yeah, a good question. So, it's a very tricky situation. Depends how always depends exactly like on the specific situation of the individual. >> Mhm. >> Um big indicator is probably like are you alone in this? Is there other people who kind of share the same concern maybe? And then how comfortable do you feel about sharing maybe your concern internally? It's probably a bit of a function of that. If you're the only person that's concerned about this, it's probably going to doesn't make it more difficult for the company to figure out maybe that you know you are the one raising the concern if it's a group of people that you know spreads the knowledge and the potential for who took action here maybe across a wide variety of people. But this very much depends on individual situation and the trust. So it's definitely not a black and recommendation to say talk to all of your colleagues about this or address this internally. first of all depends highly on the situation like how much do you feel you can trust uh leadership if you're already in a situation where it's maybe clear that there's definitely something bad going on here or there's maybe an internal eval result that says a but your company needs to relaunch this and therefore they hide it that seems like pretty bad generally what we always recommend is much much earlier than you would possibly think about it get legal counsel speaking directly to other people in the community can be very risky and there's been like cases where then something comes out maybe as part of like a discovery um can be quite risky. Um so find find legal counsel. People often kind of put off by this for like various reasons. One's of course on the cost side, one is on the knowledge side. So will those people actually understand what my concern is? Um, one thing maybe I can share here sort of one offering that we have that would help in this situation is something called third opinion. So this is something where both insiders can approach us directly or their attorneys can approach us. And the concept here is basically that instead of coming out with confidential information directly, maybe an insider is still trying to clarify whether they even have a concern. They can reach out anonymously to us. We have a tour hosted contact form. Uh there's on the FAQ page on the third opinion subpage on our website awe.org. Uh there's there's details also on like what the uh technology like architecture looks like, how we try to make it as safe as possible. Um can reach out anonymously with a question around their concern. So without actually divulging any confidential information, describe the question where the answer to that question would help them understand whether their concern is actually legitimate or not. What we then do is we work kind of collaborate with the insider um via this tool. So basically the insider gets like a code to log back in because we don't collect any data or emails or anything like that and then we try to identify who would be the right independent expert. So for example academia to help provide an opinion on this question. Then we kind of provide this anonymity shield. We go out with the questions if uh to the experts if the experts like confirm terms and conditions run back confidentiality and bring back the answers to the insider and if the insider still concerned after these if they're not concerned great right then everything can can go on if they are still concerned then we connect them to whistleblower support organizations and proono legal council this is another extremely important message I think I'd want to place here with with listeners is that there is great support organizations out there. So for example, the signals network is one of them or psst.org who I'd recommend every listener also of this podcast to take a look at. They've got many many years of experience uh supporting whistleblowers also in tech like Francis Hogan for example a very impactful case on the late 2010s has been supported by them both sort of on the legal side with proono legal advice but also for example with media training in case it does actually move towards a public disclosure. uh strategizing around like the smartest disclosure ways, getting psychological support, um maybe even safe housing if it's in a really really sensitive domain. And we can basically either kind of go through the third opinion flow, put together the expert evaluation, and then also supplement the lawyers down the line with these experts to make sure they actually understand the case and do have the context that they need under um client attorney privilege. then with that attorney as long as that's as far as that's feasible. There's some sort of limitations to what extent you can involve outside parties and experts into this. But basically um that's one angle and the other angle is one can also just reach out directly to us anonymously. If you want consultations on who the best support organization might be, you can also check out our contact hub uh on our website for a kind of profiles of different organizations. Why do they care about AI whistleblowing? um what is their experience roughly in the space to like find the right support organization for you because I think mo most insiders sort of realize way too late that they may already be in a whistleblind situation because technically [clears throat] for example by Kelly law >> the moment you raise a concern internally you are potentially already protected >> and most people think oh you know I'm just raising questions internally or like maybe I'm sending an email to a superior potentially you're already in the space the moment sort of you start exposing yourself to risk, you're already in the space and it may be wise to think about things carefully. >> Yeah. H how do you evaluate whether whistleblowing is the right choice? Of course, that's a broad question, but I could easily see people being both overconfident and underconfident that whistleblowing is the right choice. You might think this is a high stakes situation. Um, maybe you think uh maybe you think your boss is smarter than you or knows more than you and so maybe maybe maybe you think you're wrong about this even though you're right. You can also imagine that you see something and perhaps you don't have the the broader context of why this actually isn't an issue and you know you want to avoid both failure modes. >> How is do you have resources? Do you have this kind of wisdom to of how to think about uh the situation? >> Nice. Uh it's a very good question. It's a very tricky one. Um going case I mean by default the most trivial framework is like what's impact of disclosure versus personal risk you're exposed to right. So for example on personal risk side like to what extent is it feasible for you to stay anonymous um is sort of the big one. Anonymity is just the greatest protection and to what extent would that impact for example the impact of the story. An example here maybe a good one would probably be recent meta releases around their genai policies and their sort of ethical you know uh if you can call it that uh framework right around um explicit conversations with miners this is something for example I don't have the details here I cannot show there um this was a document that was widely circulated and then anonymously shared with news outlets impactful story hopefully not dramatic consequences for the individual that provided it. So this is probably like so generally this this framework is that um if you can stay anonymous that should be sort of is is the best way to go whether that's feasible in the individual situation is of extremely context dependent you know are you part maybe even of the leadership team maybe you're part of the top 10 people within your company and only those top 10 people have access to the knowledge and you're the only one out could be could could be very difficult right um there is a fair consideration around to what extent if there's going to be retaliation down the line do I want to retain my position um to maybe make positive impact on the margin sort of over the coming months years or is this this the one where I go into that risk it's a track consideration I think it's extremely difficult unfortunately to give like a a blanket answer um I'd love to give you one but because you by the way because you mentioned the the wisdom piece this is actually one that we have like um on on a road map also for Q1 to um hopefully publish something there to give find a right find the right like neither to micro nor of promote something there. >> Do you think it's it's realistic to stay employed at a company after you've become a whistleblower? Do you think you could be impactful? Do you think you could do good work and push the company in the in in a in a good direction? Or do you think you would be perhaps you're formally employed still but you maybe you know put in on some team that does nothing or you would just be it it would kind it would be formally formal employment without any actual uh impact. >> Is it possible? Yes. >> Yeah. >> How likely is it challenging? Right. Depends again massively on the case. >> Yeah. Um so if you're for example uncovering financial fraud and the individuals responsible for it are let go hopefully or there's some at least some um some disciplinary action. Um yes that seems feasible. If you're for example sending a direct contract a contrast conflict with a senior executive and that's executive stays on it's probably going to be quite difficult right um this is under the assumption right that there is retaliation and you don't manage to stay anonymous throughout the path by default um unfortunately I think the recommendation is for people to consider or like seriously consider that they're not going to be able to stay anonymous um yeah I think this unfortunately sort of the the reality you have to consider think for example for the SEC program there is an extremely strong track record of maintaining anonymity and confidentiality at least from the SEC side but also there's plenty of examples where um companies even though the SEC didn't leak the name in any way shape or form to the companies the companies still kind of figured it out >> maybe an alternative question to phrase this is can you have a career after your your wizard learn activity and there the answer again is context dependent but points more to yes so for example um previous Google whistleblows at least some of them yes they then they didn't for example left Google but this was around wrongful termination case anyways for internally speaking up uh and they're still in a great employment now >> um we have this on the other hand you have like timid grabu for example it's quite common for people to move sort of more into the research space afterwards or maybe on the advocacy side but for example with the recent case that the right to one one with the open eye employees um we've seen people go to anthropic, right? I think multiple actually by now of them or for example Daniel obviously starting um the air futures project which as far as I'm aware he's also quite happy with so I don't think we consider this a step down although I do not know this is that's just my impression. Um I think what's important here I think for the community and like what we want to also see more of is making sure that support ecosystem is there to catch people also afterwards and like culturally to say hey we want to support these people and celebrating whistleblowers speak up in the public interest to make sure they do find um afterwards. Yeah, because just uh from in a general sense, we probably want more transparency and more insight into what's happening at these companies and we want therefore we we I think on the margin we want to encourage more people um to think about or at least consider whistleblowing. How do you how do you gather the courage to actually do that if you're in a cushy job? if you're going to potentially um you know become a an a person that's known to the public, you're going to lose your job. Maybe >> maybe you maybe your network is in the company. Maybe you feel like you're betraying people on the emotional side of this question. Even if you you're convinced it's the right thing to do. How do you find the courage? >> Again, very very tough question. So for one, again, probably try to reframe it a little bit, right? >> As in we don't want to rely on courage. >> Yeah. um we want to build like the right systems that barriers drop as much as possible. We talked before like on the legal side of things, right? that this is in fact um the woman once something becomes codified into law it does also immediately become more of a standard process and more of a standard thing like actually I can give another example here of somebody who worked um also in like finance on Wall Street before then moved into like a big tech company and say was shocked that because for example SC programs are so strong compliance is just such an integral part of the culture that for example using something like an SC whistleblowing channel would not be seen as completely abhor important while you know maybe in big tech there's not really the culture you know there is just not that strong sort of compliance culture and making sure yes we do actually care about adhering to the law to the same extent and there not yet the case so for one I think the again the legal protection side is going to do a lot and also transforming that culture making people see okay this is what is expected of me this is in fact society approved people want me to do this yeah so I think there there's there's there's a strong angle here um support system e like spot ecosystem strengthening I think is another big on talking about it uh with colleagues. Another thing maybe I didn't mention before I think for like recommendations on insiders is probably start talking about it now already and start like taking notes as an insider. Start thinking about do I see people raising concerns? How is that going for them? Talk to other people well before you know you come into the concrete situation about how have your experiences been? How do you feel at the moment? Like when you raise like descent is it being taken seriously is not being taken seriously. If you're a manager of a team, make sure that happens. As in talk to them about do they feel comfortable raising concerns also just going to make your team happier if you do that as a as a side note, right? Um so I think building like the right systems is the the most important thing like on practical matter. I think the people that do become whistle laws are just primarily driven by they feel this is just really important and like either the public needs to know or this has to be rectified this issue in some way or another. Um and thinking about the consequences may not be as bad as potentially calculated. This is not advice right. I would always think about yes be prepared sort of for the worst but there is support ecosystems out there. Um I can actually share something else here. Uh there's going to be an AI whistleblower defense fund that's going to be launched in uh the coming weeks uh which we're going to promote as well. Um there's already smaller funds by the singles network and PSST but this one is going to be offered by uh legal advocates for safe science and technology uh last they're called uh run by by Tyler Whitmer incredible organization and they're going to be focused specifically on funding defense strategic litigation for AI whistleblowers. So also again I think this is another example of like bringing down the barriers a bit and to make sure people don't only have to rely on on courage >> to make sure to do it. If we see whistleblower policies from all of the um frontier companies, how do we know whether these policies are sincere or whether they're published for for PR reasons? Is there anything we can we can when we evaluate them? How do we know whether they are uh fake so to speak or whether they are they will actually protect whistleblowers in the end? >> Yeah. So probably a few different elements for one on the internal whistleblowing system sort of in general. This is going beyond the policy. What you'd really want to be looking for is to what extent does the company manage their internal whistleblowing process as like an actual business process. >> So you know if you if you really care about any sort of business process like marketing, you measure >> you measure and improve and repeat all the time. That's what you expect to see when you care about a process. Um if we see companies not doing that then that probably tells us to an extent how much they care. At the moment I don't think we have ev evidence of any of the frontier companies like doing this um this is maybe also not call for publisher policies like level one is just the policy but that only tells you as much like level two is really much more like how much do they manage this process ideally also publicly. So do they share for example uh how many requests did you actually get? >> Yeah. >> How did that number develop? what percentage is anonymous as maybe an indicator also for trust. Um how many retaliation claims were there internally? How were those resolved? What actions have they taken? Things like that really matters. So that's sort of level one. Do we see like credible evidence of companies taking this seriously by measuring and by being transparent around this measurement? Um the policy itself sort of has like there's there's probably two three elements here. for one is the degree of effort that has been put in. So if something looks like kind of slept together, that probably tells you again how seriously they take this, [clears throat] right? If it looks like, okay, they've actually thought about this, they've run stakeholder sessions to shape this, that transparenc that that tells you they take it seriously. Of course, you can still kind of fake that, >> right? Um at least on the policy side to like an outsider. On the internal side, you'll probably notice like is does this seem like a PR thing or do they actually care? Does leadership actually promote this on a regular basis? Do they celebrate people internally who speak up? Do they maybe make take a minute or two for their town halls and saying we want to quickly highlight uh person ABC from department XY Z who had raised this thing in the of course with consent uh raised this thing through Intel whistleblowering hotline. We found actually there was a problem and this led to us changing our ways. this is a great thing. Yes. Or maybe even saying this person came up and we didn't agree. We evaluated it and we thought actually no, this was wrong. Um [snorts] and therefore we didn't take those actions and maybe here you go. Here's the floor to the person that reported and you can also make your case again if you want. You know that would be like really great to see. And this from the inside you can only see when it comes to the policies themselves and like the structure. I think the most important thing probably is the governance setup outside of the measuring and monitoring. So what you really care about is having this function be independent, right? You want to make sure it actually there's a person there that you can trust that they're going to act in your interest as a whistleblower and they are actually competent and trustworthy. So this is probably the most important thing like one way you can do that is by good governance setup. So does the whistleblowing function report to the chief executive? Maybe not ideal, right? you probably rather want to look at the audit committee as a classic or the board. Audit committee is pretty pretty classic. Maybe if you have a more uh exotic governance setup like Enthropic, maybe the long-term benefit trust would be a really good host for something like that. Um on the open air side, probably under the nonprofit if it's still going to exist in the future. Let's see, right? Um those are like things you probably want to be looking for. Then you want to look at who are the recipients. Again, come from the independence side, right? Legal like a term like legal team is probably the worst you could possibly go for. Something like that. You want to look at both as signal for do they care and is this a trap, right? Um where the ultimate level of independence would be actual law such that it is fully independent from the company. Um and but you could you could imagine say a a um yes a government service where you can um >> Oh yeah I mean that would be on the external regulator side. Yeah. Yeah. Absolutely. I mean if you want to we can also talk a little bit more about sort of what that would look like or what like ideal policy would look like on the external side. >> Yeah. I think that's an interesting question. So how different would that look than internal policies? If you imagine a a a a service you can you can go to and say I am I'm looking at at this this email and I'm worried and I'm my concerns are perhaps not being taken seriously. >> Yeah. >> Um yeah where would that fit into say US law or EU law? >> Yeah. So um ideally that should be sort of the core >> of um of also like the the external regulator setup. Maybe we can start on the EU side. So we are uh we'd be very excited to see the EUI office establish a centralized whistleblowing reporting channel for people to report for example violations of the EU AI act where that should exactly be the function. So the function should be first of all educating insiders about like their rights and then making it very clear to them what does the process look like? What are the confidentiality provisions that matters a lot obviously like overcommunicating on that side making sure they feel comfortable and then being in touch especially with with the whistleblower where basically the whistleblower can come there's ideally a hotline where they can even understand the process what does it look like when does it maybe start to move outside of my control like at which point in time when I ask them about this is it going to be okay we are going to start enforcement now >> and at to what point can you pull back >> and this is actually no maybe never mind is that even feasible but explaining things like that really matters and then coming back and staying in like constant communication you can do this anonymously as well right um that that really really matters and this is exactly how we want to see it so as an insider you can go for example to the UI office say I think there is something here can you help me understand do you think there is also something here and then working with consent to make sure you know at what point uh do you want to proceed um there is actually in the US directive there is requirements on like member states to at least come back within 7 days with an acknowledgement of receipt and then provide updates the latest every 3 months. In fact, if they don't do that, there's a right to go public and you're still going to be protected from retaliation in Europe. We don't have it in the states unfortunately. So, if I would could put something a few things on my wish list for like EU regulation on AI, US regulation on AI, that would be one thing. Um, on the US side, I think the feedback is not as strong unfortunately. uh there's no commitments, no guarantees that regulars have to come back to the whistleblower, which again matters a lot for the peace of mind of the whistleblower because they risk this because they care a lot otherwise they wouldn't even go risk this. Um and it also prevents maybe then if the regulator thinks actually there's maybe not really much here the whistle for example going to the public and causing for example pain for for no reason although I think overall less less concerned with that like the moment usually I think people are willing to speak up about something and risk their career they probably have really good reasons um really good reasons to do so but always the case um and so this probably relates also to the capability to evaluate cases [clears throat] um as mentioned I think for example in the proposed um federal AI whistleblower protection act which we endorse there's a multitude of channels which can be good but you want to have some probably center of excellence or some way that all of these different recipients understand how to evaluate these reports if you're going to go that route you can go the of European route and say we're going to channel everything in one place which has the like the uh expertise benefit of course as well but then maybe you don't have as many channels you can try out as an insider to see maybe they think this is something maybe they think this is something that's centralized right and sort of on the US side because it's more decentralized and there's a bunch of different laws because it's like fragmented maybe you fall under this maybe you fall under this maybe under this it would be important that there's some strong way that uh recipients of such whistleblowing reports can access knowledge and quickly evaluate whether there's actually a case here or not. >> Yeah. What do you think of alternatives to whistleblowing? And here I'm thinking about evaluations done by external uh organizations and red teaming in in in particular. So what what are the the strengths and and and weaknesses of those alternatives and and what is it that whistleblowing provides in addition to to those uh methods? >> Yeah. So um both of these are very important. Um we do not have mandatory third party testing. >> Mhm. >> In the US obviously I think under the EU it's it's going to come with the with introduction of the AI act. Naturally, that means EVAL providers, for example, are in a bit of a strange position, right? Where of course they have to oversee and but they're still, for example, bound by NDAs and the companies let them see whatever they want them to see. >> This is still a great thing. I mean, I'd much rather live in a world like right now where we do have like a meter or an Apollo that do worked with with the companies, uncover new things, maybe almost like an extended workbench at times, but also do, for example, have sometimes freedoms, this is not from them, by the way, right? This is just my my musings here now, right? Not not quoting anybody here. >> Uh, sometimes have freedoms to publish things as part of system cards, sometimes have freedoms to talk about things they see, but also sometimes don't, right? because they're still bound by NDAs. Um so it's good to have them, fantastic to have them in fact, right? Um would be better if they were actually protected um or had some stronger rights in that sort of world that we're probably in right now whistleblowing is more important because if companies hold back certain knowledge or they maybe prevent evil organizations from saying certain things, then who again who do you go to? Who's the fallback? That's going to be the inside or the whistleblowing side. Mhm. >> Maybe important note here as well. Under European law, um, evaluation providers would be covered under whistleblowing protections if they, for example, spot a violation of the EUI act. Under US law, they're not, at least in the vast majority of cases. Also, um, for example, now in SB53, it's that only extends to employees. any violation in fact based on like state California whistleblower protection only extends to employees. Um I think probably the argument companies may be making here is oh yes we're working with them voluntarily the moment they have whistleblower protections we're just not going to involve them anymore. Um I I I don't really think that holds for like a bunch of presidents again from the past there is value that these organizations provide they will work with them and again you can basically say um which way do you want to go? Do you want to go the way of being responsible and compliant or do you want to go the other way? You there's also another option here which we would really favor if that is really the breaking point where you can still for example provide protections to evaluation companies or like organizations >> to collaborate in investigations and prevent retaliation against collecting investigations without giving them for example the full rights to report on let's say an SP53. That's something that we would have really liked uh really liked to see. >> Yeah. Um that's maybe a bit too in depth now. Um if we move into a future where things like these do become more mandatory, yes, then there hopefully is less of a need for whistleblowing >> because we're going to catch more and more risks earlier in the you know this in the Swiss cheese model. So ideally we'd like to not have any single whistleblower ever, you know, over the next 10 years. We'd want to be everything be caught well before um red teaming whether it was internal or external red team probably falls sort of in the in the same category of concerns. Um not sure if people are aware of the Nathan Lebenz case where he was part of the red teaming. Um he did speak up then about his concerns uh and was excluded from the red teaming. >> Right. >> Right. So just not not a great indicator here either. It seems like a tempting option for for companies because it in some sense it solves their problem, right? This this is this is like when a legal department immediately begins treating a an an employee that's that's thinking about whistleblowing, for example, as as an adversary. So this there's this there's this kind of tension where companies are trying to solve one problem but um yeah they are they might be undermining valuable information by trying to um kind of effectively protect the company. Is that Yeah, I know I've asked this before, but is that a tension you see resolving or is there what what would you say to the companies to convince them that this is uh at least partly in their interest also? >> So I think like the the business case for stronger internal speakup cultures is I think relatively clear at least like empirically speaking there's been like great studies being done on this. Um, if you want to have really productive researchers, they probably want to have a lot of context. They want to feel comfortable voicing their opinions. >> And this is how you get to a strong internal speaker culture, right? Like yes, you can try and brute force it from the top, but for example, having structures in place does really help. It does really help. and uncovering misconduct to make sure for example it doesn't reach a regulator and doesn't lead to a major PR crisis if you want to address it well also really matters there is there's a business case we made here of course we also have to stay realistic um there is just plenty of companies especially in the big tech space that um do not particularly care about necessarily whether they're violating the law and maybe are not ering on the side of responsibility and safety, but uh primarily want to maybe, you know, um boost their share price and they just see anybody sort of speaking up against what executives have decided as at minimum a nuisance and at maximum somebody to be actively fought. >> Um I think there's been plenty of cases where these like acting quite quite terribly. Um I mean maybe the readers can familiar familiarize themselves with the case of Ashley Grovnik, the um Apple whistleblower and like the retaliation she's experienced uh since her whistleblowing. Um I think Meta has been quite aggressive in coming after people speaking up in the public interest like for example trying to block Sarah Williams book from being promoted. Um I think there's plenty of benefits to improving internal speaker culture but once company leadership has maybe set their mind that this is not the path they want to go down to then that's the reality we have to work with. >> Yeah. If we imagine that the world is going to be moving um at a faster and faster pace. So the pace of research, model releases, um everything might be moving quite quickly and it might happen perhaps sooner uh than the mainstream thinks it will. How do you think about whistleblowing in in short timeline scenarios? Is it Yeah. When say you're say you're working in one of these companies uh in on the safety team say if there is such a safety team uh and you begin noticing some problem and then perhaps another problem and then perhaps a bigger problem when should you spend your political capital or your whistleblower capital when how should you decide when to when to uh blow the whistle? Um okay so I'll probably treat this as a scenario yeah not not status quo yeah where we have for example there's been a major breakthrough uh maybe like a month ago that's been announced um something maybe on architecture change uh maybe it's about recursive self-improvement in in some way shape or form maybe we're noticing some like step change happening around you know the meter line keeps going up on like length of task completion and we're noticing actually like a large amount of problems on the AI research sites sort of maybe are not at like 20 hours of work but maybe they're at eight >> or something like that and so we suddenly we can unlock like a bunch of capability gain and we just ride much faster. >> Um that's the sort of scenario or something right that I'm picturing right now. Um generally I think it depends sort of how intense the race is at this point already. So, are we in full-blown race mode already, or is there still like significant doubt and it's mostly your organization that feels like, okay, we've we've we've cracked it now. Um, if you're in like full-on race mode, for example, and like everybody's already all in. Um, so no kind of ma no lab that has any chance of getting there has any problems anymore raising any sort of funding. We're talking like hundreds of billions potentially now that flow into like whatever investors, champions, whatever they've picked. Yeah. >> Um then you probably can I would think at the moment and this is I mentioned something before around like the research piece we want to do. This is upcoming research piece. Um you there might be a reason to be less concerned about the sort of information you're publishing having impacts on like accelerating arms races because you're already there. Well, probably on the way to that, you would want to think twice about like publishing information at least, let's say, to the public or to other races about information that says, "Okay, we're actually accelerating dramatically at the moment." >> Mhm. >> Um, for example, like information on safety risks, for example, or things not going well are probably usually decelerationist. Caveats here. I said this is um there's probably like cases here where this is not the case or like then for example if the leader publishes something on not getting ahead then maybe others think okay now we have to really race to catch up we have a fighting chance here now right so there's yeah >> um but there's probably like something around the content information where your considerations may change there might be something around how much trust should you have in the information being accurate and true where probably today you really care about avoiding boy who cried wolf situations. >> Yeah. >> Well, if you're in full-on race mode, maybe that's less of a concern. Um, if the capabilities kind of grow accordingly, then also the public is going to notice. Regular is going to notice. Everybody's probably going to start getting more and more concerned probably. Um, so and lastly here maybe in combination with this, what's your forecast of how well we're tracking? So do you think we're heading like we actually solved a bunch of alignment problems for example and the leadership of the company that is developing this as well as the oversight of the company on like a government government level is actually trustworthy and you trust them to like make the transition well. >> Mhm. >> Okay. Then there's probably like if this is your expected value roughly then you have like probability mass of your disclosure like what impact is it going to have on what the future looks like. You probably want to be a bit more cautious. If you think we're heading straight for catastrophe, then you may be interested in rolling the dice a bit more on sort of what sort of information you publish, especially if you're like in full on arms race momentum already. Um, again, caveat, this is sort of a current state of thinking rather than what we'd um what we spend a lot more time in like Q1, Q2 next year. >> Yeah. >> Um, another consideration on the individual side is in terms I think you mentioned the the capital, right? Political capital. Mhm. >> In general, [clears throat] I think no major change to like the situation right now. Um that consideration a few ones are probably if you're maybe not extremely senior and you see more of your research being automated, maybe you're a lot more powerful today than you are maybe in in 3 months. [cough] Yeah. [clears throat] Um, if you feel like, okay, we're actually not hitting certain alignment things right now and we're using misaligned models to try and align the models of the future, probably good to speak up now. But that's sort of a core content consideration. >> Um, on the individual support side, you could probably expect support to be even stronger then from the like let's say the the responsible AI or safety community than it is today. So um I mentioned the defense fund for example before but I think if we live in a world where the writing is now on the wall I would expect a lot more funding to also flow into support ecosystem. >> Mhm. >> Um so whether it's defending against legal cases brought forward by a company whether it's getting you really great safe housing these sort of things. Um, another consideration here and this is probably the last one there on the personal side for the insider more is already now of course frontier companies have internal safety team like as internal security teams >> who are only tasked with making sure no no information gets out. >> Mhm. >> That is going to step up again. >> Especially the moment we move into like national security classification sort of direction. If this is a path that we're going to go down, [snorts] then uh yeah, things of course get hairier. >> Yeah. I how how difficult do you would you expect things to get for a potential whistleblower if we're moving into a world in which AI is recognized by governments to be a national security concern and say um we you have some form of semi nationalization of of the frontier companies then it suddenly becomes a very difficult situation to be a whistleblower would be my guess >> what do you think or potentially it makes it easier because the government might also implement a process for whistleblowing. What do you think is most likely? >> Nice. So, al also here we have a research project sort of slowly starting at the moment on especially around classification creep >> and like what sort of areas we're most concerned about if certain kinds of projects maybe would be classified. >> Yeah. >> And what would be implications for whistleblower protections there? Um I said we're only kicking off now. um overall significantly more difficult. >> Okay. So I think especially for example like depends very much on then what is the future state of that administration for example like at the moment we're seeing for example somewhat recently uh to Tulsi Gabbard removed the acting council of the intelligence community inspector general who are meant to receive whistleblowing reports uh on classified information uh which is congressional oversight basically and uh the move as it looks here is that the acting council was replaced with an adviser that reports directly to Gabbert So it's meant to be an independent oversight mechanism but now it's going back into executive branch. Uh this does not look good. I think talking also to like members of the intelligence community there's been a bit of a gutting both in terms of uh actual individuals looking at like internal whistleblowing claims within the intelligence community or the class of information but also yeah in terms of dis dismantling uh independence of oversight mechanisms. If uh something like this were to continue, that would not be good in many different ways already before like raising concerns within like um based on classified information did not go super well and it also kind of only went well if it was in the interest of the program. I mean there's a reason why we got an Edward Snowden disclosure which went to the public. >> Mhm. Um I think it would be um quite concerning if we saw like a massive overclassification on for example front frontier research and uh and deployment and um there's things to be done here hopefully um but to an extent we just have to hope that the people handling uh reports and classif information are going to do it well. >> Is there anything we can do to prepare for that? Is there anything you could put in place such that the information that that uh that's potentially relevant for the public to know is not uh classified. Um is there any like it's this seems a bit Yeah, this is of course difficult to do but is there anything we can do to prepare for such a situation? >> I will tell you after research project. >> Yeah. Yeah, that makes sense. I want to raise a hypothetical here. Um, say that we have AI models that are increasingly capable of automating AI research. Would it be possible to implement whistleblower policies for the AI researchers themselves? perhaps implemented in uh their training or their post-training or their model spec or anything like that that would make it uh make it so that that the AI researchers themselves could inform the public if necessary. >> Sounds interesting. >> Mhm. >> Uh not quite sure on the technical side. Sort of um >> bit of a Okay. If if you trust that the model actually does what it is meant to do. >> Yeah. >> And something that's in the model spec and we have good proof that in fact it is going to do something like that. That seems like a good thing. Maybe it could also doesn't can go to the public, can go to regulator, can alert other relevant recipients. It would be able to take some kind of qualified action. Um >> yeah, I guess one problem with this uh with this suggestion is just that to some extent it assumes that alignment is solved, right? to some extent that assumes that we can we can uh get our values around whistleblowing into these uh models. But it's just it would be an interesting additional layer. You talked about the Swiss cheese model, >> an additional an additional layer of security potentially if we could get it to work. >> Um so I think probably on the technical side, I'm probably not sufficiently qualified to speak on it. Like also to what extent do models actually have introspection capabilities? >> Mhm. Yeah. Right. [clears throat] Um, >> or it wouldn't necessarily have to be introspection. It could you could imagine uh one generation of models working on the next generation and so [clears throat] you you would have the full um code available. Say >> true. Fair point. Um sounds sounds like a nice thing to have. Um probably a bit outside of our scope. >> Yeah. Um, >> I'd probably as per usual say not to re like rely too much on it. >> Yeah. Yeah. >> But that seems like Yeah. Sure. An additional like um I think somebody else written on this before. Um a good additional layer. >> Yeah. Fantastic. K. That's all all of the questions I had for you. Thanks a lot for chatting with me. Thank you very much, guys.

Related conversations

AXRP

3 Jan 2026

David Rein on METR Time Horizons

This conversation examines core safety through David Rein on METR Time Horizons, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -0 · 108 segs

AXRP

7 Aug 2025

Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -5 · 133 segs

AXRP

6 Jul 2025

Samuel Albanie on DeepMind's AGI Safety Approach

This conversation examines core safety through Samuel Albanie on DeepMind's AGI Safety Approach, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -4 · 72 segs

AXRP

1 Dec 2024

Evan Hubinger on Model Organisms of Misalignment

This conversation examines technical alignment through Evan Hubinger on Model Organisms of Misalignment, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med -6 · avg -7 · 120 segs

Counterbalance on this topic

Ranked with the mirror rule in the methodology: picks sit closer to the opposite side of your score on the same axis (lens alignment preferred). Each card plots you and the pick together.

Mirror pick 1

AXRP

3 Jan 2026

David Rein on METR Time Horizons

This conversation examines core safety through David Rein on METR Time Horizons, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Spectrum vs this page

This page -10.64This pick -10.64Δ 0
This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -0 · 108 segs

Mirror pick 2

AXRP

7 Aug 2025

Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Spectrum vs this page

This page -10.64This pick -10.64Δ 0
This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -5 · 133 segs

Mirror pick 3

AXRP

6 Jul 2025

Samuel Albanie on DeepMind's AGI Safety Approach

This conversation examines core safety through Samuel Albanie on DeepMind's AGI Safety Approach, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Spectrum vs this page

This page -10.64This pick -10.64Δ 0
This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -4 · 72 segs