Library / In focus

Back to Library
Future of Life Institute PodcastTechnical alignment and control

Can AI Do Our Alignment Homework? (with Ryan Kidd)

Why this matters

Frontier capability progress is outpacing confidence in control; this episode focuses on methods that can close that reliability gap.

Summary

This conversation examines technical alignment through Can AI Do Our Alignment Homework? (with Ryan Kidd), surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Perspective map

Risk-forwardTechnicalHigh confidenceTranscript-informed

The amber marker shows the most Risk-forward score. The white marker shows the most Opportunity-forward score. The black marker shows the median perspective for this library item. Tap the band, a marker, or the track to open the transcript there.

An explanation of the Perspective Map framework can be found here.

Episode arc by segment

Early → late · height = spectrum position · colour = band

Risk-forwardMixedOpportunity-forward

Each bar is tinted by where its score sits on the same strip as above (amber → cyan midpoint → white). Same lexicon as the headline. Bars are evenly spaced in transcript order (not clock time).

StartEnd

Across 121 full-transcript segments: median -6 · mean -6 · spread -399 (p10–p90 -160) · 7% risk-forward, 93% mixed, 0% opportunity-forward slices.

Slice bands
121 slices · p10–p90 -160

Risk-forward leaning, primarily in the Technical lens. Evidence mode: interview. Confidence: high.

  • - Emphasizes alignment
  • - Emphasizes control
  • - Full transcript scored in 121 sequential slices (median slice -6).

Editor note

A high-leverage addition to the AI Safety Map that clarifies one important safety bottleneck.

ai-safetyalignmentflitechnical-alignmenttechnical

Play on sAIfe Hands

Episode transcript

YouTube captions (auto or uploaded) · video 7pRgV0yFOpw · stored Apr 2, 2026 · 3,065 caption segments

Captions are an imperfect primary: they can mis-hear names and technical terms. Use them alongside the audio and publisher materials when verifying claims.

No editorial assessment file yet. Add content/resources/transcript-assessments/can-ai-do-our-alignment-homework-with-ryan-kidd.json when you have a listen-based summary.

Show full transcript
Welcome to the Future of Life Institute podcast. This episode is a crossost from the Cognitive Revolution podcast featuring Nathan Leens interviewing Ryan Kit. Ryan is the co-executive director of Matts, which is one of the largest AI safety research talent pipelines in the world. Please enjoy. >> Ryan Kidd, co-executive director at Matts. Welcome to the Cognitive Revolution. >> Thanks so much. I'm I'm glad to be here. I'm excited for this conversation. So, I've mentioned a couple times that I've been a personal donor to Matt's. Um, I think it's actually the first time we've ever met and spoken, but your reputation certainly precedes you and I've seen a lot of great work come out of the program and a lot of great reviews and in my, you know, research also as part of the um survival and flourishing fund recommener group. Got a lot of great uh commentary on the importance of maths as a talent pipeline into the AI safety research field. So big supporter of your work from afar and I appreciate the fact that you guys have come on as a sponsor of the podcast recently as well. This conversation not technically a part of that deal but you know we're in one of these open AI style circular flow of funds sorts of things where we're um somehow both inflating one another's revenue but >> I like to think we didn't buy our way onto the podcast. [laughter] >> Yeah. No, the enthusiasm is definitely is real because I' i've heard so many great things over time. So excited to get into this. I thought we would maybe just start with kind of big picture from your perspective and I think you know having watched some of your previous talks I know that you play sort of a portfolio strategy where you're not like you know I have a very specific narrow prediction and I'm trying to you know maximize the value of this organization or this program for that you know very hypersp specific prediction. It seems like you're more kind of saying, well, there's a lot of uncertainty out there in the space and we're going to try to be valuable across kind of a range of those scenarios as much as we can be. With that said, you know, you can kind of speak on behalf of yourself or on behalf of mentors or the community as a whole. Like where are you guys right now? What where are we in terms of timelines so to speak? And how has your strategy evolved over the last year or so as we've gained more information on where we are relative to the singularity? >> Yeah. Okay. So, I don't like to have opinions here or I don't like to have opinions very loudly. And the reason for that, I think, is because as you say, we are somewhat like a hedge fund or something or or or maybe an index fund, right? Uh more more likely, which is to say like we have a broad portfolio. We adopt a bunch of different theories of change is valid and we try and like, you know, have our thumb in a hundred pies. So I would say like in terms of Mass's institutional opinion on this definitely we we we tend to go with like things like metaculus and prediction markets and uh the forecasting research institute FRRI uh their their predictions and so on. So the current metaculus prediction for strong AGI I think it's called which is I think you can ignore most of the requirements of the test and just look at one of them which is the 2-hour adversarial Turing test that's some predicted somewhere around mid 2033. Okay. So I think that is probably the best button we have for when AGI of that nature occurs. Now we recently just had uh two days ago or three days ago perhaps dropped this new AI futures project dropped this new report which two Matts fellows were uh uh one is a lead author one was a contributing author on. So very excited about that and that just was updating their model and I think they predicted something between 2031 or 2030 to 2032 depending upon how you you know like define AGI. They broke it down to all these like automated coders like they can like do all the the coding stuff. They you know the these top expert dominating AI across all these fields and so on. So I think I don't know somewhere around 2033 seems like a decent bet. But also we had you know Nathan Young recently compiled all these different like forecasting platforms. a metaculous manifold, another metaculous pole that was like for weak AGI. It was like a little bit less good at Turing tests and all these other like I don't know some couchy thing whether opening I will achieve it and he he came out with an average of 2030. Now I don't know I still like the metaculus 2033 but like I wouldn't bet against 2030 in terms of nearness of AGI. As for super intelligence, complicated, right? Could be 6 months or less after could be a very hard takeoff after this AGI thing. Uh if it's like a very software only singularity scenario where you don't need a big bunch of hardware scale up, you're aren't limited by compute. It's just recursive self-improvement or something algorithmic improvement. AIS are improving the algorithms and training eyes and it's like, wow, that's a fast feedback loop, right? Or you might need a lot more experimentation, right? uh you might need massive hardware scaleups. Uh you might need like just staggeringly more compute than exists in the world in which case that could take you a decade right to get your your singularity. So I currently think that 2033 is a decent central estimate in terms of the median for what we're preparing for. But obviously 20% chance by 2028. I think that's the metaculous uh uh prediction. That's a lot, right? So, we should definitely be considering scenarios that are sooner, right? And particularly, I think the sooner AGI happens, the more dangerous it might be, right? The less time we have to do critical technical research to prepare, the less time we have to implement policy solutions. And I don't know, like if it's happening during a transition period for US government, could be even wilder. So I would say like median bet on 2033ish but like really care a lot about the impacts of AI like frontload uh your concern to pre2033 scenarios and I think that Matt's mentors I don't we haven't surveyed them but I think if they were if we were to pull them we'd get something similar. Yeah, I always say I'd rather prepare for the shorter timelines and then have a little extra time as a and I'm sure we'll find ways, you know, that we'll still need it, but it does seem wise to me to play to that sort of, you know, first quartile possible, you know, range of outcomes. Is there still any room in the in the program or in the community? Like if I showed up and I was like a 2063 guy, would I be sort of out on an island on my own and you know are there any work streams going on right now in the sort of brain computer interface or sort of deep totalizing obviously interpretability has you know different flavors right but with the recent turn toward more like pragmatic interpretability I wonder if there's any space left for the sort of you know we know we really want to understand everything kind of interpretability or if that has kind of been understood generally as like eh it's probably going to take too long for us to really be excited about pushing it right now. >> Yeah, it's a good question. I actually think there is plenty of room for this and here's why. the mainline kind of meta strategy that the AI safety community seems to be pursuing on the whole we're talking in terms of funding in terms of uh uh sheer like numbers of people and resources deployed not necessarily in terms of like less strong post written or something right but in terms of like resources deployed is this AI control strategy which is where like basically you build uh well perhaps perhaps better called alignment MVP which is a term coined by Yan Leaky former head of super alignment openai now co-lead of alignment science and anthropic and alignment MVP is an AI system that is a minimum viable product for accelerating the pace of alignment research differentially over capabilities research such that we get the right outcome. So basically you're getting AI to do your homework. And there's been a lot of debate on this. Uh there's a very strong camp in in in the direction of like this just never will work because as soon as an AI system is strong enough to be useful, it's dangerous, right? I think you know quad code shows this is not the case for at least for software engineering but perhaps for people who think that aligning AI systems requires like serious research taste you know they they would probably say that this uh quad code is nowhere near there right where general AI systems are nowhere near that level of research taste ability now all of the things that you're mentioning that pay off only in 2063 scenarios presumably they only pay off in that many over that time period not necessarily because of like I don't know human challenge trials or something maybe that maybe that makes a difference if you're, you know, if you're interested in like, I don't know, making humans more intelligent with genetic engineering or some of the crazy things that are being tossed around. But if if you're mainly interested in like, oh, this thing is going to take decades of technical work, maybe you can compress those decades into a really short period of AI labor, right? If you can like get them to run faster, massively parallelize things, and just, you know, in general, just get them to do your homework, those 2063 AI alignment plans might be automatable over a shorter period of time. And so we we should definitely be pursuing those because the more we do to like raise the waterline of understanding on these different scenarios, the easier it will be to hand off to AI assistance or to to accelerate with AI AI input. I do think it's interesting you said BCI research because I I recall being at a conference once when someone was talking about okay so the way we're going to solve alignment is we're going to uh solve human uploading and we're going to like put someone into the computer and like get them to do 100 simulated researcher out 100 simulated researcher years or something and then you maybe like it sounds very sci-fi very pantheon but then LASI put his hand up and he said like I volunteer to be number two which makes sense right you don't want to be the first guy that might go wrong but yes people are seriously pursuing that and I think it is it is interesting. I I I I have talked to some BCI experts about a year ago and they said there's no way that we get BCI in time for AGI. Sorry, it's not BCI, sorry. No way you get human uploading in time for a AGI unless like you actually have AGI, right? The time period required would require massive amounts of cognitive labor and and and human trials and stuff like that. And I don't know, it is it does sound very sci-fi, so I don't think we should rely on something like that. Though I'm all for people pursuing moonshots on the side. That's what part of what mass is about, right? We have this uh massive portfolio with a few moonshots in. >> Okay. So, there's a lot of different directions I want to go from there. Um, and I'm trying to just make sure I keep a running tally, >> but you know, maybe an interesting first one would be how do you think we're doing on the AI safety front overall relative maybe relative to your expectations? I mean it you mentioned like less wrong and and there's this sort of I don't know all the lore of Matts but I do understand that like a lot of people who have participated in it over time come out of the Eleazar discourse and had a certain set of assumptions that were like we're not going to be able to teach this thing our values. that's going to be like, you know, extremely unwieldy from the beginning. And now we have Claude and it's like, man, you know, that's come a lot farther than I thought it would have at this point in time. And I'm kind of surprised in general by like how little I see people's P dooms moving. Seems like the people that had really high ones like remain really high. Those that were never worried remain not so worried. I kind of feel like I'm taking crazy pills at times where I'm like, I don't know. I see these like deceptive behaviors. They kind of freak me out. It's like amazing that that was anticipated as well as it was by the, you know, the safety theorists even in the absence of any actual systems to work with. But then at the same time, you know, it's not crazy to me to say that Claude seems in many ways like probably above average in terms of how ethical it is compared to the average person like that. You know, I don't know if that's contentious to say, but Claude's well, you know, it's it's awfully it's pretty remarkable in that respect. What do you make of where we are? Are you as confused as I am or do you have a sort of a more sort of opinionated sense of how well we're doing overall? >> Honestly, Nathan, I'm pretty confused. Uh [laughter] like I I do think that, you know, contrary to expectations, we are looking like we're language models understand our values, right? That's that's the first thing to update like they understand them in some in some key sense. It's not just regurgitating like sarcastic parrots. language models are really good at like understanding human ethical moors and extrapolating on them in in in in you know in some scenarios. They're also really good at sycopency. They're getting even better at deception, sophisticated deception. They do tend to deceive users in the right circumstances. Uh though it does seem like there's some debate about this and it seems like some of some of the deception it's far from what we might call consequentialist like like hardcore consequentialist deception in most situations but I think like alignment faking and some other papers have have shown that you know there are situ you can like create situations where AIs will deceive the user to achieve some like ulterior objective which was something that was like deliberately given to the AI as an objective. So that's that's one of the constraints of these little scenarios. I don't think there are any examples of AI coherently deceiving users like pursu pursuing this like coherent objective, right? Not just what we might call goodheart deception where they just kind of they like fall into deceptive tendencies because of the limitations of training data. Now I'm talking about this coherent deception. There are a few cases of this if any where this is like the sustained coherent deception that appears like to arise spontaneously through the training process. which is pretty good given the level of capabilities we have. Like it seems like people didn't think 5 or 10 years ago for sure that we'd have AIs that are as capable at as assisting frontier science that are safe to deploy and people are like we're never going to put on the internet. Who would do that? That's crazy. And now they're on the internet and notably the world hasn't ended yet. That's not to say it will stay that way. You know, certainly a thing you don't want to do with a super intelligence is let it out of the box. Um but yeah, it does seem like we're in a better scenario than many imagined. Now there of course like we could be in the cal before the storm, right? Uh it might well be that there's what they call a sharp left turn or just you know a radical change in the way AI uh internally kind of process information uh and and might acquire these kind of coherent longr run objectives. I I could point to like um Matt's mentor Alex Turner's conception of shard theory as an example for how this might happen, right? So like instead of AI systems being this this like you know containing uh uh like a single miso optimizer that is kind of coherently forming under training, right? If you remember the old Evan Hubing group paradigm, your your outer optimizer loop which is training your AI system causes it to develop an internal like optimizer architecture which then can have its own goals that differ quite a lot from the training objective. And then like whenever you uh and like presumably there's some counting arguments such as there are arbitrarily many ways to like have this mis optimizer form to produce the right outputs because this thing is clever and if its main goal is to produce paper clips or some other thing then it's going to like realize it's in a training process and it's going to give you the output you want no matter what its goal is. We still could be in store for that kind of thing but currently it seems like we don't like AI systems are really messy. They're cludgy, right? Like human brains. they have some a bunch of contextually activated heristics, right? It's it's it sees there's a a bracket there and it's like, oh, maybe I'll put another bracket there. You know, it's a very simple dumb circus. But then sometimes, right, it does stuff that is like in context learning seems a lot like uh uh like it's actually pattern matching to gradient descent when models are learning from the input data stream and and learning some like uh new complicated thing. seems a lot like they're kind of optimizing over the input tokens or rather optimizing to to produce some output. So we might be in for a world where AI systems do spontaneously gain these misa optimizers and these things you know these are a serious source of concern because they're you know very powerfully trying to optimize for some objective and and uh this is what you know this is the main concern I have I guess is that we have this kind of deception model this inner alignment failure perhaps where AI systems acquire goals spontaneously or maybe because they're being trained deliberately to be power seeeking and make money on the internet and then like they decide to hide and we don't have interpretability tools good enough to detect them. So I I guess like I haven't really changed my fear about this scenario eventuating, but I have become more confident that we can elicit useful work from AI systems before we see sign obvious signs of this. I'll say and I'm pretty confident that AI systems right now are not executing very powerful scheming against this because I think we would see some sort of like warning shots like I don't think it's going to be as like night and day, you You know, I think we'll see some kind of situations where AI systems are kind of trying to scheme in really dumb ways before they try and scheme in like very competent, difficult ways. Does that make sense? >> Yeah, I think that's a good summary. I mean, eval awareness definitely stands out to me as one thing that is making everything a lot weirder and just harder to feel confident in anything. I'm not sure really what to make of the deception track that you outlined there. I mean, in some ways it's like we're in I've often said it feels like we're sort of in a sweet spot where like they're getting smart enough that they can help with science and yet they're like not good enough at using a web browser to go out and you know get too far in terms of self- sustaining or you know causing whatever havoc. And on the deception side, I'm like, yeah, it seems like it's the gradual rise of it seems like an example of physics being kind to us. It doesn't seem like we're seeing the sharp left turn. It seems like we are seeing these like proto behaviors that are, you know, at least give us something to kind of study if nothing else. But I'm not quite sure how people get confident in the idea that like maybe they're just not that good at it yet, you know, like are how you know when you said like we would see warning shots. Are these not the warning shots? That's one thing I'm I'm still kind of confused on. People seem very quick to me in some cases to be like, well, you know, that was a a structured eval and it was sort of, you know, led into it and it really just wanted to like yes, it refused to be shut down or it took steps to avoid being shut down. But that was just cuz it wanted to accomplish its task, not because it had like a, you know, take over the world uh objective. And I'm kind of like, well, okay, still though, it did uh resist being shut down. Like, at what point should I start to consider that to be a warning shot? I'm not sure there's like an answer to that. I'm not sure there's really a question there or that there's a way for you to answer it, but it I guess basically it's just another layer of my own confusion. I mean, it seems like people are very often just led by like extremely different intuitions in response to the same fact pattern. And I'm not really sure what to make of that, but when people the the deception one in particular, not that you were doing this just now, but I've heard a lot of different, you know, ways that people sort of say, well, you know, we don't have to worry about that too much. And I'm like, I don't know. I'd really like to know that that's resolved at some point. You know, certainly if I was going to go, one of my common refrains is like if I was going to be part of a military that was going to go into battle with my AI systems, I would really want to know that, you know, deceiving the operator issues have been like well and fully ironed out. And I don't know, it seems that we're a little like casual on that. >> Even at anthropic, right? Like they, you know, have certainly done some of the the best work on this stuff, but they still also kind of like seem remarkably chill about it to me. I'm not It's strange. I mean, I don't think people should be going to battle with AIS for many reasons. Like, I think I think that's that's a pretty bad social moore to to to, you know, to set to allow that kind of thing. But that's that's another matter. I I guess like definitely AI systems. I I would not feel confident in current or future gener AI systems like not doing out of distribution failures let alone like like in in critical scenarios let alone like scary things like deception you know where when it really counts and I think that's a big deal and we should be tracking two things right like one AI capabilities so are AI situationally aware like do they have the necessary prerequisites to be able to to to to even understand that they are this AI It seems like they do, right? And even they even know kind of their training date. They know some details. They can detect their text from other AI's texts. So AI is becoming increasingly situationally aware, which is, you know, one of the necessary prerequisites for really dangerous deception. Do they have the capabilities to hack themselves out of the box, right, to steal money? Like we had this Matt's paper that came out and caused a ski stir recently. Um it was this uh uh collaboration with anthropic anthropics AI red team where they found that oh AI AI system let's just like put it in a simulated environment with a bunch of uh uh real smart contracts and it can find $4.6 million worth of exploits. Well that's a lot of money. That's enough to like set up your own server and you know run for quite a while. And that was that was like a a relatively short project. Um and it didn't seem it was it was pretty handsoff as well from the humans but not entirely. So, so it does seem like we're getting dangerous capabilities increasingly so hacking out of the box, getting money, getting influence, all that kind of stuff. And then I think we want to be tracking all of that very closely. And I don't think we're at red lines currently, but we are approaching them, I would say. And separately, I think we should be tracking, as you say, this model organisms work where we try to elicit dangerous behavior from AIS. Uh, you can think of this as like like with your child, right? you leave some cookies out and you're like, "Don't look at don't eat the cookies." And then you like turn away, but you're like secretly looking and if they ate the cookies, I caught you. You know, so that's the kind of thing we're we're doing. The thing is that AI systems can really detect when they're in training versus real environments. But if you recall the AI 2027 scenario and a lot of this um uh uh yeah like discussion around that people were talking about online learning like it's like we we train open router trains the last big AI agent and then from then on it's just like kind of constantly learning online through some sort of RL paradigm. If AI systems are perpetually online, then they're always in deployment and you've got to have monitors and control protocols, right? So that's why control research is so important for that, especially for the early days to catch some of these these slip ups, right? So you can do all the model organisms work you want in the lab and that that's like one layer of defense to see if we have these capabilities or these the penchants for deception emerge. And separately, you need to have all the kind of the control evals studying them as they're deployed, especially if they're going to be learning online and perhaps updating their behaviors and just be constantly checking for this stuff and be ready. Have like a fallback plan, a rapid response plan. What are you going to do if actually you see serious warning signs? Do you shut the models down? Your stock price is going to plummet. What do you do? Do you revert to an older system that's safer? Probably. So, I think yeah, we should definitely be tracking this stuff. And I wouldn't say that we are in the clear by a long shot. I would say that we are in a better world by my estimation than uh uh Bostonramm and Murray predicted you know uh 10 something years ago. But I don't know they would say they would say I'm very wrong about that. But I I don't know. I think I think that it's useful that we can get some work out of these things that looks like it is actually quite likely to accelerate AI safety work. >> Yeah. So that's brings up another I think huge question for AI safety research in general and probably the strongest maybe not in I don't know if you would say strongest in the sense of being most compelling to you but certainly the like most hawkish or fiercest criticism that AI safety research gets is that it always ends up being dual use and that it always ends up somehow accelerating the core capabilities track and you know some people would would say you know just stay away from the domain entirely and you know focus on social shame or whatever. I do believe we can do better than that. I think we probably have to do better than that. But I wonder how you think about that right. I mean the canonical like RLHF was sort of a safety technique that really turned out to be more of a utility driver than anything I would say. I mean I guess they're both right. it is dual use but certainly when it came to like accelerating the field you know making the things useful waking the world up like having all kinds of people pile in all the everything going exponential all at once you know you can kind of trace back to at least in part you know this transition from real raw next token predictors to actual instruction followers and we've got probably like a lot of those things going on today the one that I think stands out to me the most is one you've alluded to a couple times, which is like getting the AI to do the alignment work. You know, that sounds awfully close and uncomfortably close to recursive self-improvement, which is something that I am quite fearful of. You know, I do think like again, Claude seems pretty pretty ethical. You know, the GPTs aren't too bad either, but yikes, you know, are we really ready to like have them do our alignment homework? So how do you think about un teasing out as you kind of prioritize different kinds of research like where you want to invest, what kind of mentors you want to bring on, what kind of talent you want to cultivate through the program. How do you I mean that that seems like a a huge question that is a really hard one. How do you think about it? It's a very good question and I I'll preface by saying that all safety work is capabilities work fundamentally like people like to to distinguish these things in terms of like oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh c capabilities work is about the engine. It's about making the plane go faster and safety work is about the directionality. But as you've pointed out, ROHF, which was intended as safety work to help the directionality, steer it to where you want to go, also made people realize, oh wait, this thing is useful. I can actually hop in this plane now cuz it's going to land where I want, which made them want to make the engine go faster, so they could get there faster, right? And that whole feedback loop started. So I I actually don't know if you can avoid this. It's like the only way I could con conceive of doing safety research. There's no impact on capabilities until like I don't know the final critical moment when you deploy it. Uh is like being holed up in a lab somewhere with people that you utterly trust under crazy NDAs and only and like only you know having access to staggering resources whatever is required because presumably maths and theoretical methods aren't enough to improve safety. At least that that seems to be the lesson of the last 10 to 20 years. I don't know I could be wrong but that it seems like that the interplay between theory and empirical research is pretty vital for most types of disciplines like this. So you have to have staggering resources perfectly loyal team secret like all these NDAs no one's going to reveal your your research uh and then like you do you know you build the system in secret or something somehow and then okay then then you deploy it and then maybe you open source your alignment technology no one has it or or somehow you disable all the bad actors or something. It just seems like a very difficult prospect. I think that >> it sounds like safe super intelligence in a nutshell. >> That's basically the setup that they seems like that's the setup that they've got. >> Extreme secrecy, unlimited resources. They did have one notable defection, but um otherwise, you know, a team that has resisted uh [snorts] lucrative buyout offers. >> Nobody knows. >> I'm not trying to defend I'm not trying to defend research like this or or or even defend, you know, capabilities enhancing safety research per se. I'm just saying that like it's pretty hard to imagine a situation where you because I think you do have to build the AGI at the end of the day and I know I I I'm alienating a lot of people who might watch the show when I say that but I think that you you kind of have to from a pragmatic perspective because the market forces driving this are very strong. Now there are some options that we could take right we could build direct source comprehensive AI services so you never have to have like a centralized agent you have distributed kind of mechanisms right uh you build scientist AI that is or very narrow AI systems to serve a bunch of economic things the problem is I think they all get out competed by a agent agentic AI that like st like you know you stack like an AI company filled with agents and they all like go out in the stock market and make products and you know and so on and just just make more money and just beat your crappy narrow AI solutions. So, so the problem is like it's not just about making AI that is aligned. It's about making AI that is performance competitive enough that it dominates in the marketplace. The only alternative is to like have some sort of draconian like shut it all down kind of thing which I am just very skeptical of ever working. I don't see any example of such a thing happening. The closest example we have is like stopping human cloning, but that was not a lucrative bet like in the same way that AGI is I claim. So it's just Yeah. And also like like human cloning is kind of this it violates this deep social more I think in a way that few people today conceive of powerful AI systems to violate. I think they're wrong. I think building a second species is actually uh going to violate like some deep social more in the same way that human cloning would be. But I don't think people will see it that way. So that leaves us with with the fact that we actually have to build the AGI because but if we can build products that are safer, right, or perhaps are under some strict regulatory control that we have some like really like ideally like I know 10-year international slow phased entry to the new AGI world, right? where all these, you know, countries and and companies are kind of forced to be very careful uh and collaborative in the way that they align their models, then we're in a much better world. That's the world I hope for. Okay. Now, as to whether AI safety research is unnecessarily capabilities enhancing, some is perhaps ROF, I think I'm on the fence 50/50. Definitely at some point ROHF was like the idea was in the was in the water like uh it doesn't seem like it would have lasted much longer if Paul Cristiano and Dary at all hadn't done that. I think someone else would have done it. That's not to say that you shouldn't necessarily try and accelerate the frontier of capabilities. It seems bad on net, but certainly RHF opened up the door to a lot of very promising ways to build alignment MVPs which kind of is the Cristiano meta strategy too. I don't know. It's hard to say. I'd like to run the counterfactual simulation and see where the world would be without RoF one or like one year sooner or two years sooner. That would be interesting to see. Uh it definitely did kickstart I think the uh you know chatgbt revolution and you know productizing AI systems but it's hard to see like given how small the AI safety field was at the time. The AI safety field I think has grown from the the the increase of AI exposure, right? So you would have had some amount of additional AI safety research that happened had chat GPT moment not happened then it happened like one or two years later but I think it would have been kind of insignificant if I'm honest I don't think that the field was big enough now you could say like okay what if you also tried to pour resources into like secret AI safety projects at the same time delay rof delay chatgbt build up the AI safety field uh uh via networks the myri summer schools weren't doing a lot and matts came along uh just before the chat GBT moment, December 2021. And yeah, I think like the first Matt's cohorts were a little bit less like a little bit more directionless than the later cohorts. Definitely like I think safety research really kicked into gear after we had chatbt. Uh not to say that was the only cause, but there were like a lot of things happening around that time. And I think that like definitely larger, more capable models have enabled certain types of essential safety research you could not do with smaller models. We're talking like interpretability on models that actually have, you know, coherent concepts embedded in them. But I will say there's probably plenty of work to still be done on GPT2 small, but linear probes and whatnot at a high level can target some of our frontier models. You know, Quen, these this this these Chinese models are are particularly good for that. uh certain types of debate like we had the first interesting empirical debate paper only after models were good enough to debate and there's many many other such examples like all the control literature I think just could not have happened as well sorry if that's too much no it's great yeah I was going to ask also about the idea that it sounds like you sort of believe it at least up to a point but the you know going back to the sort of founding mythology of anthropic I think one of uh notions that was seen as like a legitimate reason even among pretty hawkish AI safety folks for starting a company like anthropic was well if you want to do this safety research you've got to have frontier models to do it on otherwise you're just inherently behind and then you know what good is that right what good is it to work on like last generation model so obviously we've got a you know quite a few generations between GPD2 and now and sure like we don't understand plenty of things about GPT2. But then I would also say there are a lot of emerging behaviors that are not observed in GPT2 that are definitely of interest including you know many of these deception and eval awareness things that are kind of most hair raising to me today. Where do you come down on that now? Like I wonder if if somebody's like geez should I go to a frontier company because that's where the best models are and that's where you know inherently that means the most consequential work would be done there or you know I could go work independently or at any of a number of other organizations and I might be limited to you know a smaller Quinn model or something but maybe that suffices you know maybe there is enough in those you know those kind of second tier models as we enter 2026 you know that you don't really need to be working with the latest latest again I think I am mostly probably just confused or unsure about this but do you have a take I mean yeah for plenty of interpretability research people aren't people aren't using the frontier models you you don't have access to them you know I mean sure people in the labs are but at mats there's tons of really excellent papers that keep getting produced uh and and from many other sources right altherai uh fari etc they're like doing you know world-class interpretability research on subfrontier models because today's like subfrontier model today's quen or or or deepseek or llama or whatever is um it's like yesterday's frontier model you know in terms of capabilities we're at that point where these models are all above the waterline you know for for doing really excellent research so from an interpretability perspective I don't think you need to be pushing the frontier that much if at all from the perspective of other types of research search agendas such as uh uh like weak to strong generalization and other types of AI control and skilled oversight things. You kind I think you kind of do need more data points. I'm not saying we've exhausted everything you can do with the current models. Far far from it. But I think like you are going to need more data points to build up uh uh you know consistent and to see some of these kind of worrying behaviors emerge where your weaker model can't actually supervise your strong model in all situations which by the way is predicated on this idea that verification is easier than generation P versus NP blah blah blah especially if you can see the other person's thoughts and they can't see yours. So I think we we are we it does make sense to be at the frontier from that perspective. But I will say that the I think the main reason that the companies are doing this uh is obviously to make money. And then you know uh uh in Carl area of that like from a safety perspective if you were trying to gen uh uh actually make a strong case for being at the frontier it would be like so our models are performance competitive and that like they're close enough to the frontier like fast follower kind of model that like you take a performance hit by using ours instead of the competitors but they're safer and currently no one wants to use anything that's worse than the frontier model. Why would you? That's the best model. But if like a model was like I don't know 10% more likely to you know tell you to jump off a bridge or something or or actually seriously 10% more likely to hack your bank account and steal all your money let alone I don't know make a you know you know escape and make a bioweapon uh you would I I would like to think people would use the the less good model and I like to think that regulators and insurers could adequately penalize the frontier companies into complying with those because then you have existence proofs Oh, my product I'm actually trying, right? I made an effort. I tried to make my product not do the heinous thing that the very best model developer is doing. You know, then everyone has no excuse and they have to do that, right? And governments can compel them to and so on. So, I think like, you know, making your model performance competitive enough that people that, you know, people want to pay the alignment tax, so to speak, seems seems like a viable strategy from that perspective. Now, of course, this is none of this is trying to justify the current frontier, like the race of frontier models, which seems very reckless, let's be clear, right? Uh I think at the current pace of of development that we're going to be in a lot of trouble, but this is one of those collective action problems, right? These companies have to coordinate to slow down and there's international things at stake here as well because now you have you do have a US model versus China model developer kind of race now that they're in the running. So it's very complicated and when you have these collective action problems I think the main way you solve them is through governance and sure the lab leads uh could be probably even more collaborative and definitely some of them are not advocating as strongly as they should be for slowing down and for for having this kind of uh uh collective you know kind of sharing in the alignment benefits and and you know not pushing the frontier dangerously but I do think this is ultimately a job for governments. Yeah, quick um this might be a bit of a digression, but quick followup on when you said companies are primarily doing this to make money. I actually would model them fairly differently than that. Take Open AI for example, Sam Alman has said, you know, we could burn $5 billion, $50 billion, $500 billion, I don't care. We're making AGI, it's going to be expensive. I think that's almost a direct quote. And then when you look at their mission, I'm always struck by how the even just the way that they've chosen to define AGI to me strikes me as kind of inherently ideological. Like you could set your goal in any way, shape or form you might want to. And theirs is like explicitly in this frame of out competing human workers. And I think that they are sincere in their expectation that it's going to be good for everyone and it's going to, you know, free people from judgery. And like I certainly hope they're right about that. I don't, you know, I'd love to live in a world where people don't have to do work they don't enjoy doing, but I don't know. I I kind of I think of them more as like obviously they're quite different too across the different companies, but I think of it less as like trying to get rich and more like trying to make a real mark on history is kind of the biggest summary that I would give for a lot of them. Does that resonate with you or or not really? >> Maybe. I I don't want to I can't really speculate on the psychologies of the leaders of these labs, let alone their share like not shareholders so much as uh I guess like you know venture capitalist investors and uh everyone else they've made promises to their clients and so on their their employees. I can't really speculate about that. I will say that like given that the value of AGI is estimated at is it like at least between 1 and 17 quadrillion dollars like I think that seems like that's a lot of money. It's like a pretty big mark on history. I'm not I'm not sure like if it even matters whether they're trying to make a big mark in history or make money, you know, from we can adopt Dennet's intentional stance, right, about the AI companies be like, okay, so uh what does it look like they're doing? Like if we were to conceptualize of them as like a coherent agent trying to do a thing, what is the thing that they would be trying to be doing? And to me, it seems a lot like they're trying to make a bunch of money. But making a mark in history could also be valid. though I would say I guess like in the world where I expect I don't want to use any specific AI lab as an example but I think like in the world where an AI lab is trying very specifically to make their mark in history and not trying to make a bunch of money um I'd expect like it might look it might look identical actually to this world right now >> yeah given the capital requirements I mean that is um >> seems like anthropic has sort of said as much right like in their early days they were less focused on commercialization or even, you know, sort of thought they might try to stay not commercial somehow or less commercial. And now it's just like, well, you can't really do that if you want to compete in this particular game because certainly as long as you believe scaling laws are continue to rule everything around us, then you kind of just have to show that you can bring in resources to attract resources and that is the path to making a mark on history. Again, Ilia maybe stands out as somebody doing something quite different there. >> He's a he's he's a billionaire. He's raising huge amounts of money for his his models. Like maybe maybe uh uh he is going to make more money this way expectation than he would have made staying at OpenAI. It's possible. He's got his own company, you know, now he still retains OpenAI stock, I'm sure. You know, he gets to >> Yeah. And I think that also, by the way, that's not something that I it's funny that he's come up a couple times in my mind just as sort of pattern matching on some of the things you've said, but the idea that somebody's going to go straight to super intelligence and then, you know, drop it on the rest of the world. And I think he's kind of softened on that a little bit. But like that general pattern is is, you know, I think if there's one thing that OpenAI probably did have, right, the iterative deployment idea, you know, and giving people a sense of like where we are and, you know, not keeping the whole thing under a a basket somewhere. I think that was one of the things that, you know, seems like it's aged pretty well in my >> I first heard my earliest uh uh perception of of person advocating for this was Paul Cristiano, you know, his takeoff speeds post pushing back against I I I suppose what he saw is like predominantly the MIR perspective at the time like oh we're going to you know we're going to build the the thing in secret or or or I I I don't I frankly I I don't want to comment I don't know what Mir's objectives were but I I know that they were trying very hard to like not leak any information about their alignment. uh uh research in some areas, other areas they published great papers and so on. But Paul Cristiano at the time uh was like pushing back against like this kind of like he he thought that fast takeoff whoop was would happen if you had a bunch of dry tinder lying around. So if we like uh had like tons and tons and tons of GPUs and then we stopped research for a year and then started again, well you'd expect like a a steeper growth, right? And we're seeing this like in terms of there's like very very fast followers. Uh this is not just like a phenomenon in AI, right? in economies, right? Um it's like Epoch recently did a study where they showed like the pace of like new AI companies approaching the frontier is just so much faster than the pace which the frontier moves because there's an abundance of chips, there's an abundance of data and me methods and and so on. And it's the same with like, you know, catchup economies and so on in the world. Uh so I think that Paul Cristiano was right in the sense that like if society is to cope and adapt to AI then having gradual release and diffusion of technologies is better from that perspective. There's another perspective which says like it's that very gradual release or something that ensures continual VC reinvestment to drive the engine to actually make the progress whereas in the other world actually you just wouldn't build AGI cuz like perhaps in that world like no one can build it without hundreds of several hundred billion dollars maybe you know trillion or something I I don't know I I can't say I certainly think that we're now in the world where it does seem better to have gradual release of models than to have it all kind of hit us at once. >> Well, I always value the opportunity to get perspective from someone like you who is such a connector, you know, and at such a um such a central node with so many mentors and and mentees and you know all the kind of flows of information and talent flows that you are are so close to. But we should probably narrow focus a little bit and talk about what you guys are actually doing at at Matts. Um, and I'll put a maybe a time stamp in the intro so people can also if they want to get right to the the core math stuff, they can zoom ahead and join us. Skipping over some of the higher level stuff. Why don't we do just kind of a quick rundown on like what are the different tracks I think you call them streams of work that are happening in the MATS program today. Maybe like a little waiting or sort of, you know, what you're most excited about. Then we can go into kind of your assessment of the AI safety labor market which I think is really interesting and unique and we'll take it from there. We recently changed up our track descriptions. Uh so we previously had like the standard oversight, control, eval, governance, interpretability, agency, which is sort of a catch-all term for cooperative AI and agent foundations uh and and AI sentience, digital minds research, and and of course security. But we've recently changed that up because we wanted to reflect more uh less the theory of change underpinning those kind of things and more like the type of process and type of individual that works on this, right? So so we now have the tracks on our website. empirical research. So this is AI control, interp skill oversight, eval red teaming, robustness. A lot of this very like hands-on coding heavy iteration focused research, right? We have policy and strategy which is different again, right? That's that's much more focused on uh uh like less on like archive publications potentially more on modeling, more on like adapting technical research into things that are actually actionable policy makers. Theory is another track. So this is a lot of mathematics. It's foundational research on the concepts of agency and how agents interact. Uh it does include some of that agent-based modeling for cooperative AI technical governance which is plenty of stuff like compliance protocols, eval standards, uh how to actually like enforce these kind of things like if you have an off switch, how would you even like make such a thing be be you know viable in a governance framework? uh and then compute infrastructure which is stuff like uh tracking where chips are going, right? Because if you're going to have international compliance uh uh with various types of treaties, you need to know where your chips are and what they're running as well or at least have some zero knowledge proofs that guarantee they're not doing terribly dangerous things. And of course, you know, physical security for if if you're going to prevent if you build super intelligence or AGI or something, presumably you don't want everyone to have access to it and arbitrarily modify it, give it weird goals because that would be bad. Some people say that's good. I say that's we don't let everyone have nukes. You know, why would we let everyone have super intelligence? Seems kind of ridiculous. So, you got to have physical security to prevent that from happening, to prevent diffusion. Um, so yeah, that those are the main tracks now. Uh, we're super excited. I think we have somewhere over like 50 60 research mentors lined up for our summer program, which applications are open right now, and it's going to be the largest program yet, 120 fellows across our Berkeley and London offices. Anything else I should say? >> Yeah, maybe you want to do like the waiting of those like how many I don't know if you would break it down by how many mentors are in each of those uh categories. >> Yeah, current program has something like 27% eval 26% interp 18% oversight control 12% agency 10% governance and about 9% security. I wish I had a figure I could show. I do have a figure but this might be hard to show in the podcast format but as you can see like it's a pretty even mix of things you know we have something like maybe roughly three times as many people doing evals as doing security so there is like some divergence there but we have a pretty broad portfolio and that's just because there's like tons of amazing researchers we really just like pick some of the top researchers in every category >> maybe it's a good time to just name some drop some names >> I could do yeah >> there's a lot and there's a lot of things that people will know. >> Yeah. I mean some of some of the oversight control researchers might be more known because this is one of the things that a lot of the companies big companies are pursuing. So we have people like Buck Schedger at Redwood Research and his whole team are part of that. Ethan Perez uh and and uh Sam Marks and many other people at Anthropic. We have Eric Jenner and David Linder and Victoria Kafa and many at DeepMind. So yeah, just tons of people doing oversight control research for interpretability. Obviously we have Neil Nanda, we have some of the Tamat Tamaus folks like Jesse Hoo and Dan Murph. We have of course the goodfire people like uh Lee Shy longtime mentors some of the people from simplex who are doing some of the interest very interesting stuff like Adam Shay and Paul Rikers and they're they're doing like yeah some of us I would say this some of the more interesting kind of maybe more moonshotty but very promising uh interp research bets that I have on the side as well uh for evals we have people from meter we have people from UKAC we have people from Apollo research Marius Hawan and plenty of others there. Yeah, I could go on. There's there's some amazing researchers there. We also have some harder to categorize research. Joshua Benjio's whole team at Law Zero. Plenty of Yoshua Benjio himself and plenty others are there. And we have some AI censience research as well, digital mind. So Patrick Butwin at LAI, Kylefish at Anthropic. Yeah, it's a very exciting program. >> Yeah, that's quite a who's who. and a couple um past podcast guests and a couple that I took note of as uh maybe need to put out an invitation. It seemed like the balance, if I was kind of categorizing those, right? It seemed like a majority would be in that first empirical category. Um, is that do you think it stays that way or you know your comment on kind of ultimately this is a job for governance tracks some of the honestly kind of the more like me line these days right like I think the mere line today would be like we don't really have time for that much research like we need to just go straight for the global treaty you're not obviously you know quite so confident in that direction but it sounds like you do believe ultimately that you know there is a major role for governments to play and you're starting to move more in this like governance policy direction. Do you see that as like a is that going to be like the biggest growth area reflecting that worldview or how do you expect the balance of these different areas to evolve over time? >> I actually can't necessarily say um well okay I can speculate but I'll say this we have had actually about the same proportion of governance researchers give or take a few percent for the last two years. It hasn't changed by a fraction. So we are quite on track for uh uh you know continuing the same trend potentially. Part of the reason is like we are based in some of the particularly in in Berkeley SF Bay area this is a big technical hub. There are other programs that have have more you know a deep governance focus like we have govi their classic fellowship we have IAPS we have of course randcast this large program run out of rand um and plenty more besides but yeah and of course horizon fellowship for us policy careers. So, so it's it and these have also kind of existed for longer than we were around. So, at a time when we were basically the biggest fish in town, which we are still in many ways in terms of funding and I think in terms of prestige as well for technical safety, we're the biggest uh and best program, but I would say that like for governance, there's always been like a bigger fish and so we've we've never felt that it's necessary to overweight governance beyond what our mentor selection committee indicates. In fact, that's the primary determination of what tracks get selected, right? Is our mentor selection committee, which is somewhere between 20 to 40 top uh researchers, strategists, etc., or leaders that we survey. when everybody applies as a mentor. We decided the mentor level based on the feedback from our mentor selection committee who gets in with some additional caveat that like we also uh uh have some diversity picks uh and and you know minimum requirements because we want to support a great breadth of research and we think that the mentor selection committee on the whole might be biased in some ways as well. So we we try and like really talk to the experts when it comes to picking the agendas. And it so happens that like governance uh uh researchers have historically been like relatively low rated by our committee which contains many governance researchers. I think I would I would go so far as to say that governance research is harder to do well in some critical sense like it's harder to see like what is the actionable thing to do in some ways. Now everyone who has their specific governance agenda I would say like doesn't feel this way for a good reason right they within their within their agenda they have like clear actionable things to work on but I think on the whole there's just like so many more uh uh possible technical directions to pursue that are high leverage in some ways as well I think a lot of the governance stuff is like oh we're trying to build this is this is not talking about advocacy now right this is talking about technical governance we're trying to build technical governance solutions such that if an administration so deems them worth you know uh uh deploying that we have the capacity to do that we actually have the solutions that can be deployed which is very important right but I would say that like don't rule technical research out because it is like especially if we have something like a regulatory market or even even like warning shots to cause the public to wake up and tell Congress to to regulate this stuff right We have to have technical solutions ready to deploy to make these systems safer. And the cheaper we make it, right, to make systems that extra degree safer, right, to lower the alignment tax that companies have to pay to train and deploy their systems to be safer, the more likely they are to do it when they come under pressure either internally, externally, or whatever. So, I think that lowering the alignment tax via technical research is super important still. Also, if this alignment MVP plan is going to work, we have to have a bunch of directions for things to uh uh to be iterated on by these AI assistants or humans calling teams of AI assistants as it's more likely to be. And you actually have this massive interplay between technical research and governance research where things like eval cases built on technical AI safety solutions are things that can actually be tangibly put forward in policy proposals, right? and can can convince policy makers of demos and evals and and model organism honeypot traps, right? Where AI systems deceive the users or whatever. This is what convinces policy makers to make policy and gives them a tangible target for their policy to to to work on, right? So there's a clear flywheel here. So I would say like do not rule out technical research. And there is a reason why Matts has so many more technical mentors and that's just because it seems like on the whole our mentor selection committee thinks that on you know I guess on average a technical portfolio is is worth pursuing. Yeah, that reminds me of what Jake Sullivan said in terms of advice for the AI safety community, which was basically you need to make this stuff as concrete as you possibly can so that people like me have something to really latch on to because as long as it remains sort of a theory or like a possibility or you know whatever, uh it's just really hard to get government to do much stuff you know on that basis. So he was like the the more grounded and concrete all these fears can be made, the more likely you're going to have success in the policy realm. You mentioned advocacy as well briefly there. Um would you [clears throat] ever consider an advocacy track and I guess it might be like advocacy research? You know that I feel like right now we do have groups doing advocacy obviously. I'm not sure how data informed their advocacy, you know, strategies generally are, but I'm always struck by when I do see survey results, it's like yikes. You know, the the public is not like super keen on AI in many ways. Do you think that would ever be something you guys would expand into? >> I mean, you assume we haven't. [laughter] >> Yes, I I haven't seen it, >> but we are a 501c3, so we have to kind of we have to keep our adver advocacy stuff to a minimum. And I think Matt's a lot of Matt's strength is as this kind of impartial player. Like we're trying to be somewhat of a research university tech accelerator kind of vibe. We don't necessarily we don't want to play favorites politically. Like that's not in anyone's interest, right? I think if people are doing that and they're trying to be the thing we are, they're doing a bad job. That said, right, I think like we do we do uh uh currently I believe David Krueger is going to be a mentor in the current program. Um and some of his research is to do that he's he's gonna be discussing is to be is to do with I guess like what sort of messaging and what sort of like standards are actionable right but of course this is I wouldn't say this is like true advocacy this is more mass is supporting independent research working with David Krueger uh you know who has his new or evitable not inevitable evitable which is focused on on some of these advocacy questions. So I think Matt has to be pretty careful you know in terms of you know obviously our 501c3 spending requirements for advocacy we haven't spent anything on advocacy for what it's worth and also like you know ensuring this political neutrality so that our fellows our mentors and all of our strategic partners just can feel assured that we are like you know we're solutionsoriented right we are pushing for a particular outcome right and I think that uh uh AI safety being a political football is just a bad idea and I I applaud advocacy orgs like in code and uh plenty of others like you know perhaps case etc for their efforts to try and you know to to that's not that's not mass as an organization. >> Yeah. Gotcha. Toe in the water at most for now. Let's talk about the profiles. I both watched a a talk of yours and and read a blog post from about 18 months ago where you kind of sketch out the different archetypes of AI researcher that you have seen and then also kind of map that onto the demands of organizations and I don't know how much it's changed in the last 18 months if at all but maybe give us the kind of baseline and then if there's any update I'd love to hear um how things are changing especially you know I have in mind of of course clawed code and it may accelerate certain people it may empower certain people to do things that they couldn't otherwise do but yeah tell us what you know first of all how you organize your thinking about the kinds of people that you're bringing into the program >> so I mean Matt's like I I've talked about a mentor selection committee well we are fundamentally uh I think this massive information processing interface right so we we we consult the very best people as much as we possibly can right and we try to like you know we build our own opinions But we don't rely on them, right? We we try and consult experts every every stage. So the the paper or blog post you're mentioning which is called talent needs of technical safety teams. Uh to construct that we surveyed like 31 different lab leads and hiring managers whoever we could get like the most senior person we could get related to safety at every AIT org we could find that was like hiring at that time. And we asked them what do you need? And we compiled all that that survey uh uh well those interview notes into like three archetypes, right? This was just technical. We've since done this for governance. Expect that to drop soon. So those three archetypes were connectors, iterators, and amplifiers. So we chose the term connector because these people are bridging gaps between theoretical arguments for AI safety and theoretical techniques to makes AI safe and the empirical techniques to actually make it happen. So they're sort of like spawning new empirical paradigms to work on. Okay. No one is hiring these people. It's pretty rare because if you're good at that, then everyone knows your name and you're already hired. Perhaps you're already leading an organization and everyone wants to be an ideas guy, but very few people want to to hire ideas guys. And these people typically, it's people like Buck Schedger, you know, AI control, Paul Cristiano, right? Just huge amount of uh uh uh resources he's produced and so on. Like there's there's you know, you know these people, right? uh and they typically like have AI safety organizations they founded and lead. Then there's iterators, right? Uh and this isn't just this is not just engineering, right? Iterators are active researchers who have strong research tastes who are pushing the frontier, but they typically aren't creating like novel paradigms based on like uh uh theoretical models of things. They're typically advancing empirical AI safety and you can even imagine iterate like like iterators and technical governance agendas as well. So this is the majority of people that that that are are working at Safety today and also the majority of hiring needs in the future. And then there's amplifiers who I think like the closest example is like um like TPM archetypes. I'll say this for iterators like prominent examples include like Ethan Perez, Neil Nanda, Dan Hendris, plenty actually I think Dan Hendrickx maybe crosses some uh crosses some boundaries there but yeah amplifiers uh uh to distinguish them they they like have more focus on on amplifying people and typically you'll find them on large research teams and they're scaling the number of people that can be effectively managed and can contribute to organizations. So a lot of maths research managers would fit this category or TPMs at the various labs. And interestingly, they're actually quite in demand as well, particularly for labs in like the 10 to 30 FTE range. They're the most in demand archetype because it's very hard to hire great people managers who also have the requisite research experience. You're trying to hit two bullse eyes. And there are ways of course like Google has this this sort of model where you have your research managers and your your people managers or project people managers and they're they're somewhat distinct and Matts does try to do this for our mentors and our RMs but yeah that that that I think the need for amplifiers is only going to grow because as you've said things like cloud code and other AI systems are going to erode away the technical minimum technical skills required to contribute and I think also AI agents are going to take more of those things you end up with a situation where your people skills your management ability, your networking, your amplifier skills in general are the more bottlenecking thing on AI safety research. So all those iterators out there, there are job opportunities. You are still the main thing everyone wants to hire. But if you don't try and build up your management capabilities, you don't work at managing AI systems, then you are going to be left in the lurch as the needs of the field shift to amplifiers. So to just try to echo that back to you, the connectors, another name for them might be conceptual visionaries. Like these are the people that define research agendas where they just previously didn't exist like denovo high concept work. They in turn need iterators which sound like essentially machine learning engineers is kind of the core skill set. I >> scientists scientists engineers. Yeah. And they're the ones that are running the experiments day-to-day, building the tooling, writing the code traditionally to like do the visualizations of the of the data and kind of taking this sort of initial conceptual hit that the connector came up with and like really kind of systematically mapping out that space. And then these organizations as they grow then they start to need amplifiers which I maybe would call like leaders you know people that can sort of build up an organization see that people aren't working well together that it can go past the sort of two pizza rule right in in scale is that changing now so you know when we hear things like 90% of code will be written by Claude and that seems like it's you know kind of closer to right than wrong. And certainly like I vibe coded three AI apps for family members for Christmas presents this year, which is something I, you know, would not have come close to being able to build previously even with, you know, just one or two generations of of model [snorts] and go. I do wonder how much the skill set is already changing. Like what are you seeing there? Like what's the what's the up to the minute in terms of how people are thinking about changing hiring needs? I mean up to the minute is that you have to be very proficient at using AI and I think that some of the companies have updated their like their coding interview uh uh processes right to allow for use of AI assistance because on the job like you you have to be using AI all the time right that's just critical to succeeding in this field to be being amplified by AI I would say that goes for every one of these archetypes we've identified um I do think as well that like checking whether AI I output is good or not in critical context is still going to be a very important thing and stitching together different types of AI output and building pipelines to more efficiently process that are also going to be very critical but we might be leaving the leak code era that's I will say this that like amplifiers while not currently the most in demand across all different uh hierarchies of of AI safety organization or team are I think probably in the next year or two going to be the most in demand but that's just that's based on my predictions about AI progress. As you say, it could be slower. You know, there could be jaggedness concerns, you know, that slow down this type of talent transition. But in general, it's never good for your it's never bad for your employee ability to spec out as a manager. Um, managers are very useful and leadership traits in general make you more useful, better employee. It's part of personal growth, I think, to take on some leadership roles. What does supply and demand look like these days in terms of like you know I guess maybe even at the highest level going back to the you know the sort of origin story of Matts my understanding was you said geez this AI safety thing seems like it's going to require a lot of people working on it in a lot of different roles and this is not something that universities teach right so like how do people what's the on-ramp to doing this if they're if somebody would benefit from one like you know where do they No. So, you've essentially created one and of course there's some other others out there too, but you've created one of the the largest and and most highly regarded ones. Are where are we in terms of like are there a lot more jobs out there than Matts can produce fellows? Are there >> Yeah. You know what? How do you think about that? >> I think we've gone back and forth a couple times where at one time it was like, oh, we're super talent constrained and then it was like, well, maybe not so much anymore. Now it's like there aren't actually a lot of roles for people to go into. So I feel like maybe I'm [snorts] wrong on this, but I feel like this has sort of seesawed back and forth and I don't know exactly where we are today. >> So I'll start by saying I didn't found Matts. I didn't co-ound Matts. I was uh in the pilot program as a participant. >> Okay. Yeah. Yeah. Yeah. Yeah. There was like five of us uh who ended up doing the first research program and it was it was a pilot like they didn't have open applications. It was just people nominated from what was the the first AI safety fundamentals course what's now blue dot impact right uh and we did that and I the credit goes to Victor Warlop and Oliver Zang who is COO and co-founder at center for AI safety and I joined the team right after that program and I would say and then Oliver left to co-ound case and uh and Christian uh joined on as well and then Victor left shortly there after and I would say that like I scaled Matts that's my contribution that Yeah. Um and and you know since Christian and I kind of you know refounded it and that we like formed a separate 501c3 a couple years after that because we got too big for our fiscal sponsor. So yeah. So I'll take credit for scaling mats and for being the driving force behind strategy since I guess mid2022 I I believe but okay in regards to talent needs. Yeah. So that's that's a good question. Yeah. Sorry actually tell me the exact wording of the question again. Well, yeah. What's the balance of supply and demand? Because this might not be right. I mean, you can correct me on this too, but I've had this sense at times where people have sort of said there's so much demand for this kind of talent. Like, where is it? You know, we're talent constrained. But then other times, I have heard from people that now people are rushing into the field and there's not actually so many roles available and so people are kind of frustrated. But I don't know where we are right now in that back and forth. >> Yeah. So, okay, I'll say this. AI safety is a field where there are always going to be jobs for the best people. Okay? If you're if you're if you're if you're a cracked coder, you can get a job in AI safety. Like the anthropic alignment science team is growing at 3x per year. Okay? They're trying to scale fast. FAR AI, a nonprofit, 2x per year. Okay? So, mats itself, we've been growing 2x per year over our entire history. So the these these teams are scaling fast. Okay. And many more are getting founded. Open Philill has sorry coefficient giving has huge amounts of of you know grant money to spend on this stuff. Right. There are like a dozen AI safety focused VC firms out there to fund your prof. Like there are incubators like Catalyze Impact, Selden Labs. I I believe Constellation has one now. There's tons of programs like Matts. I think the problem is that the the the level once you have built an organization especially if you're scaling very fast right that has hit hits a certain size the main constraint becomes like is this person good enough to warrant the extra management overhead can they take on some management responsibilities so you have this situation where like at open AI like where people are managing like 10 to 20 individuals maybe up to like I I believe one person anthropic alignment science had 18 reports so they're really flat so you have this real problem where you need to hire people who can quickly ascend the ranks and be research leads be managers and so on even PIs of new teams uh and that is like the limiting constraint okay and that's the reason why a lot of people do some moderate reskilling and then can't get hired because I think the these jobs like there aren't many many opportunities but it's what we find is when we talk to these hiring managers they say We find it extremely hard to hire. We have the money, right? We have the clear need, but people are not at our bar. And that's what that's what Mass is trying to do is to get people up to that bar. And you know that that that's there's some technical skills element to that, right? There's also some like so you know just just like actual research experience. So people who come into MATS with prior research experience do so much better on average than people who have less research experience, right? So I think a strong option for many people should just be stay in academia, get your bachelors, get your PhD. For other people, maybe they should go off and found a company. There's tons of money and directions for AI safety companies. I I think founders are strongly needed in this ecosystem and then you can create opportunities for more people to to get hired. But I'll say as well, another thing NAS is trying to provide is like credibility. uh I wouldn't say like like not formal accredititation but some sense like they have the reference from their mentor who's a senior researcher in the field you have the the the exhaustive matt selection process which is trying very hard to find people who are who are good and and then you have like your proof of actual research impact you produce a paper right that's that's your name perhaps you get it published in a conference or it's an archive and people are talking about it so that's what people need to get employed these days right you need to have an actual like great output some sort of deliverable that you can put shows your name maybe several Right? You have to be like technically good enough at coding or using AI systems whatever is required and you need references from people that are trusted otherwise it's just very hard to get ahead. The same story you see in every you know talent constrained job market is here. How does that translate to experience profile? This is obviously a big question in the broader technology world right like our junior programmers and endangered species. We see like very prominent examples like Neil Nanda and Chris Ola who are broke into the field at a super young age also kind of defined it in a way and are still quite young actually even today and that may lead people to think that like this is like a young person's game but what you're describing is more like >> sounds like more like postphd or sort of you know somebody who's kind of >> been you know grown up in an organization to an extent I'm thinking of um Rajiv from um the AI underwriting company who was at McKenzie for a number of years and now is you know co-ounded this organization but you know comes to it with like a ton of experience and sophistication in terms of management leadership all that kind of stakeholder uh management kind of stuff. What what do you see in terms of like is there a lot of opportunity for people say straight out of college or is that are they kind of barking at the wrong tree if they want to go directly from like undergrad into this space? >> So I mean the median Matt's fellow is 27. Okay, so there's like some like a log normal distribution, long tail. I think the oldest one last coat was like 55, 60. So there aren't people of all ages applying to maths. The youngest person is of course 18 because we can't take minors now. Okay, more statistics, right? 20% of maths fellows are undergrads. They have no bachelor's degree. Okay? Or perhaps perhaps some of them haven't even applied for bachelors, right? They're just cracked engineers. Uh about 15% have PhDs already in the bag. So at least as far as mass is concerned as like this accelerant reskilling retraining internship mentorship program whatever you're getting a broad distribution of people. Now I think that there is obviously huge demand for people with more experience. A second critical thing is that they have uh experience with the latest tools. And because these tools haven't existed very long, young people have a strong chance of, you know, being people who are particularly good at using these tools because they've just been like constantly on the cusp of things. They haven't been sitting in like a cubicle not using cloud code every day. So from to that extent, young people have a huge chance. But it is the case that like you do gain valuable knowledge from working on the job especially in a great team producing great papers that you can't replicate. Um you've pointed to some what I would call prodigies you know Chris Ola Neil Nanda there's plenty of plenty of people of that of that ilk who have come through maths though no one I think actually that's not true there there actually are some people who I I would put in a similar class like Marius Hoban etc. And like in that case, like our main job is just to like just get out get out of their way. And I think that if you if you're that kind of person, don't let anything hold you back. Apply to Matts, apply for grant funding, do whatever, come to the Bay, go to London, and just make it happen. You know, you you will find your path. If if this is like maybe not your path, and if especially if you're like a more a senior researcher or perhaps like like a person who's like, man, I I can't conceive of that. I just want to finish my undergrad degree and do a PhD. that's fine too. People of every single walk have passed through mass, got hired, done other programs, got hired, founded companies, etc. I think like the your the advice it's hard to tailor advice to like you know a myriad of different types of people, but I would just say focus on your technical skills, focus on understanding the frontier of technology. Uh and yeah, don't be don't be limited by the opportunities you see on job boards, right? You can create your own opportunities. You can cold email companies. You can like apply to grant funders with just some random grant proposal you put together because it it fascinates you deeply. You can call up hiring managers and stuff. One of the I mean when you describe the range it is a pretty broad distribution and that tells me that you trust your own ability to discern who's going to be good more than you trust other outside signals. So maybe tell us some like what you are actually using to assess people. Uh and this could be translated into practical advice in terms of like how does somebody make an application stand out but you know what what are you looking for that allows you to take a somebody in their 50s or somebody who's 18 and feel like you know you can read what really matters regardless of what their background is. >> Yeah. So I mean we we do some of the standard stuff that you would see at other tech companies, right? like we have CV review, we have some code signal tests, so brush up on your coding skills and so on. And they they do detect AI use. We are of course considering ways to allow for tests that you know include AI use, but these are obviously harder, right? They're harder to design. They're harder to check and so on. But yeah, that's that's part of our general application. Now, that's for that's for some streams. I'll say this mentor selection sorry scholar selection like we're trying very hard to provide something like a service to mentors. So if a mentor says to us I don't want to do CV review I don't want to do code signal. I just have this selection problem that I want fellows to to to applicants to work on and then like I want you guys to help me evaluate this. Build me like a team of contractors or some automated evaluation process to do first pass screening and then we'll go from there. That's our favorite kind of evaluation in some way because we know that is like as close as possible to the actual job, the actual research as we can get. Of course, this is like typically in in Yolanda's case, it's like go away and do a 10-hour uh uh you know, mechan kind of pseudo work test and then present your results to me. You can use AI, do whatever. Just find something interesting. And this is like this is great because then we get f uh great results. Some other streams, it's harder to do this. it's harder to like administrate and so we do rely on some proxies that are perhaps less specific than ideal but I think are no worse than anyone else in the industry is doing and of course like I think the way you stand out obviously it's going to depend on the specific mentor because Matt is very heterogeneous in that respect right this like the best thing to apply to Neil Nand stream is going to be vastly different than applying to Ethan Perez and that anthropic mega stream but in general you want to like really understand your basics about AI safety. So do a blue dog course, right? Because there may be some critical knowledge or a paper that if you haven't read, you don't understand, if you don't understand what deceptive alignment is, that might be really bad for Ethan Perez or Buck Schedger's kind of control research, but even applying getting to those streams. If you don't if you don't understand that for interp stream, probably doesn't matter as much unless of course you're dealing with deception in your interp thing. So make sure you understand your basics. Make sure that if you're applying to a stream that is empirical heavy that you can do code signal tests. You can code including without AI assistance at least for the time being. And it doesn't hurt to apply to other programs as well. Mass is far from the only program out there now. This is not like early days. Like there are so many great research programs out there. Pivotal, ERA, PIBS, laser labs, SPAR, Arena uh for technical. I think Astra is now running again. Yeah, there's there's tons of great programs out there and that can really booster your CV. If you have experience in the kind of research that you want to do at mass already, then so much the better. Consider it like a posttock opportunity or something or a post research opportunity. Build your own independent projects. Yeah. Sorry if that's like too much advice to be actionable. >> Yeah, I think it boils down to tangible product is is like king, right? I mean, and I I say that always in the AI engineering world as well, if any, you know, and I'm far from the world's leading expert on how to break into that space, but what I always tell people if they ask me is a working demo is kind of the coin of the realm, you know, like it's all people might be interested in what you have to say, but they really want to see that you can make something work. They want to see it online. It it could be a replet or it could even be a collab notebook or something, but like it's got to you've got to make something that can work. And it sounds like this is a pretty similar worldview. Like you've got to show that you can get in there, make something happen, find, as you put it with with Neil's track in particular, like >> find something interesting. If you can do that, you know, we we might have something to talk about. One thing that that jumps out is like maybe not as emphasized as I would have thought is being in command of current research. That's something I think at this point like really nobody can keep up with all the current research cuz you know that exponential has gotten away from all feeble human minds. I would say maybe with a a few uh hyperlexics that can still keep up. But I have found like keeping up with research is a pretty feels important to me. It feels like an important part of like how I stay conversant with people across a lot of different areas. But obviously what I'm doing in trying to be conversant with people across all a lot of different areas is not the same thing as research. How much emphasis do you think mentors in general put on like being you know on top of the literature so to speak? >> It varies. So there are some uh uh some of some of the mentors will ask for questions like I don't know what do you think about X concept others won't be as interested obviously as you say right these costly signals are the most important thing like have you done like good research can you do you have a deliverable like a product do you have a strong reference for an important person that's also key do you have like have you done your homework in terms of the blue dog course and other things right I think that math selection doesn't currently emphasize breadth of knowledge very much part mostly because mentors don't necessarily want that and I think that this is maybe a weakness in our process to some extent if we don't then help people build that breadth but we do we have seminar programs uh we have tons of opportunities for intermingling between different research streams which really rapidly builds a breadth of knowledge and we have like we used to have like discussion groups and and these these still occur occasionally with like workshops and so on so I would say Like I really do encourage everyone to do like a basic blue dot course or or equivalent like AI safety atlas in case of good courses as well. But I think that like this is not as required for selection and is more just like to prevent you from entering mats and then like starting to do a research project and realizing oh crap I have no idea where the gaps are. I don't understand how my work fits into anything. How do I get funding after maths? How do I get a job? Blah blah blah. Like how do I choose a good original research direction? So it's more for like your ability to actually deliver within the program, right? Tracking research and less to do with like your ability to get in at the moment, which is pretty important because maths is just a stepping stone, right? If you do maths and then you don't produce a great deliverable by the end of maths, sure it's a great thing on your resume, but it's not going to be enough in many cases, you know, because it's such a competitive environment. So yeah, I I think that it's really good for people to build a shallow but broad understanding of the literature. So I would recommend like don't be checking X constantly for new papers blah blah blah unless they're in your field. Maybe set up some Google Scholar alerts for interp if that's your thing. But like more every so often do a periodic like deep dive into like what are all the cool updates across different fields. You know what I mean? And you can do this by like every year looking at the new blue dog course or every month looking at some research roundups or highlights like Z is a newsletter or transformer. Uh and there are other people you can follow on X. That's my main recommendation. So um your admissions rate it's super low right I don't want to we want to encourage people to apply but it is a very selective program what is the kind of funnel look like in terms of >> apply I don't know if there's intermediate steps that you know wouldn't make sense to talk about selected and then I think the good news though is if you do get into the program your success rate on the other end of like getting into the field in a professional W2 employee status way is really high. Uh you want to run us through those numbers? >> Yeah. So last program I think we accepted around 5% of people maybe 4% who applied in our initial like intake form. Um there was a subsequent process that they had to complete which is applying to specific mentors and streams and I think we accepted somewhere around 6 and 12 to 7% of those people. So a bit higher that maybe is the figure I'd focus on somewhere around 7% let's say. Now that's better than people think right they hear they thought it was something like a I think the anthropic fellows program for example was like 2% because anthropic big name right the math is larger we have more diversity and more spots and so on and I think that like yeah I think uh people should also just treat the application process as a learning experience in general this is we try and make it like some streams are going to be like more painful than others uh but I think that like for the streams like Neil Nandez where you spend 10 hours working on a project then you have something really cool for your GitHub and that can only help >> and if you don't like doing that you're not going to like working with Neil anyway I would imagine >> yeah definitely definitely the case I I do think it is um it's unfortunate that there aren't like easy ways to do credit assignment cheaply right to find the best people with without them spending a bunch of time but I I don't know like I I know job interviews for for top tech companies like openthropic and GDM just the last like 3 to 6 months or something like you just like you've so many things to do before you get finally the the yes or no so we definitely aren't that involved right it's a much slower process and I think that's because like the commitment is less you know on our end right we're not we're not giving people W2s matz is an independent research program they get grants from a third party we provide the housing the office the mentorship community but we don't um sign people on for like any type of employment uh which I think is part of the appeal as well so I think so that's that's the main statistic there like 7% at the other end about 75% of our accepted fellows go on to our extension phase so our first 3-month program 7% get in 75% go on for another 6 months maybe even 12 months in some cases and that extension phase is where a lot a lot of great follow-up research happens and of the people who've done our program over history we've had 446 fellows in total uh not including people who've done training programs that we've helped facilitate of which there probably another like two 300 of those 446 80% have gone on to get permanent jobs in AI safety based on our latest statistics so that's great I think yeah like I think 98% are employed in some capacity now of those 80% not all are like W2s right some of them are independent researchers with grant funding from coefficient giving or LTF or something which I think is like a fine situation right and then in terms of like the actual field growth there's some statistics I can share So it seems like based on Steven Mikiss's uh less strong investigation that something like the the growth rate of the AI safety field is extra 25% per year which is kind of interesting right uh it does seem to be growing exponentially as far as we can tell now that is uh a lot less than the growth rate of Matt's applications which are going up somewhere between 1.4 4 to 1.7x per year depending upon how you slice it and mentor applications might be increasing around 1.5x per year. So there's a big disconnect and um according to blue dot impact I believe that their growth rate is something like 370% per year in terms of applications to their programs. So there's like yeah some large disconnect. So a lot of people are applying to blue dot that can't go on you know at that rate. That's just that's just way too fast. But probably because they've done amazing advertising and marketing stuff. Matts has only just started to do advertising and marketing. We had the first ever mentor open round of applications uh launched just recently. And yeah, we sponsored Europe's that was cool. And we're we sponsored your podcast and and several other [laughter] uh great great venues as well. And I think that this is only going to cause probably the application trend to to continue. I would say I would guess 1.5x per year something like that which that is a faster growth rate than the current deployed growth rate of the field as to why I could speculate it's probably just caused by like like a high bar a very high bar for a lot of these companies maybe a deficit of founders as well and there are plenty of organizations working to remedy that I know there was this AI assurance technology report from Juniper Ventures about a year ago where they predicted that the size of the market for AI assurance technologies is doubling each year. So there is a lot of opportunity to do stuff that might contribute to AI safety. >> What is the salary distribution look like for the people that are getting jobs? Like how much of an alignment tax uh if any do people pay in the salary department? >> I mean at the Frontier Labs no tax at all. They're getting paid the same rates um if you're waiting. Yeah, they're getting staggering amounts of money. I think like um a couple years ago the the going rate for like someone joining off the street was like 370k or something. I'm sure it's much higher now given all the especially if the the crazy meta stuff that happened. I mean like I would bet like mid mid level and higher people are making over a million on these these even on the safety teams these labs but I don't have any private data on that. But yeah, I mean don't like if you if you join as like a junior software engineer, don't be surprised if you get somewhere around 350K or something. At nonprofits, obviously it's lower, right? They can't compete with equity. They don't have any equity, but they also typically have a lot less funding. You know, coefficient giving pockets aren't as deep as the, you know, the collective might of US venture capitalism. And I think there's also something like there is something like a nonprofit tax. I wouldn't say there's a safety tax. I'll say there's a nonprofit tax, right? Because there are nonprofits that are doing like AI capabilities stuff like Allen Institute for AI people make a lot of money still right there's like this is this is artificial intelligence right and and coicient giving understands and other funders understand that you know you have to pay to play so you have organizations like meter that where they are I I believe offering quite a lot of money for their roles upwards of 300k for for most roles like yeah probably over a million for some I would I would dare today. So, there are nonprofits that are really really trying to compete for for talent. They can't offer anything like the frontier lab salaries, including equity, but they're trying their best. And I think this is kind of reasonable, but it also is it is a bit it is a bit of an an insane moment. Matt's salaries are not anywhere near that high. Maybe we're doing the wrong thing. I don't know. Yeah. And I think there are other AI safety nonprofits that have tried very different strategies like you have um I think the the going rate for FAR AIS research scientists are something like 170K. So like significantly lower. They might have actually improved that recently. So there is a wide spectrum here and it really depends on the compensation policy of the organization. But you will see you know like very wellunded nonprofits offering comparable salaries to at least some junior AI company roles. What about in terms of compute? I know you guys have a in addition to the stipend that folks get as fellows, there's also a compute grant and I believe it's $12,000 worth of compute. I'm interested in what form that takes. Like is it is it just a Brex card that you can go spend on compute wherever you want or do you guys have like established compute partners that you work with that you know serve your fellows well? How often is that enough? Are there times when people find that they need more compute to do what they want to do? And and then if they go work at a nonprofit, like how compute rich or poor are the nonprofits? Yeah. So, I mean, Matt's Matt's offers 12K I'll say budget. I won't say like we don't give people a card that says here's 12K. No. Uh they have to justify their compute spending. You know, they have to like have an actual project and and proposal that they are going to spend that on that necessitates that. Most people don't spend anywhere near that much. Okay, which is good because that's part we haven't you know we budgeted uh as if they could but we we really don't want to spend you just waste money on compute. Basically no one has compute limitations. Sometimes people rarely have needed more more than that. We've like considered their proposal and thought hard and reallocated funds as needed. And yeah I think like people aren't limited by compute at mass in general. I think that the way we do it is pretty good and that we have a bunch of um like you know kind of like we have our our for for our model API calls and all that like we have uh specific organization accounts that we sign people up you know and then we kind of give them a budget and so on and top them up as necessary. We do have our own mass cluster as well that we maintain online. So our comput team handles that typically people tend to use runpod and other types of like self-service things. It depends it depends on the kind of research like there's some there's some types of research that works well on our cluster and there's some types of research where even with the benefit our compute team can provide in setting setting up the cluster and m maintaining it. It like the experiments are so like customized and so they need to like tinker so many things that we just give them a budget and let them use online providers. It's just better that way. Uh we did we did have used other providers in the past but that's that's our current setup and we're looking at putting together like a cloud like kind of customized cloud code kind of suite as well. Hm. Meaning like building out a bunch of tools or MCP type. >> Yeah. >> I don't think not not MCP. If this Well, >> that's kind of distinction. >> Yeah. Yeah, you're right. Yeah, there could be actually a there's a lot of data, you know, in that databases. We could probably put together something really useful at some point, but we haven't we haven't thought about that. >> Yeah, just the archive of everything that's been tried would be um pretty fascinating to do some agentic search through. I mean our new research database is online at mattsprogram.org/ressearch. You can see everything there. We have a Google scholar but that's just all the papers that got published. You also see our less strong blog post under the Matts program tag. But um many more research artifacts have been produced that are visible there. >> You mentioned you said the word tinker. Is the um thinking machines API growing in prominence in terms of what people are finding attractive to use? >> Yeah, many people are wanting to use the thinking machines API. So, we've uh we've put together some I think we uh Yeah, I Yeah, many many I'll say many uh organizations like to donate API credits uh which is awesome because we can really use that. >> Yeah. Cool. But is there anything else that we should cover in terms of like >> January 18th? We know that is important as a uh date to keep in mind. What other um facts should we make sure that we touch on? >> NAS is growing. We're we're always hiring. If you want to if you want to work on our team and help grow the next generation of people, if you fancy yourself an amplifier of sorts, you have people skills and research skills, we'd love to hear from you. Go to our website, massprogram.org/careers. We're taking on mentors as well. There's an application form on our mentor section of our website. And participants, we're going to run three programs next year. This year, rather, not one, but uh sorry, not two, but three. That's going to be a summer program, a fall program, and then a winter program starting into next year. And I'm super excited. We also have plenty of other offerings in the work. We're considering a 1 to twoyear residency program for senior researchers as well. And yeah, more on that to come. Yeah. Cool. Um, are you taking any connectors? Is there if if I am a connector type or want to become one is Matt a way to find my way there or not really >> many have I would call Jesse Hoolland uh one such person I would call Paul Rikers so Tmus and Simplex I would I I'd say Marius Halpan as well to some extent with his deception eval work yeah like and I I don't probably dozens of people I'm just like sharing some of the more the names that come more easily to mind. Just many many people have come through mass this what we're super open to individuals who have this kind of archetype and note a connector right they have empirical skills they have theoretical skills so they could probably succeed in a bunch of different ways right but they're uniquely speced out to connect those two things now there are some mentors and projects that are much more suited to this kind of thing than others people like Richard no historically Evan Hubinger I think I think actually Evan Hubinger has been like probably the most dominant connector driving force at Matt's over over our time, but he's not a mentor in the next program. Unfortunately, he doesn't have time. But yeah, there's there's many different uh uh opportunities that master this kind of thing. I think even in some of the interp streams as well, it's very possible to enter an interpretability stream and bring it with it like some some like model of the kind of theory based interpretability mechanism or strategy that you want to pursue and then see that executed on. That's happened several times. One of the things that I took note of in the blog post from 18 months or so ago was you had made a comment that funders basically don't want to or they're much more inclined to support the growth of organizations that they sort of see as legible that have like research directions that they sort of feel are somewhat established or that they can wrap their heads around. and they're much more reluctant to fund like totally new conceptual directions. And that seems like a it exists in contrast with like the AE studio survey where they basically found that the field as a whole seems to think that like we don't have all the ideas that we need and you know that like more kind of far out ideas should be tried which of course led to their neglected approaches approach. What do you make of that? Is there stuff that we can do or or is it, you know, is there is it a different organization's job to figure out how to fill that gap? Because I do feel like I want some more. And I I love some of the the e studio stuff, including like self other overlap. I always come back to that as an example of something that's just like quite off the map of what most people are doing. I think of, you know, when I think of AI control and like what Buck and the Redwood research team are doing, I find that stuff fascinating. And one of the things that kind of impresses me most is that they are willing to work on something that in some ways is so depressing. You know, they're like, we're going to try to figure out how to work with AI even assuming they're out to get us. And I'm like, yikes. I don't know that I would be able to sustain the positive attitude, you know, enough to do that if I was working from that premise. I I do feel like there's a relative der of things that are more sort of inspiring here. I think maybe of one a studio but also like softmax. Obviously people have a lot of different opinions on like are these things ever going to work or not >> but I wonder what your take is on just kind of the overall mix. It seems like a lot of things are kind of more toward patch the holes, keep the AI down, tempt it, you know, see if it'll take the temptation and then patch it, you know, if it takes the temptation. And there's not nearly as much that is sort of a a more kind of colorful positive vision for the future. And I wish there was, but maybe that's just not happening because the ideas are just too hard to come by. Maybe it's not happening because the funders aren't bold enough. What's your take on how we can get if we should be trying to get more of that stuff? And and if you think we should, like how might we go about it? >> I have many takes here. So obviously I advocate a portfolio uh and Matts Matts has historically sponsored a bunch of project other overlap that project came out of a Matts alum. I'll just say Mark Carino might have messed up his name was like the originator of that project at a studios and I believe Cameron Berg is running their another maths 1.0 alum with me uh is running their uh some of their their more neuroscience inspired approaches as well. So Audos is great. I love what they're doing. I think that the survey they did of West Leong is just like probably not representative of the AI safety research field as on the whole but then again it might be even so I think we obviously need more ideas because more ideas are good right more bets are good more shots on goal are good now I would not advocate a person who is like a very strong iterator to drop that and be try and become think of some new paradigm that is I I think that would be strictly counterproductive on the margin because I think we do have very some very strong central research bets that we need more people pursuing because they will yield demonstrable results. But if everyone did that, this would be bad because you need to have your portfolio. Maybe these approaches fail. Maybe they need other pieces to work. Many AI safety research agendas are kind of contingent on other things going right or other people working on other stuff. It's like any kind of uh uh research field. You need to have everyone advancing the frontier. So I think safety has historically gone really kind of argaxi on different agendas which is bad. Portfolio approach is much better. Don't rule things out as possible directions. Just shift and and reallocate resources to them. To their credit, Coefficient Giving have done an amazing job uh uh particularly recently at supporting a bunch of different novel research bets and they've also funded PIBS or principles of intelligence, a program that is trying very hard to pursue sort of moonshotty interpretability and and like agency understanding projects. So they're great. Check them out. I think that more ideas would be good. I think that the kind of person who should be pursuing that typically is going to look something like um someone who is already a domain expert in some other area. You are occasionally going to have your buckledes, your Evan Hubingers, right, who come along with no PhD but spent years at MIRI, you know, incubating in that kind of that deep AI safety uh uh that rich AI safety experience and then come out with amazing stuff like uh risk from learned optimization and AI control and all that stuff. But short of you know having access to that type of community and that type of research experience I think most of the prominent connectors like your Alex Turners and and so on have come have spent a lot of time in research science PhDs also on course [laughter] uh and like you know incubating in that that environment as well. So I think maths is a great way for that kind of person to develop uh and and to spawn more research ideas. In fact, I've seen just shout out Alex Turner. He has come up with some amazing uh research ideas over his time at Matts and I think we've been very fortunate to support him. Uh things like gradient routing and also Alex Cloud, another mass mentor and and just just plenty of other things like activation engineering and steering. Uh he was he was one of the people involved in that. So I think that like senior experienced researchers are going to be probably like most things the main drivers of new ideas and grant funding that lets them pursue whatever their research taste dictates is great and programs like mats that let let them stalk you know their research agendas are also great. Um I also think bounty programs could work as well but I would a hazard against people putting all their eggs in the basket of we need to have a bunch of ideas because new ideas because the central ideas aren't working. I don't think that's true. I think the central ideas are still our actual best bets. >> Yeah. Okay. Makes sense. Do you want to shout out any other either just organizations that, you know, Matt's fellows have gone to or or even started that you think are kind of underappreciated? This could be sort of assignment editing for me for future episodes. Uh but also just, you know, things that you think people should be paying more attention to than they are. >> Yeah, I mean there's tons like you can see all the uh organizations listed on our website. Uh there's so many amazing people there. Like I guess I could shout out I mean it's it's hard to play favorites because Matt is like we've worked with so many people and we're trying to be very broad but yeah I guess like in terms of nonprofits specifically because maybe they don't get as much attention obviously Redwood and Meter and Randcast Polar Research FAR Goodfire Truthfully Law Zero MIRI um plenty others. I I love these organizations. We need more frankly nonprofit research organizations and if you think you could found one give it a shot though obviously like you need to have a ton of research experience under your belt and very credible references and so on. [sighs] >> Yeah. I don't know. It's it's really hard to play favorites. [laughter] >> Yeah. So many great spoiled for uh good options for organizations to shout out. So it's a testament to how many fellows have already gone on to do impressive work. So great job by you guys in uh driving this and growing it over the last few years and people should definitely apply if they want to be a fellow. January 18th is the deadline. It's time to uh get into it if you want to make sure your application stands out. Anything else we should touch on before we break? >> No, I just really appreciate this experience. Thank you so much for inviting me to to talk. >> My pleasure. Thank you for doing it and uh keep up the great work. Ryan Kidd, co-executive director at Matts. Thank you for being part of the cognitive revolution.

Related conversations

AXRP

28 Mar 2025

Jason Gross on Compact Proofs and Interpretability

This conversation examines technical alignment through Jason Gross on Compact Proofs and Interpretability, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -1 · 139 segs

AXRP

1 Mar 2025

David Duvenaud on Sabotage Evaluations and the Post-AGI Future

This conversation examines technical alignment through David Duvenaud on Sabotage Evaluations and the Post-AGI Future, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med -9 · avg -7 · 21 segs

AXRP

1 Dec 2024

Evan Hubinger on Model Organisms of Misalignment

This conversation examines technical alignment through Evan Hubinger on Model Organisms of Misalignment, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med -6 · avg -7 · 120 segs

AXRP

27 Jul 2023

Superalignment with Jan Leike

This conversation examines technical alignment through Superalignment with Jan Leike, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med -10 · avg -7 · 112 segs

Counterbalance on this topic

Ranked with the mirror rule in the methodology: picks sit closer to the opposite side of your score on the same axis (lens alignment preferred). Each card plots you and the pick together.

Mirror pick 1

AXRP

3 Jan 2026

David Rein on METR Time Horizons

This conversation examines core safety through David Rein on METR Time Horizons, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Spectrum vs this page

This page -18.16This pick -10.64Δ +7.52
This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -0 · 108 segs

Mirror pick 2

AXRP

7 Aug 2025

Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Spectrum vs this page

This page -18.16This pick -10.64Δ +7.52
This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -5 · 133 segs

Mirror pick 3

AXRP

6 Jul 2025

Samuel Albanie on DeepMind's AGI Safety Approach

This conversation examines core safety through Samuel Albanie on DeepMind's AGI Safety Approach, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Spectrum vs this page

This page -18.16This pick -10.64Δ +7.52
This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -4 · 72 segs