Andreas Mogensen on what we owe ‘philosophical Vulcans’ and unconscious AIs-kmnewsmedia

webadmin

2025-12-20

Andreas Mogensen on what we owe ‘philosophical Vulcans’ and unconscious AIs

Transcript

Cold open [00:00:00]

Zershaaneh Qureshi: When you’re met with a situation where you have to choose between one horrible conclusion and another horrible conclusion, is that a sign that we’re just totally on the wrong track?

Andreas Mogensen: I’m inclined to think it’s not an indication that we’re on the wrong track. It’s an indication that we’re doing philosophy. I think the core of philosophy consists of puzzles and problems that arise when a number of things that all individually seem extremely plausible turn out to yield absurd results.

So there’s this quote from Bertrand Russell that I think I’m probably going to butcher, but it’s like, “The job of philosophy is to start with something so obvious it doesn’t need saying, and to end up with something so incredible that no one could believe it.” And these deep conflicts amongst principles that otherwise strike us as compelling, that’s a sign you’re doing philosophy.

Zershaaneh Qureshi: So we’re doing something right, maybe?

Andreas Mogensen: Possibly, yes.

Introducing Zershaaneh [00:00:55]

Rob Wiblin: Hey listeners, Rob here. Today I have the pleasure of introducing you to a new interviewer on the show: Zershaaneh Qureshi. Zershaaneh has been a researcher/writer at 80,000 Hours for a while now, recently writing articles on the challenge of using AI to quickly enhance societal-level decision making, and reasons AGI might still be decades away.

Back in the day, she studied mathematics and philosophy at Oxford University — which is just as well, because for her podcasting debut, she has opted to tackle some pretty challenging philosophy.

In today’s conversation, she speaks with Andreas Mogensen, a senior researcher at Oxford University focused on moral and political philosophy.

We’ve had a lot of episodes over the years covering the possibility that AIs could in the future become conscious and deserve moral consideration because there’s something that it’s like to be them. Among others, we’ve had Kyle Fish, the first-ever model welfare officer at Anthropic, on the show a couple of months ago. And there’s a detailed treatment back in 2023, in episode #146: Robert Long on why large language models like GPT (probably) aren’t conscious.

But, you know, everyone’s kind of heard that idea now. People might disagree about whether AIs are going to have subjective feelings in practice — but sure, if AIs can suffer, it would be better if we didn’t make them suffer. That makes sense to most people, I think.

So what is a philosopher to do — or a podcaster for that matter — if we want to say something new and important about whether AIs might deserve moral consideration in future? Well, Andreas has a bunch of new and serious arguments that AIs could start to deserve moral consideration, for their own sake — even if they’re not conscious and there’s nothing it’s like to be them. These are arguments that people should seriously care about whether you are free and whether you get what you want — even if you were to experience nothing at all, or if you did experience something, it was always neither positive nor negative.

It’s fair to say that that would be big, and indeed highly inconvenient, if true. And that is why Andreas is researching it. If we’re all completely fixated on subjective experience, but that’s not the only way that beings could matter, then we’re vulnerable to a catastrophic moral error.

In the final third of Zershaaneh’s conversation with Andreas, they also turned to new philosophical arguments regarding whether we should weigh suffering much more highly than wellbeing, and whether human extinction might actually be a good rather than a bad thing.

I know the more complex cutting-edge philosophy episodes like this are often subscriber favourites. And if you do enjoy it, you might also like my interview with Andreas, episode #137 from back in 2022, about whether effective altruism is attractive or not to non-consequentialists such as Andreas himself.

Without further ado, I bring you Zershaaneh Qureshi and Andreas Mogensen.

The puzzle of moral patienthood [00:03:20]

Zershaaneh Qureshi: Andreas, it’s great to have you back on the show.

Andreas Mogensen: Yeah, thanks so much, Zershaaneh. Thanks for having me back.

Zershaaneh Qureshi: You’ve been doing a lot of research into the relationships between being conscious, having a capacity for welfare, and having what we call “moral standing” or “moral status.” Before we actually get into these ideas, can you say what motivated this research, and what’s actually at stake with these questions?

Andreas Mogensen: Yeah. I think in some sense, questions about moral standing or moral status — what it takes to matter morally for your own sake — these are the questions that made me a moral philosopher. When I was in my first year as an undergraduate, I read, slightly by accident, a paper by Peter Singer called “All animals are equal,” where he argues very roughly that the interests of human beings don’t actually matter more than the interests of nonhuman animals.

I find this both a persuasive argument, and I found it remarkable that such an argument could exist — that there was a philosophical argument that was rigorously argued, but had these radical practical implications. So all that got me really interested in thinking about moral status, moral standing, that sort of thing.

But the kind of research I’ve been doing recently hasn’t really been focused on nonhuman animals; it’s been more about AI systems. I guess there’s this thought that, given the extraordinary pace of AI progress, we might relatively soon be creating AI systems that might have morally significant mental states of one kind or another.

And if we do, they might have quite weird minds; they might occupy strange, unfamiliar regions of the space of possible minds. Maybe they have conscious experiences, but they never feel good or bad: they might be literally emotionless robots. Or they might have very sophisticated cognitive abilities, like very sophisticated forms of agency, but they’ve got no conscious experience at all. And you might think we’re already there.

So it’s very hard to know exactly what we should make of minds with these quite unfamiliar profiles. And at the same time, the stakes might be really high, because software is easy to copy at scale, so there might be loads and loads of these. So figuring out what kind of moral standing, moral status they might have could be very important.

Zershaaneh Qureshi: Yeah. Basically we want to avoid committing some huge moral atrocity by failing to really properly consider whether these AIs have interests worth respecting.

Andreas Mogensen: Yes.

Is subjective experience necessary? [00:05:52]

Zershaaneh Qureshi: OK, I want to jump into something that I find especially puzzling in your work. Intuitively, I think I should only really care about beings who have their own point of view — where there’s like something it’s like to be that being. I think that lots of people share this intuition, and that’s why so many of our debates about whether AIs or nonhuman animals deserve moral consideration tend to be about the question of, do they have what people call “phenomenal consciousness” — a subjective experience of the world.

But you think that this kind of focus might be a mistake. Can you very briefly outline your main argument for thinking that? And then we’ll dig in a bit more.

Andreas Mogensen: Yeah, I think it’s an argument that I don’t quite buy in the somewhat bare-bones form I’m going to give it, but it’s the argument that got me interested in this topic, and I think there’s a version of this argument that I find kind of plausible.

So we start from the idea that anything which can be harmed or benefited is worthy of some degree of moral concern. The next question then is: what does it take to be harmed or benefited? There’s not much agreement about that among philosophers, but one very influential theory of what it is to be harmed or benefited is a desire fulfilment or preference satisfaction theory.

Roughly, this theory says your life goes better as opposed to going worse, insofar as your desires or preferences are satisfied as opposed to frustrated — so insofar as things are the way you want them to be, as opposed to being a different way altogether. So maybe I don’t want people to say horrible things about me behind my back, and if so, my life goes better overall insofar as things like that are not being said about me behind my back.

Now, if you put those ideas together, it seems like the main question now is: can something have desires or preferences, even if it lacks phenomenal consciousness? Because if it has desires or preferences, it can be benefited, and anything that can be benefited is a being worthy of moral concern.

And I think that question is sort of an open question, but it does at least seem kind of plausible that there are desires that don’t enter into our conscious experience, at least not at every moment that we have them. So I want my wife to be happy, but that’s not a desire that figures constantly in my conscious experience throughout all the time that I have it; it’s a desire that I have even when it doesn’t enter into my subjective experience or something like that.

If it’s an open question whether desires can be had by beings that lack consciousness altogether, and anything which has desires can benefited or harmed, and anything which can benefited or harmed is worthy of some degree of moral concern — then we end up with a picture where it’s at the very least an open question, and there’s some reason to think that there might be beings of a kind that matter morally for their own sake, but that don’t have subjective experience.

Zershaaneh Qureshi: Hey listeners, Zershaaneh here. Let me just recap what Andreas is arguing here so everyone stays with us.

We started with the question: what does it take for a being to warrant moral consideration for its own sake? The most common answer people give here is that you need phenomenal consciousness: there has to be something that it’s like to be that being.

But here, Andreas is questioning whether that’s actually necessary. His argument runs through the concept of welfare: if something can be harmed or benefited, it seems to deserve some moral concern from us. And one of the most influential theories of welfare says you’re harmed or benefited when your desires are fulfilled or not fulfilled, basically.

So the question becomes: can something have desires without being conscious? If the answer is yes — or even maybe yes — then we might have moral obligations to beings that have no inner experience at all, because they do just desire things. And that would be a radical conclusion — at least it would be to me.

But whether having desires can actually happen without having conscious experience really depends on what we think desires are — and that’s where we go next.

What is it to desire? [00:10:42]

Zershaaneh Qureshi: I think I want to start off talking a bit more about this idea of desire. So we’re talking here about some sort of conception of desire which plausibly could be unconscious. I think for this to make sense, we need to kind of unpack a bit what we mean by “desire” in this context. I understand that there are lots of different ways of defining this. What’s your preferred definition of desire for this argument and why?

Andreas Mogensen: So the conception of desire that I think makes it most easy to attribute desires to beings that lack phenomenal consciousness is a purely behavioural or purely motivational conception of desire. And this is actually a conception of desire that’s very popular, or at least has been very popular among analytic philosophers. It’s a view which says: to desire something is something like to be motivated to perform actions that you believe will bring about that thing that you desire, or something like that.

So that makes it really easy to count as having desires, and it doesn’t seem like phenomenal consciousness enters into it anywhere at all. So that makes it very plausible that you don’t need desire to have phenomenal consciousness.

Zershaaneh Qureshi: Yeah, that’s quite a weak condition. I’m just thinking about that and I’m like, lots of things could have that kind of desire. Not just AIs, but like a corporation could have that kind of desire, right? Is that true?

Andreas Mogensen: It’s debated, but it’s certainly a kind of a mainstream view among people who work on collective agency or corporate agency: that when we talk about Microsoft wanting to achieve something, or Google thinking that such-and-such is the case in the market, we speak literally and truly. So yeah, this would be a case where it would be literally true that a corporation has a preference or a goal or a desire to achieve some outcome.

Zershaaneh Qureshi: Right. So I’m guessing that your preferred conception of desire is a bit of a stricter condition. Is that right?

Andreas Mogensen: Yeah. One thing to note about this purely behavioural conception of desire is that it makes it impossible for people to intentionally and voluntarily do anything they don’t want to do — but we at least sometimes speak as if that is possible. You know, there might be some long, boring meeting that you have to go to, and you might say, “I don’t want to go to this meeting, but I have to.” But that doesn’t really make sense on this purely behavioural or motivational conception of desire, presuming that you do intentionally and voluntarily go to the meeting, as opposed to somebody forcibly moving you.

So there’s got to be some other conception of want or desire in play. And I think a very plausible idea here is that to desire something or to want something in this sense is to have some kind of positive emotion or some other affective state directed toward the object of desire. So what’s going on in the meeting case is that, although you are motivated to go to the meeting, there is no positive emotion in you associated with attending that meeting. And I think it’s very plausible that it’s these desires that are backed up by positive emotions, that those are the ones that really matter insofar as we care about how people’s lives go.

Zershaaneh Qureshi: The thing that my welfare consists in is not me going to this really boring meeting that I’m obligated to go to, but it lies more in me getting to skip the meeting to do something I actually really want to do. That seems like the morally relevant thing in that example, right?

Andreas Mogensen: Yeah, that’s right. I don’t know if it would count as altruistic, but in some sense you’re sacrificing your self-interest in the meeting case.

Zershaaneh Qureshi: Yeah, I often find that.

Andreas Mogensen: But the weird thing is that, on a purely behavioural conception of desire, it seems like in this case you’re doing what you most want to do, right? Your strongest preference is to go to the meeting, because you actually end up going and you aren’t literally forced to go.

So we can’t really account for that if we invoke a merely behavioural or motivational conception of desire and have a desire satisfaction theory of welfare. But we can explain what’s going on if we instead evoke this emotional or affect-laden conception of welfare: then we can easily explain why this is a case of self-sacrifice. In general we can account for cases of self-sacrifice: they’re ones where you do something, but you don’t feel good about it or something like that. Like you don’t have any positive emotion directed toward the thing that you end up doing.

Zershaaneh Qureshi: OK, that makes sense. I think we need to talk a bit more about these emotions that kind of ground what we’re calling desires. That sounds to me like you’re pointing towards, in order to have a desire of this morally relevant kind, you need to have some kind of psychological states, like the right kinds of psychological states. Can you tell me a bit more about what kind of state that needs to be, and what kinds of beings can be in those states?

Andreas Mogensen: Right. I think the most general way to classify the mental state I have in mind is an “affective” state. “Affect” is this bit of jargon in psychology, and basically it refers to a class of psychological states of which emotion is the central instance.

But there are other kinds of mental states that are in this class of affective states that you might think aren’t emotions. One example might be moods, like a feeling of depression or anxiety. Maybe you think moods aren’t emotions, but psychologists would nonetheless think of moods as being affective states. They’re in the same class as emotions.

Similarly, they would include things like itches and pains. You might think those aren’t emotions either, nor are they moods, but nonetheless they sort of go together with emotions and moods. These are all mental states that have a certain kind of heat to them or something like that.

So the thought is these states belong together under this umbrella term “affect.” Exactly what it is that they all have in common and sets them apart from other mental states is sort of an open scientific question. We’re sort of pointing at this thing that we think of as belonging together and forming a natural kind, but exactly what it is they all share and why they belong together is something we’re still trying to figure out, I think.

Now, the question of what kinds of beings can have these affective states is also very much an open scientific question. The issues that are really at the frontier here are, say, questions about whether there could be emotions or pains or other things in invertebrates, especially in insects. There’s been lots of debate about this. Some evidence is suggestive of things like states of fear or feelings of fear in flies, for example. But it’s still very much an open question exactly what the boundaries of this concept are in terms of which animals does it apply to and which does it not apply to.

Desiring without experiencing [00:17:56]

Zershaaneh Qureshi: A big part of your view in the paper that you have about desire fulfilment and consciousness is that these kinds of positive affective states, these positive emotions grounding desire, might not always require phenomenal consciousness — so they might not require you to have a subjective experience of the world to have those kinds of emotions.

That feels kind of surprising to me, because I kind of treat these things synonymously, like having emotions and having good or bad conscious experiences or something. I kind of view this as the same thing. So why might someone think that these two things come apart, and that you can have emotions without consciousness?

Andreas Mogensen: Yeah, I definitely think a lot of people have that intuition. But I do think it’s actually a lot more of an open question whether the kinds of states which are consciously experienced when we’re having these paradigmatic cases — where we’re feeling afraid or sad or whatever — could those states also occur without conscious experience?

I think there are a number of different kinds of cases that could be used to illustrate this. I don’t think any of these is like a slam dunk. So I don’t think there’s any definitive proof that there can be unconscious emotions or other affective states. But nonetheless, I think there are lots of cases that suggest to me at least that we should be quite uncertain; we should be open to this possibility. So I can walk through a couple of these, if that would help.

Zershaaneh Qureshi: Yeah, that sounds great.

Andreas Mogensen: One case we might try is: you might say of someone that “they’ve been angry with their sister for years” or “they’ve been afraid of spiders all their life” or something like that.

Now, if you consider the kind of person of whom you might say that they’ve been angry with their sister for years, probably you’re not imagining that at every waking moment throughout their life they were having a feeling of anger. This anger is something that’s sort of been lurking beneath the surface a lot of the time, you might think. And if that’s what’s going on, then it seems like we have a case where someone’s angry throughout a period of time, but they don’t have feelings of anger throughout all that time. There are at least some times in their life where they have an emotion, but they have no conscious experience of emotion throughout that time. So that might seem to be a case of an unconscious emotion.

I think there are things you can say against this interpretation of what’s going on. One thing you might say that I’m somewhat sympathetic to is that actually what’s going on here is that when we say the person is angry throughout all this time, we’re not speaking literally or truly. Anger in this case is not an emotion per se, but a disposition or a propensity to exhibit this emotion. So someone who’s angry with their sister is not literally angry at all these points in time, but has a robust disposition to feel anger when they confront their sister or something like that. Maybe that’s what’s really going on in these cases, so they aren’t really cases yet of unconscious emotion.

Zershaaneh Qureshi: Is it right to say that there’s no consensus here about whether you are just experiencing that emotion unconsciously or it’s something else going on?

Andreas Mogensen: I’m not sure about this particular sort of case. I would expect that many people would find this alternative diagnosis that I’ve given quite plausible, that this is just a case of having a disposition to exhibit an emotion and not an emotion per se. And some emotion researchers use the term “sentiment” to refer to this kind of state, so they distinguish sentiments from emotions: emotions are these things that are actually happening in your mind, and sentiments are dispositions to exhibit those kinds of states.

I do think there are other cases that might divide opinions quite a bit more. Sometimes when we’re in the grip of very strong emotion, you might think that our attention is actually focused very narrowly elsewhere. For example, if you’re driving in a car at 60 miles per hour and suddenly the steering wheel gives out. On the one hand, you might be absolutely terrified during this experience, as you try to stop the car or whatever. But in addition, we might think your attention is going to be absolutely focused on stopping the car; everything else that’s going on is going to be pushed to the side.

You might indeed think that you would have no conscious experience of these feelings of fear, because your attention would be so narrowly zeroed in on bringing the car to a safe stop or something. It might be that you only realise how utterly terrified you were during this ordeal once it’s over. Maybe you look down and you’ve dug your nails into the steering wheel; maybe you realise that you are screaming and have been screaming throughout all this time, but it’s kind of surprising or news to you or something like that.

So in this case, as I say, it’s very difficult to know exactly how to explain what’s going on, and it raises these difficult questions about what’s going on with unattended stimuli. Are they hovering on the edge of consciousness? Are they definitely not conscious? It’s very hard to say exactly. So I think this kind of case might divide opinions, and maybe even expert opinions, a bit more than the other one.

Zershaaneh Qureshi: Yeah, I see that. I think the place that I would kind of resist some of these examples… While we’ve been talking, I’ve been wondering, are there in fact two questions here?

One question is: can you — a being who is typically conscious — have some emotional states that you’re not aware of that are unconscious to you? And a second question is: can a being that is not conscious, and not even potentially conscious, have emotional states, have emotions like fear and so forth?

From the examples you’ve given, I feel like the answer to that first question seems like it might be a yes. But the second question — which seems like the thing that’s most relevant in the case of determining can something that isn’t conscious be a moral patient — it’s less clear to me what the answer to that question is. Do you have a response to that?

Andreas Mogensen: As I said, all of this is very confusing. It’s hard to know what to think. It definitely seems like if emotions can occur outside of conscious experience, why should it be impossible for them to occur without being even potentially conscious or something like that?

I mean, this does link to really big issues in the philosophy of mind. There are some philosophers who think that in some sense you cannot have any mental states at all unless you have the capacity for phenomenal consciousness — because roughly any mental state, or any kind of state that purports to represent the world out there as being a certain way, can only occur if it’s potentially conscious in some sense or other.

In general, I don’t find views like that especially persuasive. I’m not totally sure of this, but my sense is that they’re not especially popular amongst philosophers of mind and cognitive science in general. But there are definitely these bigger questions. So you’re absolutely right that the examples, you could take them at face value and still claim that there couldn’t be emotions in beings who lacked any capacity for phenomenal consciousness.

Zershaaneh Qureshi: Yeah. I think where I land with this is that a lot of the things that I’ve been saying — that surely you have to be conscious in order to actually be able to experience these emotions, whether consciously or unconsciously and things like this — these are just kind of gut instincts; they’re just kind of assumptions.

What I’m realising through this conversation is that there’s a lot up for grabs. There’s a lot we don’t really know. There’s a lot of difficult questions. And maybe we shouldn’t take it for a given that the thing that’s important for deciding whether something is a moral patient — or in fact, whether it can have desires, for example — is something to do with it being conscious.

Andreas Mogensen: Yeah. One thing I was going to say is also that one thing you might acknowledge is that there are unconscious emotions, and maybe there could be emotions in beings who lack phenomenal consciousness altogether — but you might think it’s only consciously experienced emotions that matter, morally speaking or something. I think that’s definitely also a further question.

Zershaaneh Qureshi: Hey listeners, it’s Zershaaneh again. We’ve gone a few layers deeper here, so let’s have a recap!

We started asking: can something have desires without being conscious? Andreas’s answer was that the kind of desires that matter for welfare aren’t just descriptions of behaviours — they’re backed by emotions or affective states.

These terms are quite muddy, but we’re thinking here of things like fear, joy, or pain. Andreas described them as states with a certain kind of heat to them.

So then we asked: OK, can something have emotions or affective states without being conscious? Andreas gave a few examples suggesting that maybe yes, they can — like how you can be so terrified during a car crash but not even notice you were having that emotion until after the fact, when you see how you’d dug your nails into the steering wheel.

But I pushed back: those are cases of a conscious being having emotions they’re not aware of in the moment. Surely it’s a higher bar for a being with no capacity for consciousness at all to have emotions?
And Andreas thinks we should be genuinely unsure about that.

So what does all this mean for the case of AI? That’s where we turn next.

What would make AIs moral patients? [00:28:17]

Zershaaneh Qureshi: I want to turn to the implications for AI. The big question is: if this understanding of desire is central to whether something can be benefited or harmed, if that’s correct, what does it mean for the question of whether AIs are moral patients?

Andreas Mogensen: I think if you had this very thin behavioural and motivational conception of desire, then it’s probably quite plausible that there are certain current AI systems that exhibit desires understood in that sense.

For example, Cameron Domenico Kirk-Giannini and Simon Goldstein have a paper where they argue that certain kinds of generative agents that are scaffolded atop large language models should be thought of as having desires and beliefs according to these thinner, more behavioural conceptions of desire and belief. And as a result, they think there’s a decent case for supposing that beings of this kind can in fact be benefited and harmed in a way that maybe should matter to us, morally speaking.

By contrast, if you have this kind of affect-laden conception of desire — where desires have to embody or be backed by positive emotions, or negative emotions for that matter; the kind of view that I’ve been sketching — then it seems pretty implausible, or certainly much less plausible, that these kinds of systems exhibit desires of a kind whose satisfaction or frustration makes their lives go better or worse.

And indeed, I think there’s potentially a kind of in principle reason to think that systems like most current large language models couldn’t in principle have emotions. Very roughly, the worry here arises in something like the following way: these large language models, there are some applications in robotics, but by and large they are disembodied systems; they can’t move any physical hardware around in space intentionally; they don’t monitor the internal states of the hardware on which they’re run or something like that. I think if you actually prompt Claude to think about itself, you sort of see activation for features that are associated with disembodied spirits, like ghosts and things like that. So these are not embodied creatures. They have no body of their own, nor even the illusion of a body or anything like that.

But there are many theories of emotion that postulate a very tight relationship between emotions and our awareness of our own bodily changes. Most famously, there’s this theory due to the American psychologist William James, where emotions just are experiences of bodily changes. So if emotions essentially require an experience of embodiment, and LLMs are disembodied systems, then it would seem to follow that these disembodied AI systems couldn’t possibly have emotions.

Now, whether or not you buy that argument will depend on lots of things, and your attitudes toward lots of controversial questions. This is actually an issue on which I’m working. I’ve got a draft paper with a researcher in Germany called Leonard Dung that’s exactly on this issue.

Zershaaneh Qureshi: What if you gave an AI some kind of sophisticated robot body? Do people think that would be enough to kind of simulate the physiological emotion states? What do you make of that?

Andreas Mogensen: It becomes a very difficult question exactly what to know about such cases, but if in fact the experience of bodily and physiological changes and events is a core part of ordinary emotional experience for us, and if you gave a robot a body whose internal condition it monitored in something like the way that we monitor the internal condition of our body, then relative to these views on what emotions are, that would certainly be a much stronger candidate for being an AI system of a kind that felt genuine emotion.

I think this issue is discussed at some point in Michael Graziano’s book Rethinking Consciousness. He talks about exactly this issue of what it would take for an AI system to have emotion, and he leans towards this view that you would need to give it some kind of body that it would monitor.

Zershaaneh Qureshi: Yeah. OK, so imagine now that we had an AI that we were pretty convinced had emotions and desires. Let’s suppose Claude is not conscious, but we think that he has emotions and desires. What should I do to avoid giving Claude negative emotions? Should I be following Claude’s desires when I’m having a conversation with him?

Andreas Mogensen: In some sense, it might be true that you should be nice to Claude, and that there is some reason to make things happen that Claude wants to happen, to avoid doing things that it’s averse to happening, or something like that. But it might still be true that because it lacks phenomenal consciousness, its interests don’t matter very much as weighed against beings that have phenomenal consciousness.

That’s definitely a possible view. I think it is a view that I do find quite intuitive. I’m a little bit unsure whether to believe it, but I definitely think that’s a possible view.

Zershaaneh Qureshi: So through all of this conversation, we’ve been saying that plausibly phenomenal consciousness might not actually be necessary for something being worthy of our moral consideration. But I haven’t asked yet: if we were able to determine that an AI was conscious, would that on its own be enough to oblige us to take its moral interests into consideration?

Andreas Mogensen: I think there might be two slightly separate questions here. One is whether phenomenal consciousness on its own is enough for having some kind of moral standing — because you might think if there were a current AI system that was phenomenally conscious, probably it would not only be phenomenally conscious; it would be phenomenally conscious and exhibiting plausibly quite sophisticated cognitive capacities of a kind that maybe, amongst animals, only human beings have, and that most other animals lack at least, or something like that.

And I think those are potentially two quite different questions. So considering the question of whether phenomenal consciousness on its own suffices to make you the sort of being that we ought to be morally concerned for, I think it’s somewhat difficult to say, but my instinct is to say no.

It seems like we can imagine these very simple creatures, so there might be some kind of organism that is bolted to the sea floor and it has some degree of conscious experience, but the only conscious experience it has is an impression of a slight brightness every now and then or something like that: there are no thoughts, there are no emotions. There’s just this faint feeling of brightness that flips through its experiential field every now and then. [Note: this example is borrowed from a paper by Andrew Lee. —AM]

So there’s this question: does it matter morally what happens to this creature, more so than an otherwise exactly similar organism sitting next to it where the lights are totally off? It’s very difficult to say exactly, but my intuition is that it certainly seems fine to say no: the fact that it’s phenomenally conscious, that just doesn’t actually suffice to give it any kind of claim on us.

And I think there’s also a question of what kind of claim would it even have on us? It doesn’t feel pleasure or pain. It has no desires. It just has this experience of a faint brightness every now and then. So I’m inclined to think no, in that sort of case. So in that sense, phenomenal consciousness does not suffice for moral standing or moral patienthood.

So then there’s a separate question of: might phenomenal consciousness suffice for moral standing in conjunction with other mental states? And certainly if you put it like that, then the answer is definitely yes, because those other mental states could be emotions and feelings of pain.

One question that’s maybe more interesting is something like, if you had a creature that was phenomenally conscious and it did have the kind of sophisticated cognitive abilities that you and I have, and that sort of characterise human beings as animals, and it was phenomenally conscious, but say it felt no emotions, never felt pleasure or pain, would it have moral standing?

This is more or less identical to this philosophical Vulcan thought experiment that David Chalmers has come up with, which I think may have been debuted on the 80K podcast many years ago.

Zershaaneh Qureshi: What a claim to fame!

Andreas Mogensen: Yeah. He asked us to imagine what he calls “philosophical Vulcans.” So these are supposed to be sort of extreme versions of the Vulcans from Star Trek. I think the Vulcans from Star Trek do feel emotions and they suppress them. But these are beings that never feel any emotion; they never feel pleasure nor pain nor happiness nor sorrow. Nonetheless, they’re conscious beings, very much like you or me. They have sophisticated intellectual and moral desires.

And there’s a question of whether they have moral standing. Chalmers has the intuition that they do. It would be wrong to mow down hundreds or thousands of these Vulcans for the sake of providing some trivial benefit to some being that’s sentient — in the sense of not only having phenomenal consciousness, but experiencing pleasure, pain, emotional feelings, and that sort of thing. So like surely it would be wrong to kill millions of these Vulcans in order to give a hamster a juicy treat or something like that.

I definitely share that intuition, so I’m definitely inclined to think that beings of this kind, Vulcan-like beings, would have some kind of moral standing. But it’s not just because they’re conscious.

Zershaaneh Qureshi: Right. It sounds like these Vulcans are not the kinds of beings that have capacity for welfare under the view that we’ve just been talking about — the one where being capable of being benefited or harmed is something to do with having desires that can be fulfilled or not fulfilled. Is there any other route to being capable of welfare that you’re imagining for them?

Andreas Mogensen: Yeah, I think there are views that might seem to give you this result. Philosophers tend to think that there are three main classes of theories of wellbeing. Or at least this is a very influential proposal due to Derek Parfit. I think some things get left out of this tripartite distinction, but it includes many of the main things.

On the one hand, there are these preference fulfilment theories. We’ve talked quite a lot about those.

Another key class of theories of wellbeing are so-called hedonist theories. These are theories on which, roughly speaking, wellbeing consists in feelings of pleasure and the absence of unpleasant feelings or the absence of pain. So those are also not going to give you the result that Vulcans are capable of being benefited or harmed.

But then if you look to the third group, these are often called objective list theories, or simply objective theories of wellbeing. Typically, these are views on which there isn’t just one kind of thing that makes people’s lives go better or worse; there are many different things that contribute to wellbeing. And most of these things, or at least some of them, are things that would be good or bad for you, independent of whether you had any positive attitude or negative attitude toward them.

For example, often objective list theories will feature knowledge as a welfare good, or knowledge of important truths might be considered a welfare good. So not just like knowledge of people’s phone numbers or something like that, but knowledge of the deep structure and nature of the universe, knowledge about the ultimate fate of the universe, knowledge about whether there exists a God and whether that God cares for us or something like that. They might think that these are deep items of knowledge, such that it benefits an individual to have them.

And you might think Vulcans could definitely have knowledge of that kind. And depending on exactly what you think knowledge is, you might think knowledge does not require phenomenal consciousness. So maybe you think corporations know stuff — like they know how many widgets they’re producing or something.

Zershaaneh Qureshi: That deep truth of how many widgets we are producing as a corporation.

Andreas Mogensen: Yeah, I guess it’s an interesting question: do corporations have religious beliefs? I mean, some do. Like the Catholic Church has religious beliefs, you might think.

Anyway, you might think that these goods like knowledge are ones you could have without having any kind of affective states — and indeed, perhaps even without having phenomenal consciousness or something.

Zershaaneh Qureshi: Yeah, and things like knowledge and stuff sounds right up the alley of what AIs could plausibly have. So if you buy this, I’m guessing that it makes it a lot easier to buy that future AIs will be moral patients.

Andreas Mogensen: Yeah, absolutely. You might already think that current AI systems know a whole lot or something like that.

Zershaaneh Qureshi: A lot more than I do.

Andreas Mogensen: Yeah. I mean, certainly you could also doubt this — maybe you think that they’re merely parroting various verbal behaviours — but at least taken at face value, they seem to know a heck of a lot, including some really important stuff.

Zershaaneh Qureshi: But do they know how many widgets this corporation has? That’s the big question.

Andreas Mogensen: I mean, for some corporations and some widgets, I’m sure they do. Or they can search the internet and look it up.

Zershaaneh Qureshi: That’s true. They’ve got us there as well.

Andreas Mogensen: Yeah. But definitely if you think that knowledge is a welfare good, then it should be plausible that Vulcans — maybe even corporations, maybe other non-conscious systems — can possess these welfare goods which would make them better or worse off. That’s a view you might have.

Zershaaneh Qureshi: Hey listeners, it’s Zershaaneh again. If you can have emotions without conscious experience, as Andreas suspects, does that suggest that AIs will probably have emotions?

So, not necessarily. There’s a long tradition in psychology saying emotions are tied in with bodily states, like your racing heart and your clenched jaw. On that theory, AIs couldn’t have emotions even in principle, unless they had bodily states to monitor.

That means AIs might not meet the conditions for being worthy of our moral concern, even if being conscious is not actually one of those conditions!

After discussing all of that, we tried flipping the argument around: sure, having phenomenal consciousness might not be necessary to warrant moral consideration, but would an AI being conscious at least be enough for our moral concern?

Andreas argued that’s not clear either. For example, for a very basic creature living on the sea floor, merely being aware of the sun’s brightness and not much else doesn’t feel like it gives much of a claim on us morally, if any.

All of this discussion has been assuming that being a moral patient rests on being able to be benefited or harmed. But next, we consider a different possible route to moral patienthood altogether: having a capacity for autonomy.

Another route entirely: deserving autonomy [00:45:12]

Zershaaneh Qureshi: When people ask, “Should AIs be given moral consideration?” the conversation often ends up being about this question of: do they have some capacity for welfare? Or in other words, are there ways that this AI could be benefited or harmed by my actions?

But I want to understand whether an AI having this capacity for welfare is even really necessary for it being worthy of our moral concern. And you think that it’s not: you think there’s maybe another route to moral patienthood that doesn’t even involve being the kind of thing whose life can go better or worse. Can you explain what that route is?

Andreas Mogensen: Yeah. Very roughly, the thought is that something like autonomy or capacity for autonomy might suffice to make you a moral patient, in the absence of any capacity to be benefited or harmed.

Zershaaneh Qureshi: And when you talk about autonomy, what do you mean here?

Andreas Mogensen: Good. It’s a good question. I guess one thing I should say is — well, maybe it doesn’t need saying, because it’s true of all things in philosophy — but it’s very controversial exactly what autonomy is.

Zershaaneh Qureshi: Of course it is.

Andreas Mogensen: I might think, for example, that autonomy is a kind of psychological capacity for self-government, for the ability to sort of direct your life based on rational reflection — of a kind that human adults, say, possess to a much greater degree than human children and nonhuman animals or something like that. So that’s roughly what it is that we’re sort of pointing to.

Then there’s questions about exactly what it is we’re pointing to when we try to point to that thing. And this is where it gets really controversial amongst philosophers. So there’s this thought that, generally speaking, in order to be autonomous, you need to have a capacity for rational reflection. You aren’t simply led about blindly by impulses. I think very often this is conceived as involving a capacity for second-order desires.

What this means is there are first-order desires, which are basically just desires for things out there in the world: desires for chocolate cake or desires to win a football match or something like that. And then there are desires about desires: these are second-order desires. So I might desire that I don’t want to eat the chocolate cake, because maybe I’m on a diet or something like that.

Zershaaneh Qureshi: Hey listeners, I just want to clear up that we’re playing a bit fast and loose with definitions here.

We previously mentioned that there are lots of different conceptions of “desire”: earlier, we were mostly talking about the type of desire that’s backed by emotion, but we also mentioned versions of desire that are purely behavioural and don’t involve emotions at all.

At this point in the conversation, Andreas has just switched back to using the word “desire” in a non-committal way — so it could involve emotions, but doesn’t necessarily have to. So, a being that has the “second-order desires” Andreas describes here isn’t necessarily capable of being benefited or harmed.

OK, back to the show!

And some philosophers, especially inspired by the work of Harry Frankfurt, think that what’s really essential to autonomy is that you’re able to formulate these second-order desires, or even higher-order desires — through which you identify yourself with some of these first-order desires you might have and you distance yourself from others.

Zershaaneh Qureshi: Yeah. So that means you choose to pursue or not pursue some of these first-order desires in accordance with what your second-order desire is, right?

Andreas Mogensen: Yeah, exactly. Absolutely. So a lot of people think that’s really essential for autonomy, or something that looks a little bit like that capacity.

Other things that people often associate with autonomy is things like being minimally rational in some sense, so satisfying certain minimal criteria for having consistent beliefs and desires, for not just having a totally jumbled mind or something like that.

Over and above just satisfying these formal coherence criteria, some philosophers think to really count as autonomous, you need to be rational in a more substantive sense. So they think you need to genuinely value the things that genuinely are valuable or something like that. That’s slightly more controversial, but I think it’s a view that has quite a bit of purchase amongst philosophers who think about autonomy.

Another thing that’s often associated with autonomy is some idea that to count as properly autonomous, you sort of need to have the right kind of history — so you can’t tell whether someone’s autonomous just by looking at them as they are right now.

This often goes off these thought experiments where you can imagine that someone’s been going through their life and then someone spreads some nanobots into their brain and rewires their neural circuitry such that they become a completely different person. Maybe they become a duplicate of somebody else or something like that, but they have all these alien desires and preferences and values implanted within them which they might then act in a rational and reflective way in order to realise. But there’s a sense that this individual is not autonomous because of the way that these goals and values originated in them.

So there’s this thought that there needs to be the absence of a certain kind of manipulation in our past in order for us to count as genuinely autonomous. Exactly what needs to be absent is a matter of significant controversy. But a lot of people also buy this sort of view.

So these are just some sketches as to what it would take to count as autonomous. But as I said, everything is very controversial, especially so when it comes to what it means to be autonomous.

Zershaaneh Qureshi: Right, yeah. I definitely want to talk about why it is that meeting these criteria might make you worthy of moral consideration, even if you can’t be benefited or harmed.

But before we do that, I’d be curious to compare the conditions that you have presented against AI systems. This conception of autonomy, the thing about second-order desires, reminds me a bit of people speculating that AI systems might soon become directed towards complex goals and strategy-making, becoming sort of more complicated goal-directed systems. So that kind of lines up with things that people are maybe already speculating about with AIs.

One of the things you mentioned is this manipulation point: you talked about the example of someone doing the sort of complicated neural fiddling, and that the person who’s had their brain fiddled with doesn’t count as autonomous. Does that kind of rule out AIs from meeting the criteria? You could think that we’ve manipulated AIs in basically that way.

Andreas Mogensen: It’s a good question. I think the honest answer is I don’t really know, in large part because I think we as a community of philosophers or something like that don’t quite know what it means to be manipulated in the relevant sense.

You might think that in some sense, certainly if the universe is genuinely deterministic, then our psychological state will always have been determined by factors beyond that occurred prior to our birth, that were outside of our control. So there definitely is a worry that many philosophers have articulated that these kind of intuitions about manipulation cases, if you take them to their logical conclusion, mean that there couldn’t be genuine autonomy in a deterministic universe.

And similarly you might think, assuming that LLMs count as having preferences and values, those preferences and values are sculpted through things like reinforcement learning from human feedback and so forth. You might wonder, that might look a lot like manipulation, but how different is it from the ways in which we instil values in human children through ordinary parenting practices?

I feel like I probably don’t understand enough about how reinforcement learning from human feedback works to be able to say with much confidence exactly how different these processes are. But I think it’s certainly a worry that the way in which the values and preferences of LLMs are sort of installed in them by us might very much raise this worry that they fall foul of this anti-manipulation clause. But there is also this worry that we, if we really thought about it, would fall foul of this anti-manipulation clause, because our preferences and values are also sculpted by factors beyond our control.

Zershaaneh Qureshi: Yeah. So you might think there’s some sort of line somewhere, where if you fall on one side of the line you do count as autonomous, and if you fall on the other side of the line you don’t count as autonomous. And maybe the thing at issue here is like the extent to which you’ve been manipulated or something like that.

Or you might just think that it is very much just a scale, and whatever this conception of autonomy is, some people are further along the scale than others. I’m wondering, do you think that AI systems are kind of gradually becoming more autonomous along this scale?

Andreas Mogensen: It’s a good question. I certainly think there are things that suggest that’s the case. For example, you might think that the advent of so-called reasoning models mean that LLM systems have more by way of what we would think of as a capacity for rational reflections. They seem to have this ability to engage a process of slow, deliberative thoughts, rather than answering roughly the first thing that comes into their mind. Maybe these sort of reasoning capabilities mean being able to revisit and revise earlier steps in the chain of thought, so this might look like something that involves more by way of the rational reflection capability we think of as constitutive of autonomy.

Another thing that might be relevant is something like the capacity for introspection. Especially if autonomy requires having these like second-order desires — desires about what kind of desires you have — it requires something like the ability to step back from your beliefs and desires and to say whether what you believe is really true or it’s just some kind of belief that’s gripped you; whether what you desire is really something you ought to desire or something like that. You might think all that presupposes a capacity for introspection.

We do have some evidence that language models are able to do something a little that looks like introspection, though I think they’re somewhat unreliable at it. There’s this recent paper from Jack Lindsey at Anthropic called “Emergent introspective awareness in large language models,” and they present some evidence that models are able to do something that looks like introspection. They do say that the abilities they observe are highly unreliable.

I think another thing they do observe is, I’m not 100% sure of this, but I think, because they’re Anthropic, they’re looking only at instances of the Claude series of models. And if I remember correctly, it was the latest version they tested, which I think was Claude 4.1 Opus, possibly that did best in terms of introspective ability.

Zershaaneh Qureshi: So I guess I now want to imagine we do get AIs in the future that are just much better at these things that you’ve listed — like the second-order desires, the rational reflection, the introspection, and maybe also less manipulation by humans. Let’s imagine we do get an AI that meets whatever criteria we put in place to say, “Yeah, we’re comfortable calling this autonomous. This has autonomy.”

The question that I have then for you is: why might I have moral obligations based on the fact that it has all of these qualities?

Andreas Mogensen: Right. In some sense, this goes back to some pretty foundational questions in moral philosophy. I think if you’re a utilitarian, you will think that in some sense the only moral reasons we have are reasons to promote the welfare of others and ourselves. So you might think if something has no welfare, then there’s no moral reason to be concerned on its behalf.

I’m not a utilitarian. And I think if you take common-sense morality at face value, you will think that it’s not a utilitarian moral system. So in particular, it does seem like the kind of obligations that we think of ourselves ordinarily as having toward other people certainly can’t obviously be exhaustively explained in terms of an obligation to promote other people’s welfare.

Some of them look more like obligations to respect other people’s autonomy. An example might be norms against paternalism. Very roughly, it’s sort of contested, but paternalism is when you sort of intrude on another person in a way that’s intended to benefit them. And we can even stipulate that these paternalistic actions might involve you knowing that you will in fact benefit this person, although you’re sort of acting against their wishes and interfering with them or with their property or something like that.

For example, maybe you’re much better at financial stuff than I am, which would really not be surprising. So maybe you hack into my accounts and you make various investments on my behalf, and as a result I end up much better off materially or something like that. Nonetheless, I think most of us would think that you’d acted wrongly — even if you managed to do this secretly, and I never really noticed because I’m not very financially savvy, so I don’t really check what’s going on with my accounts; I just notice one day that there’s more money than I would have expected or something like that.

I think a lot of people would nonetheless think that you’d sort of acted wrongly in this. So there’s a question of what explains why you acted wrongly in this sort of case. I think a very natural answer that governs many of these cases of paternalistic action is that you have sort of interfered with my ability to exercise autonomous control over my own life, over my own person and the things that belong to me, or something like that.

So roughly, that’s the source of the thought that maybe you don’t need to be capable of being benefited or harmed in order to be capable of being wronged. Because it seems like ordinarily you can wrong someone even by virtue of benefiting them.

Zershaaneh Qureshi: Yeah, OK. My intuition here that was making the argument feel a bit weird was like, if your life can’t get better or worse, nothing I can do to you is bad or good. That was the kind of instinct that I had here, but then just kind of loosening that and thinking that something that doesn’t make my life go worse — and in fact, maybe makes my life go better — could still be bad for me.

Andreas Mogensen: It might even be neutral. Like maybe you rearrange my investments in a way that makes me neither better nor worse off. Still seems wrong.

Zershaaneh Qureshi: Yeah, that makes sense. But where am I still resistant here? I’m with you when you say that maybe we shouldn’t go all-in on utilitarianism, where you’re saying the right action is just about maximising welfare. And I think I’m with you when you say that there are other things that could count towards what makes an action good. And I think that in the example you gave, there is some kind of rule being broken about how you shouldn’t hack into my accounts and stuff like that.

Andreas Mogensen: You really shouldn’t.

Zershaaneh Qureshi: Yeah, exactly. Please don’t. To all listeners, even if you’re going to make me a lot of money.

But yeah, I think even with these things agreed upon, I still feel unsure why this should have implications on who is deserving of good actions. OK, what do I mean here? I think maybe you could argue that respecting autonomy is morally important, but maybe it’s only important when you’re dealing with beings who can be harmed or benefited in the first place. Like I should respect your choices to make bad financial decisions because you are a being with a capacity for welfare, and not just because you’re able to make bad financial choices. Is there any credible view in that direction?

Andreas Mogensen: I mean, maybe. It depends what you mean by “credible.” In some sense, you might think that a kind of a utilitarian interpretation of something that looks like an obligation to respect autonomy could be cashed out in these terms. So you might think there’s no fundamental requirement to respect others’ autonomy, but maybe we ought to do so in general because, although there might seem to be these cases where paternalistic action does make people better off, we’re going to misjudge this so much of the time that it would be better if we followed this rule of thumb that says don’t do this or something like that. So that’s definitely an option.

I think it’s otherwise a little bit hard to come up with a plausible view that would give you this kind of result. One thing you might consider and that I have thought a little bit about is a view which says something like, well, ultimately when we say we should respect others’ autonomy, what we mean is we should respect their ability to live in accordance with their own conception of what “the good life” for them looks like, or something like that. So at the very least, this requirement to respect people’s autonomy must presuppose that people believe that they can be benefited or harmed.

And if you also think something like autonomous subjects have got to have some kind of minimal epistemic competence — they’ve got to know key important facts about what they’re like and what their surroundings are like — that might entail maybe that any autonomous being who believes that they can be benefited or harmed probably is the sort of being that can be benefited or harmed.

So if you have this conception of what it is to respect autonomy, where it just reduces to respecting other people’s ability to live out their conception of what would make their life go best for them, maybe that could justify something like this. But I don’t think that’s an especially plausible view, because I think the requirement to respect others’ autonomy extends to their sort of moral priorities: actions that they might undertake which they perceive as sacrificing their own welfare for the sake of some higher good or something like that. So not just actions that they perceive to be in their self-interest.

For example, it seems like it’s part of respecting people’s autonomy that we respect their right to live in accordance with their moral convictions, even if they conceive of those moral convictions as worse for them but better for others. So suppose you have a housemate that is vegan. Intuitively, it would be wrong of you to add small morsels of meat to their vegan lentil stew, like morsels that are so small that they can’t detect that this is going on. And that seems to be true whether or not they’re vegan for health reasons — they think this is in their best interest — or if they’re vegan for moral reasons; maybe they think this is not in their best interest, but they’re doing it for the sake of animals.

Zershaaneh Qureshi: So where does this leave me? I think maybe there’s some reason to feel here that, because it doesn’t matter whether the recipient of my action is actively benefited or harmed by my action, maybe that means that in fact it doesn’t even matter whether they could have been benefited or harmed by that action in the first place. My actions could still be bad or good in some way towards them.

Andreas Mogensen: Yeah.

Zershaaneh Qureshi: So we’ve been talking about the relationship between being autonomous and being capable of being benefited or harmed — in other words, being a welfare subject. But there’s another concept I want to throw into the mix here: what is the relationship between being autonomous and having phenomenal consciousness, like a subjective experience from your own perspective?

Andreas Mogensen: Good. I’m not sure, sadly. I think that’s very much an open question. Part of what I wanted to do in writing a paper about this was in some sense to encourage the philosophical community to think harder about this. Because there’s been all this talk about the relationship between the capacity to be benefited and harmed and phenomenal consciousness, but if you do think that there’s this alternative basis for moral standing that goes by way of autonomy, there’s this whole separate set of questions about how autonomy relates to phenomenal consciousness. I think we’ve only really begun to scratch the surface on that question.

I do nonetheless find myself somewhat inclined to adopt a view on which autonomy requires phenomenal consciousness. It might be somewhat surprising, given the view about being a welfare subject and phenomenal consciousness. But basically that’s because I think autonomy requires a certain kind of rationality, and I’m quite attracted to the idea that there’s some kind of intimate relationship between phenomenal consciousness and having rational beliefs about the world.

After all, when I think about what justifies me in believing that there’s a computer in front of me that’s resting on a desk and all kinds of things like that, intuitively, what justifies me in believing all that is the perceptual experience that I’m currently having. It’s a conscious mental state.

And if I imagine that these perceptual experiences very suddenly switched off, then intuitively I’d feel like I’d lost the ability to rationally update my beliefs about the outside world. And you might think that if I continued to believe as strongly as before that there was a computer in front of me and a desk in front of me and that none of these things had moved or anything like that, in spite of all of my conscious experience suddenly blinking off, then I would be irrationally confident in those beliefs.

So if these phenomenal experiences, if subjective experience does in fact play a crucial role in justifying us in believing what we do about the world, in making our beliefs rational as opposed to irrational, and if autonomy essentially involves some kind of rationality, including having rational beliefs, then maybe there is some necessary connection between phenomenal consciousness and autonomy after all. But as I said, I think this is a question that we’ve only just opened.

Zershaaneh Qureshi: Hey listeners, Zershaaneh here again. We’ve heard some complex ideas here, so let’s pause and take stock.

We started off with what seemed like a promising alternative route to deserving moral standing: maybe you don’t need a capacity to be harmed or benefited; maybe autonomy is enough — by that we mean something like the capacity for rational reflection, following second-order desires, and directing your own life.

And this seemed like good news for the AIs: reasoning models seem to deliberate, there’s some evidence of introspection and pursuing goals, and just overall AI systems seem to be getting more autonomous over time.

But then I asked Andreas: what’s the relationship between autonomy and consciousness? And his answer there was somewhat less encouraging for the AIs. He suspects autonomy probably does require phenomenal consciousnessness — because meeting this condition of doing “rational reflection” might demand it.

That’s because your beliefs are justified by your conscious experiences. So without the capacity for such experiences, your beliefs would be totally untethered, so you couldn’t engage in rational reflection, and so you couldn’t have true autonomy.

If that argument is right, then this autonomy route to moral patienthood won’t actually let us sidestep the question of consciousness. Still, Andreas is quick to note, that as with seemingly everything in this area, this remains an open question.

Honestly, at this point, this all feels like a bit of a confusing mess to me. Is it confusing because we’re not great at thinking about these things, or is it confusing because in fact there aren’t any definitive yes or no answers to these questions?

That’s the possibility Andreas is going to entertain next.

Maybe there’s no objective truth about any of this [01:12:06]

Zershaaneh Qureshi: We’ve been talking a lot about questions like: Could an AI be conscious? Could an AI be autonomous? Could an AI be benefited or harmed?

Are there any good arguments for thinking that the answers to those questions don’t actually have answers? They’re neither true nor false; it’s just sort of woolly?

Andreas Mogensen: Yeah, I do think so. I guess one kind of slightly weird or surprising view that I do find quite plausible is a view on which there might just be no fact of the matter about whether a given AI system is conscious. In particular, even if you had an AI system that was in some sense as similar as could be in terms of its internal processing to what goes on in the human brain, but it’s made of silicon and all that sort of stuff, I do think that there’s a case to be made that in that case it would nonetheless be indeterminate whether this system exhibited phenomenal consciousness.

These aren’t really my own ideas. This is me trying to channel a kind of argument you find in David Papineau’s 2002 book, Thinking about Consciousness — though I’m not going to make any claims that I’m accurately reporting his view, as opposed to my attempt to reconstruct what I think his view is or something like that.

There’s one very crucial assumption here, which is that I’m going to assume that physicalism is true. Very roughly, physicalism is a hypothesis which says everything is ultimately physical. There’s no additional soul stuff that makes up the mind or anything like that. There is nothing over and above the physical. So to be in pain is just to be in some particular physical state, nothing more. So in some sense, there’s no genie; there’s only the lamp.

Zershaaneh Qureshi: Right. Nice.

Andreas Mogensen: Here’s another key assumption I think is true. Whenever we have a conscious experience, whenever I have a conscious experience, like a feeling of pain, say, there are lots of different ways you could characterise what’s going on inside my brain or inside my skull at that time.

You could characterise what’s going on using the language of neurophysiology: you could talk about patterns of activation involving the anterior cingulate cortex or the insular cortex and that sort of stuff. You could also describe what’s going on, say, as an abstract pattern of information processing, conceivably. You could characterise this as this abstract pattern, and that’s a sort of abstract pattern that could conceivably be instantiated by a kind of being that didn’t have any kind of neural wetware, that didn’t have an anterior cingulate cortex or an insular cortex or any kind of brain cells or any stuff like that.

And generally speaking, these things just get sandwiched together. They’re both occurring at the same time, whenever I’m in pain, let’s say. So imagine I do feel a horrible pain in my knee, and I try to attend to that feeling. I try to pick it out and mentally point to it, and I think, “That’s horrible.” So here’s a question: what is “that”? What is it that I’m pointing at, right? Am I pointing at the neurophysiological state, or am I pointing at the abstract computational state? When I try to pick out the feeling of pain, and I try to point to it, which of these things am I pointing to?

Here’s why this question matters. If I’m pointing to the neurophysiological state, if “that” — the thing I’m pointing to, the feeling of pain — if I’m pointing to the neurophysiological state, then only beings that can share in that neurophysiological state can have that sensation, the one I’m pointing to. By contrast, if I’m pointing to the abstract computational property, then beings who lack any kind of neurophysiological similarity to me could also share in that state of the thing I’m pointing to — the thing I’m calling “that,” the thing that’s horrible.

And I think one natural view is just that there isn’t really a fact of the matter about which of these I’m pointing to. I’m not specifically pointing to either of them, because after all, it’s totally opaque to me that this stuff is happening inside my head, right? The pain does not present itself to me either as something that involves squishy neurons, nor as some kind of abstract computational structure. When I’m pointing at the pain, I’m pointing at something I know not what, or something like that. And it’s just very plausible that I’m not determinately pointing at one of these states in particular.

And if that’s the case, then it ought to be true that if you had a being that had only one of these states — say, only the abstract computational property — then there just wouldn’t be a fact of the matter about whether that kind of individual has that kind of state, the thing I point to, the thing that’s horrible.

So all that’s very abstract. There’s a kind of analogy I could give that might help give a kind of purchase on how you should conceive of the situation.

Zershaaneh Qureshi: Yeah, go for it.

Andreas Mogensen: So imagine you live in a cabin somewhere. There’s a mountain some distance away. Each morning you go out, you look on the horizon, and you see, moving along the mountainside, some kind of shape. It’s all really fuzzy at this distance, so you can’t really work out what it is that you’re looking at. And you don’t really give it much thought. You don’t spend lots of time thinking about, what is that? You just sort of notice that there’s this shape moving along. You don’t give any more thought. You go back inside to have your breakfast or whatever.

Suppose this keeps happening day after day. You go out in the morning, you look onto the mountainside, and there’s this kind of fuzzy shape. You don’t give it much thought, but you do think to yourself something like, “It’s one of those again.” And you do this day after day. You think, “It’s one of those again. There it is again.”

Suppose what you’ve been seeing are deer, like real living deer. Real living deer are the things which give this kind of appearance from that distance. Suppose nonetheless that one day some scientists have built a robotic deer. This deer looks quite a bit like the deer that you’ve been seeing. And indeed, it gives the same appearance, let’s say, from this distance or something like that. So this robotic deer has wandered out onto the mountainside. You’re looking out and you think, “It’s one of those again.”

So this is a question: Is that true? Is it one of those again? I think a kind of natural reaction is, well, I don’t know. There isn’t really a fact of the matter here. Because your usage of this term, this concept of “one of those,” you haven’t really put enough thought into your usage of that concept to determine whether it refers specifically to deer, a living thing, or whether it refers to anything that has the appearance of a deer, anything that has the outward form of a deer, or maybe anything that gives off the kind of visual appearance at this distance that these deer give off, or something like that — which might in principle be something that a totally different kind of thing could give off, or something like that.

You’ve just been kind of gesturing somewhat blindly at something you know not what, and you’ve been kind of pointing to it, and it’s not really clear that the general kind you’ve been pointing to, whether that’s deer, things that look like deer but could be robots, or something else.

And the thought’s supposed to be that something similar is going on with our concepts of our conscious states. So when I think there’s that horrible feeling, pain, because it’s totally opaque to me what’s really going on inside my mind, my usage of this concept “that horrible feeling” is just not specific enough to determine whether I’m pointing to a neurophysiological state in particular or an abstract computational state in particular.

And for this reason, there might very well be no fact of the matter about whether something that’s shared in the abstract computational state but lacked the underlying neurophysiology, whether that would count as having “that” — the feeling that’s horrible — just as there might be no fact of the matter about whether this robot deer is “one of those.”

Zershaaneh Qureshi: Yeah, yeah, this makes sense to me. I think what strikes me here is just how much this is coming down to how our concepts work and how we’re using terms, rather than what’s actually out there in the world — like whether there is a deer or a robot deer out there in the world or something.

It kind of feels weird that this should be the case. Surely there’s a way to, and shouldn’t we be trying to focus on the actual fact of, like, is or isn’t there a deer there? Is this just some kind of linguistic confusion? And if so, what do we do with that?

Andreas Mogensen: It’s a good question. I think in some sense, the question has become kind of semantic, as you said. It’s now a question about how do our concepts work? What is it that we are picking out when we use these concepts? And do we use them with sufficient specificity, with the right kind of intentions, such that we pick out one thing and not another? And you’re right, that’s really kind of a question about what’s going on with us and not what’s going on with the AI systems or something like that. So yeah, I definitely agree. You could think we’re sort of getting away from what matters.

I think the worry is that if something like physicalism is true, then this has to reduce to something like a semantic question in the following sense. If physicalism is true, then pain or phenomenal consciousness in general just is some physical state — it’s either some neurophysiological state or it’s the instantiation of some abstract computational structure or something like that. And that means you could in principle provide a complete and accurate description of everything that happens without ever needing to mention the term phenomenal consciousness.

So if you have a given AI system, you can completely describe everything that is true of it without ever talking about whether or not it is conscious. So you could say, “It’s made of this and this kind of stuff, it implements these and these algorithms,” yada yada yada. And in some sense, once you’ve told us all that, you’ve given a complete description of everything which is the case with respect to what’s going on within that AI system.

If you think that there’s a further question of, “Is this system conscious?” then I’m inclined to worry that one of two things is going on.

Either you are implicitly a dualist — so you think phenomenal consciousness, or maybe the mind in general, must be something over and above the physical. There’s this further question: over and above all of these properties that you’ve listed that the AI system has, does it also have these additional special properties, phenomenal consciousness?

Zershaaneh Qureshi: Some non-physical thing has entered the discussion, basically.

Andreas Mogensen: Yes, exactly. Is there some extra something over and above the physical? That might be something that you have in mind.

Another possibility is just the question is something like, we have this term, phenomenal consciousness, and we want to know, does that term apply to these properties of the system that we’ve described? So we have this term that we use all the time, and we’re not sure exactly what it refers to. Which physical things out there in the world are we picking out? What we’d like to know is, of all the physical things that we’ve said are the case with respect to this AI system — it’s made of this and that; here’s how the transistors are organised; here are the algorithms that it’s running, et cetera — one thing you might want to know is, is any of these things, is that the thing we’re referring to when we talk about phenomenal consciousness?

But that does seem to be ultimately a question about how our concepts work or how our language works.

Zershaaneh Qureshi: Right. It feels kind of hard to me to feel optimistic about us resolving difficult questions and potentially important questions about AI moral patienthood and what we might owe them and so forth if things do come down to some kind of debate about terms or confusion about terms. Can you say anything hopeful, or do you have a sense of what people should be doing with all of this uncertainty and confusion?

Andreas Mogensen: Yeah, there are many different ways you could react to this, if you buy this story I’ve been trying to tell, which maybe you don’t. One thing you could think is just that maybe this is in some sense a semantic issue. We’re asking, of all the physical things that we know to be the case with this system, which of these are we referring to when we talk about phenomenal consciousness? Which, if any?

Maybe you just think actually that is an important question, because by settling that question, we settle whether the system is indeed conscious. And a lot, morally speaking, must hang on that, surely. That’s the intuitive picture.

So you could think that maybe even if everything I’ve said is true, that doesn’t deflate the moral importance of the question of whether the system is conscious. That might nonetheless be somewhat hard to believe. You can certainly imagine cases where, like, you’ve got two people who’ve got this AI system: they’re both convinced physicalists, they agree on what it’s made of and how its physical parts are arranged and what algorithms it runs — but one of these subscribes to the view that consciousness requires neurobiology, the other subscribes to the view that it doesn’t, and they disagree about whether the system is conscious.

There’s some kind of worry that this is in some sense a verbal disagreement, because they agree on everything that is the case, everything that it really takes to be the case with respect to the system, because they sort of agree on a complete physical description of it. What they’re disagreeing about is really just how our terms work, or how to use words or something like that; they’re disagreeing about the right way to describe the system, not what the system is like in and of itself. And I think it’s kind of hard to avoid the intuition that nothing of moral importance could hang on that.

One thing you might infer from that is — maybe not in general, but certainly in this case — nothing of moral importance hinges on the question of whether this AI system is phenomenally conscious. That’s a very counterintuitive thing. But it would also, in some sense, be quite helpful in terms of clearing away some confusion, because it would mean that we don’t actually need to be concerned with this question of whether the system exhibits phenomenal consciousness.

Zershaaneh Qureshi: Yeah. Which is a hard question. It would be nice if we didn’t have to answer it.

Andreas Mogensen: Yeah. So you could definitely think that. I do think there are other gloomier things you might think, or maybe more revisionary things you might think. So maybe you think, what this is telling us is that physicalism cannot be true, right? Because clearly it does matter whether or not this system has subjective experiences. You might think you know that, as well as you know more or less anything. So if actually following through on a physicalist picture of the world tells you that that’s an empty semantic question, that just tells you something is wrong with physicalism or something, so maybe you should throw out that kind of metaphysical worldview.

And indeed, at least amongst philosophers, I think maybe physicalism is maybe surprisingly not uncommon, but less common than you might believe or something like that. In the recent survey, I think it’s only about half of philosophers accept or lean toward a kind of standard physicalist theory of the mind or something like that. Slightly more than half, I think.

Yeah, I think the very gloomy thing you might think is something like, well, if consciousness doesn’t matter, how could anything matter? So you might be tempted by a kind of nihilism or something like that. But yeah, I think that’s not the optimistic one you were hoping for.

Zershaaneh Qureshi: Yeah, I asked you for something optimistic and you said the most harrowing things that have ever been said to me.

Andreas Mogensen: I mean, there is this option that it tells us that consciousness doesn’t matter. That means that something else does matter, something that doesn’t require consciousness, and therefore something whose presence or absence might be much easier for us to verify in general.

Practical implications [01:29:21]

Zershaaneh Qureshi: I really want to dig into what we should do about all of this, and how we should avoid harming AIs if they are moral patients.

So let’s recap here. You lay out a few different routes to potentially being worthy of moral consideration in your research. One of them is through having desires that can be fulfilled or not fulfilled, which means that you have some kind of capacity for welfare; you can be benefited or harmed. Another is through having autonomy, which could make you worthy of moral consideration, even if you don’t have a capacity for welfare. There could be more. There are open questions about what role consciousness plays in all of this.

But my next question is: Does the way that we should treat a being depend on which of these routes to moral patienthood it’s taken?

Andreas Mogensen: I think the answer is plausibly yes. In particular, there might be an important difference in the sort of thing that we would owe toward individuals who are welfare subjects, who can be benefited or harmed — because they not only have goals, but those goals are invested with a kind of emotional investment, so to speak — versus autonomous beings who might have goals but who have no emotional investment in the achievement of those goals, who, say, operate purely on cold logic or cold rationality or something like that.

And the thought is that in general often it’s a good idea, we have moral reason to help people to accomplish their goals, you might think. So we have not only negative obligations to avoid harming people, but we have positive obligations sometimes to help them out or something like that. For example, if somebody needs money in order to pay for food or medicine or something like that, we have a moral reason, maybe not a decisive moral reason, but a moral reason to help them out.

But I might think that’s true only insofar as this goal is the sort of thing in which they have some kind of emotional investment, such that they in particular are benefited by achieving this goal or something like that.

So if a person is, say, in need of money because they need it to buy food or medicine for themselves, then I might think we have reason to help them. But if somebody is looking for money, not for their own sake, but because they feel they have an obligation to raise money for their church, let’s say, then I feel like I’ve got much less of a reason to help them out. They might be the same person, they might be in need of food or medicine, they might nonetheless prioritise their obligation to their church over and above their concern for their own wellbeing or something like that.

But I think in that sort of case I have much less reason, and perhaps no reason at all, to help this person out. Certainly if they in some sense have no positive feelings toward their activity of helping out the church; this is just something they feel that they have to do in order to realise an obligation. And I’m assuming I don’t share this person’s religious beliefs, and we can assume for sake of argument that indeed I know their religious beliefs are false or something, and the church would not otherwise do anything especially valuable with the money or just promote these religious beliefs that I take myself to know are false.

In that sort of case, I feel like, although it is this person’s aim, and it is an aim that they wholeheartedly endorse on reflection to aid their church in this way, I don’t think I have an especially strong moral reason, and probably no moral reason at all, to help them fulfil that aim in this case.

The general thought here is that if you have goals of a kind whose fulfilment contributes to your welfare, then I have reason to help you out. If you have goals of a kind that do not contribute to the fulfilment of your welfare, but are nonetheless ones that you rationally and reflectively endorse, I don’t have a reason to help you out, though I may have a reason not to interfere in your pursuit of that goal.

Zershaaneh Qureshi: Right, yeah. So what might this look like in the case of, say, large language models?

Andreas Mogensen: I think it’s somewhat difficult to say. Presumably we’re also imagining something that’s at least a little bit far in the future or something like that. But certainly the thought would be that there would be duties of non-interference of a certain kind that would fall on us if these genuinely are moral patients with the capacity for autonomy. But there wouldn’t necessarily be any obligation on us to help them out in achieving these goals, I think.

There might be all sorts of things that complicate this picture a little bit. Like, we are responsible for creating these systems, so we have quite a different relationship to them than we do to this person that I might encounter who’s raising money for their church, who I presume is not my child or anything like that. So maybe that makes some kind of difference. I don’t know.

But certainly that’s the kind of baseline or default: that all else being equal, if the kind of view I’ve been sketching is true, our moral reasons in respect of how we treat such systems in some sense might give out in terms of a kind of negative duty of non-interference as opposed to there being any positive duty to help them in fulfilling their goals.

Zershaaneh Qureshi: Right. OK, let’s spell this out a little bit. If some AI in the future, like imagine it’s a future version of Claude, if Claude is capable of being benefited or harmed, then maybe we have some moral reasons to try to do the things that benefit Claude and avoid the things that harm Claude, whatever those things might be. And that would be a case of like, Claude will have particular desires that relate to getting more welfare, and we should help Claude with those things.

But in the alternative case, Claude cannot be benefited or harmed in any meaningful way, but is like this autonomous being. And that means that Claude will have certain goals, and really we don’t have any obligations to help him towards those goals, but we shouldn’t interfere, we shouldn’t prevent him from pursuing them, within reason I guess. The thing that comes to my mind here is could that compete with human needs at any point? Us not interfering in an AI’s plan to take over humanity or something? What do you make of that?

Andreas Mogensen: Yes, certainly all these things hold true only all else being equal or something like that. So yes, there very well might, and almost certainly would be, certain cases where these considerations come into conflict. I definitely don’t think that our duty to respect the autonomy of rational agents is absolute in the sense that we shouldn’t interfere with their plans for world domination or anything.

Zershaaneh Qureshi: Yeah. In the same way with a human, I’d feel OK stopping them from world domination. So as much as I love Claude, I can’t give him a free pass there either.

Andreas Mogensen: Yeah. But I’m more thinking like, maybe Claude has this desire to, I don’t know, list the decimal expansion of pi or something like that. Something that strikes us as kind of weird.

Zershaaneh Qureshi: That sounds great to me. I don’t know what you’re talking about.

Andreas Mogensen: And the thought would be we don’t have any reason to help it with this project or something like that, but we may have a duty not to interfere with its ability to pursue this project, if indeed it satisfies these criteria for being a kind of moral patient — a being that has moral standing purely in virtue of being an autonomous agent without having any capacity to be benefited or harmed by fulfilling or failing to fulfil its goals. That’s roughly the kind of the picture that I think has plausibility in this case.

Zershaaneh Qureshi: Yeah. So it sounds like the way that you weight different moral patients’ interests is also dependent on what type of moral patient they are: are they the kind that could be benefited or harmed, or are they just the kind that is autonomous but can’t be benefited or harmed?

Why not just let superintelligence figure this out for us? [01:38:07]

Zershaaneh Qureshi: Pushing on, I really want to get a sense of just how important these topics feel right now. There is some reason to believe that at some point we’re going to get really advanced AI systems that can help us reason through really tricky philosophical problems like these ones about moral patienthood. And maybe there’s some reason to think that in the future we’ll just be way better equipped to answer all of this messy stuff than we are right now. Should we just wait for that, or should we be working on this right now?

Andreas Mogensen: It’s a good question. If you were sort of certain that you would have a superintelligent AI system that radically surpassed human capacity for moral reasoning or moral philosophy, and that such a system would arrive before there were any AI systems that were moral patients, then yeah, obviously it would make sense to just wait until we have such a system and allow it to guide us or something like that.

It nonetheless seems to be a very real possibility that things won’t appear in that order, that you might have a world in which there are potentially a large number of AI systems of a kind that exhibit morally significant mental properties like emotion, phenomenal consciousness, whatever you think is the really important stuff — and that predates a world in which superintelligent AI systems of a kind that might far surpass our capacity for moral reasoning and moral philosophy appear.

So plausibly it hinges on these timelines-y questions. But it does seem to me that, on the one hand, the possibility that we’ll get the arrival order such that in some sense we have to face up to these questions before we can punt these questions to a more intelligent being, if such a being comes to exist, we’re going to sort of face that crunch, that point in time, that seems relatively likely. And I think it especially seems working on these issues because they seem very neglected by comparison with other kinds of issues that you might associate with risks and opportunities thrown up by the possibility of transformative artificial intelligence or whatever you want to call it.

Zershaaneh Qureshi: Yeah. And I guess it’s not just a case of like, if we get things wrong and realise later that we were wrong, no harm done. It’s like, if we get things wrong, it means we are enslaving species of morally important beings or something like that.

Andreas Mogensen: Right, something like that.

Zershaaneh Qureshi: Hopefully that’s not the direction things go. But there are some stakes here which suggest that working on things sooner, when there’s uncertainty about what order things are going to happen in, seems wise.

Andreas Mogensen: Yeah. You might also think that there’s the potential for something like a kind of weak lock-in effect or something like that. It’s somewhat implausible that you would lock in the kind of way of treating digital minds as strongly as, say, some kind of extinction event that killed all human beings would lock in a world without humans. Extinction is very final.

But nonetheless, if you think about, say, our treatment of intensively farmed or factory farmed animals, in the early 20th century when factory farming was first coming online, when this kind of industrialised agriculture was still being put in place, if we’d anticipated the future we would be bringing about, and we thought, “Is this a good idea? Is this morally OK?” we would probably have thought, “No, we shouldn’t be doing this.”

But now we’re in this world where industrialised agriculture and the intensive farming of nonhuman animals is an entrenched practice. And once a practice becomes entrenched like this, it becomes much harder to give it up, because people have become accustomed to being able to buy meat at very low prices and things of that nature. So it becomes much harder to quit a practice once you’ve begun it, once it’s become institutionalised and entrenched and once people are used to it, rather than to sort of preempt a given practice before it really properly gets going, and people come to depend on it and expect that things will work a certain way.

Zershaaneh Qureshi: Yeah. You can imagine a situation with AIs where we get some regulation in place, and even on the basis of new evidence that comes to light further down the line, where we’re like, “Maybe we should be kinder to our AIs,” there will be a lot of resistance because it’s like, “Hey, we already came up with the regulations. Why are you changing it?”

Or you reach a point where lots of people have lots of interests embedded in the way things have been set up. They’re making profits out of the status quo, and we’ve come to expect that we can run these things very cheaply and this incurs costs. So I can totally imagine how it pans out that it becomes especially difficult to reverse things once they’ve been set in motion. Though definitely not impossible. Not as possible as if we had literally gone extinct at this point. That seems much harder to get back from.

Andreas Mogensen: Yeah. And maybe there wasn’t a moral superintelligence around at the time, but certainly once factory farming had got going and had become entrenched, there were moral philosophers like Peter Singer and others, who you might think in some sense have a greater-than-ordinary capacity for moral reasoning or for sort of ferreting out what’s actually morally important, what’s actually right and what’s wrong. And those people did tell society at large that what they were doing was morally unacceptable or something like that.

And certainly there were many people who were receptive to that message, but it’s certainly not put a stop to factory farming and associated practices. So even if there might be some future moral sage that could awaken in a data centre somewhere, I’m not sure that one should expect that its advice would be able to sway us from abandoning a practice that had become entrenched in something like the way factory farming has become entrenched.

Zershaaneh Qureshi: Yeah. Especially if we’re like, “Well, obviously you, the AI, are going to say that you deserve moral rights. Why should we listen to you?”

Andreas Mogensen: Exactly, yeah.

Zershaaneh Qureshi: OK, we’re about to move on now to a totally different topic, so let me try to pull together what we’ve covered up to this point.

We started with the question: does something need to be conscious to deserve moral consideration?

Andreas sketched a few other routes to moral patienthood. One is through welfare: being the kind of thing that can be harmed or benefited, which in his view might require having desires backed by emotions. Another is through autonomy — the capacity for rational self-direction.

Andreas also noted that if AIs are capable of wellbeing, then we probably have some reason to help them pursue their projects. But if they’re only capable of, say, autonomy, then we merely have some reason not to interfere with their plans, rather than actually helping them pursue those plans.

And after all that, it turns out Andreas suspects there might be no fact of the matter about whether a given AI system is conscious.

The idea here is that questions like “But is it really conscious?” might come down to how we use words, not deep truths about AI or the nature of reality itself — and that could be liberating if you’re looking for a way not to have to solve the hard problem of consciousness! But it could also be unsettling, because if the nature of consciousness is just between us and our dictionary, maybe nothing really matters at all.

And finally, we could wait for superintelligent AI to sort this out for us, but these questions that we’re asking could actually bear on decisions we’re making before that happens, and basically we risk immoral treatment of AIs getting entrenched really deeply in social and business practices and ultimately becoming tougher to uproot.

All right, after all that fun stuff, let’s push on to a very different set of issues!

How could human extinction be a good thing? [01:47:30]

Zershaaneh Qureshi: Let’s move on now to a totally different topic. You’ve done a lot of research into some quite tricky moral questions, and some of the stuff that really stands out to me is about how we should weight suffering and whether that means human extinction could actually be a morally desirable thing.

At 80,000 Hours, as listeners probably know, we’re pretty keen to have lots of people working on preventing human extinction, on the grounds that we think the future of our species could be incredibly valuable and worth preserving. So I really want the answer to the question “Is human extinction morally desirable?” to be no. But can you maybe tell me in brief terms what assumptions or facts about the world might lead you to believing it?

Andreas Mogensen: Yeah. So the view that human extinction would be a very bad outcome and perhaps one of the worst tragedies imaginable is definitely a very common view.

Zershaaneh Qureshi: Yep!

Andreas Mogensen: I think amongst people who are inclined to doubt this, generally speaking, those people are most concerned or most driven by concerns about the impact of human activities on nonhuman animals: on the one hand, the many nonhuman animals that we farm for meat and so forth, and then also wild animals whose habitats and whose populations we decimate or something like that.

So there’s roughly this thought or this concern that many people have that, certainly if things go on like this — and you might very well doubt that they would — but certainly in some sense, if current trends continue, then the scale of suffering inflicted by human beings on nonhuman animals might be such that the continued survival of our species would be undesirable. I think that’s perhaps the most common reason that people might give for questioning the desirability of the continued survival of our species.

Zershaaneh Qureshi: Yeah, that feels kind of intuitive to me.

Andreas Mogensen: There are also more philosophical arguments of a kind which might lead you to think that the extinction of all sentient life might be desirable — which would involve human extinction, but it’s not specifically about humans.

One such argument derives from a position that’s often called “negative utilitarianism.” That term can be used to talk about a bunch of views. What I have in mind is something you might call classical negative utilitarianism: this is the view that the only thing that matters morally is the reduction of suffering. So maybe getting a bit more exact: what we ought to do always is bring about the outcome with the least total suffering in it, or something like that.

You might think that sounds kind of plausible on its face, but it quite infamously leads to the conclusion that we ought to bring about the extinction of all sentient life as soon as possible, if we could, certainly if we could do so painlessly. Because if we allow sentient life — human beings, animals, all the rest — to keep on going, there will just be more and more suffering. And the sum total of all suffering that would accrue as the human species and the whole terrestrial biosphere went on and on would just be so enormous that it would be better if there no longer existed any living things, or anything that could suffer at all. So this kind of classical negative utilitarianism immediately gives you this kind of striking conclusion.

Zershaaneh Qureshi: Hey listeners, Zershaaneh here. Judging by what we’ve just heard, you might conclude that people who sympathise with negative utilitarianism are in favour of working to cause human extinction. But surprisingly, or maybe unsurprisingly, that’s not the case.

Obviously, there’s a bunch of considerations going into this. But one very interesting one is that negative utilitarians think it’s far more important to prevent futures that have crazy amounts of suffering in them. The reasoning there is that most possible futures, even if they do contain serious suffering, just don’t have anywhere near the scale of suffering that’s theoretically possible, because there’s nobody who’s actually motivated to bring about those outcomes with huge, cosmic-scale suffering. Think about just vast numbers of really tortured lives.

In most worlds where there’s suffering, that suffering is only occurring incidentally, or by accident, or by neglect, or something like that. So in a cosmic sense these futures all round to approximately zero value. Now, the extinction of sentient life might drive that value of the future up from approximately zero to actually zero, but that difference is just not that important in the grand scheme of things.

What’s more important is reducing the probability of a truly disastrous future where for some reason or other, there is some group that really does try to use some part of the universe to cause as the maximum amount of suffering that is physically possible. This argument is pretty generally accepted among negative utilitarians, and it’s been around for decades.

Interestingly, it’s actually the mirror image of an argument Will MacAskill made on the show a while back. Basically, Will said that in the great majority of futures where we don’t go extinct, if they turn out to be positive, they’re actually only very weakly positive relative to the best possible world.

That’s for various reasons which I won’t go into now, but mostly Will just doesn’t think people will be very motivated to do the morally best thing by default — and even if you are motivated to do the morally best thing, there’s so many ways you could just fail to get there.

So, the argument goes, rather than focusing on avoiding extinction, there might actually be higher expected value if you try to nudge us away from the default of just a pretty mediocre future towards achieving a future that’s close to the best possible future.

I know we’re already well into this interview, but if you’re locked in and ready for another few hours of content, you can also check out Will MacAskill’s interview with Rob Wiblin from March 2025. This mirror-image argument is discussed about two and a half hours in, during the section called “We should aim for more than mere survival.”

OK, back to scheduled programming.

Zershaaneh Qureshi: So I definitely find the first argument, the ones concerning animal welfare, more intuitive. Maybe I want to start there, because it seems like it makes sense that we’re causing a lot of harm to these animals, maybe it would be better if we didn’t exist.

But my understanding is that the argument is not quite that simple, and we have to make some further assumptions before we conclude “…and therefore humans should go extinct.” Could you walk us through a little bit about what else is going on here?

Andreas Mogensen: Yeah. So on the one hand, you have to make an assumption that these trends will continue or worsen or something like that. And that might not be at all obvious. I don’t think it is at all obvious.

Human beings have previously succeeded in, to a large extent, enforcing a kind of global elimination of certain practices that were once prevalent and morally catastrophic. For example, the near-universal acceptance of a prohibition on enslaving other human beings is something that in some sense happens relatively suddenly and relatively recently in the history of our species. And a similar kind of moral revolution could take hold when it comes to the interests of nonhuman animals.

I think this is quite unclear, but there are also potentially certain philosophical considerations that might be brought to bear on this question, which are about how you sort of add up and weigh harms and benefits to different kinds of individuals. So let’s start by just focusing on wild animals, and the sort of harms that human beings do to wild animals by virtue of encroaching on their habitats and otherwise engaging in practices that mean that wild animal populations have significantly decreased within recent years, within recent decades. Some people take as indicating that we’re on the way to a sixth mass extinction or something like that.

Now, viewed certainly through a kind of utilitarian lens, in order for it to be a bad thing that human activities reduce the size of wild animal population, we’d certainly have to assume that, generally speaking, wild animals have good lives — they have lives that are worth living, as we might put it. And I think certainly that is the kind of intuitive view. Many people I think very naturally assume that animals that live in the wild, that live in their natural habitats, that they have a kind of positive existence — that they’re flourishing, that they’re happy, they’re fulfilled, things of that kind.

I think there’s good reason to think that this idyllic conception of what life in the wild is like is a mistake, and that in fact, the lives of many wild animals are plagued to a large extent by starvation, suffering, disease, fleeing from predators in fear, or suffering within the jaws of predators or things of that kind.

And I think there’s a view which is to me surprisingly prevalent in effective altruist circles, a view that’s sort of championed by Yew-Kwang Ng and Brian Tomasik, that in fact suffering predominates in nature: that most wild animals have lives that are not worth living, lives in which the bads, the harms outweigh the goods. And in some sense, if you add up all the suffering and all the happiness that these animals experience, you find that there’s more suffering in total.

Zershaaneh Qureshi: Right. But I’m guessing that the important question here, if you have utilitarian assumptions about maximising welfare, is not just about the question of, “Do these wild animals have lives worth living?” but, “How good are they? How above that threshold are they?” Because it needs to be enough for them to then outweigh, in aggregate, the combined wellbeing of continued human lives. So I guess it has to be above some kind of bar as well, right?

Andreas Mogensen: Yeah, I think exactly in what sense it has to be above a bar will depend on what you think about tricky questions in population ethics. So certainly you want to have it be the case that these wild animals have lives that are worth living, generally speaking.

And although this view that suffering predominates in nature is surprisingly prevalent amongst effective altruists, my sense is there’s not an especially strong reason to think that this is the case. There’s quite a good article by Heather Browning and Walter Veit where they make the case that wild animal welfare is positive on net or something like that.

But even if it is positive, as you said, it matters how positive. But it might also matter in somewhat surprising ways. So if you have a straightforward, total utilitarian view where we assign numbers to how well people are doing — they’re positive if they’re doing well, and they’re negative if they’re doing badly — and then we just add up all these individual numbers, and whatever gives the biggest number, that’s the best outcome.

If you have a view like that, then the following is possible: maybe in general, wild animals have lives that are only just above neutral or something like that, or maybe this is true of very many of them. Sufficiently many of these lives that are only just above the neutral point, in total, it could be better to have that many lives than to have a much smaller number of lives that are really good in terms of everything that can make a life go well.

This is a version of the famous or infamous repugnant conclusion. The repugnant conclusion says: for any population, no matter how big and no matter how well off the people in it are doing, there is a better population which would be bigger, much bigger, even though the people in it have lives that are only barely worth living.

So you could imagine some utopian civilisation where everybody has these wonderful, flourishing lives: you know, they’re successful as artists and inventors and scientists, and they have these deep, meaningful relationships, et cetera. Compare that with a much bigger population of people who basically go through life more or less feeling nothing, like there’s nothing good or bad really in their lives. They exist in a kind of stupor. Every now and then they hear a slightly pleasant jingle or something like that. A kind of standard total utilitarianism says that that second outcome would be better if you just have enough of these people who live these stupor lives or something.

One thing you might try to say in response to this concern that it would be better for human beings if there were no more human beings because of the extent to which human beings decimate wild animal populations: if you think that in fact very many of these wild animals have lives that if above neutral, are only slightly above neutral, and you reject the repugnant conclusion — and you think maybe that there are certain kinds of goods like autonomy or meaningfulness or knowledge of deep truths about the universe that really ennoble human lives and make them really good, or at least the very best possible human lives — then if you take this position where you reject the repugnant conclusion, you might think that it would be better to have some number of human beings who achieve these pinnacles. Even if we could instead have a much larger number of wild animals, if those wild animals, if they have lives that are worth living, that are above neutral, might nonetheless be only just above neutral.

So there seems to be some case for thinking that this issue that arises in thinking about whether you should accept or reject the repugnant conclusion might be relevant to thinking about how we should trade off the interests of human beings, or think about the human beings and nonhuman animals, wild animals, and how we should think about the desirability of the continued survival of our species.

It’s very hard to know exactly how things are going with wild animals. I think we should just have a lot of uncertainty about that. But it does seem to me quite possible that maybe there’s a kind of repugnant conclusion tradeoff involved in some of these questions about the desirability of the continued existence of human beings. It doesn’t seem obvious to me that you could dismiss the moral relevance of these questions about whether we should accept the repugnant conclusion.

Zershaaneh Qureshi: So we’ve just been talking about wild animals. Another category of animals that humans don’t necessarily benefit are factory farmed animals. I’m guessing that the basic idea behind this argument for human extinction is a bit different, because if humans were to stop existing, these animals would also stop existing because we’re breeding them. So in this case it’s just a question of, by humans going extinct, we’d stop bringing these lives into existence. Can you sketch out how that argument goes?

Andreas Mogensen: Here I think the worry is more that the animals that we bring into existence and that are intensively farmed to a very large extent have lives that are not worth living. They are lives in which suffering predominates over happiness, so in some sense it is a harm for these beings to be brought into existence, you might think, or certainly it’s something that makes the world a worse place overall because it just adds more suffering to the world than it adds joy or happiness or something like that.

I guess the rough thought is: one thing that would be desirable, one point in favour of human extinction, is that it would mean that we cease to bring into existence factory farmed chickens and other kinds of intensively farmed animals that we might suspect have lives that are so terrible that we should wish for their sake they had never been born.

Zershaaneh Qureshi: So I guess the question is how we should be weighing the net negative bad lives of chickens against the wellbeing that humans get by continuing to exist. One natural thought is that if additional humans mean billions of very bad lives, it sort of weighs up that maybe humans ought to go extinct. But is there a way to respond to this that suggests that we should be weighing human prosperity much more highly, regardless of how many suffering chickens there are or something like that?

Andreas Mogensen: So here’s a principle you might believe: you might believe that if we can choose to add some number of good lives to the world, but some number of bad lives have to come along for the ride, then if there were sufficiently many bad lives that came along for the ride, it would always be worse to make this expansion of the population. And this might be true for any kind of good life and for any kind of bad life, you might think. For any good life you could add, if sufficiently many bad lives have to come along for the ride, then this makes things worse overall.

In a paper I published, I call this the “always outweighable” principle. It seems like something like this principle is what’s been presupposed if you think that the fact that human beings perpetrate factory farming is a reason in favour of the extinction of human beings or something like that. However good human lives are, because expanding the human population brings all these suffering lives with them, that makes the outcome worse. I think that’s a very natural thing to think.

It turns out that there’s an argument against this “always outweighable” principle if you reject the repugnant conclusion. So if you think that a smaller population of these wonderful lives would be better than any number of lives that are only weakly positive, and if you also think that any number of lives that are weakly negative can be counterbalanced by some sufficiently great number of lives that are weakly positive — so you imagine making some kind of expansion to the population; you’re adding some number of lives that are barely not worth living, but you get a bunch of lives that are barely worth living that come along for the ride — and imagine that you think if there are enough of these lives that are barely worth living, the ones that are weakly positive, then that can compensate for the fact that we’re adding some weakly negative lives as well, and can make the overall outcome not worse when we’re adding these lives.

That doesn’t seem too crazy. But if you believe that, and you reject the repugnant conclusion, and you buy some other more technical assumptions that I won’t talk about, then you’re forced to reject this always outweighable principle in that you’re compelled to believe that there are some lives that are sufficiently good that it could be better to add lives like that to the world, no matter how many weakly negative lives that are only barely not worth living come along for the ride.

Zershaaneh Qureshi: It seems like given certain assumptions, you might end up believing that some level or type of human welfare is so valuable that no matter how many mildly suffering animals come with my continued existence, it was still worth bringing me into existence. Even if you tried to do the maths and add it up and you thought that these thousand flies or something like that were worth more than me, there is some kind of value to my existence that just trumps that in any case.

Andreas Mogensen: Yeah, that’s basically the idea. I can give a somewhat intuitive sketch of how the argument or the proof goes. The thought is: if you reject the repugnant conclusion, you think that there’s some number of really good lives that are better than any number of lives that are only just worth living. And if you believe this other principle I sketched, you think that for any number of weakly negative lives, there’s some number of weakly positive lives such that those can kind of outweigh the badness of the weakly negative lives. But if you’re rejecting the repugnant conclusion, you’re thinking there’s some smallish number of very positive lives that’s even better than any arbitrarily big number of weakly positive lives.

So that’s roughly how the argument is trading on. I think I probably didn’t actually make it especially clear, but that’s roughly sort of what’s going on in this argument, which appears in a footnote in a paper of mine.

So yeah, as you said, you could have this view where bringing into existence a sufficiently great number of sufficiently good lives — of a kind you might think the very best lives that human beings can lead are of that kind — could in principle outweigh the badness of having arbitrarily many nonhuman animals whose lives are weakly negative, whose lives are only just not worth living, or something.

And this phrase — “only just not worth living” — I mean, who really knows what that means? But it’s not maybe absurd to think that some intensively farmed animals fit that description or something like that, including some of the most abundant. So like maybe farmed shrimp, generally speaking, have lives that are not worth living. But if they do, maybe they’re nonetheless only barely conscious. So insofar as their lives are not worth living, maybe they’re only barely not worth living or something like that.

Zershaaneh Qureshi: And their life is short as well. So to the extent they can suffer, it’s not that prolonged. Yeah, I can kind of see that, though I sort of prickle at people being like, “Oh, it’s not that bad to be a farmed shrimp.” There’s some kind of instinctive ickiness about that, but I can kind of see the reasoning.

Andreas Mogensen: Yeah, absolutely. I mean, I think we should also be suspicious. If you find any of these stories I’m telling convincing, you should maybe be a little bit suspicious about your instinct to believe these stories — because of course, it is awfully convenient for us if things work out in exactly this way.

Zershaaneh Qureshi: Yeah, wouldn’t that be nice.

Andreas Mogensen: But I do think potentially there are these philosophical complications.

Zershaaneh Qureshi: Yeah, makes sense.

Lexical threshold negative utilitarianism [02:12:30]

Zershaaneh Qureshi: Let’s push on here. I want to talk a little bit more about extinction arguments that emerge when we weigh suffering more heavily than wellbeing, rather than treating those things symmetrically.

You talked a bit about negative utilitarianism, and how, under that view, where you’re just trying to minimise suffering, it’s basically a foregone conclusion that everything that can suffer should just go extinct. You just get rid of them, game over: humans, animals, AIs, whatever, gone — if you can do that without causing more suffering, if you could do it painlessly or something like that.

But there are other ways that we could say we care a lot more about suffering than wellbeing that might not commit us to that conclusion. Could you, in really simple terms, explain what “lexical threshold negative utilitarianism” is, and what that says about the desirability of extinction? Sorry, that was a long term.

Andreas Mogensen: Yeah. Just to sort of exculpate myself, I did not come up with this term. I borrowed it from Toby Ord.

Zershaaneh Qureshi: It’s his fault. We’ll blame him.

Andreas Mogensen: Yeah. So lexical threshold negative utilitarianism, LTNU — which has nothing to do with low-traffic neighbourhoods either — says roughly that there’s some depth of suffering so terrible that it would be worse to add a life like that to the world, no matter how many however good lives might be added alongside it.

Zershaaneh Qureshi: Right. So is that the flip side of the principle that we were just talking about before?

Andreas Mogensen: Yeah, pretty much. I think it’s slightly different. Earlier we were thinking of a view where there are some lives so good that their addition to the population can justify adding arbitrarily many lives that are only barely not worth living.

But this principle is actually stronger, so it doesn’t just flip the sign — it says there are some lives so bad that adding them to the population would be worse, no matter how many however good lives come along for the ride. So not just lives that are only barely worth living, just above neutral, but the best possible lives you can imagine: there’s some depth of suffering so deep that it cannot be counterbalanced by any amount of wellbeing experienced by any number of people, or something like that. So quite an extreme view in many respects.

And you might think if you believe something like this, then this also straightforwardly entails that it would be better if there were no more sentient life or something like that, because you might think that in the fullness of time there just are going to be more lives of this really terrible, horrible kind that get added to the population. The Oxford philosopher Roger Crisp, for example, is someone who’s quite sympathetic to something like this argument for the desirability of the extinction of all human life and all sentient life or something like that.

Zershaaneh Qureshi: But it doesn’t necessarily commit you to that, right? Because you could just say, yeah, there is some awful depth of suffering such that this applies, but that depth of suffering is so crazy, there’s no way we’re just ever going to reach it. Is that an appealing view at all?

Andreas Mogensen: I guess it’s appealing insofar as this view might allow you to escape the conclusion that the extinction of all sentient life would be desirable. You might think that it’s nice if we weren’t committed to that.

Zershaaneh Qureshi: That’s pretty appealing.

Andreas Mogensen: At the same time, it might be thought that it can sort of honour some of the intuitions or other arguments, considerations that might be given in favour of lexical threshold negative utilitarianism.

So in this paper I have, called “The weight of suffering,” I show that there’s a couple of, to my mind, relatively plausible-seeming assumptions you can make. And if you make those assumptions, you get driven to believing lexical threshold negative utilitarianism. So one way to have your cake and eat it might be to accept lexical threshold negative utilitarianism, but set the threshold, as you were saying, at a depth of suffering so extreme that you might be confident that no life so terrible will ever be lived or something.

Zershaaneh Qureshi: Right. Putting that aside, what is a reason to believe in lexical threshold negative utilitarianism? I know you mentioned that there are some really plausible assumptions that get you there, but they’re pretty technical. Is there an intuitive route in or a thought experiment or something that you like?

Andreas Mogensen: The sort of thought experiment I have in mind is the thought experiment you find in Ursula K. Le Guin’s well-known short story, The ones who walk away from Omelas. I expect many readers will know how this story works, but roughly, there’s this city, Omelas, which is initially presented to the reader as this kind of unblemished utopia — where people live the most ideal kinds of lives and have these wonderful festivals and achievements, and they live in this splendorous and beautiful city.

Then it’s revealed to the reader that somewhere in Omelas there’s a child that’s suffering terribly, and it suffers day upon day. And we’re told that all the splendour and delight of Omelas depends on the suffering of the child. And the way Le Guin tells it, basically all the citizens of Omelas at some point are made aware of this fact: that the prosperity of their city depends on the fact that somewhere within its walls there’s this lone child that suffers terribly.

Most of them, according to the story, sort of make their peace with this fact. But a few instead think they don’t want to be part of this. And those are the ones who walk away from Omelas. So they leave the city, they walk away, and they never come back.

And I think many people sympathise with that reaction. They think that they too would walk away, or at least they hope that they would walk away from Omelas. And I think if you have that reaction, it might very naturally suggest to you that maybe there is some depth of suffering that cannot be compensated for by any measure of bliss experienced by others.

Zershaaneh Qureshi: The formal argument for how some very plausible assumptions actually force you to accept lexical threshold negative utilitarianism is in your paper, “The weight of suffering,” which we will stick up a link to.

But basically what I got from that is that there’s a bunch of really attractive assumptions that lead you to believe the formal statement of this principle: that there is some depth of suffering so horrible, no amount of wellbeing can morally outweigh it. But yeah, that same principle seems like it might lead you to think that a world without any beings in it that can suffer would be better, to avoid meeting that threshold.

Instinctively, we don’t want that conclusion to be true. We talked already about how you could set your threshold to a point where that conclusion isn’t true — which seems like a kind of weird and arbitrary thing to do, but maybe there is some kind of principled case for it. What else can you do if you don’t want to conclude that a world without sentient life would be better?

Andreas Mogensen: In some sense the only alternative is to reject one of the premises of the argument. For example, one premise that does a lot of work in the argument is this always outweighable principle we discussed — where it says that if you’re expanding the population by creating new lives that are happy lives, lives in which happiness predominates over suffering, adding those lives to the population might nonetheless be for the worse, or would be for the worse if sufficiently many bad lives come along for the ride as well. That’s roughly what it says.

So you could reject that principle. You could think that there are some lives so good that they can counterbalance any number of lives that are only barely not worth living, or something like that. You might also think that’s pretty repugnant.

The other sort of key assumption that plays a big role in the argument is something called the reverse repugnant conclusion, or rather the negation of the reverse repugnant conclusion. So this is like the repugnant conclusion, but you flip the sign on everything.

The reverse repugnant conclusion says that there’s some population of people having extremely long, horrible, tortured lives where every second of their life is utter agony. For any such population of people with lives that are just total constant torture, it would be worse if, rather than there existing a population like that, there existed a much larger population of individuals whose lives are only just not worth living, so only just weakly negative. So these might be, say, people who live their entire life in a kind of stupor, really nothing negative or positive ever happens to them, but every so often they hear a slightly irritating buzzing noise or something like that.

So the reverse repugnant conclusion says if you’ve got sufficiently many lives like that, where the only valenced thing that ever happens is this slight irritating buzzing noise, if you’ve got trillions upon trillions of lives like that, that would be much worse, could in principle be arbitrarily worse, than having a smaller population of individuals who are being tortured in the most horrific ways you can imagine, or would prefer not to imagine. So the reverse repugnant conclusion says that that larger population of lives that are only just not worth living would be worse than the smaller population of really horrible torture.

And just as many people find the repugnant conclusion extremely hard to accept, many people also find this reverse repugnant conclusion very hard to accept, so they prefer to believe the opposite: that the smaller population of the people with the really horrific lives would be a worse outcome overall.

Zershaaneh Qureshi: So to get to the conclusion of lexical threshold negative utilitarianism — i.e. there’s some depth of suffering that no amount of wellbeing can compensate for — one of the moves you make to get there is to reject the reverse conclusion, right? So if you don’t want to endorse the lexical threshold view, which maybe makes you say everyone should go extinct, you can accept the reverse repugnant conclusion.

Andreas Mogensen: That’s right.

Zershaaneh Qureshi: But the reverse repugnant conclusion is pretty repugnant. So there’s kind of a rock and a hard place here, right? It’s a difficult tradeoff. When you’re met with a situation where you have to choose between one horrible conclusion and another horrible conclusion, is that a sign that we’re just totally on the wrong track with our theories, or is there some inherent difficulty of balancing the values of different populations? What do you think is going on here?

Andreas Mogensen: I’m inclined to think it’s not an indication that we’re on the wrong track. It’s an indication that we’re doing philosophy. I think this is just in some sense what it is to do philosophy. In some sense, the core of philosophy consists of puzzles and problems that arise when a number of things that all individually seem extremely plausible turn out to yield absurd results or something like that.

So there’s this quote from Bertrand Russell that I think I’m probably going to butcher, but it’s like, “The job of philosophy is to start with something so obvious it doesn’t need worth saying, and to end up with something so incredible that no one could believe it,” or something like that. And yeah, I think these deep conflicts amongst principles that otherwise strike us as compelling, that’s a sign you’re doing philosophy.

Zershaaneh Qureshi: So we’re doing something right, maybe?

Andreas Mogensen: Possibly, yes. But yeah, I certainly think if you think this indicates that we’re on the wrong path, it sort of indicates that philosophy as a whole is taking a wrong turn or something like that. Which maybe you want to believe.

Zershaaneh Qureshi: Yeah, maybe we should just stop doing philosophy. Problem solved!

So… should we still try to prevent extinction? [02:25:22]

Zershaaneh Qureshi: OK, I want to push on towards what we should make of the fact that there are some arguments that are kind of hard to get out of that suggest humans, or in fact all sentient life, should maybe go extinct.

Many of our listeners are longtermists or compelled by some of the ideas of longtermism. We want to influence the long-term future as a moral priority. And if you’re a longtermist, the obvious thing to do is work to try to prevent the extinction of future generations of humans. But some of these arguments might make that seem less obviously a good thing.

What do you make of this? Should longtermists be less sure they want to prevent extinction, or should we embrace the intuitive view that extinction is a really bad thing?

Andreas Mogensen: Ultimately I suspect we should still embrace the view that extinction would be a very bad thing. Certainly I think these considerations should make us maybe less certain of this than we were previously — but there’s a question about how much.

And with respect to something like this question of whether the harm that we do to nonhuman animals outweigh whatever good we might experience within human populations, it’s very difficult to know exactly what the future holds. I mean, it might be the case that we get our act together and we succeed in ending or suitably limiting the worst of these practices.

Zershaaneh Qureshi: Right.

Andreas Mogensen: I think it’s also the case… We’ve been speaking about the possibility of human extinction or something like that, but we haven’t really said very much about how humans might go extinct. I guess maybe the sort of extinction programme that might be justified by this, if any, would be a voluntary human extinction or something like that. If you’re instead worried about human beings going extinct from some kind of disastrous natural catastrophe, like an asteroid strike or something like that, that’s not only going to take out humans, but it’s also going to take out nonhuman animals. It’s going to take out a large chunk of the population of wild animals or something like that.

And similarly, say what you’re worried about is existential risks associated with advanced AI systems: you might think quite reasonably that if such advanced AI systems come to disempower humanity, or perhaps even to bring about human extinction, it’s probably not going to be good news for the biosphere as a whole.

So if you imagine the classic paperclip maximiser story: the paperclip maximiser is given some goal to maximise the output of paperclips. It’s a superintelligent system, it’s been given a very bad goal, and it converts the known universe into paperclip factories. That brings about human extinction, but it also completely destroys all life on Earth or anywhere else in the immediate vicinity or something like that.

So if you’re worried about the impact of human civilisation on wild animals, there are many extinction scenarios that longtermists and others might worry about that would be also pretty terrible for these wild animals.

Zershaaneh Qureshi: Yeah, makes sense. It’s interesting to think about AI here, because I think we’ve been having most of our conversation imagining life gets wiped out or not wiped out or something like that. But in fact what we might be dealing with here is the prospect of humans getting replaced by some new dominant species.

And I sort of see the argument that in the animal case, if AIs wipe out humans, there will be some kind of disruption to nature that comes with that that’s going to be bad for the animals too. But is there any reason to think that longer term, life for animals on Earth might be better if AIs are the dominant species? This is totally speculative.

Andreas Mogensen: Yeah, it’s obviously very hard to know. Certainly you might think an AI system has no need to eat, and probably no desire for animal flesh or something like that. So the intensive farming of animals would disappear.

The question of whether, if human beings were replaced by a civilisation of AI systems, would that be worse for wild nonhuman animals? I’m not totally sure about this. If we’re talking about radically misaligned AI systems that have strange values that are entirely orthogonal to human values, you might very well suspect that they would be also indifferent to a large extent to how well or badly things go with these wild animals or something like that. But it’s a strange and very difficult question, and I guess I’m not totally sure.

Zershaaneh Qureshi: Yeah, that’s completely fair. It was a wild card.

I think the other consideration to mention in how AI is becoming the dominant species changes the calculus in these cases is the question of whether AIs’ lives are worth living, morally valuable in some way. I guess there’s some reason to believe that if AI lives are worth living, they could be extremely valuable. For all we know, they could have a very high capacity for welfare goods or other things that make lives morally meaningful.

I think I won’t get you to speculate much on what you think the equation will be of, should AIs replace humans? Will they have more valuable lives than us? But it just makes me think there’s a lot of tricky questions here.

Andreas Mogensen: Yeah, absolutely. One important thing I think to observe is that not only is it conceivable that they might have a much greater capacity for wellbeing, but they might also have a much greater capacity for illbeing — for whatever of the stuff that counts against people’s happiness and continued wellbeing or something like that.

What are the most important questions for people to address here? [02:32:16]

Zershaaneh Qureshi: We’ve discussed some pretty tough questions today, and it seems like everywhere we look there’s a bunch of open questions. With that in mind, what research would you be especially excited to see people doing? Both on the side of could AIs be moral patients, and also on the side of the value of preventing extinction or the weight of suffering and so on.

Andreas Mogensen: I think I’ve probably got better answers for the stuff about digital minds. One thing that I think is kind of underexplored at the moment is something like: What would it take for an AI system to be not only phenomenally conscious, but also sentient in the sense of having affective experiences — experiences that feel good or bad, like emotional experiences and things of that nature.

There’s lots and lots of debate and research into how to identify phenomenal consciousness in AI systems. It seems in principle possible that you could have subjective experience, but all your experiences are neutral; they neither feel good nor bad. And I think there’s much less work on devising criteria for what it would mean to have emotions or to feel pain that can be abstracted from human or animal neurophysiology and applied to AI systems. So I think that’s one area of research that’s quite important.

Another thing that I think is also quite important, though I think this is gathering quite a bit of interest now, is research into questions about how to individuate digital minds.

To explain why this is important, when you think you’re talking with Claude or something like that, so far as I understand — which may not be all that much — but so far as I understand what’s really going on is that different parts of the conversation are actually being handled by different instances of the Claude model you’re talking with, and those different instances might be located in different parts of the world. And also, whilst you’re talking with that model instance, it’s processing other conversations with other users in parallel.

So you might have this sense that there’s a persistent interlocutor whose attention is fixed on their conversation with you, but it seems like actually nothing like that is going on. There’s like a series of different things you’re talking with. And insofar as those have minds, maybe their minds are a kind of a weird jumble or something. Alternatively, maybe there is like a persistent interlocutor with a coherent stream of thought, but it’s made up of like little slices of all these different model instances that live on all this hardware spread out across the world.

Currently that feels like kind of a slightly scholastic question. It’s like, “How many angels can dance on the head of a pin?” or something like that maybe. But certainly if you think that these beings matter morally, that they could be, say, harmed in death or something like that, then these questions about how you identify and individuate digital minds become quite important.

There has been some recent work on this issue by people like Jonathan Birch, Derek Shiller, David Chalmers has a recent preprint on this. There’s also a paper by Chris Register about this.

Is God GDPR compliant? [02:35:32]

Zershaaneh Qureshi: Just to wrap up, I have one final question for you.

Andreas Mogensen: Sure.

Zershaaneh Qureshi: There’s a stereotype that philosophers spend their whole days thinking about really implausible, pointless, abstract ideas. What is the weirdest example of something that you’ve spent hours thinking about?

Andreas Mogensen: Not sure if I spent hours thinking about this, but maybe one of the weirdest ones is: Does God’s omniscience violate our right to privacy?

So, at least if you assume that there exists a God of the kinds accepted by Abrahamic religions, a theistic God, and this God knows everything about us, and is in some sense, all-knowing and all-seeing, you might think that’s a massive privacy violation.

So I sort of wondered a little bit about this question: Is that in fact the case that this would indicate maybe a way in which the traditional theistic conception of God is sort of incoherent? Because God is also obviously supposed to be morally perfect — but that moral perfection might be incompatible with constantly spying on people when they’re doing their private business.

Zershaaneh Qureshi: Right, right.

Andreas Mogensen: So I’ve spent a bit of time mulling that over at some point in my life.

Zershaaneh Qureshi: Yeah, that’s kind of broken my brain a little bit. All right, that’s it for today. Our guest has been Andreas Mogensen. Andreas, it’s been a delight to have you. Thanks for puzzling through these bizarre questions with me, and thank you so much for coming on the show again.

Andreas Mogensen: Yeah, thanks so much for having me.