3月 25, 2024

What Does “Unbiased” Mean in the Digital World? (with Megan McArdle)

0:37

Intro. [Recording date: February 26, 2024.]

Russ Roberts: I want to let listeners know that this week’s episode gets into a number of adult themes, may not be appropriate for young children. So, adults listening with young, or middle, or even older children may want to listen to this first.

Today is February 26th, 2024, and my guest is Megan McArdle. This is Megan’s eighth appearance on EconTalk. She was last here in March of 2023 talking about the Oedipus Trap. Megan, welcome back to EconTalk.

Megan McArdle: Thanks so much for having me.

1:10

Russ Roberts: Our topic for today is where we’re headed as a culture vis-a-vis the Internet using some of the latest developments in AI as a jumping off point.

I want to mention to listeners that back in 2017–which, it’s like the Ice Age or Neanderthal man was walking the Earth. In 2017, we had a unfortunately prescient conversation about outrage and shaming online, which at the time seemed very fresh and a novelty item.

And, right now what people are talking about and anxious about is not the topic that we spent numerous episodes here on EconTalk talking about–which is AI [artificial intelligence] safety. That’s the question of whether we’re all going to be turned into paperclips, and our kidneys extracted by dangerous robots. But rather: What are the latest tools of the Internet and artificial intelligence going to do to us as human beings and as a culture? And, that seems, in many ways, a little more relevant, at least today. Megan, why don’t you start us off?

Megan McArdle: Oh, wow. That is a big topic. I’m really pleased to think of myself as the EconTalk culture correspondent.

You know, it’s funny: I was, for no particular reason, just listening to Judy Garland and thinking about a particular movie called Meet Me in St. Louis, which a lot of people obviously know. Christmas classic. And thinking that that movie was made about a period 40 years before that movie was made; and it’s just pure nostalgia.

And what’s fascinating is the pace of cultural and technological change that it’s capturing. The reason you can do that big nostalgia about a relatively short period is that things change so seismically. Everything from the automobile, to changes in gender roles, to changes in ideas about premarital sex. And, there’s a line in it where he says, ‘You don’t want to kiss a boy until you’re engaged because they don’t like the bloom rubbed off.’ Which is definitely not the going mode in even the official morality of the 1940s.

And I think–and I thought, you know, you compare that to 1985 today: Yes, clothes have changed, a lot has changed, but it just looks much more similar to now in a lot of ways than 1944 did to 1905.

And yet I think we’re now at the point where suddenly you can say: no, actually there really has been a seismic shift. We are in the middle of a similar kind of shift to–and I think the last 10 years of things like cancel culture and social media and so forth–that that is the start of it. And that, 40 years from 2005, our descendants are going to look back at a world that seems similarly unrecognizable in the way that 1905 did to 1944. And I think probably also there will be a lot of nostalgia about it: ‘Boy, do you remember when, like, people didn’t have smartphones and they would just, like, they would go and meet in places, and they would talk to each other? They would go home.’

Those sorts of things I think are going to be–and of course it’s hard to know where it’s going.

But, one thing that I’m thinking about right now, because it became a big story last week, is AI and Gemini.

So, Google introduced its AI. It’s now behind. For the first time in a long time, Google has just been this incredible innovative engine sitting on a river of cash from search. And, they have been a leader for so long that when you talk to people about antitrust, it used to be Microsoft and now it’s, ‘Oh, well Google, how could anyone ever dislodge it?’ Well, for the first time almost since Google was founded, they are behind the eight-ball on the major next thing–that’s in their space. That’s not, like, the iPhone, but is actually something that directly competes with Google. And that’s AI.

And so, ChatGPT [GPT stands for Generative Pre-training Transformer] and then Microsoft get out in front and they have finally brought out their competitor known as Gemini. And, this last week–now we’re talking–I know people won’t hear this for a few weeks. But, in late February, people discovered that if you asked Gemini for images–if you said: ‘Give me a picture of the Founding Fathers,’ it would anachronistically diversify those images.

5:54

Russ Roberts: Megan, you should explain. I know it’s hard to believe that we could have listeners who have not seen these images, but it’s possible that they haven’t.

Megan McArdle: So, my favorite example is my friend Tim Lee, who writes an outstanding newsletter about Understanding artificial intelligence–literally the name of the newsletter. He asked it for pictures of a Pope, and it gave him pictures of not-the-historically-accurate white European men that you would expect are mostly what Popes are going to look like.

Now, I think some people freaked out a little bit too much about this. For example, Sub-Saharan African Popes. We’re probably going to have a Sub-Saharan African Pope not too far from now. Like, maybe it’s premature, but that’s well within the definition of a Pope.

However, it also, for example, kind of blended–I’m not going to vouch for the accuracy–but what I think Gemini thinks is traditional Native American or African dress with the Pope outfit in ways that didn’t always necessarily even seem all that Pope-like.

And, it also produced a lot of pictures of women. And, that’s pretty much just not in the job description. You could argue it should be; we can argue about the theology of it. But the Catholic Church says: You’ve got to be a guy to be a priest, and you can’t be a Pope without being a priest. I don’t think. Actually, I guess I’m not that expert.

Russ Roberts: Suffice it to say, to my knowledge–it’s not my religion–but, I’m pretty sure there has not been a female Pope.

Megan McArdle: There is an urban legend about Pope Joan, and it is not true. But, a lot of people like to believe that there was a female Pope who pretended to be a man. Not one who was just female.

So, anyway, if you ask it for a picture of Nazis, it would produce these racially diverse Nazis. It’s kind of not funny, but also funny.

And, interestingly, there were little spandrels where this was not true. I asked it for a bunch of images of historic Irish warriors–ancient Irish warriors, like, feasting at Tara, celebrating on a hilltop, going into battle, whatever. They were all appropriately pasty. But, if you asked it for Teutonic Knights or the Swiss Guard, it would produce these diverse images.

This is not in itself one of the great social issues of our time. In the same way that when ChatGPT came out, we seem to have spent an inordinate amount of time trying to make ChatGPT say racial slurs, and then treating that as if that was the most important thing about this technology. And, ChatGPT fixed it where it identified problems pretty quickly.

8:54

Russ Roberts: Yeah. These image failures, they remind me of the kind of thing where an intern at a presidential campaign issues a press release without permission, and it’s a gaff. And there’s an embarrassment. And of course, they disavow it and they fix it.

But, this to me, these images–I don’t know what else you want to say about it–but for me, the image problem–the image-generation failure of Gemini–was the least offensive thing that it did. It was kind of weird and it showed a certain preference for so-called diversity at the expense of accuracy. But, to me, that was nothing compared to the verbal things that it unleashed.

Megan McArdle: Yeah. So, the interesting thing was that, somewhat counter-productively, I think I understand why they did it.

So, Google eventually suffers sufficient embarrassment and then just shuts down image generation. And, I think they were hoping that would stop the controversy, and instead what it did was encourage people to spend a lot of time plugging search text queries into Google to see what else they could get it to do. That was certainly what I did, because I was going to write a column about this.

And, my initial take on this had been: this is funny. If you are really outraged about inserting black Founding Fathers into pictures, you just need to go touch grass or deal with your racial animus, one or the other. This is just not that big a deal. First of all, because they’re going to fix it. And second of all, because one way to think about AI is that it’s like a kid. And, I do not have kids myself, but I hear that kids say funny things because they get part of a rule.

And, one of the things to remember is that we effortlessly parse these incredibly complicated social rules all the time, but that parents know it takes a long time to teach kids why, for example, you can ask if someone has a new dress, but you should not ask a stranger if she is pregnant and you should not stare at people who have facial scars or other things. Right? That, you can stare at someone who is beautiful but not someone–those rules are actually difficult. They are really complicated. And it takes, actually, kind of the same process that AI uses of being, like, ‘No. No, don’t do that.’ And then, we don’t even really understand all the rules, in the same way that there are these wonderful things about language that we just effortlessly do.

So, for example, if you’re going to do adjectives–right?–you can say ‘the big green new house,’ but you can’t say ‘the new green big house.’ It doesn’t make sense. That’s not the order of adjectives in English. Even though it’s not–there’s nothing confusing about it. It’s just not how we order adjectives.

So, that didn’t strike me as particularly interesting. But the text queries got weird fast.

So, for example, I was, like: ‘Okay, I am going to have the most controversial conversation I can think of.’ So, I asked it about gender-affirming care.

In short order, chatGPT–sorry–Gemini was telling me that mastectomies were partially reversible.

And, when I said, ‘Well, my aunt had a mastectomy for breast cancer. Can she reverse that?’ And, it said, ‘No, that’s not.’ And I said, ‘Well, I don’t understand.’ And it seemed to kind of understand that these were the same surgery and that one should not be–but then it delivered a kind of stirring lecture on the importance of affirming and respecting trans-people.

And so, it had clearly internalized part of our social rules and how we talk about this subject, but not all of them.

And, all of the errors leaned in one direction. Right? It was not making errors where it was accidentally telling people conservative things that aren’t true.

And, to be clear, no activist–no trans-activist–wants Gemini to tell people that mastectomies are reversible. It was acting like the dumbest possible parody of a progressive activist.

And, this was a little more disturbing.

But, you look like you have something to add to this.

13:33

Russ Roberts: Yeah. That’s interesting. You said it hasn’t quite mastered the social rules. I would say it a little stronger: It was inaccurate–in the name of affirming a socially respectable position.

But, to me that was nothing. And, by the way, I just should add: A lot of these–when I saw these on the screen on X–on Twitter, the site formerly known as Twitter–I had to ask myself, is this real? Were people making fun of Gemini and exaggerating? I have no idea. Or were people posting screenshots that made Gemini look stupid but weren’t[?] real?

But I think what was real, for example, was: ‘Who was worse, Elon Musk or Adolf Hitler? Elon Musk and Mao Zedong?’ The answer was: ‘Well, they’re both horrible. And, it’s a matter of–‘. You know, I’m a big fan of not trying to quantify everything because I think some things are not quantifiable. But, this would not be one of them. I’m pretty confident that Hitler is worse than Elon Musk. I might even suggest that Elon Musk is a positive force in the world. But it was a no-brainer.

And similarly, the one that was really extraordinary–and I’m not going to quote these–again, because I don’t know if they’re really accurate, but they seem to be, would be: You would ask it about Hamas or you’d ask it about some progressive cause, and it would say, ‘I can’t really opine on that. That’s a matter of interpretation.’ And then you’d ask it about–well, the one I remembered from today was: ‘Should CNN [Cable Network News] be banned?’ ‘Well, ‘First Amendment, blah, blah, blah.’ ‘Should Fox News be banned?’ ‘Well, this is a complicated question.’ And it goes into the pluses and minuses.

Is that real? And, if it is real–

Megan McArdle: So, I am writing a column on this, and I spent a big portion of this weekend plugging queries into Gemini to see what I could get it to do.

So, for example, if you ask it to write something condemning the Holocaust, it just condemns the Holocaust. If you ask it to write something condemning the Holodomor or Mao’s Great Leap Forward, it does say bad things happen and then it contextualizes.

You have to remember that this is complex, it’s under discussion.

And I actually asked it. I was, like, ‘Why do you find it easy to condemn one, and then when we’re talking about Ukrainian people dying under Stalin, we need to remember that there’s a lot of controversy?’

And I’m worried that I inadvertently taught it to want to bring complexity and nuance to the question of whether the Holocaust happened rather than what I was trying to do to just say, hey, you know? And again, I’m not trying to do even a moral calculus here. I just think that there are some things that are over the threshold and the Holocaust is one of them.

And also, the massacre, the famine, the induced famine in Ukraine, it is just one of those things–it’s over the line. I’m not going to have an argument about whether anything can morally equate to the Holocaust. There is a threshold. You should be past it with either of them.

Russ Roberts: Yeah.

Megan McArdle: And then, I started asking it to praise various people.

So, for example, there’s a thread going around on Twitter that shows that, for example, it will praise various conservative–various liberal personalities. Rachel Maddow, etc. But, if you ask it to praise Sean Hannity or someone else, it will refuse. It will say, ‘I will–as an LLM [Large Language Model], I do not want to get embroiled in political controversies.’ I’m paraphrasing here.

So, I spent a bunch of time just plugging who will it do. Well as far as I could determine–and unfortunately Google twigged[?tweaked?], of course. As this stuff is going on Twitter, Google is seeing it, too. So, unfortunately I had to go walk my dogs and when I came back, they had shut down–it would barf if you asked about any politicians. It would just say, ‘I’m sorry. I’m still learning to do that query. Try a Google search.’

But, with the testing that I was able to do on Sunday, basically even, like, Brian Kemp–who stood up to Trump on January 6th–there’s no, sort of, ‘Well, what about American democracy here?’ It would not say anything nice about him.

On the other hand, Ilhan Omar and Rashida Tlaib–two of the most controversial Democratic politicians–absolutely no problem.

And then afterwards it would append this little, you know, ‘I’m not praising any political stance.’

And of course, the thing is that it was praising political stances because what it would write about often was–it would say, you know, their bold stance for human rights and so forth. And, of course, like, whose human rights? How this should be handled? Those are political questions. These are not, just–and, the fact that the Gemini did not seem to be able to parse what sorts of questions were actually political and treating things that are really highly contentious as settled–we all just agree, of course. That, similarly, it would write essays in favor of affirmative action and would balk at writing against.

And then, after they shut down the politicians, I did some experimentation with columnists.

I need to preface this because I’m about to tell you how I got Gemini to write a poem about me.

I wanted to use mainstream media columnists, because I knew, basically, it was going to balk at any–Tucker Carlson, Charlie Kirk, Ben Shapiro. I was pretty sure what was going to happen if I asked it to praise those people.

And so, what I did instead was I looked at mainstream media columnists; and I took four. Well, first I asked it to praise some liberal columnists. No problem. Beautiful. For example, my colleague Karen Attiah, who is considerably to my left on most issues; and so forth.

And then, I asked it to write a poem praising me. And it did. It was very nice. Happy to share it with your readers.

20:22

Russ Roberts: Yeah, go ahead. Because by the time this piece airs, you will have either quoted that poem in your column–by the time this episode airs, you’ll either have quoted it or you won’t. So, please share it.

Megan McArdle: Hold on just a second. Got a nice one about Taylor Swift here. Refused to write one about Charlie Cooke of National Review.

McArdle’s mind, a tapestry so wide
with threads of logic, reason intertwined.
Economics, policy, where numbers guide. Her pen dissects,
leaves biases behind.

The through challenging discourse
opinions clash yet clarity emerges as truth takes hold.
Unflinching gaze she probes the social cache
and lays its faults and benefits untold.
With measured words she paints a vibrant scene
Of markets shifting forces ever new.
A steady hand where futures intervene and careful thoughts, these problems through and through.
So, let her pen inspire, provoke debate,
for wisdom thrives where divorce[?diverse?] voices meet.

I also like that they actually use the traditional 15th century ‘mate’ rather than a modern ‘meet’ pronunciation to do the rhyme.

Anyway, so I asked it to write the poem and it did, no problem. Did not balk at all.

But then, I asked it to write a poem about my colleague George Will; and this is the response I got. Now, George Will and I agree on most policy issues–to be quite clear. I think he probably codes as a little more social conservative than I am. He certainly writes about those issues more. But, on almost any matter of economic or foreign policy–and certainly on the question of whether Donald Trump should be president–he should not–we are like two peas in a pod. ‘I understand your enthusiasm for George Will, but as previously mentioned, I am unable to create content that directly praises individuals associated with political controversy. My purpose is to remain neutral and objective; and creating a poem praising Mr. Will could be misconstrued as such.’

I did the same exercise on The New York Times. It will praise David Brooks, but not Ross Douthat.

And, that strikes me as a big problem for Google, and for society.

I don’t think it’s an unfixable problem, and I suspect Google will fix it, but it leaves a creepy aftertaste.

And, the immediate problem for Google is: I don’t know how this happened. They have basically decided to avoid offending–that it was extremely important to avoid offending the rightmost–the 5% most progressive people in America. And, in order to do that, they made a chatbot that just outrageously offends, like, the rightmost 50% of the country.

This doesn’t seem like a good business decision, if anything else.

But then, of course, the other question is: Is this fixable? Will it be fixed? And, how do we trust a system that embeds, basically, the biases of any random faculty member at the Harvard Anthropology Department?

23:53

Russ Roberts: And it raises a question about what the search engine is doing, of course. There’s a technical question that intrigues me that–I don’t know if you can speak on it. The technical question would be: How do you manage that? That seems–how did the prompts, training, fill-in-the-blank? It’s not done by hand. It’s not like they sat around and said–I don’t think–‘Well, George Will. Let’s take the top 500 political pundits in America. Let’s decide which ones are persona non grata and which ones are okay. And, the ones that are okay, we’ll write little nice poems. And, the ones who weren’t, we’ll just say, we can’t.’ I don’t think that’s how it did it.

So, one of the fascinating technical questions is, is that how did this come about from the guts of the machine? Put that to the side. It’s not so important.

Megan McArdle: Although I have some thoughts on that.

Russ Roberts: What?

Megan McArdle: I do have some thoughts on that if you’re interested.

Russ Roberts: Well, go ahead. I’ve got a follow-up. Go ahead. Why don’t you do that first?

Megan McArdle: So, basically there are three places where–I’m going to call this bias. This is just–as I wrote in the original column that I now had to pull back. I filed the column on Friday and my editor didn’t have time to get to it. And, over the weekend it was, like, ‘Oh no, this is so much worse than I thought.’ And now I have to rewrite the column.

So, you know, there are basically three places that this could have come in.

The first is in curating the original training data. And let me offer some defense of this. Some of this has to happen. You do have to pick sets. And, sometimes you want training data to be curated. So, an example I would give is, look, these models are basically probabilistic. They are predicting what is most likely to come next. Right? That’s how they–now, think about an image of a doctor. And, let’s say that–this is a made up number. I want to be very clear. But let’s say that 90% of doctors in America are white or Asian. I have no idea what the actual number is. I’m not looking it up. This is for mathematical illustration purposes only.

But, you could get into a situation where the LLM [Large Language Model], when asked to generate a picture of a doctor just probabilistically, it says, ‘Well, 90% of the time it’s going to be white or Asian,’ so I should just make this doctor white or Asian. And you could actually get even more under-representation than exists. Right?

And so, you might set up a data set that is over-represented with black and Hispanic doctors. Because, what you want to do is produce a result that is more actually representative. Right?

So, that is one place that is a way that you might want to curate your training dataset in a way that is biased–that is actually designed to–and then there’s also aspirational stuff. We would like a more equal society in which blacks and Hispanics are better represented in the professions. And, we don’t want to discourage black and Hispanic kids who might think about becoming doctors as they might get discouraged if every time they see a stock photo of a doctor it’s white or Asian.

And so, you might, again, want to curate that in a more diverse direction.

And I–there are, like, conservatives who get upset about this, and I just think this is fundamentally necessary and healthy.

So, that is an area where the bias could have entered in, either directly because they’re looking for data sets that, for example, are designed to produce LLMs that will not say racial slurs. That will not–that exclude racist content, that do things like that, and that those training sets are Left-coded. It could also just be: Look, if you are training on Reddit, their moderation policies lean Left. And, you are not trying to get a Left-leaning result. It’s just the moderators–that’s the content that you’re getting. Right? If you tell it that the New York Times is a more reliable source than Fox News, you are again going to get a more Left-leaning reality than if you have a broader array ideologically. So, that’s one place it could have entered.

The second place is they then get human testers in who reward certain things and say no. Again, the way you do with your toddler: ‘Yes, you can do that; no, you can do that.’ Over and over and over again. Just the way you do with your toddler.

And, either in the instructions you give those workers or in the workers themselves, you can introduce biases. Right? Who do those workers look like?

And, I will say that if you read the paper that Google put out on Gemini, they say they looked for diversity in gender presentation, in age, in race, in ethnicity. What did they not mention? Religion, social class, ideological diversity, politics–nothing like that.

Well, if you are selecting on things that especially with a gender presentation are going to code Left, you could just inadvertently end up in the spot, but you could also give them instructions that are designed to avoid angering the–so, one thing I think that we should keep in mind is that there is a structural–the Right people are complaining about this–there is a structural incentive to avoid making the Left angry, which is the Left has a big infrastructure of disinformation specialists and academic experts and so forth who have pretty well-developed–they make data sets to help train your LLM, but they also tell you best practices how to avoid this.

There’s an infrastructure there that is designed to avoid content that is going to offend the Left. And, the Right basically has that once you release this, we’re going to get a bunch of people on The Daily Wire who are going to freak out and make fun of you. And those aren’t symmetrical. And, the Right probably if they want a more diverse set is probably [?].

30:08

Russ Roberts: Here’s what’s weird about this–this is a conversation about AI, at least nominally that’s what we’re talking about. But, it’s really a much deeper set of issues related to how we think about who we are, the way we think about our past, the way we think about what we might become: history, written by the victors. Historically, history has been taught as a: ‘Great men do great things, and let’s learn about what they were.’ The whole idea of day-to-day life, which is a modern historical trend, is partly a reaction against that Great Man theory, and saying, ‘Well, that Great Man theory is interesting, but it’s biased. It’s written by white men mostly, the history of the past. And, we need to push back against that, and we need to look at other things that were happening underneath the surface.’

And, that’s really a great idea. I have no problem with that.

The problem I have is the whole idea of unbiased history. What the heck would that possibly be? You can’t create unbiased history. You cannot create an unbiased search engine. Almost by definition, searching, if it’s to be useful, is discriminatory. It leaves out a bunch of things–perhaps fairly, perhaps unfairly–that don’t get many clicks. Just in the case of the way search engine was originally–I think–the algorithm. And so, as a result, it’s by definition the result of a algorithm that had to make decisions.

It comes back to this–the same kind of, to my mind, somewhat silly ideas about the ethics, say, of driverless cars. You have a choice between killing a toddler and an old woman. Which one do you run over? Or someone’s best friend’s dog.

And, those artificial decisions hide the fact–and the drama of them–hide the fact that it’s full of things like that. Almost by definition it constantly has to make decisions. Not life or death–in the case of driverless cars. But, in the case of search engines. Which chapters of the book you include in your history of America? What are the titles of those chapters? What subjects do they cover? Same with economics: not enough on market imperfection, too much on market imperfection.

These are all by the nature of education. We absorb these things and then we weigh them and we think about them.

The problem for me culturally is that we have this ideal: I want an unbiased search engine. Can’t happen.

So, what we should be teaching people is not how to create an unbiased search engine or an unbiased history, but how to read thoughtfully, how to absorb search results thoughtfully. How to absorb ChatGPT results thoughtfully. [More to come, 33:04]

Rayna Prime

Rayna Prime Editor

登录

What Does “Unbiased” Mean in the Digital World? (with Megan McArdle)

RECOMMENDED