Kartik Hosanagar: “AI Governance and Risk Management” | Talks at Google

By | August 17, 2019


[MUSIC PLAYING] KARTIK HOSANAGAR: Well,
thank you all for being here. And it’s a great pleasure
to come and speak here. As Chris mentioned,
I’m a professor at the Wharton School. I am an author of a
recently released book, which is called “A Human’s
Guide to Machine Intelligence.” I am contractually
obligated that, whenever I speak about anything
remotely related to AI or tech, I have to show you a screenshot
of our book cover jacket and subtly influence people
into picking up a copy and reading the book. So the book is on machine
decisions, and, in particular, the opportunities and challenges
around machine learning and a governance framework
for how individuals, how companies,
and how regulators might think about governing
automated decisions. And so today, I’m
going to speak mainly about that, the governance
framework I have in mind. And in particular,
I’m going to begin by talking a little bit
about the role of machine decisions in our lives. A lot of that is intuitive. I’ll quickly share that. And then I’ll get into
risks of machine decisions and governing that. Now, as part of
researching this book, I interviewed many people. And one of the really
interesting people that I interviewed was a
22-year-old in Shenzhen, China. Her name was Yuan Zhang. And she’s in the
biotech profession. And one of the things I
found really interesting about her day-to-day habits
was these daily conversations she had with a social media
celebrity called Xiaobing. Now, Xiaobing is a
social media celebrity. She’s a teenage girl who
has 40 million followers. And if news reports
are to be believed, over a quarter of
these followers have said “I love you” to
Xiaobing at some point. Now, what is really
interesting about Xiaobing is, despite being a social
media celebrity with 40 million followers,
she’s actually engaging in a daily conversation
with one of her followers. And of course, it’s
no surprise to learn that Xiaobing is
actually not a human, but Xiaobing is a
chatbot that was created by Microsoft in the
persona of a teenaged girl and that was immensely
successful in China. Now, motivated by the
success of Xiaobing, Microsoft asked if we can
launch a similar chatbot in the US, which I think
all of us or most of us have heard about. And that chatbot
was Microsoft Tay. And Tay was launched in 2016. And within a few minutes
of being launched, Tay turned into a racist,
sexist, fascist chatbot that said all kinds of offensive
things ranging from, feminists should burn in
hell, to, Hitler was right, and many other things. In fact, later that year,
“MIT Technology Review” ranked Microsoft Tay as
the worst tech of the year. And when I read about
Xiaobing and Microsoft Tay, I couldn’t help but
wonder, how is it that two chatbots using
very similar approaches by the exact same
company can have such vastly different outcomes? And that is what got me started
in terms of writing this book or starting exploring the ideas. And since I started
the process, we’ve seen many more examples
of automation or automated decisions gone wrong over
the last two, three years. For example, late last year
there was a story by Reuters about resume-screening
algorithms that Amazon came up with. And Amazon’s problem is quite
intuitive and straightforward, and one that, certainly,
Google can relate to as well. Amazon hired over
100,000 people in a year. They probably got
millions of resumes. And you can’t have
even a large team of recruiters that can go
through that many resumes and figure out who to
invite for interviews. So they tried to
automate the process to figure out who to invite
and who to shortlist. And their own internal
investigations later revealed that this
system had a gender bias. And then, as far as the
news report was concerned, it said they stopped using it. And again, that’s
just one example. There are many others. There was a story by
ProPublica a couple years back about algorithms
used in courtrooms to guide judges and parole
officers in making sentencing, bail, and parole decisions. So these algorithms would look
at the profile of a defendant and then, based on
historical data, come up with various risk scores– for example, a recidivism
score that would predict the likelihood that
this person would reoffend. And that information
would be used by judges to make sentencing decisions,
parole officers to make parole decisions, and so on. And an analysis by ProPublica
showed that that algorithm had a race bias. In particular, it was twice
as likely to falsely predict future criminality
in black defendants than white defendants. And the article provided
many examples, one of which is right here on the slides. And you have an
individual, a male who had committed
two armed robberies, and he had a very low
risk score in comparison to this other woman who
is African-American who had a few juvenile misdemeanors
and had a high risk score, even though there was nothing
equivalent in her profile. And these are just
a few examples. There have been
many more examples. I’m at Google so I didn’t pick
up too many Google examples. You guys know more, but, say,
an autocomplete gone wrong or image tagging gone wrong. So there are those kinds
of examples as well. Every company that does see
this machine learning at scale would have examples
of things gone wrong in one way or the other. And the question when we
see these kinds of issues is, what is an
approach that one can use in trying to minimize
the impact of machine decisions gone wrong, in
particular, when we use machines to make
decisions for high stakes kinds of environments
like, let’s say, credit approval, loan approval,
or recruiting, and so on? And so that’s what I
want to talk about. But I’ll begin by just touching
upon this topic of algorithms driving a lot of
decisions in our lives. This is not a setting
where I need to motivate. Look, the stakes
couldn’t be any higher. Algorithms are driving
a lot of our decisions. So what I’m going
to say initially is perhaps going to just
confirm what you already know. But perhaps, it
adds some numbers to what is fairly
intuitive for you in terms of how many
of our decisions are driven by algorithms. And having covered
that quickly, I’ll then jump into a governance
framework for this. And of course, more
than anything else, when I come here to Google,
I’m interested in getting your perspectives on this. So I’m going to keep
my presentation short and hopefully get your
viewpoints and comments and questions as well. So in terms of algorithms
driving decisions for us, I think we all might have
our own favorite examples. But certainly, one maybe that
all of us can relate to and one that’s been around for a while
is, let’s say you go shopping and you go to Amazon.com
you see a message. People who viewed this
eventually bought that product. Or people who bought this also
bought these other products. And the question is how much
of our decisions they drive. And most of us recognize they
do subtly influence our choices in various ways. One study suggests
that over a third of the decisions
at Amazon originate from algorithmic
recommendations. Now, these
recommendations don’t just influence the
number of choices we make, the number of
purchases we make, but it also influences the
kinds of choices, the kinds of products we consume as well. And in fact, in
my research, I’ve been very interested
in understanding how automated
recommendations change the mix of products we consume,
the mix of media we consume, and so on. And in one study, we partnered
with a top five online retailer in the US and we ran
an experiment with them where half a million
of their users got a shopping
experience where they had no algorithmic
recommendations, and that was a control group,
and another 100,000 users got a collaborative filter,
which is essentially a recommendation that is like,
people who bought this also bought these other products. And so we looked at how these
systems changed the kinds of products people consumed. And in particular, we were
interested in looking at how it changes sales diversity. How does it change the diversity
of products individuals consume, that people
consume in aggregate? And this is a depiction of that. And in particular,
what I’ve plotted here is what’s known as a
Lorenz curve, which is a graphical
representation of the market share of different products. Lorenz curves are often used
to plot income inequality and look at wealth inequality. And here, we’re looking
at sales inequality. And the x-axis essentially
has the products with the least popular
products on the left and the most popular
products on the right. And the y-axis in this graph
has the cumulative market share of the products. So in other words, what
this graph is saying is that if you look at the curve
in black, which is the Lorenz curve for our control group,
meaning our group that received no recommendations,
what it’s saying is that the bottom
40% of products contributed roughly
around 5% of sales. Similarly, if you look at the
bottom 80% of the products, they contributed roughly
about 37% of the sales. So it’s like the 80-20 rule we
have heard except, it’s 80-37. That’s where it is over here. Now, the black curve
is the Lorenz curve for users who are not
exposed to recommendations. And the dotted red curve
is the Lorenz curve for users who are exposed
to recommendations. And what you notice
is, for people who are exposed to recommendations,
the bottom 40% of products accounted for around
1% of purchases. And the bottom 80% of products
had a market share of 27%. In short, after launching
this recommender system, the market share
of niche products actually reduced, and the
market share of popular products actually increased. And this is surprising,
because the promise of algorithmic
recommendations is that they reduce
our search costs, help us discover
these niche items. And that wasn’t happening. And in fact, one way
to make sense of this is by looking at the absolute
sales of the products. I have plotted here
the market share, which is the relative
sales of products. If we look at the absolute sales
of products, what we’ve found is that the absolute sales
of all kinds of products, whether niche products or
blockbuster, all went up. However, the sales
of popular products went up way more than the
sales of niche products. So on a relative
basis, they just had a “rich becomes
richer” kind of effect. It’s a bit like
what has happened, let’s say, with income
over the last 30, 40 years. If you look at 40 years
back versus income today, income for all groups– the poorest groups, the richest
groups– all have gone up. But on a relative
basis, the rich have gotten way
richer than the poor. And so income inequality
has actually increased. And we find a
similar effect here. And this is because
the algorithm used here is one of the more
popular designs, which is a collaborative filter. People who bought
this also bought this. So for a product
to be recommended, it has to be bought by others. So it does have a popularity
bias built in there. In contrast, another
design is what’s known as a content-based
recommender, where you don’t look at popularity. You look at actual
attributes of the products, and you find other
products that are similar in terms of attributes. And that design does not have
this kind of popularity bias. And there are hybrid designs. For example, Spotify
uses a hybrid design, where they use a
collaborative filter, but they also use content
attributes to recommend music. And so that gives you a balance
between showing you music that others are consuming and
also helping discover stuff that’s close to your
preferences that aren’t. But independent of
the design you’re using, all our research
shows that they do change the mix of
products people buy, the mix of media they
consume, and so on. And of course, this is
an example in e-commerce. You could take this
to other settings. If you look at media, for
example, data scientists at Netflix came up with a
paper that measured this and said that 80%
of the viewing hours streamed on Netflix
originate at some point through algorithmic
recommendations. So it’s the algorithmic
recommendation that steers us into a show and then we’re
consuming a lot of that show, for example. Similarly, another
study shows that 70% of the time people
spend on YouTube is driven by algorithmic
recommendations. Again, I’m not going to go
too much into this particular metric here,
because maybe you’ll tell me more about this metric
later on when we chat offline. But bottom line– they’re
driving lots of our choices, whether it’s products we
consume, media we consume, even our social
networks, certainly friends we add on
Facebook or LinkedIn, but even in our personal
lives, physical world, social networks. If you look at dating, for
example, Match.com or Tinder, most of the matches
that are driven by algorithmic
recommendations that say, this is who you should date, or,
this is who you should marry. And also, the algorithms are
influencing choice, sometimes without even realizing it. One interesting example of that
comes from a research study that was conducted at Facebook. So in 2012, Facebook conducted
an experiment with its users to see if they could influence
people’s decisions to vote by altering the news feed. So in particular, the study
was designed as follows. A large set of users– the control group– got the
regular, plain, old Facebook experience, where
their news feed algorithm remained as before. But another set of
users, the treated users, their news feed algorithm was
modified in one simple way. It was more likely to recommend
hard news stories than softer or more entertaining
news content. So for example,
that system was less likely to recommend
funny cat videos and more likely to recommend,
say, hard news stories. In 2012, that might have
been the war in Iraq. And that might be
the kinds of stories it would have recommended. And then they observed the
self-reported voter turnout. So around elections, Facebook
has the I Voted button, which people can click to
indicate that they voted. And they observed that the
self-reported voting turned out for the control
group was about 64%. But for the treated group
whose news feed algorithm was modified merely to
recommend more news stories, that went up to 67%. Now, three percentage points
might not sound like much. But we’ve already
seen elections are influenced by much smaller
proportions than just 3%. And so clearly,
these systems drive important political decisions
that people might make. It influences not
just how people vote, but also how elected
representatives serve us. So in many places in the US,
the decision of which students go to which schools are
made by an algorithm. As I mentioned
earlier, in courtrooms, algorithms make recommendations
on likelihood a person will commit a crime,
which guides sentencing, bail, and parole decisions. They will be making
many more decisions, whether it’s with autonomous
vehicles or with medicine. You’ve got a push towards
precision medicine, where the idea is
that treatment is personalized to each individual
based on their DNA profile. And that genetic data is
just too massive for a human to process. And so again, it’s going to
be automated decision making. So the point is quite
simply that machines are making lots of decisions. The stakes are high. And so, if they fail, whether
it is race biases in courtrooms, or biases in loan approvals
or recruiting decisions, or many other such examples,
the stakes are pretty high. And so one should have a
governance structure in place to manage those kinds of risks. So let’s talk about that. And in particular,
the two questions that come up when
we observe this are, what is driving these
kinds of behaviors? And second, what is a company
supposed to do about it? And what is the government
supposed to do about it? What does a user, an individual,
supposed to do about it? So let’s start
with the first one. Why does this happen? Again, this is an audience
where you guys are savvy enough. You know why this happens. It’s in the data. And these biases or problems
get picked up from the data. But one analogy
I’ve found helpful when I explain it, especially
to non-technologists, is to compare or
contrast algorithm behavior with human behavior. So with human
behavior, psychologists attribute human behavior
to nature and nurture. So nature is quite
simply our genetic code that we inherit
from our parents. Nurturer is the environment. And if you look at algorithms,
they’re very similar. They have a nature,
which is actually the logic in the algorithm that
a programmer kind of codes in, or the rules that the algorithm
follows, the genetic code equivalent for the algorithm. And nurture is the data
from which they learn. And together, they drive
the algorithm behavior. And if you look at
nature and nurture, there’s a thing a
friend once said. Nature or nurture– either
way, the parent is to blame. I think, over here, nature
or nurture– either way, the data scientist is to blame. And I think, either way, you
could blame the same person but for different reasons. But I think what’s interesting
is, as we move from a world where all the rules were entered
by the programmer to a world where a lot of the actions
the algorithm takes is learned from data,
we’re moving to a world where predictability goes down
and new kinds of risks arise. And so when I’m talking to
business audiences, especially CEOs and boards, if
you think of banks that are making loan approval
decisions and credit approval decisions, there’s lot
of risks associated with these kinds of decisions– social risks from making
decisions without testing for biases in the data in
or the final decisions, whether it’s credit
scoring or recruiting, reputational risks to companies
when it comes out that the algorithms are biased
or vulnerable to some hacks by others– adversarial
attacks by other people– or even litigation risks,
and also regulatory risks. As we know, lot of regulators
are looking at this. And the risk is, if companies
do not take action proactively, then you have heavy
regulation coming that actually stifles innovation. A lot of people complain GDPR is
overly cumbersome for companies to comply with, and
the value created is not commensurate with
the cost of compliance. And at some level,
heavy regulation results from lots of
violations of privacy. And so if companies
are not careful, then, again, one could
see heavy regulations. So then the question,
of course, is, OK, how do companies
proactively manage this? And in the book,
I’ve proposed what I call an algorithmic
bill of rights, where I’ve kind of talked about, what
are some basic rights users should have? And what should
companies do in order to ensure you can
catch these problems or reduce the chances
that they happen? So I want to talk about
those main pillars of my bill of rights. And I’ll talk about three
main ones in increasing order of importance. So I want to start with
this idea of user control. And this is simply
the idea that there should be a feedback
mechanism from the user back to the algorithm. And a lot of times when we
design algorithms, especially when we go for autonomous
algorithms and decision making, we’re going for
a design approach where we’re saying the
user should ideally not be thinking about this, and
let’s automate everything, and the user can use it
in a more passive way. And I think that’s a
source of problems. And one should really think
about how you give users some level of control
in order to influence algorithms’ actions as well. And of course, when I say
you should give control, it’s an easy thing to say, but
it’s a very complicated thing in practice. As an example, in
2015, Facebook decided to give users more control
over their news feed. So Facebook released mixer-style
controls where people could actually just
slide a few things, like a DJ slides stuff– you know, slide a few controls– and say, give me
more posts like this, give me fewer posts from
these kinds of people, give me more posts
from those kinds of people, or more photos, fewer
of this content, and so on. And so they released a
bunch of these features. And they rolled
it out among a set of users who were going to
test out these features. Now, after the data came
back in and they analyzed that to see what it
did to user behavior, they found that all these
users who got access to these controls,
for the most part, they were all very happy
to have these controls, and their user
satisfaction was very high. But when they looked
at engagement data, these users were
liking, commenting on, clicking on fewer posts. They were also spending
less time on Facebook. And so interestingly,
what was happening was, people were feeling
that they have control and they were satisfied. But the ways in which they
were moving these dials was creating a news feed that
was less engaging to people. They were spending
less time on Facebook. And eventually,
Facebook kind of phased out some of these features
and made some of them as power advanced features
that you only get if you go through multiple
pages of settings and so on. And it’s not very heavily used. And so that was what
happened back then. And in fact, there is ample
evidence in many other settings that, if you give users control,
the performance actually goes down. In fact, another example
was from Match.com, where they actually
asked people, OK, who do you want to date? And people would say, OK,
I want to date people who have the following
interests and who have maybe this religion,
who want to do that, who want to do this, and so on. And they said, OK, we’ve
got your preferences. We’ll try and match
based on that. What Match.com found was
that their match rates were lower when they listened to
users than if they just ignored the users and went with
the data and just saw whose profiles are
people lurking on, and then, based on
that, suggest matches. And so eventually they
decided, forget it. There’s no point in asking
people what they want. And so when you
give users control, it seems to hurt
performance a little bit. But I think there’s some
good news here in terms of how you balance the two. There was a study done by some
of my colleagues at Wharton. And they were trying to do,
what is the impact of user control on trust in algorithms? And so what they did
was, they ran a study where they divided the
participants into four groups. All groups were given data
on high school students. And they were asked to predict
how well these students would do in standardized tests. And so they had data,
like socioeconomic data, whether these people had
taken advanced placement classes and other tests, and
how many of their friends were going to college–
that kind of data. And they were going to look
at the data and predict, what is this person’s
score going to be? They also had the option
of consulting an algorithm that had access to
the historical data and that would give
its own estimate. They divided these
users into four groups. The first group had no ability
to adjust the algorithm. They had to just decide
whether they were going to use the algorithm or not. And if they said they were going
to use the algorithm, whatever the algorithm’s estimate,
that was what got baked in. So they had no control. It’s completely autonomous. And they have to make a decision
whether they adopt it or not. The second and third
group had minimal control over the algorithm–
very little control. So one of those groups could
take the algorithm’s estimate score and change it by
plus or minus 10 points. Another group could take
the algorithm’s estimate and had a few
instances where they could overrule the algorithm and
say, in this particular case, I’m going to put in
my number and ignore what the algorithm says. And the last group
had complete freedom to do whatever they want. And they were just
given the score and they could do whatever
they want with it. Now, interestingly,
what they found was that trust was the
lowest in the group that had no control over the algorithm. They had to just decide
whether they adopt it or not. And they have no control
over the algorithm. And so the likelihood
that these people want to adopt the algorithm is low. Their trust in the
algorithm is low. Trust went up dramatically
if you give people a minimal amount of control. For example, the score
comes from the algorithm. The dial can really move
at the margins a little bit here and there, or
in a few rare cases you were allowed to
overrule the algorithm. And trust went up dramatically
in both of those cases. Now, what was even
more interesting was that, in later
tests, they found that the level of control they
offered users didn’t matter. If they gave users
a lot of control, the trust was the same as
giving them little control. And so it was
really a matter of, give people some
control, not so much that they’ll hurt
the performance of the algorithm or the system. Because what they did
observe is that when they let people do whatever they
want, performance went down. But when they said,
here’s an algorithm, but I’ll give you a
little bit of a dial to tune this a little
bit, trust went up, and performance stayed
pretty high as well. And so it kind of shows this
interesting trade-off where I think there’s a sweet spot. In this graph, in
blue, I have trust. Trust just goes up dramatically
when you give people a little bit of control. But giving them a lot of
control doesn’t really do that much in terms of trust. But performance is continuously
and consistently deteriorating. And so this little
bit of control helps address user
trust kinds of issues and doesn’t hurt performance. And in fact, arguably,
there are many settings where giving users a
little bit of control can actually help
performance, especially when you have the
ability to overrule when you think the algorithm
got something wrong. And one interesting
example of that might be if you look at the fake
news problem that Facebook had. In 2016, a lot of people were
observing fake news stories in their news feed. But there was simply
no feedback mechanism from the user to the algorithm. So there was no way of
letting the algorithm know that a certain story is fake. And of course, after
the issues, Facebook launched a bunch of features
where, with two clicks now, we can say that a post is
offensive, a post is fake news, a post is violent
content, and so on. And that is actually a very
interesting and important addition and helps
the algorithm. So for example, in the New
Zealand terror attacks, a lot of the focus
was on the fact that Facebook’s
algorithm couldn’t detect that a live video was
posted of the terror attack. But people ignored the
fact that a lot of users were able to flag the
content as offensive. And so many people flagged the
content as offensive or violent that it could correct itself. And the video was
actually caught within 12 minutes
of the video ending, and it was removed right then. And arguably, that’s late. But at least there’s
a feedback mechanism where it gets caught
sooner than later. And so I think the nature of
this user control might differ. It might be something like
Facebook giving people the ability to give
feedback to the algorithm and merely inform
the algorithm, I think this is a wrong
decision or I think this is a problematic post. It could be like with
AdChoices, where a user can say, don’t show me ads like
this again in the future. Or it could be an autonomous
vehicle, something like an override function
where the user can say, well, I know it’s
autonomous, but there’s a way for me to take over. And so I think that
notion of control and has it been
baked into the design is a pretty important one. And often, I think there’s
a lot of product design focus on making
things autonomous where there’s no user control. The second piece I want to
mention is transparency. Now, transparency
is one area that gets a lot of when people talk
about machine bias and problems with machine
decisions and so on. But transparency is also a
pretty interesting and nuanced topic. Because especially when
you look at regulators, when they talk
about transparency, they talk about what is known
in the academic literature as technical transparency, which
is the idea that the source code has to be revealed. And there’s many instances where
that has been proposed as a way to address problems. So as an example,
in May of 2010, there was a flash crash
in the US stock market. So what happened was that the
stock prices of many stocks started dropping
quite dramatically. And within 34 minutes, $1
trillion worth of market value was wiped out. And in fact, many
well-established companies like Accenture were
trading for pennies. And the regulators stepped in. They shut down the market,
prevented all transactions from happening. They reversed many
of the transactions. And eventually, the
market recovered. But then this really spooked
the regulators in terms of, how is it possible that a
trillion dollars got wiped out in just 30 minutes? And they started
looking into it. And the analysis looked
at many possible reasons. And one of the big
culprits in the report ended up being high
frequency trading algorithms that saw one big sale
happened and that assumed, without having a causal notion,
something weird is happening. And one algorithm
started selling. And one algorithm sells. Another algorithm observes
that sale and start selling. And they went into a frenzy
and they sold a lot of this. And so the regulators,
the CFTC, which is the Commodities and
Futures Trading Commission, got worried about this. And they approved a
decision which allowed them and the Department of
Justice to have access, without even a subpoena,
to the source code of any of these trading algorithms. And obviously, the industry
was upset and alarmed by this. Because you’re creating
this proprietary algorithms and you’re saying
those guys can just walk in and look at the
source code anytime. And so the industry protested. And eventually, CFTC backed off
and changed that regulation. Another example was
more recently in 2017. A council member in New
York City, James Vacca, he proposed a bill to regulate
automated decisions made by the government. And specifically,
what bothered him was that there were a lot
of decisions being made by the city, like which
policeman is assigned to which precinct, which student is
assigned to with school, which nobody could explain. Nobody could say why this
person was moved there. It was just like, the
software said this, and that’s what is happening. So then he proposed the
legislation which said, OK, any software that’s
used by the city that’s making automated
decisions, the source code should be made available. And again, all the vendors
started protesting and said, hey, this cannot be done. And eventually, they
backed off and they passed another
regulation, or a bill, which is really a non-bill,
because all the bill said in the end was, we’re going to
set up a task force to figure out what’s to be done. But the point is that
they look at transparency, everyone talks
about transparency, and then they back off. Because technical transparency
has obvious problems. Companies create
proprietary software, and then you have to
make it available. It violates your
intellectual property rights. It also makes the algorithms
vulnerable to hacking or gaming. Because for example,
if Google were forced to reveal the
search ranking algorithm, then all these black hat
SEO firms and websites will obviously game it
and try to rank higher. So clearly, technical
transparency is problematic. And the question then
is, what’s to be done? And here again,
it’s a result that is similar to the control
result that, really, we don’t have to go that extreme. And in fact, you don’t
even have as much value from technical transparency
today as you did 10 years back. Because 10 years back, if
you look at the source code, you can see how the
algorithm will behave. Today, if you look at the source
code of some really advanced algorithms, it might
be 1,500 lines of code that doesn’t tell you anything
about why this algorithm behaves the way it does. So there’s no point even
asking for the source code. There was a study
done that looked at the impact of
transparency on trust. So this was done
by a BSD student at Stanford, Rene Kizilcec. And he looked at how
transparency impacts trust in algorithms. And the specific setting
was grading of MOOC courses. And so these courses
are online courses that have thousands of students. And they need to grade the
homeworks and submissions of the students. And you can’t hire a TA to
grade 50,000 or 20,000 homework submissions. So what a lot of them
do is peer grading. Each homework is graded by
three or four of the peers. And then they
average those grades and then compute
the average grade. But of course, that
makes the average grade vulnerable to just
random chance. It depends on which three
or four people your homework got assigned to. So what they did was,
they created an algorithm that would readjust people’s
grades based on the grading tendencies of people. So they would look
and say, this person looks like they’re
strict in their grading. We’ll adjust there. That person looks
like they’re lenient. We’ll adjust theirs
the other way. And so on. And so they created
an algorithm. And then they wanted to
see if students would be accepting of this algorithm. So here, again, they created
three groups of students. One group of students just got
their final grade and nothing else. And as you would expect,
the trust in that system was somewhat low,
because all you got was a final grade without
being told how it was adjusted, why it was adjusted, and so on. A second group got a
one-paragraph description that was just quite
simply, it was adjusted, and there’s a statistical model
that looks at these factors and that adjusts in the
following manner– just a very high level explanation. Trust went up a lot once they
provided these explanations. I haven’t shown the numbers
on the confidence intervals. But please assume that
everything I’m saying is statistically significant. Next, they looked
at a system that gave even more information. It not only gave the
one-paragraph description, it gave the raw scores that
every individual grader assigned. It gave the adjustment formula. And it gave the details
of how it was modified. And then they observed
the trust in that group. And interestingly,
trust actually went down to levels that
were comparable to having no information. In fact, it was slightly lower,
but that slightly lower was not statistically significant. So what was interesting
is that giving people a lot of information really
wasn’t helping with trust. It was just very high level. And that’s all you need
with users in order to win their trust. And so as far as
users are concerned, we aren’t talking
about very high levels of model transparency
or explanations regarding the decisions. But it’s much more high level
to get them those decisions. By the way, I forgot to
mention that everything I said about trust is for
people whose expectations were violated. So these are people who got
less than what they expected. People who got the
same or more, for them, it didn’t matter how much
transparency they got, which is what you would expect. So what this leads us to
believe is, OK, you don’t really need technical transparency. You just need some
carefully calibrated level of transparency. And what does that mean? I think, for me, it means
first informing the user that an algorithm made a
decision– so letting the user know that there is an
algorithm involved. In fact, when Google did
the Google Duplex demo, there was some
pushback saying, hey, the other person doesn’t
know that there’s a chatbot they’re talking to. And then Google said,
OK, well that’s fair, we’ll make sure
people are aware. But there’s a lot
of settings we don’t know where an
algorithm is making– let’s say– a lone decision
and things like that. So is an algorithm involved? What kinds of data does the
algorithm have access to? So for example, if you apply for
a job, just simple information beyond what you
listed in your job application or your
application form, we also looked at other stuff. What else did we look at? We looked at your
social media data– and so information
on what data are being used, what
variables are considered, how important are these
variables, and how much do they matter. And of course, that’s harder
with things like deep learning algorithms, or random
forests, or these algorithms where the programmer
can’t really say exactly what are the
weights for all the variables. But it is possible with
the recent research and focus on interpretable
machine learning. And there’s a lot of emphasis on
two ideas of interpretability. One is what’s called global
interpretability, which is the ability to say
at an aggregate level, at a population level,
this deep learning model tends to focus most on education
when it makes its loan approval decision. The second most important
variable, let’s say, is the income. The third most
important variable is, let’s say reference, and so on. But it kind of clarifies those
weights at an aggregate level. The other is local
interpretability, which is the ability to say,
for this particular decision, the five most important
things were these. And these are the
relative weights. And I think that’s very
achievable with today’s techniques. And that level of transparency
is sufficient for users. Because as I mentioned, very
high levels of transparency actually hurts more
than helps with users. So lastly, what I just said is
as far as users are concerned. But you could ask about
what happens when experts are evaluating the system. They want more than just
these three or four questions that I just raised, which
is, what variables are used and what are the weights. They want a little
more information. And that brings me
to the third pillar, which is the idea of audits. I think companies should have
an audit process in place for machine
decisions, especially in high stakes settings. So identify which models are
making high stakes decisions and then have an audit
process in place. And that audit process
will look at many things. So for example, with
regard to input, that is the data being
used to train the models, it will look at data quality
and rate the data quality. When it’s making
decisions, it would say this is based on
social media data, so the quality is likely lower. The emphasis on
that data will be lower than this other data,
which is rated higher. It would test the
data for biases. The model itself,
the audit would look at alternative models. It would do some basic
statistical tests, like, of course, model
validation for overfitting, but also test for causation
versus correlation. It would look for
things like stress testing the model
against simulated data. So create other data
and then evaluate how the model does
on data that looks really different from
the training data sets. Audit the model and the outputs. For example, can you have
explanations for the decisions at a global level,
at the local level? Can you look at where are the
outliers in the inputs, what kinds of inputs cause
outliers in the outputs? All of that goes into
auditing a model. And as part of
this book research, but also post the book, since
I’ve done a lot of book talks, I’ve talked to various
companies in terms of auditing their ML model. Almost nobody does it. And I think, when
the stakes are high, it’s kind of really
surprising it’s not done. And in fact, I kind of find
it even more surprising that every company has
a QA process for all their regular engineering,
but almost no company has a QA process
for data science. And you look at the developer. You have an independent
test engineer who’s testing the system. But in most settings,
it’s the data scientist who develops the model, who
evaluates the model, stress tests the model, and then
works with engineers to deploy the model. So there isn’t an
independent QA process in most companies
for data science. So I kind feel like
every company should have three lines of defense. So the first is, of course,
what every company already does, which is the model
developer that tests the model and does all the things like
cross-validation and model explanations. But then you have an independent
QA person who is actually stress testing the model. And I bring this
up because I think testing ML models is a bit
like information security. You almost need a hacker’s
mentality to kind of see how to break this model,
all of the different ways in which to break it, whether
it is break it in simple ways, like bias and correlations,
or in security kind of ways, meaning
adversarial attacks and so on. So the second line
of defense, I feel, is a data science QA process,
where somebody independent actually evaluates the model. And the third is where
you have something like an audit for
not every model, but for the few models that
are making really high stakes decisions, where you
have an external auditor or somebody with
the role of auditor in the company who actually
audits the algorithm. So these are some things
that I wanted to share. Of course, I want
to get your view. So I’ll stop right here and
take any questions and comments. [APPLAUSE] AUDIENCE: Hi. Thank you so much for your talk. I don’t work with models. I’m not going to ask you
about that aspect of it. But with regard to
transparency and the first two things you’re talking about,
I’m curious about visibility. How does that play into it? Maybe you show
someone information, but they might have to,
like you said with Facebook, click through a bunch of
forms to get these controls. Maybe we show people
information about why we’re showing them certain content. But you have to click through
some things to get there. It’s not explicit right there
on the search results page, for instance. So how would you consider
that kind of a challenge? KARTIK HOSANAGAR:
Yeah, so I think, when we talk about
transparency, there’s the data, there’s the model,
then there’s inference, which is the things like
weights and what’s important, and then there’s the interface. And I think all four are equally
important to a conversation on transparency. And so one of the things
I didn’t get into today but mention in the
book is transparency along each dimension,
including the interface. And yeah, I think there’s
a lot of companies that will do stuff for
compliance reasons, but it’s all hidden somewhere. And of course, both
transparency and control have very limited impact
if it’s hidden behind. And so I think transparency
in the interface is extremely important. And I think, from
a UI standpoint, that’s a great question. I don’t know the
answer, which is, how do you build those things
to make it easy for people? Even when I said that–
when you give users control, a lot of research shows
performance goes down. So how do you design
these controls for users so they actually are
able to do useful things and they don’t hurt themselves? AUDIENCE: [INAUDIBLE],,
should that be something that’s regulated
externally, basically? Do you think [INAUDIBLE]? How much [INAUDIBLE] doing there
versus someone [INAUDIBLE]?? KARTIK HOSANAGAR: Yeah. I worry about user
interfaces being regulated. But certainly, you could
make a compelling case that the regulation can
be very high level, such as, the falling sets of things
should be conveyed to the user. But lots of regulation,
especially when it comes to privacy, does talk
about those kinds of things. AUDIENCE: And there is similar
regulations for accessibility, for instance. KARTIK HOSANAGAR:
Exactly, exactly. Yeah, so I think
there’s a good case to be made for regulating
the interface itself. But I do worry that,
unless companies actually genuinely want to
do this, there’s always a way to comply
while still making things less accessible to the users. But I do agree. I think the interface is a
really important component of this. And I don’t know if anyone
has great ideas from my user experience design
to address that. I don’t know. Even in terms of privacy, I
haven’t seen very compelling ways to do that. AUDIENCE: Really
interesting talk. Thanks for coming. My main question is
on the framework. It sounds like the idea of
the first two elements– so transparency and control– don’t necessarily
impact the outcome. They impact your
perception of the outcome. So thinking back to
the recidivism example, if you just had
those two things, you would essentially just
have more trust in the outcome, but it would still lead
you to the bad outcome. So I guess, is your hope
that there will essentially be a grassroots change,
or that will somehow raise flags that will then
affect the actual model to change the outcome? I guess I’m curious how the two
will impact the model itself. KARTIK HOSANAGAR: Yeah, I
think that’s a great question. I think, especially
when I showed you the studies on trust,
it’s pretty easy to see how it affects trust. But it’s not obvious how it
helps you detect or prevent those problems. I’ll answer your
question in a moment. But I think, first, trust is
also an important piece here, simply because, when
you see these failures, you tend to see a backlash
against these systems. And the result of
that is, people don’t use the algorithms,
to their own harm. Because actually, the
algorithm, on average, is doing much better than
them and is probably, on average, less biased
than humans, for example. But coming back to,
do they actually help in preventing or
detecting, I do believe so. So I mentioned Facebook
and the terror attack. The detection happened
because people flagged it and because they had the control
and the ability to report back. So I do believe
that there are lots of examples where
users see the problem and they have no
way to report it. Or they don’t see the
problem because they don’t have transparency. And so for example, GDPR,
which is the privacy regulation in the EU, has one
clause in there, which is right to explanation
in model decisions. And if you automate
the explanation and you give the
explanation to a user and he or she sees that one
of the factors that resulted in their loan application
being rejected is their address
or their zip code, then that’s a way
for them to perhaps recognize that
something is wrong and then to appeal
and then to fix it. So I think transparency
coupled with control is also a way for
users to be part of the process of detecting. But I don’t think that
alone is sufficient, which is why I said it’s in
increasing order of importance, and I put audit last. And to me, the most
important piece here, from a detection and prevention
standpoint, is the audit. And the first two are important
primarily from trust, but also, secondarily, from
the perspective of having users be participants
in the process of detection and prevention. AUDIENCE: Question over here. Yeah, thank you for the talk. I have a question about testing
the algorithm adjustment. So the results where you had
higher trust with some amount of transparency but a dropping
trust with higher transparency, that reminded me of
the study, I think, that was done in a library
with a copy machine. And the researchers gave
excuses as to why they needed to budge to get in line. And if they had
any excuse at all, people would
happily let them in. But if they just tried to budge,
they wouldn’t let them in. So how much of this is
actually the real reasons, and how much is just, people are
happy to have some explanation? And then the dropping trust
with more information, is that either, A, they’re
horrified because they don’t understand what’s
going on, or, B, they can actually see problems? KARTIK HOSANAGAR: Yeah, yeah. And I think the
researchers of that study did connect it to
anthropomorphic reasons. And I think the simple
psychology of it is quite simply what
you articulated, which is that, if you
don’t get an explanation, but you’re happy
with the outcome, you don’t care for
the explanation. But if you’re not
happy with the outcome and it comes with
no explanation, you’re obviously upset by it. But at the same
time, if you don’t understand the explanation,
it doesn’t help as much. And so when they had all
those detailed explanations, it was almost as good as not
providing any explanation, because people
couldn’t understand it. But if you actually go to their
middle level of transparency and you read the explanation,
it is almost a non-explanation, in the sense that it’s
an English explanation. It’s not a mathematical
explanation, meaning it’s saying there’s
a statistical model that looks at the grading
tendencies of TAs and adjusts it upwards or
downwards based on how lenient or how strict they are. It doesn’t say exactly
what is the weight or how exactly is it adjusted,
but that’s enough for people. Because intuitively, it
makes sense to people. But the moment you say,
here’s the formula, then it comes down to, my
formula would be different. It wouldn’t be this formula. And so I think
that kind of thing is playing out there, which
is why I think for the users, again, high level explanations
are kind of sufficient. But you could take
these ideas, as you did, to settings which have
nothing to do with algorithms. And I think it’s just more
psychology than anything else. AUDIENCE: Actions
of software systems are often blamed
on the algorithm, but it’s pretty clear
that algorithms do what we build-slash-train them to do. How should we address this
sort of bureaucratic avoidance of responsibility? KARTIK HOSANAGAR: Yeah, I
think this responsibility issue is a pretty interesting one. And just adding to what the
question kind of brought up, just last week, I was just
listening to a forum– [INAUDIBLE] And there was a guest
who said, the reason there are these biases
in the algorithms is that these tech
companies hire programmers and they don’t have
diversity in the programmers. And these programmers are
programming these biases in without realizing it. And I found that
statement to be shocking. And this notion that people
are programming the bias in or the programmer is
actually doing something here is kind of weird. But it comes from the data. And so sometimes there are
people blaming the algorithm and saying there’s algo bias. And sometimes there’s people
blaming the programmer and saying it’s as though
it’s being programmed in. And I think that’s
a huge problem. And even when it
comes to regulators, they don’t understand
these issues. And when I said,
for example, they revealed the source code
and those kinds of things, it clearly shows
that they’re not understanding what’s driving
it, where it’s coming from, and so on. So clearly, education
is a big part of this. And then there’s all this fear
mongering in the media as well, these stories that make
these problems seem to be far worse than they are. I think almost every
study will show that these algorithms on average
are less biased than humans and there are fewer problems
than human decision making. Every study that’s focused
on human decision making shows all the ways in which
human decision making is prone to all kinds of biases. I think the only sense in
which we should focus and worry about these biases
for me are two things. The first is that machine
decisions scale in a way human decisions don’t. So a bad or biased judge
can affect 500,000 lives, but an algorithm used in
courtrooms all over the country can affect millions of lives. So that scale is the
reason for us to worry. And the other thing is that
it’s easier to fix machine bias than to fix human bias. So we may as well tackle it,
because it’s easier to fix. But I think there’s
education issues. And this notion of
what’s causing it and then all the fear
mongering, all that is, I think, a huge issue, which I fear
could cause overregulation. And I worry about that. And that’s why, in my book
tour, I go to a company and I keep saying,
look, all companies should do something proactively
before regulators come at it. Because you don’t have to wait
for something to go wrong. You do something proactively
and regulators stay out of it. But the moment
something goes wrong, they come in and slap
crazy regulations without understanding it. And that’s a huge problem. AUDIENCE: Well, we want
to thank you so much again for being here. We really appreciate it. We’re out of time now. But how about another
round of applause? KARTIK HOSANAGAR: All right. OK. [APPLAUSE]

11 thoughts on “Kartik Hosanagar: “AI Governance and Risk Management” | Talks at Google

  1. tdreamgmail Post author

    Do the Chinese followers know they're taking to a chatbot?

    Reply
  2. Heetendra Rathor Post author

    What I think is that these Biases are actually the reality and In process of making AI unbiased you actually making it Ideal and unrealistic and unuseful to catch the real threat or real candidate.

    Reply
  3. Feeling FeNi Post author

    What an inhuman thing to do. Put a horible chat bot in the web to negativley influence people. Your all going to hell if a place realy exists.
    Do you hear how stupid you sound.

    Reply
  4. Scroo Goo99le Post author

    Fuck off Google.
    You are done.
    We need to resources to be better, wiser, freer humans – not controlled bots, shills and machines.
    You shat on our principles of law and constitution, tried to enslave us in your techno-fascism, and now the reaction is coming in.
    Friendly advice: if you work in big tech, gather evidence so you can flip and be protected from the fallout.
    Hidden truth always gets exposed, and those closest to it (and profiting from it) are the last to see it – and much at risk in the end game.

    Reply
  5. D LJ Post author

    Good talk. A larger framework for Information Governance already exists, of course… COBIT2019.
    (I'm biased, of course, as I'm a COBIT trainer 🙂

    Reply
  6. Molossus Post author

    My fear is humans will become too dependent on AI and loose their own abilities to trust their expertise.

    Reply
  7. Beumadine Sweevy Post author

    I guess they should continue Distractions & Ignoring LOST JOBS EVERYWHERE, eh. 👌🤘Natch!

    Reply
  8. akiva private Post author

    5:44 How the computer knows if the woman is African American?

    Reply
  9. Terpe Hambon Post author

    How about humans rising up and refusing AI, banning it. Breaking it. Rmoving it. Refusing it. Why do we need another competing thing on the planet? Why do we need what they are aiming with all this transgender rights smokescreen as an AI rights, another effectively race pr being as they will sell the guilt of turning it off? Are people honestly this dumb to welcome this evil force of control? Thais all it is EVIL and CONTROL.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *