Checklists and Principles and Values, Oh My! Practices for Co-Designing Ethical Technologies with Michael Madaio


Michael.png

What are the limitations of using checklists for fairness? What are the alternatives? How do we effectively design ethical AI systems around our collective values?

To answer these questions we welcome Dr. Michael Madaio to the show. Michael is a postdoctoral researcher at Microsoft Research working with the FATE (Fairness, Accountability, Transparency, and Ethics in AI) research group. He works at the intersection of human-computer interaction and AI/ML, where he uses human-centered methods to understand how we might co-design more equitable data-driven technologies with stakeholders. Michael received his PhD in Human-Computer Interaction from Carnegie Mellon University, where he was a PIER fellow funded by the Institute for Education Sciences and a Siebel Scholar. Michael, along with other collaborators at Microsoft FATE, authored the paper: “Co-Designing Checklists to Understand Organizational Challenges and Opportunities around Fairness in AI”, which is one of the major focuses of this interview.

Follow Michael Madaio on Twitter @mmadaio

If you enjoy this episode please make sure to subscribe, submit a rating and review, and connect with us on twitter at @radicalaipod.



Transcript

MichaelMadaio_mixdown.mp3 transcript powered by Sonix—easily convert your audio to text with Sonix.

MichaelMadaio_mixdown.mp3 was automatically transcribed by Sonix with the latest audio-to-text algorithms. This transcript may contain errors. Sonix is the best audio automated transcription service in 2020. Our automated transcription algorithms works with many of the popular audio file formats.

Welcome to Radical A.I., a podcast about radical ideas, radical people and radical stories at the intersection of ethics and artificial intelligence.

We are your hosts, Dylan and Jess. In this episode, we interview Michael Madaio, a postdoc with Microsoft Research, working with The Fate, which stands for Fairness, Accountability, Transparency and ethics in A.I. Research Group. Michael works at the intersection of human computer interaction, A.I. Machine Learning and Public Interest Technology, where he uses human centered methods to understand how we might equitably code design data driven technologies in the public interest with impacted stakeholders. Michael, with other collaborators, authored the paper code designing checklists to understand organizational challenges and opportunities around fairness and AI, which is one of the major focuses of this interview, specifically related to that paper and beyond.

In this interview, we cover some of the following topics, including what does it mean to Ecodesign Technologies? What is a checklist when it comes to the field of ethics and why is it so important? What's the problem with A.I. ethics principles released from large tech companies? Is there a problem? And how do we fix it? What are the limitations of using checklists for fairness and what are the alternatives?

This interview is actually particularly special to Dylan and I because it has been in process for months now and we actually started planning this interview before we even launched the podcast back in April. So Jen and Vaughn, who we recently had on the show, tweeted in January about how this paper was accepted to and one one of the best paper awards. So after we read the paper, we just had to reach out to all of the different co-authors on the paper, including Hannah Wallach and Luke Stark and of course, Michael Amadio. This interview is with and it took quite a long time for all of us to finally get together just due to covid craziness. But we are so excited that we finally got the scheduling figured out and were able to sit down and chat with Michael just about what the heck checklists really are. So without further ado, we are so excited to share this interview with Michael Medio, with all of you.

We're on the line today with Michael Midale Michael, how are you doing today? Great, great. Thanks for having me. Absolutely. It's great to have you here. I was wondering if we could just get started with a little bit of who you are and your story and your journey to the work that you're doing right now.

Absolutely. So I'm a postdoctoral researcher at Microsoft Research in our research group on fairness, accountability, transparency and ethics in I so my my background by my training is I have a Ph.D. in human computer interaction.

But originally I was a teacher, public school teacher for a number of years and it kind of made my way obliquely into research and thinking about questions of access to technology for learning and teaching, and then thinking a little bit about how do we design those in human centered ways involving students and teachers and parents and then sort of widening the scope out, thinking a bit more broadly around questions of equity and ethics and fairness, which led me here to Microsoft Research.

And I saw before that you were a high school teacher in your previous career. So did that have anything to do with your interest in influencing people's access to data literacy and then also people's access to technology in general?

Yeah, yeah, absolutely. So, yeah. So as a public school teacher, the high school level and at the time was this was about twenty twelve, I was realizing that that children were coming to school with more and more technology and had access to it at home and so wanted to think about how could we actually use these to allow students to continue learning on their own and pursue learning in different ways. And as a teacher wanting to use those to improve my teaching also so that that led me into computer science and HCI research. But I think one of the questions for for me, like as I went back to grad school into a master's program and then to a Ph.D., I kept running to these questions like, well, how do we design these education technology systems in a way that gives learners some agency and some control over their learning and over the kinds of technologies that are used in their learning process? And are there some technologies that may be harmful or may not be beneficial or that students might not want, even if they might have, if they improve their learning outcomes on a test, if the way that they go about doing that is undesirable or is inequitable or maybe allows some students to perform better than others. What are these sort of larger unintended consequences?

So for when we ask people, you know, their stories every time on the show, like every episode, and the one thing that we've learned is that there's no pattern. Right? So like everyone comes on and is like I have a really unique story about how I got here. And that's the case for you as well, coming in from a second career. And I myself also came in from a second career that someone might think is totally disconnected from this ethics or responsible tech space. But actually there's a lot of skills that are transferable. And so I'm wondering, before we get really a deep dive into your research, if you could talk a little bit for those folks out there who maybe are teachers right now in the public school system who are looking to get into the responsible tech or the ethics field, like talk a little bit about either what that journey was like or maybe like what barriers, but also what gifts you think that that first career has brought into this work for you?

Absolutely. Absolutely.

I think for me, it was a lot of a lot of reading and following some some academics and looking at who was doing the work that was inspiring to me. And then thinking about what? I would be excited about doing in the short term, so initially I actually went into grad school thinking, OK, I'll come out, I'll work at a tech company or VCs or like learning engineer at a school system, and then realizing that, like in this process of taking courses on technology, design and education, technology is opening up all of these other lines of questions and years of teachers are coming in having read Paulo Freret and this idea of of learners being drivers of the learning process. Right. Active participants that they're not just sort of these empty receptacles or banks to be filled with knowledge and then sitting in on some of the courses where some of the rhetoric around learning is so very much this this idea that the content the learners need to know and they and this technology will sort of disseminate it to the learners. So this tension and I think also this very. Techno deterministic view of learning that if we have technology, we have a problem, we can solve it with the technology that we have at hand, right. The sort of like hammer and nail idea, like if you have a hammer, everything looks like a nail. If you have an algorithm, everything looks like it's a problem that can be solved with an algorithm. And quite frankly, that's often not the case. And so. And I think. Four for you, you're asking about what was helpful. I think awareness of the larger social system and political and historical legacies and issues and factors in education specifically I think was so, so helpful and gave me such a helpful perspective in conversations around tech and technology design.

So one of the reasons why we wanted to talk to you specifically is because of this amazing paper that you and your colleagues in the faith group had published entitled CO, Designing Checklists to understand organizational challenges and opportunities around fairness.

And I write in I which which is a lot of really great words, but it's also a lot of words.

And I am part of this conversation is just breaking that down. But before we dive into the paper itself, I was wondering if you could talk about checklists, because checklists, it's been this word that just and I have heard from a lot of folks about whether they're good or bad or whether there are too many of them. And sometimes I feel like those of us who are in the echo chamber forget to even frame like what the heck a checklist is in the first place and why why did it exist? Why is the framework, etc.? Could you just say, like a checklist one on one in terms of ethics and fairness? Like what do we mean when we say checklist?

Yeah, absolutely. So I want to answer your question. It's kind of a roundabout way and maybe help give some perspective on what what motivated some of this research. So about five years ago now, in between a master's program and a PhD program, I was doing this this data science for social good fellowship out of Georgia Tech in the city of Atlanta, where we were working with the Atlanta Fire Rescue Department to help them use some of their data to improve access to fire inspection and other types of services.

And and as as part of that and after that fellowship, when I was my PhD at CMU, I and others is working with we're thinking about, well, how do these types of civic data reduce or contribute to inequities in access to these kinds of services, like which communities are prioritized and getting access to fire safety inspection services or which ones have historically been overlooked for various reasons? How does that show up in the data? And this is something that the partners he worked with at the fire rescue departments in both Atlanta and in Pittsburgh raised as issues like certain communities, wealthier communities may have more robust data available, and thus the models might be better able, might be more accurate, because there's more data available and thus more accurate models. The services might be better prioritized or might be able to pick up on various risk factors for building fire risk and things like that. So that was those a couple of different projects, a couple of different questions. But the reason why I bring it up is because as we're going through this process and as the data analysts from the city governments, from the fire departments, we're working with our HCI and machine learning and data science teams, we kept running these questions of like, well, how do we.

Think about. These how how historical inequities. Or become perpetuated through predictive models, through the kinds of data that we have access to, right. So these kinds of questions that that are are have been percolating over the last five years. And we wondering we were running into like, well, there are these principles. And certainly now in the last two or three years, there are many, many principles of value statements for what makes ethical AI or equitable or fair AI, but. And others have pointed out this this disconnect, too, that these principles are often at such a high level of abstraction that in the day to day practice for a data scientist or a machine learning engineer, they're often difficult to to operationalize. You've got to put into practice and say a friend and colleague of mine, Can Hosteen, who who did an internship at Microsoft Research the summer before I was there in twenty eighteen. Him and Hannah Wallach and Jen Wartman, Vaun and others at Microsoft Research did some some work with industry practitioners where they realized that many teams often only realized they had issues with their products being inequitable or unfair after the products were deployed in the world.

And that's even with having principle statements and value statements for what fair systems might mean, because in the day to day, the nitty gritty of choosing which data sets how you process those data sets right decisions. These are micro design decisions that data scientists and engineers are making at each step of the process might introduce or compound or simply overlook issues of equity.

And so to kind of circle back to that example with the fire rescue department, the Atlanta Fire Department, we're working with them, had access to a commercial real estate data set which had information, for instance, about sprinklers in various properties, and that contributed to making a predictive model of building fire risk more accurate. And this seems great, except that that data set was only available for a subset of problems. And if you have to guess, you might guess which which communities have access to this kind of data. And often historically, it's wealthier communities. And there are, again, historical inequities in which in marginalized communities or impoverished communities having lack of access to public services and thus less data sets to to to train models.

So this is just one example of a place in the process where decisions might either overlook historical equity issues or not. Sorry.

So maybe long winded way of circling around to your question of why checklists. So you have principal statements, but. During the process, it's difficult to identify decision points and choices that need to be made.

So we look to other fields where these types of complex decisions get made in high stakes fields, like in medicine, like in aviation, structural engineering, where they use checklists for a variety of purposes. So these are also used in software engineering as well, but for perhaps different purposes.

So let me zoom up a little bit. There are different types of checklists that either serve as like a memory aid to remind people to do certain things like is the landing gear down or your flaps raise these sorts of things for like for an aviation context or in surgical safety.

The World Health Organization has put out checklists for for instance, for an anesthesiologist and a surgeon prior to inducing anesthesia, anticipating potential risks or harms to that particular patient, making sure that one there's confirming like, is this the right patient and the right procedure for this patient, these types of things.

But those assume a set of known risks and known steps in the process.

And the checklist in those cases is serves this role mentioned memory and reminding the practitioners to complete some set of tasks and actually think for fairness and ethics. And I it's not quite as simple or straightforward. And I think we there is a risk of treating fairness and ethics and equity as this fully mapped and known space. And we can reduce it to a set of just binary. Yes, no questions that are meant to remind people of things that they should probably already know when actually it's significantly more complex than that. Each sociocultural context might have unique factors. Different groups of stakeholders might actually fundamentally disagree on what the intended outcome is that everybody likes to throw around for good. But what is good, mean, good for whom and who? Who's defining that? So. As we were discussing and reading is getting literature reviews on these ways, that checklist can use in other fields, we recognize that there are some fields that are highly complex and highly situated and locally determined. So, for instance, structural engineering checklists in structural engineering are often used to structure communication between different teams or different members of the team involved.

And that might be because soil conditions are different than expected when the blueprint was drawn up. Or and maybe the foundation isn't quite sitting at the right level than as they expected when they drew the plans up.

So a checklist in that case might connect maybe the engineers who are developing the foundation with people doing a survey of the soil composition and things like that.

So the idea was that for after I design and specifically, checklists might serve a similar role to coordinate communication, to make sure that certain conversations between team members, across teams, between teams and stakeholders that have an interest in the particular problem or are impacted by the systems that those conversations take place.

What I'm wondering then is if we're talking about like the domain specificity of different checklists and then we talk about I as one of those domains, I see so many different subdomains of A.I. And so how do you make a checklist for, quote, A.I. fairness that would work for like medical procedures, loan approval or denial recommendation, fire department resource allocation like a one checklist. Is it 10 checklists? How does that work?

Yeah, absolutely. And so so let me walk you through a little bit of what we did have, starting with that is the motivation for the research. So this was work we did last summer and 2019 and we worked with but about 50, 48 practitioners from a variety of different organizations, tech companies, et cetera, in a variety of roles. So so some of these guests were data scientists and engineers, but others were or designers or researchers or even product managers, program managers, consultants or even we also talked with some people who do annotation work or the sort of content editing or labeling work that goes into some of the data. And in part, we were we want to investigate. Well, what what are their existing processes for thinking about issues of of fairness in their products and their systems, what role they might envision checklists might play, but also what concerns they would have about checklists in their work. And then we conducted code design sessions and of course, if its design involves Post-it notes. So we broke down in a lifecycle onto a wall and had posts where people could add suggestions for what should go onto the checklist and at which stage or what phase of the lifecycle and then comment on other people's and suggest others support for deletion.

And so what we ended up with at the end of the summer research and what we ended up publishing, and we did publish the the actual final checklist that resulted from these design sessions as a supplement to the paper is something that could be adapted or customized by different teams. And I think just you're right on that. We as as a group of researchers would not want to wouldn't presume to develop a checklist that would be appropriate for every single domain, for every single use case or application area that we had participants from computer vision, from natural language and speech systems, from search and information retrieval systems and other sort of classification models, different types of models of use cases, and in part to provide what some of these cases might be like, which types of prompts or questions in a checklist might not work at all for for your domain or for their domain. But at the end of the day, this will need to be customized or different teams.

Different application areas will need to develop some type of guided process.

Or at least the version that we put together would need to be customized by by different teams. And this is this is equally true for for the medical domains, for aviation as well. And there's not just one medical checklist right there. There's some for surgical safety, for procedures with anesthesia. There's some form in diabetic hypertension for for various procedures and zem for aviation for different steps in the process. There's routine checklists, there's emergency checklists, and even different airlines have their own. So that's something that we saw when we were doing the literature review, that as when when airlines merge or different companies merge, one critical challenge is how do they merge their flight, their aviation safety and checklists, and which items are appropriate for different models of aircraft.

Right. Or for the unique cultural and social dynamics of that particular airline of pilots and copilots and how they work for that particular airline versus another.

So these social and cultural factors were certainly things on our mind and certainly things we were trying to uncover through the research we were doing to.

You have me sold on checklists, but there are a lot of people who are not necessarily sold on checklists, and I imagine that part of the background info and one of the reasons why and motivations for this paper is because this the even idea of a checklist is still. There's energy around it, right? And so I'm wondering.

What what are I guess, what is the research that this paper was in conversation with and for people who take issue with even the concept of a checklist for ethics specifically, like what is that side of the argument? Yeah, absolutely.

And in part, this was responding to several organizations, some companies, some government agencies, putting out checklists of their own for data science, ethics, for ethics, and recognizing that as as well-intentioned as they were, they often treated ethics and fairness as this this binary. Yes, no sort of thinking, binary thinking, but also treated it as an individual problem and as an individual's responsibility. That is. So an individual data scientist might look up this particular company's checklist or government agencies checklist and go through it on their own and address these issues as framed as this individual issue. But others in my research. So Colin Gray is is one researcher who has done quite a bit of work on user experience practitioners and how they think about and grapple with ethical tensions and ethical responsibilities in organizational contexts. And so actually, this is one thing that we found in our research also is that, you know, many of the people we spoke with recognized that there were risks to fairness, that there were potential biases and societal issues that that were amplified or reproduced in their systems. I mean, this is part of the zeitgeist now, but.

Many, many of them articulated that right now.

In their organizations, the burden was on the individual to speak up, to say, hey, I think there's an issue here. I think our system might not perform as well for people of color or for women, etc., etc., and with that individual responsibility.

To speak up, there were often social costs and consequences either. It was the same member of the team who felt like they were always the one to to bring up ethical or questions or questions of fairness, often marginalized members of the teams who felt that they were tokenized by their organizations. And and those social dynamics had often our participants felt that they had career costs if they were always the person who wanted to slow down. We can't ship this product yet because we have to evaluate its disparate impact. It is this better performance for different groups. Well, that's going to cost their their company money if they're not shipping that product or if they have to collect new data and target data collection from different cell populations. So. Our participants felt that having some kind of organizational framework, organizational infrastructure, some kind of guided process, and that is where we were trying to target with the checklist, that would be something that would give members of teams some framework to to latch on to and say, well, wait, wait. You know, we're in this meeting here in our design spec meeting. Have we asked these questions? Have we considered how our system might perform worse or for different groups of people or how have disparate harm or benefit for four different groups? Have we met with stakeholders from those populations, from those communities? Have we solicited their input or have we asked them to even want the system to exist at all? And having some set of questions, some set of checks that that structure of those conversations at these different moments, what in the medical checklist research has been called these pause points.

So before inducing anesthesia, you pause and you confirm these things before your incision, you confirm these things. And so we, through this research, identified with practitioners what some of these points might be like when they were first putting their their system design specs together and their plans for data collection and before the launch or before they ship the product. And then what sort of ongoing review might there be? But, you know, it's I think sorry, all of that to say that that for us, for the practitioners, having some framework would help catch issues that might not otherwise be caught in the process. But there is, I think, this concern for our participants, for us, for others in the field, that a checklist in either by the name or the way it's designed might reinforce this binary thinking, might reinforce the sort of procedural thinking like, OK, we we know what the risks to the ethics of us are. Let's just go down the list and check them off. And if we're if we have all the boxes checked, then we're good when that's almost certainly likely not to be the case.

Yeah, that's actually thank you for specifying that, because my next question for you is going to be, well, does this mean people just get the check off a box that says, like, all right, all these boxes were checked, so check on fairness, but it's clear that that's not exactly the case. And what I'm wondering for you actually is it seems like it would be really ideal to incorporate within the machine learning development lifecycle, as you were saying before, like, you know, before doctors put anesthesia, anesthesia on their patients.

They have this set of checklists that they need to check off. And if the same were true with machine learning development, that would be really amazing. And now we have this set of things that can be checked off that are really great to include in the machine learning lifecycle to help with these concerns for fairness. So how do we get those enacted in the teams and the development teams that are creating these models and who are making and building this A.I.? So how do we go basically from theory and this wonderful paper that has been written to practice and basically enact all of the work that was done and make it happen?

Yeah, I think that's a great question. I think that is a crucial question and I think it's an open one. So I would say in part that these are the recognition that these are socio technical issues. Right. That even though the the models themselves, the data use like those are technical aspects of these systems, but those systems exist in in social contexts and social settings where even if in your with your training data and your test data, you haven't identified any performance differences. OK, well, once humans out in the world are using this, the use of the system might lead to inequities or might lead to to differential harm or benefits for different groups. You can also imagine a language system that in isolation, if it's generating language, doesn't the the developers or designers might not identify issues. But once it's out in the world and people are interacting with maybe it's a chat bot or a language generating model and people are contributing to it, then it might respond in biased ways. When you look at things like like three, for instance. Right. This this language generation system that people are trying out and probing on. And I and certain. Seeing some of these biases present in the underlying language model. So I think that that's one example. There's others of chat bots that have become sexist and racist or produce sexist, racist language after people were interacting with them. So I think there's many, many things that cannot possibly be anticipated prior to deployment. And the goal for us, and this is a longer term goal is how do we support teams and anticipating as many of these types of harms as possible prior to deployment. And so there's research and methods from from user experience, from HCI, from design, things like speculative design and critical design that are intended to engender this kind of of future focused thinking of what could go wrong here. And even some academic conferences are requiring authors to be pessimistic and not just have these are rose tinted glasses.

So the in the the ACM, the Association for Computing Machinery, there is a group that called for essentially this sort of removing the rose tinted glasses and encouraging researchers, but also product designers to be a little bit pessimistic. There's others like Casey Fischler out of Kubota and some others who are working on this kind of speculative ethics. Like how do you associate she's referred to these as black mirror writers workshops. So how do technologists kind of anticipate all the ways their systems could go wrong?

And so I think it's going to be some combination of the two right of technology designers in combination with stakeholders who are impacted by these systems and working together, code design and participatory design process.

Wall systems are being developed to anticipate the ways that could go wrong and flag potential harms and then ensuring that there is ongoing monitoring, ongoing review of systems once they're deployed, that it's not just left up to like the telemetry of scraping data of model performance and the sort of business metrics like revenue and speed of things and maybe even accuracy, but that actually you're conducting maybe more human centered or more qualitative evaluations of these systems out in the field.

Yeah, this definitely hits close to home for me, because as you already know, Casey is one of my co advisers for D program and I love her. Blackmore's Writers Room exercisable link that in the show notes for this episode.

And so based off of that, actually, I'm wondering if you see it realistically, pragmatically, but also maybe optimistically. If you see companies like Amazon, Facebook, Google, you know, the top five, 10 tech companies and their machine learning teams sitting down in a room and speculating with each other about the harms of their machine learning technologies and wanting to incorporate a checklist like this. Like, do you optimistically see that happening?

I would love to I would love to see tech companies both.

Introducing slowness or caution in India as a value into their design process that.

They should stop, pause and solicit input, get perspectives that they might not have on their teams prior to shipping, prior to launching. I think my maybe the cynical side of me recognizes that. Large companies in a capitalist society are driven by profit incentives and maybe one way to shift the model, shift the conversation.

I think that there may be a couple of ways one might be top down either regulation or standards. So there's there are standards in other fields. There are these thought processes and policies in other fields, certainly the FDA. Right. And you could imagine and others have called for I think the I now institute has been one of them. Others have called for for similar types of regulatory bodies, for algorithms, for for machine learning and AI systems. And of course, there's questions like what would that sort of live in its own separate agency? Would that be distributed across the different domains that these models and systems work within? How do. We and I suppose we to work more broadly, how do people in communities impacted by the systems affect change for those and prompt that to regulation standards? So that, I think, is where maybe the other lever or the other side is this bottom up pressure, whether that is bottom up pressure just from from society at large to critique companies that go wrong. And you can see instances of this with data journalists. And ProPublica is one great example and certainly in authors, people writing about these issues. Right. Safiya Noble and Benjamin and many, many others that I think has sort of popularized the the awareness of these issues. And you're starting to see the public responding to that and criticism of large tech companies.

I think it remains to be seen.

The extent to which that criticism translates into actual meaningful change and not the sort of just ethics washing where where companies might. Yes, but put out a principled statement, or they might have publicly announced that they have an ethics board or something, but it's not clear how that actually leads to meaningful change in their design process and in the products themselves. And that bottom up pressure might come from just the public or may come from from people within the organizations. And you see movements like tech won't build it. And no tech for ICE. And some of these tech worker coalitions and tech worker organizing either go on strike to protest decisions for their companies or targeting at university. So at my university at Carnegie Mellon, there's a student group called No Tech for Ice that would increase awareness of which tech companies are contributing to ICE. And so I think that that's one other bottom up pressure.

That's one of the things that we absolutely love about this paper and about your work on checklists is because it does get to the heart of this urgency around ethics. And one of the things I've been struck by in this conversation has been the checklist that you're comparing to and drawing from on research like you're going straight to the medical world and you're going straight to airplane design and things like life and death, checklists, essentially, like literally. Right. And then comparing it to the ethics checklists. And so I'm wondering for you what what is the urgency here in terms of ethics that we've talked around it a little bit. But in terms of like putting a finer point on why are those the checklists that you went to in terms of what what is important here when we start designing for ethics and really what's at stake?

Yeah, I think the stakes are incredibly high as our systems are used to drive decisions and everything from who gets access to medical care, to which communities receive, which types of public services, at what priority, whether that is, for instance, in the case of like fire inspections and community risk reduction. Other examples of like housing rights and tenants rights organizations advocating against computer vision and facial recognition systems being set up. So this is in Brooklyn, for instance, in one public housing project, they were using the housing project, which was using computer vision or facial recognition to determine who was granted access and not.

And certainly with all of the issues in facial recognition being less accurate for people of color and women in particular, very like even like who actually gets to go inside their house and and the surveillance of people and particularly marginalized communities and historically over surveilled overpoliced communities. But our systems can exacerbate those those issues and they cover them with the sort of veneer of objectivity that even whether if the the decision was made prior to the algorithms may have been biased to themselves already. But now there's an algorithm involved. It's data driven. It's evidence driven. This is the rhetoric people use. Right. And then. Oh, well, it's decisions made by a machine. It must be trustworthy. And that's certainly not the case. So so I would argue that the domains that they're being used in are high stakes. And the opacity, I think, is a critical issue, that decisions are made that shape people's lives that those people don't have access to and often are not able to contest or have recourse to.

So I think it's fairly clear at this point that these technologies are incredibly powerful, and I want to take us back actually to something you said at the beginning of the interview and that your original research in this space was to design HCI systems that empower the users. And that was even something that was brought up a bit in this checklist paper to shifting that power from the model and the engineer back towards the end user and those who are most negatively impacted, especially the vulnerable communities.

So do you first of all, how do you define the word radical as we talk about power and you're on the radical AI podcast, and do you think that this work that you're doing with checklists is radical?

So, yeah, I love that question. I think for me, radical means starting with people first valuing people over profit. And that takes that could take a lot of different forms that that could take the form of questioning whose voices are involved in designing technology and A.I. systems, specifically how historically marginalized groups of people might contribute to to shaping those systems and what form that takes. So I've been thinking a lot lately about this question of of. Power and organizational power, right, because I think. In a research context, in HCI research, we love to promote human centered design and participatory design, participatory research. But in practice, in in large tech companies, in organizations, often people are involved at the end of the process, so there maybe consult, maybe they're consulted. Technology is built, an algorithm is built, and it's given to some group of users or stakeholders and they're consulted in their opinion of it, rather than bring them in early, often giving them the power to refuse. What what would it mean for a group of students to say, no, I don't want a online learning platform that tracks my movements, for instance, to determine where where my attention is? Am I paying attention to the reading or the teacher or these sorts of things? And this is a type of system model that exists. These are out there. These are being commercialized. So so I think for me and I think going back to to to to my strengths as an education and as a teacher, the reality of it is that that. Dimensions, the dynamics of political power, social power are such that. Historically, disempowered groups are still not given a meaningful voice in the design of technologies that will impact them.

And I think it's.

It's both a question of methods of how do we do this and that, I think, is where there is significant work in HCI, in the research community.

But I also do really think it's it's organisational to and that is really one undercurrent underneath the checklist work. And the larger research program around this work is like, well, what does it mean for the organizations to shift their thinking? And and how do I think that is the radical piece? I'd like to think that's the radical piece of some of this work is is introducing this this friction into the design process, into opening up opportunities for four stakeholders, voices for community voices in the design process, making it not just acceptable, but actually valued for people on the design teams to raise a flag and and raise awareness of potential risks or threats to fairness and equity and to bring in stakeholders. And again, just coming back to, like, making it acceptable, making a it a part of the process to have an exit ramp and say, look, we we thought this was a good idea to build this thing. But after considering these potential harms, after talking with the people who might be impacted by it, not just the user, if it's not not just a teacher. Right. It's educational technology, but students, parents, others involved in their lives and say, all right, well, we thought was a good idea. We've invested all this time, but we shouldn't build this because it would actually cause more harm, but it would help.

As we move towards wrapping up this interview. First, we were just wondering if you could tell listeners where they can find out more about your research and about you if they wanted to connect. And then second, I was curious if there was one thing that you had wished that you had known when you started this journey into what that might be.

So I think when I started the journey into, quote, a I for a social good and data science for social good. I think I would have wanted to have a more holistic view of the. Theories of methods of of what that actually means, maybe to take a more skeptical view, I think, you know, I think going back to the idea of like the rose tinted glasses, right. As as a former teacher and then going into computer science and HCI and data science machine learning and. Oh, wow, these technologies are so cool. They could do so much good. Let's build them. At the time I would have wanted. Now, looking back, I wish I think I had been a little bit more skeptical of the premises and reached out a little bit more beyond the the the fields of data science, of machine learning into some of these more interdisciplinary fields like philosophy, like science, technology studies and.

And others are like sociology and some of the domain areas impacted by some of these fields. So I think that's that's certainly one one piece. Yeah.

And yes, you're welcome to check out. So that was maybe a little bit of a downer to end on. But but I do think it's.

Part of the journey that I've been on the last five, six, seven years in research has been moving through and within different communities and. I have been so excited and inspired at all the amazing work that's happening in the research community and within some tech companies, I think especially here at Microsoft, I, I acknowledge that I am biased because I am working here. But I did come here for the postdoc for a reason, because I think that we that this lab is doing some really exciting stuff and I am really thrilled and grateful to be a part of it. So I would encourage you to check out Microsoft Research Labs were generally and then specifically the research group. And yeah, there's there's lots of exciting, cool work going on around the socio technical understanding of how fairness is thought about incorporated into design.

And for for you specifically, is there a website or a Twitter account or something you would like to plug for folks who want to follow up with you personally?

Sure. Yeah. You can find my website on Michael Amadio dot com.

I can send that over to you for the show, so we'll be sure to include all of that and from skepticism to optimism. Thank you so much, Michael, for sharing it all. It's really been a pleasure to have you on the show.

Thanks so much, Robert.

We again want to thank Michael Medio for sharing with us his expertise on the subject matter of checklists and so much more, but as we begin this debrief, I was wondering just if we could talk some more about checklists, because I find checklists fascinating. What do you think about checklists?

Yeah, let's talk about checklists.

Checklists are so fascinating. And it's funny because the word doesn't really sound super sexy, like it's a checklist. But it's really interesting in the field of ethics, especially in this paper that Michael was telling us about, because the way that they created this checklist was different than so many other checklists that I've made. And also for those who are listening, maybe keep a count of how many times we say the word checklist in this debrief, because I assume it's going to be a lot. But I think the thing that I'm immediately grappling with right now at the end of this interview is something that Michael was actually saying towards the end there. And that's how important it is to get feedback from the people who use and are impacted by A.I. systems before making it decisions about how to design them. And that was what was so interesting about this paper, is that they had these participatory design workshops and these code design sessions with people who are actually impacted and harmed by the technology before creating these checklists and these guidelines for the technologists. And it makes me wonder how different technology, and especially I would be if every single time we made an algorithm, we talked to the people that the algorithm was going to impact before even thinking about designing and creating it.

And I think part of that part of why it gets lost, like when the user centric experience that we're trying to design for or that artificial intelligence systems or even entire companies, that these checklists are trying to express values for. Right. To inform the development of these systems. I think where it gets lost is actually in the fact that it's trying to express values.

And at this, like, high level of abstraction, it's like when you're designing a checklist for your like to go grocery shopping. Right. It's like pretty specific. You know, you're going to get bananas, you're going to get eggs, you're going to get milk. Right. But if you add like bananas, eggs, milk, fairness. Right. Is the fourth thing that you're going to pick up from the grocery store. It's a lot harder to to actually get that fourth thing. And so then it's a question of like, what do you do? I mean, in order for it to be like a one page checklist, like functionally you need it to be abstract enough where it can cover some ground. It can't be that specific. And at the same time, you need some level of specificity. So it has some level of of teeth to it or at least some ability to communicate what you need and actually allow engineers or developers or whoever to to use those values in the real world thing. So it's not just ethics washing, it's not just marketing for ethics, but it's actually OK, here's our values and here is how we're going to apply these values into the work that we're doing. But that's the hardest part.

Oh, my gosh, I love that grocery store. A checklist example. I wish that we had started off the episode with that, actually, because that actually makes checklists, makes so much more sense in my head.

Of course, it's obvious we use checklists in our everyday lives. It may be a little bit more difficult to understand when we apply it to, you know, A.I. engineering and design. But it's definitely the same idea. And this is something I was thinking about a lot. While Michael is speaking, especially in the ethics community, I feel like we have this tendency to really think abstractly and vaguely about the ethical issues that come up with A.I. technologies, which isn't necessarily a bad thing because it allows more people to join the community and join the conversation.

But if we're talking about issues of racism and sexism and oppression and we continue to keep these problems at this vague high level of abstraction, as Michael said earlier in the interview, then what does that give to engineers? All they're going to do is just get angry at the ethics community for saying, well, OK, we get it like you're angry, you're pissed at us for for quoting what we're coding. But what do we do? All we know is just to sit down in front of this computer and write lines of code and to, you know, take the data and make the model and tune the model and optimize and test it. So where in that pipeline do your levels of abstraction and critique on the ethical conundrums of A.I. fit into the pipeline that we that we currently work with in our day to day lives and in our jobs? And that's what I love about checklists, too, is because it takes those values like you were saying, it takes that fairness checkbox. And instead of just saying make it fairer, make it less sexist, make it less racist, it's saying no, follow these steps, do these guidelines that we laid out and start to. Think intentionally about the systems that you're creating.

Yeah, I do think that one thing that came out in this interview with Michael, too, though, is that not all checklists are created equal. Right. Like there are some checklists that are designed to do certain things and there are other checklists designed to do other things. And so as much as we want all of our our industry checklists to be these things that are really helpful and not just smoke and mirrors and like actual embodiment of values, they're not all like that. Right. And I think that we need to hold these companies to high standards, not just have these checklists. Right. Because it's it's easy to talk about fairness. And it's a whole nother thing to actually design around fairness and implement it. And I think that's where, you know, public policy comes into play and where responsibility and accountability come into play. And really for all of all of us who want to get in the heads of these companies and their checklists, I think we can all, you know, engage in a pretty easy thought experiment of just like naming what our values are. Right. Like, it could be really tough to just name what our core values are as as humans, as people in the world. And then to actually think about, OK, well, if I were to list those out and talk about, like, how I operationalize those values, like that's a hard exercise, that's that's like a hard thing to do, which doesn't mean we shouldn't hold those companies accountable to do it, but it does mean that it's difficult. And there's a lot of moving parts here.

Yeah, I'll even take your challenge one step further delineate. And I will say I'm going to call out right now any engineers and data scientists who are listening to this episode. I encourage and invite all of you to challenge yourselves the next time that you are dealing with a data set or creating a model or tuning a model or testing or whatever it is that you're doing at your job, don't think of those rows of data as numbers to optimize for, but think about them as people and recognize that what you're optimizing for might not necessarily just be a number on a screen, but it might mean someone's livelihood and consider how what you do with your code might impact someone in their real life.

For more information on today's show, please visit the episode page at radical IG.

If you enjoyed this episode, we invite you to subscribe rate and review the show on iTunes or your favorite podcast to join our conversation on Twitter at Radical iPod. Tune into our episodes weekly on Wednesdays and as always, stay radical.

Automatically convert your audio files to text with Sonix. Sonix is the best online, automated transcription service.

Sonix uses cutting-edge artificial intelligence to convert your mp3 files to text.

Sonix has the world's best audio transcription platform with features focused on collaboration. Automated transcription is much more accurate if you upload high quality audio. Here's how to capture high quality audio. Get the most out of your audio content with Sonix. Sometimes you don't have super fancy audio recording equipment around; here's how you can record better audio on your phone. Manual audio transcription is tedious and expensive. Are you a podcaster looking for automated transcription? Sonix can help you better transcribe your podcast episodes. Automated transcription is getting more accurate with each passing day. Quickly and accurately convert your audio to text with Sonix.

Sonix uses cutting-edge artificial intelligence to convert your mp3 files to text.

Sonix is the best online audio transcription software in 2020—it's fast, easy, and affordable.

If you are looking for a great way to convert your audio to text, try Sonix today.