Confusion in the popular media about terms such as algorithm and what constitutes AI technology cause critical misunderstandings among the public and policymakers. More importantly, the role of data is often ignored in ethical and operational considerations. Even if AI systems are perfectly built, low quality and biased data cause unintentional and even intentional hazards.
Language Models and Data
A generative pre-trained transformer GPT-3 is currently in the news. For example, James Vincent in the July 30, 2020, article in The Verge writes about GPT-3, which was created by OpenAI. Language models, GPT-3 the current ultimate product, have ethics issues on steroids for products being made. Inputs to the system have all the liabilities discussed about Machine Learning and Artificial Neural Network products. The dangers of bias and mistakes are raised in some writings but are likely not a focus among the wide range of enthusiastic product developers using the open-source GPT-3. Language models suggest output sequences of words given an input sequence. Thus, samples of text from social media can be used to produce new text in the same style as the author and potentially can be used to influence public opinion. Cases have been found of promulgating incorrect grammar and misuse of terms based on poor quality inputs to language models. An article by David Pereira includes examples and comments on the use of GPT-3. The article “GPT-3: an AI Game-Changer or an Environmental Disaster?” by John Naughton gives examples of and commentary on results from GPT-3.
A possible meta solution for policymakers to keep up with technological advances is discussed by Alex Woodie in “AI Ethics and Data Governance: A Virtuous Cycle.”
He quotes James Cotton, who is the international director of the Data Management Centre of Excellence at Information Builders’ Amsterdam office: “as powerful as the AI technology is, it can’t be implemented in an ethical manner if the underlying data is poorly managed and badly governed. It’s critical to understand the relationship between data governance and AI ethics. One is foundational for the other. You can’t preach being ethical or using data in an ethical way if you don’t know what you have, where it came from, how it’s being used, or what it’s being used for.”