Racial bias observed in hate speech detection algorithm from Google

TechCrunch | 8/14/2019 | Staff
pixielilia (Posted by) Level 4
Click For Photo: https://techcrunch.com/wp-content/uploads/2019/08/perspective_bias.png?w=628




Understanding what makes something offensive or hurtful is difficult enough that many people can’t figure it out, let alone AI systems. And people of color are frequently left out of AI training sets. So it’s little surprise that Alphabet/Google -spawned Jigsaw manages to trip over both of these issues at once, flagging slang used by black Americans as toxic.

To be clear, the study was not specifically about evaluating the company’s hate speech detection algorithm, which has faced issues before. Instead it is cited as a contemporary attempt to computationally dissect speech and assign a “toxicity score” — and that it appears to fail in a way indicative of bias against black American speech patterns.

Researchers - University - Washington - Idea - Hate

The researchers, at the University of Washington, were interested in the idea that databases of hate speech currently available might have racial biases baked in — like many other datasets that suffered from a lack of inclusive practices during formation.

They looked at a handful of such databases, essentially thousands of tweets annotated by people as being “hateful,” “offensive,” “abusive,” and so on. These databases were also analyzed to find language strongly associated with African American English or white-aligned English.

Sets - Vernacular - Chance - Lo - English

Combining these two sets basically let them see whether white or black vernacular had a higher or lower chance of being labeled offensive. Lo and behold, black-aligned English was much more likely to be labeled offensive.

For both datasets, we uncover strong associations between inferred AAE dialect and various hate speech categories, specifically the “offensive” label from DWMW 17 (r = 0.42) and the “abusive” label from FDCL 18 (r = 0.35), providing evidence that dialect-based bias is present in these corpora.

Experiment - Researchers - Annotations - Tweets - Biases

The experiment continued with the researchers sourcing their own annotations for tweets, and found that similar biases appeared. But by “priming” annotators with the knowledge that the person tweeting...
(Excerpt) Read more at: TechCrunch
Wake Up To Breaking News!
NY Times, a small niche paper for a small niche people in a really big internet
Sign In or Register to comment.

Welcome to Long Room!

Where The World Finds Its News!