For now, English remains the lingua franca of indexed science, dominating most peer-reviewed journals and international conferences. This puts non-native speakers at a significant disadvantage. While large language models can assist authors to help overcome some language barriers, a new study by two researchers with the Stanford Graduate School of Education suggests that bias against non-native speakers persists even when these tools are used.
Peer reviewers, increasingly attentive to the use of LLMs in scientific writing, may infer that LLM use is related to author country of origin. These inferences may consciously or unconsciously bias peer reviewers’ scientific assessments, according to new research by Stanford PhD candidate Haley Lepp and postdoctoral scholar Daniel Scott Smith.
The Stanford Institute for Human-Centered AI provided a seed grant for their research, which was accepted for publication at this summer’s Association for Computing Machinery conference on Fairness, Accountability, and Transparency. Their paper, 'You Cannot Sound Like GPT’: Signs of language discrimination and resistance in computer science publishing, is also available now on the pre-print server arXiv.
The work highlights how linguistic biases can persist even with the adoption of AI tools and other technology.
“So if you have a subconscious bias against people from China, for example, that bias will emerge in other ways, even as language is adjusted [with LLMs]. That’s the surprising takeaway from this study,” Lepp said.
The researchers looked at nearly 80,000 peer reviews at a large computer science conference and found evidence of bias against authors from countries where English is less widely spoken. After ChatGPT became available, there was only a muted change in the expression of that bias. Through interviews with 14 conference participants from around the world, Lepp and Smith found that reviewers may use common LLM phrases in papers to infer author language backgrounds, affecting their judgments on research quality. Overall, the study shows how ChatGPT might reinforce stereotypes that equate good English with good scientific work.
Lepp, a former natural language processing engineer and digital education practitioner, holds an MS in computational linguistics from the University of Washington and a BS in science, technology, and international affairs from Georgetown University. Her research now focuses on the influence of natural language processing on educational practice. Smith holds a PhD from Stanford and will be starting as an assistant professor of sociology at Duke in the fall.
We noticed that discussions about scientists using LLMs for English focused on authors rather than readers. So that idea of LLMs as an intervention for scientists puts the onus of change on the authors whose first language isn’t English, rather than on the root causes of peer reviewers’ biases.
In education, there’s rich literature about language ideology and the role of the listener or reader in linguistic bias. We cite this paper by Flores and Rosa, who describe how racialized language varieties in American schools are seen as deficits to overcome. Even when students change their writing or speaking, the students continue to experience bias. The source of the bias is deeper than the language itself, and so we wondered if that theory would hold up among international scientists.
The expression of bias appeared not so much around the rules of “Academic English” but around what people associated with the kind of scientist who would break such rules. Interviewees described how language quality could be a proxy for science quality.
After ChatGPT came out, peer reviewers noticed that grammatical idiosyncrasies in writing generally started to disappear. Peer reviewers described how instead, they came to guess that certain words or phrases common in LLMs – like “delve” – came from authors from non-English-speaking countries. These guesses also often came with stereotyped descriptions of scientists from different countries.
It depends on your theory of democracy. Daniel Greene and others have critiqued the “access doctrine” that suggests bringing people access to technology improves democracy. I’m not sure I buy that access is inherently democratizing. If anything, the idea of AI as democratizing can justify the idea that existing social inequalities can be solved by “fixing” marginalized people, rather than looking at how the people at the top, or even social institutions, might be contributing to inequalities. Our findings offer a kind of alternative view. Even when people have tools to act more like a dominant social group, new mechanisms of stratification may emerge.
We must interrogate the way that people use language, not just to communicate content but as a sign of other things: of race, class, who to trust, or whose knowledge can be trusted. In science, English-only publishing has a long history with connections to colonialism and racist academic institutions. To repair that, we’ll need more than a tool for helping people produce English text.
One of the things we emphasize in the paper is that biases we identify are actually, in many ways, tools for efficiency. The current speed of computer science publishing may contribute to people cutting corners, evaluating science based on writing style and perception of author background rather than on the science described.
This story was originally published by the Stanford Institute for Human-Centered Artificial Intelligence.