Polygence student presents computational linguistics paper on the media's political bias at the WiNLP Workshop
We are so excited to announce that a Polygence project titled “Analyzing the Framing of 2020 Presidential Candidates in the News” was recently presented virtually at Widening Natural Language Processing (WiNLP) Workshop. The conference, in conjunction with the annual meeting of the Association for Computational Linguistics, featured NLP researchers from underrepresented backgrounds at all levels—undergrad and graduate students, postdocs, professors, and professional researchers. It’s so amazing that our Polygence student, who is only in high school, was able to present her research among professionals with many years of experience in academia and industry! And not only was this student’s paper accepted, but she was also invited to give a slideshow presentation of her findings, indicating her work was of high scientific merit and interest to a broad audience.
Dora, one of Polygence Educational Advisors and outstanding mentors, guided her high school mentee through the research, data analysis, writing, and presentation process. Intrigued by the increasing political division in the U.S., Dora’s student used computational and linguistic methods to investigate how seven Democratic presidential candidates were framed across 22 news sources with different political leanings. NLP has previously been used to study social media coverage of candidates; the language used in the debates; as well as how the media portrays current presidential debates. However, Dora and her student found little existing work into the news media’s framing of presidential candidates, and so the Polygence project explored this novel area of research.
As a PhD candidate in linguistics at Stanford University, Dora had the expertise required to help her student thrive. “The idea for the project came primarily from my student, who said she was interested in learning more about NLP and was particularly interested in studying gender bias in the media. I wanted to make sure from the beginning that she drives the development of the project topic, to focus on something that she is truly interested in. It was very easy to develop the project idea with her; she had a lot of great suggestions that aligned both with my expertise and with current research trends.”
“[My student] had a lot of great suggestions that aligned both with my expertise and with current research trends.”
We spoke to Dora to learn more about the work that went into this fascinating project. “Once we settled on a project topic, a significant part of our work was devoted to data collection and cleaning, which involved obtaining news URLs for news articles about the presidential candidates, scraping those articles and extracting and doing post-processing on the texts. I felt this was quite important for us to work on, since it's a frequently underestimated effort that one can learn a lot from...We also did some debugging together, which is unfortunately what takes up most of the time of a coder, but makes things all the more exciting once problems are solved!”
Dora and her student ran two different kinds of analyses on their data: a lexicon-based approach, which measures the valence, arousal, and dominance of the words describing each candidate, and word embeddings, which allowed them to identify words that are most and least associated with each candidate in contrast to other candidates.
“Our results suggest a greater framing difference between candidates than between sources. For example Joe Biden's old age was something that consistently appeared across a lot of sources, but many other candidates were rarely described with their age.” This is rather unexpected—though we might think that a news sources’ political leanings would greatly affect how they frame these candidates, it seemed that the individual candidate mattered more than the political leanings of the source, according to these metrics.
Throughout this project, Dora’s student gained extensive hands-on experience with computer science, linguistics, and the sophisticated ways these two fields can be brought together in research. She sharpened her Python coding skills, learned to navigate important libraries of code for web extraction and NLP, and acquired important linguistic models for analyzing texts. Additionally, Dora saw her student develop a sophisticated capacity for determining what can and cannot be concluded from a certain set of scientific data. "We talked about the main take-aways, and the difficulty of drawing conclusions in the presence of confounding factors...We spent extensive time with the data, so we had a lot of interesting things to talk about in the paper, even if the story was not that simple."
“A Polygence project can be a great opportunity for students who are interested in learning more about language and computational methods because it allows them to study language in a very hands-on way.”
Mentoring with Polygence allowed Dora to conduct research which was of great interest to both her and her student, all while refining her teaching skills. “I learned a lot about how I can share knowledge about the subject I am passionate about in a way that was relatable and accessible to someone who is just starting to learn about this subject.”
Dora hopes students with interests in linguistics and natural language processing will consider Polygence a powerful resource for getting an early start in these cutting-edge fields of study. “Most high schools don't offer classes in either of these subjects, even though I think that in many ways, they are quite accessible to high school students. In particular, everyone uses language, and most people have thought about their language before in one way or another — linguistics is really just about taking these thoughts further and studying them. A Polygence project can be a great opportunity for students who are interested in learning more about language and computational methods because it allows them to study language in a very hands on way.”