An Example of Transcript - Descriptive Text Analysis

Posted on Jul 14, 2017

The Women's March was a worldwide protest on January 21, 2017, to advocate legislation and policies regarding human rights and other issues, including women's rights, immigration reform, healthcare reform, reproductive rights, the natural environment, LGBTQ rights, racial equality, freedom of religion, and workers' rights.

For demonstrating the use case for descriptive analysis on text data, we extracted the data of speech transcripts of multiple speakers of Women's March. The speakers include Angela Davis, Madonna, Gloria Steinem, Ashley Judd, and America Ferrera. We cleaned the data set by removing stop words, fixing normalization of keywords, and performed tokenization. The free flowing text tokens were then aggregated, linked together and run through our theme, sentiment and tone detection in order to unearth insights with respect to different speakers. For theme and sentiment detection, we used IBM Watson that takes the text as input and outputs the tone and sentiment of the text. All of the derived data was then intermingled together with the real domain knowledge. Contexts from their speeches, their themes, and keywords used were checked in other datasets such as twitter, news postings etc. In short, the following process represents the analysis flow:

1. Dataset Curation
2. Dataset Cleaning
3. Text Tokenization
4. Token Normalisation and Aggregation
5. Theme, Sentiment and Tone Detection
6. Contextual Insights using Business Analysis
7. Domain Knowledge Insights

The overview:

- The term “Trump” was directly mentioned by his name in the speeches of Angela, Gloria, and Ashley in contrast to speeches of Madonna and Ferrara where “president” keyword was used to refer Donald Trump.

- The term “People” was used mostly by all the women, the Phrases include: "It is not - I the president, It is - We, the people (repeated thrice in Ferrera’s speech)" "Freedom struggles of black people", "Resistance to attacks on disabled people", and "Danger faced by marginalised people"

Context of “I” vs “We”

The term "I" was mentioned in the following statements: “I'm angry. Yes, I am outraged”- Madonna, "I am deeply honored to march with you today"- America Ferrera, “I am a nasty woman. I'm as nasty as a man who looks like he bathes in Cheetos dust”-Ashley Judd

"We" was mentioned in these statements: “We dedicate ourselves to collective resistance”- Angela Davis, “We choose love. We choose love. We choose love”-Madonna, “We are here and around the world for a deep democracy that says we will not be quiet, we will not be controlled, we will work for the world in which all countries are connected”-Gloria Steinem, “We are America”- America Ferrera

Top Common Themes:

Main themes observed from the speeches are - Racism, Immigration, Religion and Women's Rights. The general sentiment and tone associated with these themes are negative (specifically sadness & anger on the current situation of Muslims, Immigrants, Blacks, and Women).

Top Observed Tones and Personality Trait:

From the overall analysis - Sadness, Disgust, and Fear are the top three observed emotional tones. Openness was observed as the major personality trait.

Here is the analysis speaker by speaker:

1. Angela Davis: Top Quote: “The next 1,459 days of the Trump administration will be 1,459 days of resistance.”

Her top themes expressed in the speech are - Racism & Slavery, Human Rights, Violence, Religion & Immigration, Liberalism.

“Resistance” was the most used word by Angela in her speech. Her top personality trait observed from the speech was “Conscientiousness” (Acting in a thoughtful way).

Context - Angela Davis appealed for "Collective Resistance": Resistance to the attacks on Muslims and on immigrants, disabled people etc.. She called out Trump administration will be 1,459 days of resistance

Overall Sentiment - Negative (Disgust and Sadness as the emotional tone)

Top Positive Words Used: ‘supremacy', 'united', 'freedom', 'rising', 'celebrate', 'thank',

Top Negative Words Used: ‘murder', 'worse', 'dying', 'demands', 'attacks', 'resistance', 'struggles',

2. Madonna

Top Quote by Madonna: “Welcome to the revolution of love. To the rebellion. To our refusal as women to accept this new age of tyranny. ”

Her top themes expressed in the speech are -Love, White House, Unity, Tyranny. “Love” was the most used word by Madonna in her speech of total 5:10 minutes and 358 words.

Her observed personality trait from the speech was also “Conscientiousness” which means - The tendency to act in an organised or thoughtful way.

Context - Madonna appealed people to not fall into despair and choose Love Overall Sentiment - Positive ( -0.05 in the range of -1 to +1)

Top Positive Words Used: ‘hallmark', 'good', 'right'

Top Negative Words Used: ‘false', 'danger', 'refusal', 'fuck', 'shake', 'awful'

Most observed Emotional Tone: Anger, likely due to her mentions most negative mentions on “White House”

Top Madonna’s Tweets about the rally - Yesterday's Rally was an amazing and beautiful experience. I came and performed Express Yourself and that's exactly - Express Yourself...............So you can Respect Yourself.
On Stage at the Women's March In D.C. - With My Girl Amy at the Women's March in D.C. We Go Hard or We Go Home.

To view the analysis of other speakers, please share your email id, will be happy to email it. Feel free to share your views in the comments section. :)