Home — Essay Samples — Social Issues — Hate Speech — The Use Of Artificial Intelligence In Detecting Hate Speech

The Use of Artificial Intelligence in Detecting Hate Speech

Categories: Artificial Intelligence Hate Speech

Human-Written

About this sample

Human-Written

Words: 2687 |

Pages: 6|

14 min read

Published: Mar 18, 2021

Words: 2687|Pages: 6|14 min read

Published: Mar 18, 2021

One of the major problems facing companies working with user-generated online content in the modern era is moderating offensive speech and hate speech. The current approach to handling the problem is manually maintaining a list of all possible words and phrases that may be considered offensive or hate speech, and using that to filter out questionable content. This approach presents a couple issues: first and foremost, such an approach is not sustainable; by having to manually maintain a list of hate and offensive words and phrases, the list becomes quite large very quickly, and keeping track of what variations of words and phrases have been added to the list becomes too manual to be worth the effort. With this project, the goal was to create an Artificial Intelligence model that is capable of detecting hate speech and offensive words and phrases in textual data without the need for manually-maintained lists of all possible words and phrases that qualify as offensive of hate speech.

For this project, the dataset used for training and testing the Artificial Intelligence model was sourced from Github; this dataset was first generated by Davidson, Thomas and Warmsley, Dana and Macy, Michael and Weber, Ingmar for their paper titled “Automated Hate Speech Detection and the Problem of Offensive Language”; the paper was first published in “Proceedings of the 11th International AAAI Conference on Web and Social Media”.

The original dataset consists of 24,783 unique tweets scraped from Twitter, and is stored as a CSV file. This file contains six columns for each tweet: the full text of the tweet, the number of people who manually classified each tweet as offensive, hate speech, or normal, the number of users who rate the tweet as offensive, the number of users who rate the tweet as hate speech, the number of users who rate the tweet as normal, and the majority class for each tweet based on user ratings. The dataset is also limited to only show tweets that had at least three users rate whether it was offensive, hate speech, or normal; this allows for higher likelihood that the classifications are legitimate, and reduce noise in the model. Because the data came originally from Twitter, the length of each text is limited; the average tweet length is 85 characters long, with a maximum length of 754 characters and a minimum length of 5 characters. Also, because the texts are tweets from Twitter, they contain special and foreign characters, retweet information, URLs, and user mentions; none of this data is useful for the purposes of this project, so were handled in the data pre-processing step. One other important note about the dataset is all of the tweets are in English, or social media colloquial English. The original raw dataset is also class-imbalanced; seventy-seven percent of the records were classified as offensive speech, six percent were classified as hate speech, and the remaining seventeen percent were classified as neither hate nor offensive speech. When pulling the data into Python, most of the columns were removed due to being unnecessary for this project; the only two columns retained were the tweets and to which class they belonged.

Once the data set was loaded, it required a considerable amount of preprocessing to prepare the data to be used for training and testing an Artificial Intelligence model; a large amount of the data cleansing was accomplished utilizing the Regex library in Python. First step was to remove any mentions included in the tweets in the dataset; this was done by compiling a regex pattern that looked for any “at” symbol followed by characters and then whitespace. Next step was removing any information relating to retweets; this was done by using regex to strip out any substrings containing “RT” and followed by characters and whitespace. Then any URLs within the tweets were stripped; also, any foreign characters found in the tweets were removed to reduce the complexity of the vocabulary for the model. Also, all digits were removed from the tweets in the dataset, as numbers do not add anything of value to the goal of this model. The following step was to remove all of the possible stop-words from the tweets; in Natural Language Processing, there is a concept known as stop-words, which are words that typically add no significant meaning or value to the overall meaning of the text; in this project, the NLTK library was used to remove all possible stop-words from the tweets to make sure the AI model is not negatively impacted by noise in the data. At this point, the tweets contained in the dataset are as clean as possible, so the last step in preparing the dataset for the model is to remove excess whitespace in the tweets, and remove any tweets in the dataset that have become empty strings after all of the above cleaning processes were completed. After all this data preprocessing was completed, the dataset was left with 24,768 cleaned tweets for model training and testing; these remaining tweets were on average 54 characters long, with the longest being 343 characters long and the shortest being three characters long.

In order to further the value and generalizability of the Artificial Intelligence model being built, the dataset was augmented with more hate speech and offensive words and phrases that had been identified by the Cludo Data Science team. These extra examples of hate speech and offensive words and phrases were compiled from twenty-five different languages, by the Cludo Data Science team over the course of 2018; Cludo is a website search company that provides in-depth analytics and search solutions to customers for their websites.

In order to incorporate these new examples into the original dataset and keep them in line with the rest of the dataset, a function was developed to generate new fake tweets with these examples of hate speech and offensive words and phrases inserted into them. This was accomplished by creating a list of the most common words from the cleaned, original dataset of tweets and randomly sampling a random number of words based on the distribution of the length of the tweets in the original dataset. Once these new imputed tweets were created, words and phrases were sampled at random from the Cludo-identified hate and offensive speech and inserted randomly into the new imputed tweets.

To further augment the original dataset, and to avoid letting the Artificial Intelligence model learn erroneous relationships, the dataset was expanded by copying the tweets and shuffling the order of the words in each tweet. One possible issue that a model such as this could run into is erroneously memorizing the specific order of words as a way to classify which words or phrases are offensive or hate speech, instead of actually learning what is and is not hate speech or offensive. To avoid this, the cleaned dataset was copied multiple times, and each of the tweets within the copies had the word order within the tweet shuffled at random; this process purposely added some noise to the dataset, as well as made it so the model is forced to actually learn from the data instead of memorizing. Once this process was completed, any duplicate records in the new dataset were dropped, to reduce the risk of overfitting. At the end of this step, the new dataset consisted of 131,941 observations with roughly the same distribution among classes as the original dataset; with a significantly larger and augmented dataset, the Artificial Intelligence model is enabled to better learn and generalize to new examples in the future.

Now that the dataset has been cleaned, preprocessed, and augmented, it is time to prepare the data for training and evaluating the Artificial Intelligence model. First step was to split the dataset into our X and Y variables; in the case of this model, our X variable is the cleaned and preprocessed tweets and our Y variable was the class indicator column. After splitting the dataset into X and Y variables, the Tokenizer functionality from Keras’ text preprocessing library was applied to the X variable; this Tokenizer used TFIDF to convert the tweet texts into arrays of numbers indicating each word in each tweet. Next step was to split the X and Y variables into training and testing datasets; this was accomplished by using the train_test_split function from the sklearn model_selection library. Because this model is being built to predict multiple classes, the dataset was randomly split while using stratified sampling to make sure each of the three classes were more or less evenly distributed in both the training and testing datasets; the chosen train-test split was seventy percent going to training and thirty percent going to testing. The final step in preparing the data for training and testing the model was to pad the now sequential data to standardize their lengths; in order to reduce complexity of model training, the sequences were padded and/or capped at 500 units in length. Also, at this point the Y variable was changed from an integer value representing the class to a one-hot encoded variation on it, using Keras’ to_categorical function.

A couple different variations on Neural Networks were tested on a small subset of the training data to determine the best combination of architecture and hyperparameters for solving this problem; ultimately, a Bidirectional LSTM was decided on as the core of the model. The shape of the input data for the model was 92,359 observations with 500 features, while the testing data shape was 39,583 observations and 500 features. The first layer after the input layer was an Embedding layer with a hidden layer size of 256 nodes; by making the first hidden layer an Embedding layer, it allows the Artificial Intelligence to learn more about the data by setting contextually related words to an embedding representation encoding their relationship. After the Embedding layer is the Bidirectional LSTM layer; this layer had 256 nodes, and a 30% dropout rate to reduce chances of overfitting, and returned sequences; the Bidirectional variant of the LSTM model was chosen due to Bidirectional LSTMs being better at learning context from natural language data as compared to a standard LSTM. The Bidirectional LSTM layer fed into a Flatten layer, to allow the following layers to utilize Keras’ Dense layers. After the Flatten layer, a Dense layer was implemented using 256 hidden nodes and the ReLU activation function; ReLU was chosen due to the speed performance associated with it in larger networks, as well as its ability to learn non-linear relationships. This Dense layer fed into another Dropout layer to help reduce the likelihood of overfitting the model; this Dropout layer was set to a 30% Dropout rate. After this combination of Dense and Dropout layers, another set of Dense and Dropout layers was utilized, with the same settings as the previous combination; Dense layer with ReLU activation and 256 hidden nodes, and a Dropout layer with 30% Dropout rate. The goal was to build an Artificial Intelligence model that was robust and powerful enough to learn successfully from the dataset, without overfitting the model and keeping the training time of the model to an acceptable level. The final layer of the model was an Dense output layer with 3 nodes (one for each class) and an Softmax activation; the Softmax activation function was chosen for the Output layer because it maps the output for each of the classes such that adding up the outputs will sum to 1, giving what is in essence the probability for each class. After building the model, the model was compiled with a Loss function of categorical_crossentropy (since the model is for a multiclass classification problem), an Optimization function of Adam, and categorical_accuracy chosen as the metric. Adam was chosen as the optimization function because it is relatively efficient at computation while still being robust for this type of data, and categorical_accuracy was chosen as the training metric because we want to be sure the model is actually properly classifying the inputs and not just bucketing all inputs under one class and calling it “accurate”. Other important parameters to note for this Artificial Intelligence was a batch size of 64 observations and 100 epochs; typically, Artificial Intelligence models learn better when presented with smaller batch sizes, but there is a need to balance that with training time and computational efficiency, so the batch size was set to 64. Also, 100 epochs as chosen to make sure the model had enough training time to learn everything possible about the data, but Early Stopping was also implemented to make sure the model did not train for longer than necessary. For the Early Stopping parameters, the model was directed to monitor the Validation Loss metric with a focus on minimizing it, and given a “Patience” parameter of 10 epochs; this means the model would stop automatically if the Validation Loss did not improve significantly over the course of any 10 epochs during training. Overall, this model had 72,549,379 trainable parameters.

Training of the model was done locally on an MSI-brand laptop using the built-in NVIDIA GeForce GTX1060 GPU. Originally, the model training was started using just the CPU, but this showed that training would take around 19 hours per epoch, which is unsustainable; by switching to GPU training, the training time dropped to about an hour per epoch. This combined with the early stopping meant the model trained for 48 epochs, which took roughly two and a half days to come to a conclusion at the early stopping point. The system was also setup to log the performance of the model at each epoch in a CSV file, for easy reference later on the model accuracy and loss. At the beginning of training, at the end of the first epoch, the model categorical accuracy was roughly 82% with a validation categorical accuracy of roughly 87%. The improvement in the categorical accuracy started to level-off around epoch 13 and ended with a Categorical Accuracy of 97.94%, a Loss of 0.05, Validation Categorical Accuracy of 98.23% and a Validation Loss of 0.047. In order to test the performance of the model, new short texts were generated containing normal texts, offensive texts containing swear words, and hate speech containing racial slurs; the model was able to correctly identify each of these new unseen test texts as to whether they were offensive, hateful, or neither. Of interesting note, there is an unfortunately common racial slur (the N-word), and the model classifies it as “hate speech” when it ends with a hard-R, but classifies it as “offensive speech” when the colloquial or slang version of the word is used instead.

Overall, the model appears to perform quite well on the given task it was trained on; in the future, it would be interesting to convert the problem into a binary classification problem, having the model determine whether the text contains offensive language in general or not. By reducing the complexity of the model, from categorical to binary categorical, the model should be able to learn even more and become even more performant than this version. Ignoring the nature of the text in the dataset, this model was a very useful and interesting learning experience for me; my work typically is Natural Language Processing focused, but is almost exclusively Unsupervised, so going through the process of handling a Supervised Natural Language Processing from start to finish, with well-labeled data was worthwhile. Also, this model will likely be incorporated into my work at Cludo as a way to help moderate the analytics we present to our customers, and improve and clean the training datasets used for other future models.

Bibliography

Davidson, Thomas and Warmsley, Dana and Macy, Michael and Weber, Ingmar. “Automated Hate Speech Detection and the Problem of Offensive Language”. Proceedings of the 11th International AAAI Conference on Web and Social Media. ICWSM '17. 2017. Pp. 512-515.
Zixiang Ding, Rui Xia, Jianfei Yu, Xiang Li, Jian Yang. “Densely Connected Bidirectional LSTM with Applications to Sentence Classification”. arXiv:1802.00889v1 [cs.CL] 3 Feb 2018.
Amit Mandelbaum, Adi Shalev. “Word Embeddings and Their Use In Sentence Classification Tasks”. arXiv:1610.08229v1 [cs.LG] 26 Oct 2016.
Davidson, Thomas. “Automated Hate Speech Detection and the Problem of Offensive Language”. Github. https://github.com/t-davidson/hate-speech-and-offensive-language.

The Case of Social Stigma of HIV Patients in South Africa

Cyber Racism – the Growth of Right-wing Extremists and Hate Speech

This essay was reviewed by

Dr. Oliver Johnson

More about our Team

Cite this Essay

The Use Of Artificial Intelligence In Detecting Hate Speech. (2021, March 18). GradesFixer. Retrieved July 5, 2025, from https://gradesfixer.com/free-essay-examples/the-use-of-artificial-intelligence-in-detecting-hate-speech/

“The Use Of Artificial Intelligence In Detecting Hate Speech.” GradesFixer, 18 Mar. 2021, gradesfixer.com/free-essay-examples/the-use-of-artificial-intelligence-in-detecting-hate-speech/

The Use Of Artificial Intelligence In Detecting Hate Speech. [online]. Available at: <https://gradesfixer.com/free-essay-examples/the-use-of-artificial-intelligence-in-detecting-hate-speech/> [Accessed 5 Jul. 2025].

The Use Of Artificial Intelligence In Detecting Hate Speech [Internet]. GradesFixer. 2021 Mar 18 [cited 2025 Jul 5]. Available from: https://gradesfixer.com/free-essay-examples/the-use-of-artificial-intelligence-in-detecting-hate-speech/

copy

Keep in mind: This sample was shared by another student.

450+ experts on 30 subjects ready to help
Custom essay delivered in as few as 3 hours

Get high-quality help

Dr. Heisenberg

Verified writer

Expert in: Information Science and Technology Social Issues

4.9

(456 reviews)

“Dr. Heisenberg followed all my directions. It was really easy to contact him and respond very fast as well.”

+120 experts online

Hire writer

Learn the cost and time for your paper

Paper Topic

Deadline: in 10 days

Number of pages

Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"

Get an estimate

No need to pay just yet!

Remember! This is just a sample.

You can get your custom paper by one of our expert writers.

Get custom essay

121 writers online

Still can’t find what you need?

Browse our vast selection of original essay samples, each expertly formatted and styled

The Use of Artificial Intelligence in Detecting Hate Speech

Bibliography

Cite this Essay

Related Essays

Still can’t find what you need?

Related Essays on Hate Speech

Related Topics

Get Your Personalized Essay in 3 Hours or Less!

Get Your
Personalized Essay in 3 Hours or Less!