Using Natural Language Processing to prevent suicide

The Basics

I am not gonna go too in depth about Machine Learning (ML) but if you want to learn more check out my series #AIwithAlisha where I cover all the topics in Machine Learning.

Natural Language Processing

How I like to look at NLP is a field of AI that gives machines the ability to read, understand and derive meaning from the human language.

Sentiment Analysis to prevent Suicide

Sentiment analysis is simply the classification of emotions based on text. While it’s more commonly used to classify things like reviews of products for companies, I used it to identify depressed and suicidal users on social media to possibly prevent self harm or suicide.

Step 1

The first step is to use our dataset and understand the frequency of a certain word (Bag of Words).

# TFIDF Vector
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVC
from sklearn.metrics import classification_report
X = df['tweet']
tfidf = TfidfVectorizer(max_features=10000, ngram_range=(1, 2))
X = tfidf.fit_transform(X)
y = df['intention']

Step 2

For Topic Modelling we have to conduct a LDA analysis (model used to classify text) and as you can see in the code we are loading the LDA model from Sk-learn (Machine Learning library).

import warnings
warnings.simplefilter("ignore", DeprecationWarning)# Load the LDA model from sk-learn
from sklearn.decomposition import LatentDirichletAllocation as LDA

# Helper function
def print_topics(model, count_vectorizer, n_top_words):
words = count_vectorizer.get_feature_names()
for topic_idx, topic in enumerate(model.components_):
print("\\nTopic #%d:" % topic_idx)
print(" ".join([words[i]
for i in topic.argsort()[:-n_top_words - 1:-1]]))

# Tweak the two parameters below
number_topics = 5
number_words = 10# Create and fit the LDA model
lda = LDA(n_components=number_topics, n_jobs=-1)
lda.fit(count_data)# Print the topics found by the LDA model
print("Topics found via LDA:")
print_topics(lda, count_vectorizer, number_words)
Topic 1: Possibly Suicide
Words: "kill" , "die" , "death" , *"worthless" , "murder" , "self-murder" , "depressed", "Lonely"
Topic 2: Normal
Words:"american","alright" , "covid-19" , "good" , "lover"

Let’s test the model!

At risk tweet

X = 'I just want my life to end already'
vec = tfidf.transform([X])
clf.predict(vec)
array([1])
X = 'congratulations, you have done it'
vec = tfidf.transform([X])
array([0])

Training and testing our model

After a lot of tweaking and errors, I got the model to have a 88% accuracy rate, this means that this model could accurately predict at-risk users 88% of the time!

The Future

Now that we have identified the problem, the next steps are to think about what the solution would be. If a suicidal tweet is identified, would 911 be called? We will also need to think of important things like does letting a person know that a message you tweet would be analyzed, prevent them from even posting that tweet? And is it a bad thing to post a tweet about you wanting to suicide? Overall, there are many ethical concerns, so if you want to chat about that shoot me a message!

  • Connect with me on Linkedin to stay updated on my AI journey, and shoot me a message (I love meeting new people).
  • Subscribe to my newsletter, for monthly updates on what I’m working on!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store