Machine Learning in a Forum App

Shreyas Rana
4 min readDec 25, 2019

Students in high school have a lot of stress and pressure these days. Be it homework-related, social platforms or grades. When the stress is homework related it can directly impact performance and grades. I decided that I want to fix this issue.

The problems at hand — asking questions

Almost every day at the lunch table, one of the discussions with my friends happens to be usually around homework and how someone missed their homework. The most common reason students are missing on homework is an unwillingness in them to post messages in open forums so someone could share info with them. Some of their biggest fears include peer pressure and judgment, if teachers are watching the forums, or someone might just post profane messages in reply. The problem at hand was small but had huge impacts on students and their performance.

Ideas:

I thought of many ideas, some of the main ones included — creating a facebook page, and believe it or not, some of my friends do not have a facebook account and bigger concern with anonymity and lack of privacy. There are many eaves droppers on Facebook. Maybe a web portal. The one idea that stuck with me was creating an anonymous forum. I have been a big fan of Quora and Reddit. I thought, why not create an app and maybe make it official in school. Most people have a phone, so the app would work for all of them.

How the app works works:

I’ve created an app using Flutter and Dart in which people can simply ask questions anonymously, and anyone can answer, also anonymously. The app is compatible with Android as well as iOS. When one answers a question, their number of points, or reputation goes up, and they must have a certain amount of reputation to ask. The number of points can be viewed in the profile tab and the questions that the user themselves have asked can be viewed there as well. Since the app was designed with student usage in mind, there is also a focus tab, which is a stopwatch. Every thirty minutes spent on the stopwatch, the more reputation they get.

Aside from all of the above, one of the biggest features of this app is not visible to the users but is very powerful. Every time someone posts a question or answers one, the text is filtered with a firebase function, which determines the appropriateness of the post (i.e. if it uses profane language) using a machine learning algorithm. This greatly reduces the amount of tedious human moderator interference, which is a problem with any moderated Facebook page.

The algorithm used is called Naive Bayes. Naive Bayes computes the probability of a sentence being a certain class. The algorithm is trained with text files from 2 classes; one class contains positive text files, and the other class contains negative text files.

For the training part, the algorithm keeps track of the number of occurrences of each word in its respective class. Then in the final test phase, the algorithm compares the log probability of the word appearing in a positive or negative class. I used log probability because the probabilities can get very small, so small that doubles cannot handle them (underflow). If the log probability of that word appearing in say, a positive class, is higher than that in the negative class, value is kept in an overall sum that gets compared at the end. This algorithm achieves an 81.056% accuracy.

The backend of the app is as follows. The app uses Firebase Databases to store data such as a post, it’s answers, and the time posted. Other than the database, the app utilizes Firebase Functions to determine the sentiment of the post. The text is passed as a parameter to the function. The function then runs the Naive Bayes algorithm on the text. It then returns a 0 (negative) or 1 (positive) as the sentiment value. The app then uses the return value to determine whether to flag it or not.

Of course, this is still a work in progress; I’m going to keep working on this app. This dataset that is being used right now is used for positive and negative sentiment analysis. In the future, to set my needs as a profanity classifier instead of a positive and negative sentiment classifier, I will find a Kaggle dataset to train on. If Kaggle has no such dataset, then I will create my own dataset by labeled example generation with word substitution. After I get this dataset, I can start experimenting with the concept of bi-grams to store log-probs. Using bi-grams are almost exactly the same as using single words, except instead of storing individual words, it stores the probabilities of 2 words at a time. Also, at the moment, I haven’t developed the “flagging” algorithm by the app yet, though I will most likely finish it after the Naive Bayes Backend is finished.

To conclude, this was a very fun project, and one of my favorites. The result of the prototype I’ve created is an aesthetic and usable app that has a lot of potential. I learned how to use Firebase Functions and Databases. Most importantly, I learned the algorithm Naive Bayes, its flexibility, and some of the useful applications to it.

--

--

Shreyas Rana

High school junior in California who loves building intelligent mobile apps, doing robotics, drawing and playing tennis!