By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email
No need to pay just yet!
About this sample
About this sample
Words: 755 |
Pages: 2|
4 min read
Published: Apr 2, 2020
Words: 755|Pages: 2|4 min read
Published: Apr 2, 2020
The Internet is a huge resource of knowledge and information that give you any information you want. But, very often there are situations where you aren’t able to find the answers to your questions. Your question may require trivial information, analytic thinking or particular expertise that can only be given by people. not computers. Without those, your question may not be answered.
Fortunately, such websites exist out there only to be used to find experts in various fields. You can find peoples thoughts and opinions about a particular subject. Such websites are called Community Question Answering websites or CQA for short.
In recent years Community Question Answering websites (CQA) have gained much popularity as a way of providing and searching for information. These websites provide users with a direct and rapid way to find the information that the users want. Also, they provide other peoples thoughts and opinions as those peoples are experts in their fields.
StackOverflow is such a Community Question Answering website. It is a privately held website, the flagship site of the Stack Exchange. In this website, people can ask questions and get answers on a wide range of topics in computer science and computer programming.
But very often people get the answer to their question not because nobody knows the answer to that question, but the question was not asked properly or it was not structured properly so that the users can understand the question and answer correctly. Most of these unpopular questions are asked by new users who just started to StackOverflow. Most of the time these new users are students who need help for their studies. It would not have been a problem to ask unstructured questions, because they can just ask the question in a different way but StackOverflow has a system that bans users if they have a bad reputation of posting frequent bad questions. This is problematic for new users as they do not know how to give a good post. And among this banned users almost all are new users.
Prediction means telling an approximate result of an event before the event actually occurs. In technical terms “prediction means to determine result purely on the description of another related data or another related set of data”. Predicting the popularity of a StackOverflow post means to predict if the post will get likes or dislikes. People will answer the question or not. In this basis, the problem can be turned into a classification problem where exits two categories the will be popular or not. In StackOverflow post good post that gets most likes almost all of them have a similar structure. For example, good post has some code added to it, good posts have a good title etc. We will define this kind of features to apply classification algorithms.
Though a post's popularity depends on the post's content, there are many other factors that determine how successful a post becomes. These features are like Title, Domain, Author, Thumbnail, Self-Text. Good classification performance was achieved using statistical classification algorithm with varying numbers and kinds of features. The more features taken into consideration that affect the popularity of a post more accurate the prediction will be. Despite the encouragement of StackOverflow, a lot of questions on StackOverflow are not answered.
With the increase in popularity of StackOverflow, the number of questions and the number of new increased with that and with that the number of answered questions also increased. According to statistics from 2012, close to 45 percent questions remained unanswered. A decision layer text classification model works very good but it does not outperform statistical models. Some other classification algorithm were used in previous work like Decision Trees, Random Forest, Neural Networks, Nearest Neighbor. Though Neural Networks show slightly better result for prediction, the computational power needed for implementing Neural Networks and the cost is too high then other algorithms.
There are a number of feature extraction method exists such as TF-IDF, doc2vec, CountVectorizer, Text ranking. Among them TF-IDF and CountVectorizer performs very well in text classification and prediction. Features play a major role in classification algorithms. TF-IDF is a very good algorithm for finding out the frequency of word among documents. In TF-IDF, the filtered word content is segmented into words. Stop words are removed. Word frequencies are 4counted and the TFIDF values are computed according to the corpus. Candidate words are identified by the TFIDF values and word similarities are computed. Then keywords are extracted from the candidate words according to the TF-IDF values.
Browse our vast selection of original essay samples, each expertly formatted and styled