On the Internet, "fake news" is a common phenomenon that frequently disturbs society because it contains intentionally false information. The issue has been actively researched using supervised learning for automatic fake news detection. Although accuracy is increasing, it is still limited to identifying fake information through channels on social platforms. This study aims to improve the reliability of fake news detection on social networking platforms by examining news from unknown domains. Especially, information on social networks in Vietnam is difficult to detect and prevent because everyone has equal rights to use the Internet for different purposes. These individuals have access to several social media platforms. Any user can post or spread the news through online platforms. These platforms do not attempt to verify users or the content of their locations. As a result, some users try to spread fake news through these platforms to propagate against an individual, a society, an organization, or a political party. In this paper, we proposed analyzing and designing a model for fake news recognition using Deep learning (called AAFNDL). The method to do the work is: 1) First, we analyze the existing techniques such as Bidirectional Encoder Representation from Transformer (BERT); 2) We proceed to build the model for evaluation; and finally, 3) We approach some Modern techniques to apply to the model, such as the Deep Learning technique, classifier technique and so on to classify fake information. Experiments show that our method can improve by up to 8.72% compared to other methods.
This paper focuses on capturing the meaning of Natural Language Understanding (NLU) text features to detect the duplicate unsupervised features. The NLU features are compared with lexical approaches to prove the suitable classification technique. The transfer-learning approach is utilized to train the extraction of features on the Semantic Textual Similarity (STS) task. All features are evaluated with two types of datasets that belong to Bosch bug and Wikipedia article reports. This study aims to structure the recent research efforts by comparing NLU concepts for featuring semantics of text and applying it to IR. The main contribution of this paper is a comparative study of semantic similarity measurements. The experimental results demonstrate the Term Frequency–Inverse Document Frequency (TF-IDF) feature results on both datasets with reasonable vocabulary size. It indicates that the Bidirectional Long Short Term Memory (BiLSTM) can learn the structure of a sentence to improve the classification.
As a result of the analysis, it was revealed that social networks (Vkontakte, Facebook), thematic communities in microblogging networks (Twitter), resources for travelers (TripAdvisor), transport portals (Autostrada) are a source of up-to-date and operational information about the traffic situation, the quality of transport services and passenger satisfaction with the quality of levels of transport services. However, the existing transport monitoring systems do not contain software tools capable of collecting and analyzing traffic information located in the Internet environment. This paper discusses the task of building a system for automatically retrieving and classifying road traffic information from transport Internet portals and testing the developed system for analyzing the transport networks of Crimea and the city of Sevastopol. To solve this problem, an analysis of open source libraries for thematic data collection and analysis was carried out. An algorithm for extracting and analyzing texts has been developed. A crawler was developed using the Scrapy package in Python3, and user feedback from the portal http://autostrada.info/ru was collected on the state of the transport system of Crimea and the city of Sevastopol. For texts lemmatization and vector text transformation, the tf, idf, tf-idf methods and their implementation in the Scikit-Learn library were considered: CountVectorizer and TF-IDF Vectorizer. For word processing, Bag-of-Words and n-gram methods were considered. During the development of the classifier model, the naive Bayes algorithm (MultinomialNB) and the linear classifier model with optimization of the stochastic gradient descent (SGDClassifier) were used. As a training sample, a corpus of 225,000 labeled texts from the Twitter resource was used. The classifier was trained, during which the cross-validation strategy and the ShuffleSplit method were used. Testing and comparison of the results of the pitch classification were carried out. According to the results of validation, the linear model with the n-gram scheme [1, 3] and the vectorizer TF-IDF turned out to be the best. During the approbation of the developed system, the collection and analysis of reviews related to the quality of transport networks of the Republic of Crimea and the city of Sevastopol were conducted. Conclusions are drawn and prospects for further functional development of the developed tools are defined.
1 - 3 of 3 items