Preview

Моделирование и анализ информационных систем

Расширенный поиск

Поиск упоминаний экологических практик в социальных сетях с помощью методов классификации текстов

https://doi.org/10.18255/1818-1015-2022-4-316-332

Аннотация

Работа посвящена решению задачи поиска упоминаний экологических практик в текстах социальных сетей. Авторами составлен корпус текстов экологических сообществ социальной сети ВКонтакте, снабженный экспертной разметкой упоминаний девяти видов экологических практик. Предложен полуавтоматический подход к сбору дополнительных текстов для уменьшения несбалансированности видов экологических практик, представленных в корпусе. Подход включает в себя следующие этапы: определение наиболее частотных слов, характеризующих упоминания практик; автоматический сбор текстов, включающих в себя найденные частотные слова; экспертная проверка и фильтрация собранных текстов. Проведено сравнение четырех моделей машинного обучения для поиска упоминаний практик на двух вариантах корпуса: исходном и дополненном. Лучший усредненный показатель F-меры (81.32%) достигнут моделью Conversational RuBERT, дообученной на текстах дополненного корпуса. Данная модель выбрана в качестве основы для реализации прототипа приложения для поиска упоминаний экологических практик, реализованного в форме чат-бота Telegram.

Об авторах

Анна Валерьевна Глазкова
Тюменский государственный университет
Россия


Ольга Владимировна Захарова
Тюменский государственный университет
Россия


Антон Викторович Захаров
Тюменский государственный университет
Россия


Наталья Николаевна Москвина
Тюменский государственный университет
Россия


Тимур Русланович Еникеев
Новосибирский государственный университет
Россия


Арсений Николаевич Ходырев
Тюменский государственный университет
Россия


Всеволод Константинович Боровинский
Тюменский государственный университет
Россия


Ирина Николаевна Пупышева
Тюменский государственный университет
Россия


Список литературы

1. O. Zakharova, I. Pupysheva, T. Payusova, A. Zakharov, and S. L., "Green Values in Crowdfunding Projects”, Glocalism, no. 1, p. 6, 2021. doi: 10.12893/gjcpi.2021.1.6.

2. VCIOM. Jekologicheskaja povestka: za desjat’ mesjacev do vyborov v Gosdumu (analiticheskij doklad). 2020-12-30, http://www.wciom.ru, Accessed: 2021-03-18.

3. Y. V. Ermolaeva and M. V. Rybakova, "Civil social practices of waste recycling in Russia (Moscow and Kazan)”, IIOAB Journal, vol. 10, no. S1, pp. 153-156, 2019.

4. O. Zakharova, T. Payusova, I. Akhmedova, and L. Suvorova, "Green Practices: Ways to Investigation”, Sotsiologicheskie issledovaniya, no. 4, pp. 25-36, 2021. doi: 10.31857/S013216250012084-5.

5. A. Zubiaga, A. Aker, K. Bontcheva, M. Liakata, and R. Procter, "Detection and resolution of rumours in social media: A survey”, ACM Computing Surveys (CSUR), vol. 51, no. 2, pp. 1-36, 2018. doi: 10.1145/ 3161603.

6. D. Rogers, A. Preece, M. Innes, and I. Spasic, "Real-time text classification of user-generated content on social media: Systematic review”, IEEE Transactions on Computational Social Systems, 2021. doi: 10.1109/TCSS.2021.3120138.

7. Q. Li, H. Peng, J. Li, C. Xia, R. Yang, L. Sun, P. S. Yu, and L. He, "A Survey on Text Classification: From Traditional to Deep Learning”, ACM Transactions on Intelligent Systems and Technology (TIST), vol. 13, no. 2, pp. 1-41, 2022. doi: 10.1145/3495162.

8. F. C. Permana, Y. Rosmansyah, and A. S. Abdullah, "Naive Bayes as opinion classifier to evaluate students satisfaction based on student sentiment in Twitter Social Media”, in Journal of Physics: Conference Series, IOP Publishing, vol. 893, 2017, p. 012 051. doi: 10.1088/1742-6596/893/1/012051.

9. V. A. Fitri, R. Andreswari, and M. A. Hasibuan, "Sentiment analysis of social media Twitter with case of Anti-LGBT campaign in Indonesia using Naive Bayes, decision tree, and random forest algorithm”, Procedia Computer Science, vol. 161, pp. 765-772, 2019. doi: 10.1016/j.procs.2019.11.181.

10. N. R. Fatahillah, P. Suryati, and C. Haryawan, "Implementation of Naive Bayes classifier algorithm on social media (Twitter) to the teaching of Indonesian hate speech”, in 2017 International Conference on Sustainable Information Engineering and Technology (SIET), IEEE, 2017, pp. 128-131. doi: 10.1109/ SIET.2017.8304122.

11. K. K. Kiilu, G. Okeyo, R. Rimiru, and K. Ogada, "Using Naive Bayes algorithm in detection of hate tweets”, International Journal of Scientific and Research Publications, vol. 8, no. 3, pp. 99-107, 2018. doi: 10.29322/IJSRP.8.3.2018.p7517.

12. Z. Peng, Q. Hu, and J. Dang, "Multi-kernel SVM based depression recognition using social media data”, International Journal of Machine Learning and Cybernetics, vol. 10, no. 1, pp. 43-57, 2019. doi: 10.1007/s13042-017-0697-1.

13. P. Karthika, R. Murugeswari, and R. Manoranjithem, "Sentiment analysis of social media network using random forest algorithm”, in 2019 IEEE international conference on intelligent techniques in control, optimization and signal processing (INCOS), IEEE, 2019, pp. 1-5. doi: 10.1109/INCOS45849. 2019.8951367.

14. B. Y. Pratama and R. Sarno, "Personality classification based on Twitter text using Naive Bayes, KNN and SVM”, in 2015 International Conference on Data and Software Engineering (ICoDSE), IEEE, 2015, pp. 170-174. doi: 10.1109/ICODSE.2015.7436992.

15. S. Hochreiter and J. Schmidhuber, "Long short-term memory”, Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.

16. Y. Ma, H. Peng, T. Khan, E. Cambria, and A. Hussain, "Sentic LSTM: a hybrid network for targeted aspect-based sentiment analysis”, Cognitive Computation, vol. 10, no. 4, pp. 639-650, 2018. doi: 10. 1007/s12559-018-9549-x.

17. M. Tripathi, "Sentiment analysis of Nepali COVID19 tweets using NB SVM and LSTM”, Journal of Artificial Intelligence, vol. 3, no. 03, pp. 151-168, 2021. doi: 0.36548/jaicn.2021.3.001.

18. R. Monika, S. Deivalakshmi, and B. Janet, "Sentiment analysis of US airlines tweets using LSTM/RNN”, in 2019 IEEE 9th International Conference on Advanced Computing (IACC), IEEE, 2019, pp. 92-95. doi: 10.1109/IACC48062.2019.8971592.

19. P. Badjatiya, S. Gupta, M. Gupta, and V. Varma, "Deep learning for hate speech detection in tweets”, in Proceedings of the 26th international conference on World Wide Web companion, 2017, pp. 759-760. doi: 10.1145/3041021.3054223.

20. A. Bisht, A. Singh, H. Bhadauria, J. Virmani, et al., "Detection of hate speech and offensive language in Twitter data using LSTM model”, in Recent trends in image and signal processing in computer vision, Springer, 2020, pp. 243-264. doi: 10.1007/978-981-15-2740-1_17.

21. V.Rupapara, F.Rustam, A. Amaar, P. B. Washington, E. Lee, and I. Ashraf, "Deepfake tweets classification using stacked Bi-LSTM and words embedding”, PeerJ Computer Science, vol. 7, e745, 2021. doi: 10.7717/peerj-cs.745.

22. A. Wani, I. Joshi, S. Khandve, V. Wagh, and R. Joshi, "Evaluating deep learning approaches for COVID19 fake news detection”, in International Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situation, Springer, 2021, pp. 153-163. doi: 10.1007/978-3030-73696-5-15.

23. S. Lai, L. Xu, K. Liu, and J. Zhao, "Recurrent convolutional neural networks for text classification”, in Twenty-ninth AAAI conference on artificial intelligence, 2015. doi: 10.5555/2886521.2886636.

24. S. Bansal, "A Mutli-Task Mutlimodal Framework for Tweet Classification Based on CNN (Grand Challenge)”, in 2020 IEEE Sixth International Conference on Multimedia Big Data (BigMM), IEEE, 2020, pp. 456-460. doi: 10.1109/BigMM50055.2020.00075.

25. M. E. Basiri, S. Nemati, M. Abdar, E. Cambria, and U. R. Acharya, "ABCDM: An attention-based bidirectional CNN-RNN deep model for sentiment analysis”, Future Generation Computer Systems, vol. 115, pp. 279-294, 2021. doi: 10.1016/j.future.2020.08.005.

26. J. Wang, L.-C. Yu, K. R. Lai, and X. Zhang, "Dimensional sentiment analysis using a regional CNN-LSTM model”, in Proceedings of the 54th annual meeting of the association for computational linguistics (volume 2: Short papers), 2016, pp. 225-230. doi: 10.18653/v1/P16-2037.

27. A. M. Alayba, V. Palade, M. England, and R. Iqbal, "A combined CNN and LSTM model for Arabic sentiment analysis”, in International cross-domain conference for machine learning and knowledge extraction, Springer, 2018, pp. 179-191. doi: 10.1007/978-3-319-99740-7_12.

28. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need”, Advances in neural information processing systems, vol. 30, 2017.

29. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, 2019, pp. 4171-4186. doi: 10.18653/v1/N19-1423.

30. Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, "RoBERTa: A robustly optimized BERT pretraining approach”, arXiv preprint arXiv:1907.11692, 2019. doi: 10.48550/arXiv.1907.11692.

31. A. El Mahdaouy, A. El Mekki, K. Essefar, A. Skiredj, and I. Berrada, "CS-UM6P at SemEval-2022 Task 6: Transformer-based Models for Intended Sarcasm Detection in English and Arabic”, in Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), 2022, pp. 844-850. doi: 10.18653/v1/2022.semeval-1.117.

32. M. Du, S. D. Gollapalli, and S.-K. Ng, "NUS-IDS at CheckThat! 2022: Identifying Check-worthiness of Tweets using CheckthaT5”, Working Notes of CLEF, 2022.

33. A. Glazkova, M. Glazkov, and T. Trifonov, "g2tmn at constraint@ aaai2021: exploiting CT-BERT and ensembling learning for COVID-19 fake news detection”, in International Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situation, Springer, 2021, pp. 116-127. doi: 10.1007/978-3-030-73696-5-12.

34. Y.Rubtsova, "Constructing a corpus for sentiment classification training”, Software & Systems, no. 1 (109), pp. 72-78, 2015. doi: 10.15827/0236-235X.109.072-078.

35. I. Bolshakova and K. Lagutina, "Avtomaticheskaja klassifikacija tekstov na russkom jazyke s pomoshh’ju tonal’nogo slovarja”, no. 14, pp. 6-13, 2022.

36. A. Kotelnikova, D. Paschenko, and E. Razova, "Lexicon-based methods and BERT model for sentiment analysis of Russian text corpora”, in CEUR Workshop Proceedings, 2021, pp. 73-81.

37. N. Loukachevitch and Y.Rubtsova, "SentiRuEval-2016: overcoming time gap and data sparsity in tweet sentiment analysis”, in Computational Linguistics and Intellectual Technologies, 2016, pp. 416-426.

38. A. Chernyaev, A. Spryiskov, A. Ivashko, and Y. Bidulya, "A rumor detection in Russian tweets”, in International Conference on Speech and Computer, Springer, 2020, pp. 108-118. doi: 10.1007/978-3030-60276-5-11.

39. E. Mikhalkova, Y. Karyakin, and I. Glukhikh, "Large Scale Retrieval of Social Network Pages by Interests of Their Followers”, in Computational Science - ICCS 2018, Cham: Springer International Publishing, 2018, pp. 234-246. doi: 10.1007/978-3-319-93698-7-18.

40. E. Pronoza, P. Panicheva, O. Koltsova, and P. Rosso, "Detecting ethnicity-targeted hate speech in Russian social media texts”, Information Processing & Management, vol. 58, no. 6, p. 102 674, 2021, ISSN: 0306-4573. doi: 10.1016/j.ipm.2021.102674.

41. K. V. Lagutina, N. S. Lagutina, and E. I. Boychuk, "Text classification by genre based on rhythm features”, Modeling and analysis of information systems, pp. 280-291, 2021. doi: 10.18255/1818-10152021-3-280-291.

42. K. Svetlov and K. Platonov, "Sentiment analysis of posts and comments in the accounts of Russian politicians on the social network”, in 2019 25th Conference of Open Innovations Association (FRUCT), IEEE, 2019, pp. 299-305. doi: 10.23919/FRUCT48121.2019.8981501.

43. I. Kozitsin, A. Chkhartishvili, A. Marchenko, D. Norkin, S. Osipov, I. Uteshev, V. Goiko, R. Palkin, and M. Myagkov, "Modeling political preferences of Russian users exemplified by the social network Vkontakte”, Mathematical Models and Computer Simulations, vol. 12, no. 2, pp. 185-194, 2020. doi: 10.1134/S2070048220020088.

44. P. Basina, V. Goiko, E. Petrov, and V. Bakulin, "Classification community publications of the ’’VKontakte” for assessing the quality of life of the population”, Computational Linguistics and Intellectual Technologies, p. 18, 2022. doi: 10.28995/2075-7182-2022-21-1001-1016.

45. A. Sboev, I. Moloshnikov, A. Naumov, A. Levochkina, and R. Rybka, "The Russian Language Corpus and a Neural Network to Analyse Internet Tweet Reports About COVID-19”, PoS, vol. DLCP2021, p. 017, 2021. doi: 10.22323/1.410.0017.

46. M. J. Farrell, L. Brierley, A. Willoughby, A. Yates, and N. Mideo, "Past and future uses of text mining in ecology and evolution”, Proceedings of the Royal Society B, vol. 289, no. 1975, p. 20 212 721, 2022. doi: 10.1098/rspb.2021.2721.

47. S. C. Anderson, P. R. Elsen, B. B. Hughes, R. K. Tonietto, M. C. Bletz, D. A. Gill, M. A. Holgerson, S. E. Kuebbing, C. McDonough MacKenzie, M. H. Meek, et al., "Trends in ecology and conservation over eight decades”, Frontiers in Ecology and the Environment, vol. 19, no. 5, pp. 274-282, 2021. doi: 10.1002/fee.2320.

48. J. Knott, E. LaRue, S. Ward, E. McCallen, K. Ordonez, F. Wagner, I. Jo, J. Elliott, and S. Fei, "A roadmap for exploring the thematic content of ecology journals”, Ecosphere, vol. 10, no. 8, e02801, 2019. doi: 10.1002/ecs2.2801.

49. F. R. Dayeen, A. S. Sharma, and S. Derrible, "A text mining analysis of the climate change literature in industrial ecology”, Journal of Industrial Ecology, vol. 24, no. 2, pp. 276-284, 2020. doi: 10.1111/jiec. 12998.

50. F. Romero-Perdomo, J. D. Carvajalino-Umafia, J. L. Moreno-Gallego, N. Ardila, and M. A. Gonzalez-Curbelo, "Research Trends on Climate Change and Circular Economy from a Knowledge Mapping Perspective”, Sustainability, vol. 14, no. 1, p. 521, 2022. doi: 10.3390/su14010521.

51. O. J. Luiz, J. D. Olden, M. J. Kennard, D. A. Crook, M. M. Douglas, T. M. Saunders, and A. J. King, "Trait-based ecology of fishes: A quantitative assessment of literature trends and knowledge gaps using topic modelling”, Fish and Fisheries, vol. 20, no. 6, pp. 1100-1110, 2019. doi: 10.1111/faf.12399.

52. R. Cornford, S. Deinet, A. De Palma, S. L. Hill, L. McRae, B. Pettit, V. Marconi, A. Purvis, and R. Freeman, "Fast, scalable, and automated identification of articles for biodiversity and macroecological datasets”, Global Ecology and Biogeography, vol. 30, no. 1, pp. 339-347, 2021. doi: 10.1111/geb.13219.

53. N. Le Guillarme and W. Thuiller, "TaxoNERD: deep neural models for the recognition of taxonomic entities in the ecological and evolutionary literature”, Methods in Ecology and Evolution, vol. 13, no. 3, pp. 625-641, 2022. doi: 10.1111/2041-210X.13778.

54. N. T. Nguyen, R. S. Gabud, and S. Ananiadou, "COPIOUS: A gold standard corpus of named entities towards extracting species occurrence from biodiversity literature”, Biodiversity data journal, no. 7, 2019. doi: 10.3897/BDJ.7.e29626.

55. R. Bossy, L. Deleger, E. Chaix, M. Ba, and C. Nedellec, "Bacteria biotope at BioNLP open shared tasks 2019”, in Proceedings of the 5th workshop on BioNLP open shared tasks, 2019, pp. 121-131. doi: 10.18653/ v1/D19-5719.

56. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., "Scikit-learn: Machine learning in Python”, the Journal of machine Learning research, vol. 12, pp. 2825-2830, 2011.

57. Y. Kuratov and M. Arkhipov, "Adaptation of deep bidirectional multilingual transformers for Russian language”, in Komp’juternaja Lingvistika i Intellektual’nye Tehnologii, 2019, pp. 333-339.

58. P. Lison and J. Tiedemann, "OpenSubtitles2016: Extracting large parallel corpora from movie and TV subtitles”, 2016.

59. T. Shavrina and O. Shapovalova, "To the methodology of corpus construction for machine learning: ”Taiga” syntax tree corpus and parser”, Proceedings of the “Corpora”, pp. 78-84, 2017.

60. A. Fenogenova, "Russian paraphrasers: Paraphrase with transformers”, in Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing, 2021, pp. 11-19.

61. I. Bondarenko, "Contrastive fine-tuning to improve generalization in deep NER”, 2022. doi: 10.28995/ 2075-7182-2022-21-70-80.


Рецензия

Для цитирования:


Глазкова А.В., Захарова О.В., Захаров А.В., Москвина Н.Н., Еникеев Т.Р., Ходырев А.Н., Боровинский В.К., Пупышева И.Н. Поиск упоминаний экологических практик в социальных сетях с помощью методов классификации текстов. Моделирование и анализ информационных систем. 2022;29(4):316-332. https://doi.org/10.18255/1818-1015-2022-4-316-332

For citation:


Glazkova A.V., Zakharova O.V., Zakharov A.V., Moskvina N.N., Enikeev T.R., Hodyrev A.N., Borovinskiy V.K., Pupysheva I.N. Detecting Mentions of Green Practices in Social Media Based on Text Classification. Modeling and Analysis of Information Systems. 2022;29(4):316-332. (In Russ.) https://doi.org/10.18255/1818-1015-2022-4-316-332

Просмотров: 964


Creative Commons License
Контент доступен под лицензией Creative Commons Attribution 4.0 License.


ISSN 1818-1015 (Print)
ISSN 2313-5417 (Online)