Задачи в области распознавания именованных сущностей: технологии и инструменты
https://doi.org/10.18255/1818-1015-2023-1-64-85
Аннотация
Задача распознавания именованных сущностей (named entity recognition, NER) состоит в выделении и классификации слов и словосочетаний, обозначающих именованные объекты, таких как люди, организации, географические названия, даты, события, обозначения терминов предметных областей. В поисках лучшего решения исследователи проводят широкий спектр экспериментов с разными технологиями и исходными данными. Сравнение результатов этих экспериментов показывает значительное расхождение качества NER и ставит проблему определения условий и границ применения используемых технологий, а также поиска новых путей решения. Важным звеном в ответах на эти вопросы является систематизация и анализ актуальных исследований и публикация соответствующих обзоров. В области распознавания именованных сущностей авторы аналитических статей в первую очередь рассматривают математические методы выделения и классификации и не уделяют внимание специфике самой задачи. В предлагаемом обзоре область распознавания именованных сущностей рассмотрена с точки зрения отдельных категорий задач. Авторы выделили пять категорий: классическая задача NER, подзадачи NER, NER в социальных сетях, NER в предметных областях, NER в задачах обработки естественного языка (natural language processing, NLP). Для каждой категории обсуждается качество решения, особенности методов, проблемы и ограничения. Информация об актуальных научных работах каждой категории для наглядности приводится в виде таблицы, содержащей информацию об исследованиях: ссылку на работу, язык использованного корпуса текстов и его название, базовый метод решения задачи, оценку качества решения в виде стандартной статистической характеристики F-меры, которая является средним гармоническим между точностью и полнотой решения. Обзор позволяет сделать ряд выводов. В качестве базовых технологий лидируют методы глубокого обучения. Основными проблемами являются дефицит эталонных наборов данных, высокие требования к вычислительным ресурсам, отсутствие анализа ошибок. Перспективным направлением исследований в области NER является развитие методов на основе обучения без учителя или на основе правил. Возможной базой предобработки текста для таких методов могут служить интенсивно развивающиеся модели языков в существующих инструментах NLP. Завершают статью описание и результаты экспериментов с инструментами NER для русскоязычных текстов.
Об авторах
Надежда Станиславовна ЛагутинаРоссия
Андрей Михайлович Васильев
Россия
Даниил Дмитриевич Зафиевский
Россия
Список литературы
1. R. Grishman and B. Sundheim, “Message understanding conference-6: A brief history,” in Proceedings of the 16th International Conference on Computational Linguistics (COLING 96), Copenhagen, August 1996, 1996, pp. 466–471.
2. D. Nadeau and S. Sekine, “A survey of named entity recognition and classification,” Lingvisticae Investigationes, vol. 30, no. 1, pp. 3–26, 2007.
3. R. Sharnagat, “Named entity recognition: A literature survey,” Center For Indian Language Technology, pp. 1–27, 2014.
4. J. Li, A. Sun, J. Han, and C. Li, “A Survey on Deep Learning for Named Entity Recognition,” IEEE Transactions on Knowledge & Data Engineering, vol. 34, no. 1, pp. 50–70, 2022.
5. G. Popovski, B. K. Seljak, and T. Eftimov, “A survey of named-entity recognition methods for food information extraction,” IEEE Access, vol. 8, pp. 31586–31594, 2020.
6. J. Piskorski et al., “Slav-NER: the 3rd Cross-lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic Languages,” in Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing, 2021, pp. 122–133.
7. A. S. Starostin, V. V. Bocharov, S. V. Alexeeva, A. A. Bodrova, A. S. Chuchunkov, and others, “FactRuEval 2016: Evaluation of named entity recognition and fact extraction systems for Russian,” in Computational Linguistics and Intellectual Technologies, 2016, pp. 702–720.
8. Y. Jiang, C. Hu, T. Xiao, C. Zhang, and J. Zhu, “Improved differentiable architecture search for language modeling and named entity recognition,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 3585–3590.
9. R. Speck and A.-C. Ngonga Ngomo, “Ensemble learning for named entity recognition,” in International semantic web conference, 2014, pp. 519–534.
10. A. Ghaddar and P. Langlais, “Robust Lexical Features for Improved Neural Network Named-Entity Recognition,” in Proceedings of the 27th International Conference on Computational Linguistics, 2018, pp. 1896–1907.
11. A. Akbik, T. Bergmann, and R. Vollgraf, “Pooled contextualized embeddings for named entity recognition,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019, pp. 724–728.
12. M. Riedl and S. Pad'o, “A named entity recognition shootout for german,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2018, pp. 120–125.
13. P. H. L. de Araujo, T. E. de Campos, R. R. R. de Oliveira, M. Stauffer, S. Couto, and P. Bermejo, “Lener-br: a dataset for named entity recognition in brazilian legal text,” in International Conference on Computational Processing of the Portuguese Language, 2018, pp. 313–323.
14. Y. Luo, F. Xiao, and H. Zhao, “Hierarchical contextualized representation for named entity recognition,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, vol. 34, no. 05, pp. 8441–8448.
15. N. Eliguzel, C. cCetinkaya, and T. Dereli, “Application of named entity recognition on tweets during earthquake disaster: a deep learning-based approach,” Soft Computing, vol. 26, no. 1, pp. 395–421, 2022.
16. M. Y. Arkhipov, M. S. Burtsev, and others, “Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition,” in Conference on Artificial Intelligence and Natural Language, 2017, pp. 91–103.
17. R. Hvingelby, A. B. Pauli, M. Barrett, C. Rosted, L. M. Lidegaard, and A. Sogaard, “DaNE: A named entity resource for danish,” in Proceedings of the 12th Language Resources and Evaluation Conference, 2020, pp. 4597–4604.
18. M. Arkhipov, M. Trofimova, Y. Kuratov, and A. Sorokin, “Tuning multilingual transformers for named entity recognition on slavic languages,” BSNLP’2019, p. 89, 2019.
19. J. Piskorski et al., “The Second Cross-Lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic Languages,” in Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, 2019, pp. 63–74.
20. C. Helwe, G. Dib, M. Shamas, and S. Elbassuoni, “A semi-supervised BERT approach for Arabic named entity recognition,” in Proceedings of the Fifth Arabic Natural Language Processing Workshop, 2020, pp. 49–57.
21. C. Liang et al., “Bond: BERT-assisted open-domain named entity recognition with distant supervision,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 1054–1064.
22. E. Oliveira, G. Dias, J. Lima, and J. P. C. Pirovani, “Using Named Entities for Recognizing Family Relationships,” in Anais do IX Symposium on Knowledge Discovery, Mining and Learning, 2021, pp. 24–32.
23. R. Yeniterzi, G. Tur, and K. Oflazer, “Turkish named-entity recognition,” in Turkish Natural Language Processing, Springer, 2018, pp. 115–132.
24. T. Ruokolainen, P. Kauppinen, M. Silfverberg, and K. Lind'en, “A Finnish news corpus for named entity recognition,” Language Resources and Evaluation, vol. 54, no. 1, pp. 247–272, 2020.
25. Y. Fu, N. Lin, Z. Yang, and S. Jiang, “Towards Malay named entity recognition: an open-source dataset and a multi-task framework,” Connection Science, pp. 1–23, 2022.
26. M. Ju, M. Miwa, and S. Ananiadou, “A neural layered model for nested named entity recognition,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2018, pp. 1446–1459.
27. C. Xia et al., “Multi-grained named entity recognition,” in 57th Annual Meeting of the Association for Computational Linguistics, ACL 2019, 2020, pp. 1430–1440.
28. J. Wan, D. Ru, W. Zhang, and Y. Yu, “Nested Named Entity Recognition with Span-level Graphs,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022, pp. 892–903.
29. J. Yu, B. Bohnet, and M. Poesio, “Named Entity Recognition as Dependency Parsing,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 6470–6476.
30. M. G. Sohrab and M. Miwa, “Deep exhaustive model for nested named entity recognition,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 2843–2849.
31. J. Li, D. Ye, and S. Shang, “Adversarial Transfer for Named Entity Boundary Detection with Pointer Networks,” in IJCAI, 2019, pp. 5053–5059.
32. D. Bareket and R. Tsarfaty, “Neural modeling for named entities and morphology (nemoˆ2),” Transactions of the Association for Computational Linguistics, vol. 9, pp. 909–928, 2021.
33. S. H. Jeon and S. Cho, “Edge Weight Updating Neural Network for Named Entity Normalization,” Neural Processing Letters, pp. 1–22, 2022.
34. D. Zhou and T. Liu, “Joint model of biomedical entity recognition and normalization labels based on self-attention,” in Second International Symposium on Computer Technology and Information Science (ISCTIS 2022), 2022, vol. 12474, pp. 461–466.
35. A. Fritzler, V. Logacheva, and M. Kretov, “Few-shot classification in named entity recognition task,” in Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, 2019, pp. 993–1000.
36. J. R. Finkel and C. D. Manning, “Nested named entity recognition,” in Proceedings of the 2009 conference on empirical methods in natural language processing, 2009, pp. 141–150.
37. M. E. Peters et al., “Deep contextualized word representations. arXiv preprint arXiv: 180205365,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018, vol. 1, pp. 2227–2237.
38. A. Ghaddar, P. Langlais, A. Rashid, and M. Rezagholizadeh, “Context-aware adversarial training for name regularity bias in named entity recognition,” Transactions of the Association for Computational Linguistics, vol. 9, pp. 586–604, 2021.
39. Y. Nie, Y. Tian, Y. Song, X. Ao, and X. Wan, “Improving Named Entity Recognition with Attentive Ensemble of Syntactic Information,” in Findings of the Association for Computational Linguistics: EMNLP 2020, 2020, pp. 4231–4245.
40. G. Aguilar, S. Maharjan, A. P. L'opez-Monroy, and T. Solorio, “A Multi-task Approach for Named Entity Recognition in Social Media Data,” W-NUT 2017, p. 148, 2017.
41. Q. Zhang, J. Fu, X. Liu, and X. Huang, “Adaptive Co-attention Network for Named Entity Recognition in Tweets,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2018, vol. 32, no. 1, pp. 5674–5681.
42. A. Miranda-Escalada et al., “The profner shared task on automatic recognition of occupation mentions in social media: systems, evaluation, guidelines, embeddings and corpora,” in Proceedings of the Sixth Social Media Mining for Health (# SMM4H) Workshop and Shared Task, 2021, pp. 13–20.
43. M. Asgari-Chenaghlu, M. R. Feizi-Derakhshi, L. Farzinvash, M. A. Balafar, and C. Motamed, “CWI: A multimodal deep learning approach for named entity recognition from social media using character, word and image features,” Neural Computing and Applications, pp. 1–18, 2021.
44. J. Yu, J. Jiang, L. Yang, and R. Xia, “Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 3342–3352.
45. M. Asgari-Bidhendi, B. Janfada, O. R. Roshani Talab, and B. Minaei-Bidgoli, “ParsNER-Social: A Corpus for Named Entity Recognition in Persian Social Media Texts,” Journal of AI and Data Mining, vol. 9, no. 2, pp. 181–192, 2021.
46. R. Priyadharshini, B. R. Chakravarthi, M. Vegupatti, and J. P. McCrae, “Named entity recognition for code-mixed Indian corpus using meta embedding,” in 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), 2020, pp. 68–72.
47. M. C. Phan and A. Sun, “Collective named entity recognition in user comments via parameterized label propagation,” Journal of the Association for Information Science and Technology, vol. 71, no. 5, pp. 568–577, 2020.
48. A. Vaswani et al., “Attention is all you need,” Advances in neural information processing systems, vol. 30, pp. 5998–6008, 2017.
49. X. Wang et al., “Cross-type biomedical named entity recognition with deep multi-task learning,” Bioinformatics, vol. 35, no. 10, pp. 1745–1752, 2019.
50. W. Yoon, C. H. So, J. Lee, and J. Kang, “Collabonet: collaboration of deep neural networks for biomedical named entity recognition,” BMC bioinformatics, vol. 20, no. 10, pp. 55–65, 2019.
51. L. Weber, M. S"anger, J. M"unchmeyer, M. Habibi, U. Leser, and A. Akbik, “HunFlair: an easy-to-use tool for state-of-the-art biomedical named entity recognition,” Bioinformatics, vol. 37, no. 17, pp. 2792–2794, 2021.
52. D. Kim et al., “A neural named entity recognition and multi-type normalization tool for biomedical text mining,” IEEE Access, vol. 7, pp. 73729–73740, 2019.
53. Z. Miftahutdinov, I. Alimova, and E. Tutubalina, “On biomedical named entity recognition: experiments in interlingual transfer for clinical and social media texts,” in European Conference on Information Retrieval, 2020, pp. 281–288.
54. R. Catelli, F. Gargiulo, V. Casola, G. De Pietro, H. Fujita, and M. Esposito, “Crosslingual named entity recognition for clinical de-identification applied to a COVID-19 Italian data set,” Applied Soft Computing, vol. 97, p. 106779, 2020.
55. L. Luo et al., “An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition,” Bioinformatics, vol. 34, no. 8, pp. 1381–1388, 2018.
56. W. Hemati and A. Mehler, “LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools,” Journal of cheminformatics, vol. 11, no. 1, pp. 1–7, 2019.
57. Z. Hong, R. Tchoua, K. Chard, and I. Foster, “SciNER: extracting named entities from scientific literature,” in International Conference on Computational Science, 2020, pp. 308–321.
58. L. Weston et al., “Named entity recognition and normalization applied to large-scale information extraction from the materials science literature,” Journal of chemical information and modeling, vol. 59, no. 9, pp. 3692–3702, 2019.
59. A. Kumar and B. Starly, “‘FabNER’: information extraction from manufacturing process science domain literature using named entity recognition,” Journal of Intelligent Manufacturing, vol. 33, no. 8, pp. 2393–2407, 2022.
60. S. Moon, G. Lee, S. Chi, and H. Oh, “Automated construction specification review with named entity recognition using natural language processing,” Journal of Construction Engineering and Management, vol. 147, no. 1, p. 04020147, 2021.
61. M. H. Syed and S.-T. Chung, “MenuNER: Domain-adapted BERT based NER approach for a domain with limited dataset and its application to food menu domain,” Applied Sciences, vol. 11, no. 13, p. 6007, 2021.
62. G. Popovski, S. Kochev, B. Korousic-Seljak, and T. Eftimov, “FoodIE: A Rule-based Named-entity Recognition Method for Food Information Extraction.,” in ICPRAM, 2019, pp. 915–922.
63. F.-L. Li et al., “AliMeKG: Domain knowledge graph construction and application in e-commerce,” in Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020, pp. 2581–2588.
64. N. Perera, T. T. L. Nguyen, M. Dehmer, and F. Emmert-Streib, “Comparison of text mining models for food and dietary constituent named-entity recognition,” Machine Learning and Knowledge Extraction, vol. 4, no. 1, pp. 254–275, 2022.
65. T. Eftimov, B. Korouvsi'c Seljak, and P. Korovsec, “A rule-based named-entity recognition method for knowledge extraction of evidence-based dietary recommendations,” PloS one, vol. 12, no. 6, p. e0179488, 2017.
66. Z. Liu, M. Luo, H. Yang, and X. Liu, “Named entity recognition for the horticultural domain,” in Journal of Physics: Conference Series, 2020, vol. 1631, no. 1, p. 012016.
67. G. Kim, C. Lee, J. Jo, and H. Lim, “Automatic extraction of named entities of cyber threats using a deep Bi-LSTM-CRF network,” International journal of machine learning and cybernetics, vol. 11, no. 10, pp. 2341–2355, 2020.
68. S. Zhou, J. Liu, X. Zhong, and W. Zhao, “Named Entity Recognition Using BERT with Whole World Masking in Cybersecurity Domain,” in 2021 IEEE 6th International Conference on Big Data Analytics (ICBDA), 2021, pp. 316–320.
69. M. Tikhomirov, N. Loukachevitch, A. Sirotina, and B. Dobrov, “Using BERT and augmentation in named entity recognition for cybersecurity domain,” in International Conference on Applications of Natural Language to Information Systems, 2020, pp. 16–24.
70. T. W. T. Au, I. J. Cox, and V. Lampos, “E-NER--An Annotated Named Entity Recognition Corpus of Legal Text,” arXiv preprint arXiv:2212.09306, 2022.
71. C. Cetindag, B. Yaziciouglu, and A. Koc, “Named-entity recognition in Turkish legal texts,” Natural Language Engineering, pp. 1–28, 2022.
72. A. Brandsen, S. Verberne, K. Lambers, and M. Wansleeben, “Can BERT Dig It? Named Entity Recognition for Information Retrieval in the Archaeology Domain,” Journal on Computing and Cultural Heritage (JOCCH), vol. 15, no. 3, pp. 1–18, 2022.
73. E. Kogkitsidou and P. Gambette, “Normalisation of 16th and 17th century texts in French and geographical named entity recognition,” in Proceedings of the 4th ACM SIGSPATIAL Workshop on Geospatial Humanities, 2020, pp. 28–34.
74. D. Alexander and A. P. de Vries, “" This research is funded by...": Named Entity Recognition of Financial Information in Research Papers,” in Proceedings of the 11th International Workshop on Bibliometric-enhanced Information Retrieval co-located with 43rd ECIR, 2021, pp. 102–110.
75. J. Li, S. Shang, and L. Shao, “Metaner: Named entity recognition with meta-learning,” in Proceedings of The Web Conference 2020, 2020, pp. 429–440.
76. B. Nie, C. Li, and H. Wang, “KA-NER: Knowledge Augmented Named Entity Recognition,” in China Conference on Knowledge Graph and Semantic Computing, 2021, pp. 60–75.
77. B.-S. Lin, J.-H. Chen, and T.-H. Chang, “NERVE at ROCLING 2022 shared task: a comparison of three named entity recognition frameworks based on language model and lexicon approach,” in Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), 2022, pp. 343–349.
78. T. Isazawa and J. M. Cole, “Single Model for Organic and Inorganic Chemical Named Entity Recognition in ChemDataExtractor,” Journal of Chemical Information and Modeling, vol. 62, no. 5, pp. 1207–1213, 2022.
79. B. Singh, A. Marathe, A. A. Rizvi, and A. R. Joshi, “Retaining Named Entities for Headline Generation,” in Inventive Computation and Information Technologies, Springer, 2021, pp. 221–234.
80. Y. Ji, C. Tong, J. Liang, X. Yang, Z. Zhao, and X. Wang, “A deep learning method for named entity recognition in bidding document,” in Journal of Physics: Conference Series, 2019, vol. 1168, no. 3, p. 032076.
81. S. M. Makeev, S. V. SHekshuev, M. G. Petrov, and S. D. Zyuzin, “Modul' obrabotki estestvennogo yazyka na osnove obuchennoj modeli nejronnoj seti s mekhanizmom poiska imenovannyh sushchnostej,” Izvestiya Tul'skogo gosudarstvennogo universiteta. Tekhnicheskie nauki, no. 9, pp. 56–64, 2022.
82. K. Krasnashchok and S. Jouili, “Improving topic quality by promoting named entities in topic modeling,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2018, pp. 247–253.
83. A. Kumar and B. Starly, “‘FabNER’: information extraction from manufacturing process science domain literature using named entity recognition,” Journal of Intelligent Manufacturing, pp. 1–15, 2021.
84. A. Siekmeier, W. K. Lee, H. Kwon, and J.-H. Lee, “Tag Assisted Neural Machine Translation of Film Subtitles,” in Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021), 2021, pp. 255–262.
85. J. Ding, H. Sun, X. Wang, and X. Liu, “Entity-level sentiment analysis of issue comments,” in Proceedings of the 3rd International Workshop on Emotion Awareness in Software Engineering, 2018, pp. 7–13.
86. J. Yu, J. Jiang, and R. Xia, “Entity-sensitive attention and fusion network for entity-level multimodal sentiment classification,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 429–439, 2019.
87. Z. A. Guven and M. O. Unalir, “Improving the BERT Model with Proposed Named Entity Recognition Method for Question Answering,” in 2021 6th International Conference on Computer Science and Engineering (UBMK), 2021, pp. 204–208.
88. B. Kleinberg, M. Mozes, A. Arntz, and B. Verschuere, “Using named entities for computer-automated verbal deception detection,” Journal of forensic sciences, vol. 63, no. 3, pp. 714–723, 2018.
89. M. Honnibal, I. Montani, S. Van Landeghem, and A. Boyd, “spaCy: Industrial-strength Natural Language Processing in Python, Zenodo, 2020.” .
90. P. Qi, Y. Zhang, Y. Zhang, J. Bolton, and C. D. Manning, “Stanza: A Python Natural Language Processing Toolkit for Many Human Languages,” 2020, [Online]. Available: https://nlp.stanford.edu/pubs/qi2020stanza.pdf.
91. J. Nothman, N. Ringland, W. Radford, T. Murphy, and J. R. Curran, “Learning multilingual named entity recognition from Wikipedia,” Artificial Intelligence, vol. 194, pp. 151–175, 2013.
92. M. S. Burtsev et al., “DeepPavlov: Open-Source Library for Dialogue Systems.,” in ACL (4), 2018, pp. 122–127.
93. V. Mozharova and N. Loukachevitch, “Two-stage approach in Russian named entity recognition,” in 2016 International FRUCT Conference on Intelligence, Social Media and Web (ISMW FRUCT), 2016, pp. 1–6.
Рецензия
Для цитирования:
Лагутина Н.С., Васильев А.М., Зафиевский Д.Д. Задачи в области распознавания именованных сущностей: технологии и инструменты. Моделирование и анализ информационных систем. 2023;30(1):64-85. https://doi.org/10.18255/1818-1015-2023-1-64-85
For citation:
Lagutina N.S., Vasilyev A.M., Zafievsky D.D. Name Entity Recognition Tasks: Technologies and Tools. Modeling and Analysis of Information Systems. 2023;30(1):64-85. (In Russ.) https://doi.org/10.18255/1818-1015-2023-1-64-85