Education
- PhD Student, Computational Linguistics, GPA 10/10, The National Research University Higher School of Economics, Moscow, Russia.
- Master, Computational Linguistics, GPA 9/10 (2017 - 2019), The National Research University Higher School of Economics, Moscow, Russia.
- Bachelor, Computer Science, GPA 4.3/5.0 (2013 - 2017), Novosibirsk State University, Novosibirsk, Russia.
Research Experience
Senior Research Engineer (August 2024 - present), Anecdote AI, Montréal, Canada.
- Created LLM pipelines and LLMOps infrastructure for automatic CX & Product insights extraction;
- Fine-tuned LLMs; worked with agentic workflows.
Senior Research Engineer (August 2020 - February, 2023), Logiciel Behavox Inc, Montréal, Canada.
- created a core text classification engine for ``Behavox Quantum’’, a market-leading compliance solution;
- designed and implemented the experiment tracking and model deployment workflows (ETL, CI/CD/CT, QAA);.
Senior Research Engineer (September 2018 - July 2020), Huawei Technologies, St. Petersburg, Russia.
- created a Natural Language Understanding component for Huawei voice assistant “Celia”;
- worked on R&D on Zero-Shot Learning and Data Augmentation; presented the results in industrial conferences.
Research Engineer (October 2017 - August 2018), Federal Research Center “Computer Science and Control” of Russian Academy of Sciences, Moscow, Russia.
- created a multilingual Plagiarism Detection system for Russia’s largest academic electronic library “RuCont”;
- worked on R&D on Multilingual NLP; published the results in journals indexed in Springer, WoS, ACL Anthology.
Research Engineer (February 2017 - August 2017), Expasoft Ltd., Novosibirsk, Russia.
- created Text Classification and Named Entity Recognition modules for enterprise dialogue assistant “chatme.ai”;
- worked on R&D on Paraphrase Detection task; published the results in Springer-indexed journals.
Publications
- Bakarov, A. (2022). Distributional Word Vectors as Semantic Maps Framework. Computación y Sistemas, 26(3), 1343-1364
- Bakarov, A. (2021). Did You Just Assume My Vector? Detecting Gender Stereotypes in Word Embeddings. Recent Trends in Analysis of Images, Social Networks and Texts, 1357, 3.
- Parinov, S., Bakarov, A., Vodolazcky, D. (2020). Layout logical labelling and finding the semantic relationships between citing and cited paper content. International Journal of Metadata, Semantics and Ontologies, 14(1), 54-62.
- Artemova, E., Bakarov, A., Artemov, A., Burnaev, E., Sharaev, M. (2020). Data-driven models and computational tools for neurolinguistics: a language technology perspective. Journal of Cognitive Science, 21(1), 15-52.
- Bakarov, A (2018, December). Vector Space Models for Automatic Misogyny Identification. EVALITA Evaluation of NLP and Speech Tools for Italian 12 (2018): 211.
- Yadrintsev V., Bakarov. A., Suvorov, R., Sochenkov, I. (2018, September). Fast and Accurate Patent Classification in Search Engines. In Big Data Conference (Vol. 1117, No. 1, p. 012004). IOP Publishing.
- Nikishina, I., Bakarov. A., Kutuzov, A. (2018, July). RusNLP: Semantic search engine for Russian NLP conference papers. In International Conference on Analysis of Images, Social Networks and Texts (pp. 111-120). Springer, Cham.
- Bakarov, A., Suvorov. R., Sochenkov, I. (2018, June). The Limitations of Cross-language Word Embeddings Evaluation. In Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics (pp. 94-100).
- Bakarov, A., Yadrintsev, V., Sochenkov, I. (2018, June). Anomaly Detection for Short Texts: Identifying Whether Your Chatbot Should Switch From Goal-Oriented Conversation to Chit-Chatting. In Digital Transformations & Global Society (pp. 289-298). Springer, Cham.
- Bakarov, A., Kutuzov, A., Nikishina I. (2018, May). Russian Computational Linguistics: Topical Structure in 2007-2017 Conference Papers. Computational Linguistics and Intellectual Technologies (Dialogue 2018).
- Bakarov A. (2018, May). The Effect of Unobserved Word-Context Co-occurrences on a Vector-Mixture Approach for Compositional Distributional Semantics. Computational Linguistics in Bulgaria 2018.
- Bakarov A. (2018, May). The Effect of Unobserved Word-Context Co-occurrences on a Vector-Mixture Approach for Compositional Distributional Semantics. Computational Linguistics in Bulgaria 2018.
- Bakarov, A. (2018, May). Can Eye Movement Data Be Used As Ground Truth For Word Embeddings Evaluation? In Linguistic and Neuro-Cognitive Resources (LiNCR), LREC 2018 Workshop (pp. 27-32).
- Bakarov, A. (2018, January). A Survey of Word Embeddings Evaluation Methods. arXiv preprint arXiv:1801.09536.
- Bakarov, A., Gureenkova, O. (2017, July). Automated Detection of Non-Relevant Posts on the Russian Imageboard 2ch: Importance of the Choice of Word Representations. In International Conference on Analysis of Images, Social Networks and Texts (pp. 16-21). Springer, Cham.
Open-source contributions
- RusNLP, a semantic search engine for Russian NLP conference papers.
- Vecto, a Python framework for working with real-valued linguistic representations.
- CIRTEC project dedicated to scientometric analysis of role of citations in academic publication.
Relevant skills
- Computer Science: My education and work experience as a software engineer gave me such skills in computer science and software engineering as algorithms, object oriented design, patterns, architecture design and product understanding.
- Linguistics: My second degree in linguistics gave me a deep understanding of mechanisms of a human language, e.g. generative grammar, formal semantics and linguistic typology.
- Stack: I use C/C++ for performance-critical code, and Python (along with pytorch and numpy) for high-level scripting, quick prototyping, experiments and data analysis.
- Other: I have a decent understanding of UNIX system administration, I am familiar with version control systems (git, svn) and databases (SQL). I have interest in cognitive sciences, and I know about key works, methods and recent advances in fields of decision making and language processing.
- Natural Languages: Russian (native), English (fluent).
Teaching
- Teaching Assistant (January 2019 - July 2019), Machine Learning, The National Research University Higher School of Economics.
- Teaching Assistant (January 2018 - July 2018), Data Science, The National Research University Higher School of Economics.
- Lecturer (January 2019 - June 2019), Distributional Semantics, Novosibirsk State University, Novosibirsk, Russia
Activities
- Programme Committee: AACL, ACL, EACL, EMNLP, LREC, NAACL; AIST (Analysis of Images, Social networks and Texts Conference), AINL(Artificial Intelligence and Natural Language Conference).
- Reviewing: IEEE Access (Q1), JASIST (Q1), Language Resources and Evaluation Journal (Q3), Journal of Intelligent Systems (Q4), Journal of Metadata, Semantics and Ontologies (Q4), etc.
- Membership: Association of Computational Linguistics and Special Interest Group on Slavic Natural Language Processing SIGSLAV.
- Mentoring: Mentor for the Apertium Project at Google Code-in 2018.
- Organizing: Organizer of an open NLP Seminar in Moscow and St. Petersburg; participated in organisation of Sberbank Data Science Journey 2018.
- Speaking: Speaker on DataFest 2019 and Geek Picnic 2019; invited speaker at AINL 2018.
- Competitions: 1st place at the EVALITA-2018 Shared Task on Automated Misogyny Detection across 16 teams.
- Participated in ESSLLI 2018 (Sofia, Bulgaria) and RUSSIR 2017/2018 summer schools.