[AISWorld] CFP: [WWW 2022] FinSIM-4 Shared Task : Learning Semantic Similarities for the Financial Domain, extended version to ESG Insights

Tue Dec 28 13:13:00 EST 2021

Greetings,

We would like to invite you to submit to FinSIM-4, the 4th shared task on
Learning Semantic Similarities for the Financial Domain, extended to ESG
insights, held in conjunction with The Web Conference 2022 @ Online, 25th
-26th April, 2022 as part of the FinWeb-2022 workshop.

====================

Shared Task URL:
https://sites.google.com/nlg.csie.ntu.edu.tw/finweb2022/shared-task-finsim-4
Workshop URL: https://sites.google.com/nlg.csie.ntu.edu.tw/finweb2022/home
Registration Form: https://forms.gle/aScP11s5vPSK1ghm6

*Introduction*
The FinSim 2022 shared task aims to spark interest from communities in NLP,
ML/AI, Knowledge Engineering and Financial document processing. Going
beyond the mere representation of words is a key step to industrial
applications that make use of Natural Language Processing (NLP). This is
typically addressed using either 1) Unsupervised corpus-derived
representations like word embeddings, which are typically opaque to human
understanding but very useful in NLP applications or 2) Supervised approach
to semantic representations learning, which typically requires an important
volume of labeled data, but has high coverage for the target domain or 3)
Manually labeled resources such as corpora, lexica, taxonomies and
ontologies, which typically have low coverage and contain inconsistencies,
but provide a deeper understanding of the target domain.

These approaches form different spectrum which a number of them have
attempted to combine, particularly in tasks aiming at expanding the
coverage of manual resources using automatic methods.

   - The Semeval community has organized several evaluation campaigns to
   stimulate the development of methods which extract semantic/lexical
   relations between concepts/words (Bordea et al. 2015, Bordea et al. 2016,
   Jurgens et al. 2016, Camacho-Collados et al. 2018).
   - A large number of datasets and challenges specifically look at how to
   automatically populate knowledge bases such as DBpedia or Wikidata (e.g.
   KBP challenges, https://tac.nist.gov/2020/KBP/SM-KBP/).
   - There are also a number of studies on the supervised and unsupervised
   approaches to the extraction of semantic relations between concepts and
   terms (Alfarone et al. 2015, Fauconnier et al. 2015, Shwartz et al. 2016,
   Sarkar et al. 2018, Martel et al. 2021).

Considering the ESG (Environmental, Social and Governance) related issues
in the financial domain, from the end of 2022, companies providing
investment products that make sustainability or environmental claims will
be required to disclose how their portfolios align with the EU taxonomy (
https://ec.europa.eu/info/business-economy-euro/banking-and-finance/sustainable-finance/eu-taxonomy-sustainable-activities_en)
for sustainable activities according to the European Commission. The
objective is to elaborate a ESG taxonomy or ESG related concepts
representations and make use of it to analyze how an economic activity
complies with the taxonomy, by consequently, it allows to know how an
investment product is aligned with it.

*Task Description*
The new edition FinSim-4 proposes two sub-tasks:

*Sub-task 1.* We have created an in-house sustainable finance taxonomy
called Fortia ESG taxonomy. It is based on different financial data
provider taxonomies as well as several sustainability and annual reports
where we looked for ESG related criteria. Given a subset of Fortia ESG
taxonomy (your trainset), participants will be asked to enrich this
trainset to cover the rest of the terms of the original Fortia ESG
taxonomy. For this purpose, participants will be given a set of annual
reports and sustainability reports of financial companies from which they
can develop a model allowing to induce semantically related terms to the
concepts defined in the trainset. For example, given a set of terms related
to the concept Waste management (e.g. Hazardous Waste, Waste Reduction
Initiatives) you need to find the missing ones by the way that you predict
a corresponding concept to unlabeled terms.

*Sub-task 2.* Participants will be asked to design a system which can
automatically classify sentences into sustainable or unsustainable
sentences making use of the enriched taxonomy if helpful. For this purpose,
participants will be given a list of carefully selected labeled sentences
from the sustainability reports and other documents. In this shared task,
we consider a sentence as sustainable if a sentence semantically mentions
the Environmental or Social or Governance related factors as defined in our
ESG taxonomy.

Performance will be measured according to the accuracy with which label is
assigned, and according to recall (based on the total number of
predictions).

This year, we propose a subset of our in-house made ESG taxonomy and a
dataset composed of financial and non-financial reportings. And we are
interested in systems which make use of contextual word embeddings such as
BERT (Devlin et al. 2018), as well as systems which make use of resources
related to the ESG (Environmental, Social and Governance) and/or to
sustainability including EU taxonomy.

*References *

   - Daniele Alfarone and Jesse Davis (2015). Unsupervised Learning of an
   IS-A Taxonomy from a Limited Domain-Specific Corpus. In Proceedings of the
   Twenty-Fourth International Joint Conference on Artificial Intelligence
   (IJCAI 2015).
   - Georgeta Bordea, Paul Buitelaar, Stefano Faralli and Roberto Navigli
   (2015). “SemEval-2015 Task 17: Taxonomy Extraction Evaluation (TExEval)”.
   In Proceedings of SemEval 2015, co-located with NAACL HLT 2015, Denver,
   Col, USA.
   - Georgeta Bordea, Els Lefever, and Paul Buitelaar (2016). “Semeval-2016
   task 13: Taxonomy extraction evaluation (TExEval-2)”. In Proceedings of the
   10th International Workshop on Semantic Evaluation, San Diego, CA, USA.
   - Jose Camacho-Collados, Claudio Delli Bovi, Luis Espinosa-Anke, Sergio
   Oramas, Tommaso Pasini, Enrico Santus, Vered Shwartz, Roberto Navigli, and
   Horacio Saggion (2018). “SemEval-2018 Task 9: Hypernym Discovery”. In
   Proceedings of the 12th International Workshop on Semantic Evaluation
   (SemEval-2018), New Orleans, LA, United States. Association for
   Computational Linguistics.
   - Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova
   (2018). “BERT: Pre-training of Deep Bidirectional Transformers for Language
   Understanding”. https://arxiv.org/abs/1810.04805v2.
   - Jean-Philippe Fauconnier, Mouna Kamel and Bernard Rothenburger (2015).
   A Supervised Machine Learning Approach for Taxonomic Relation Recognition
   through Non-linear Enumerative Structures. In: 30th ACM Symposium on
   Applied Computing (SAC 2015), 13 April 2015 - 17 April 2015 (Salamanque,
   Spain).
   - David Jurgens and Mohammad Taher Pilehvar (2016). “SemEval-2016 Task
   14: Semantic Taxonomy Enrichment”. In Proceedings of SemEval-2016,
   NAACL-HLT.
   - Félix Martel, Amal Zouaq (2021). Taxonomy extraction using knowledge
   graph embeddings and hierarchical clustering. In SAC '21: Proceedings of
   the 36th Annual ACM Symposium on Applied Computing, March 2021 Pages
   836–844.
   - Rajdeep Sarkar, John P. McCrae, Paul Buitelaar (2018). “A supervised
   approach to taxonomy extraction using word embeddings”. In Proceedings of
   the Eleventh International Conference on Language Resources and Evaluation
   (LREC 2018)
   - Vered Shwartz, Yoav Goldberg, Ido Dagan. (2016). Improving Hypernymy
   Detection with an Integrated Path-based and Distributional Method. In
   Proceedings of the 54th Annual Meeting of the Association for Computational
   Linguistics (Volume 1: Long Papers).

*Registration*
To register your interest in participating in FinSim shared task, please
use the following google form: https://forms.gle/aScP11s5vPSK1ghm6.

*Prize*
A USD $500 prize will be rewarded to the best-performing team.

*Important Dates*
Submission paper: https://easychair.org/conferences/?conf=finweb2022

   - December 22, 2021: First announcement of the shared task and beginning
   of registration
   - January 14, 2022 : Release of training set & scoring scripts.
   - February 16, 2022: Release of test set.
   - February 22, 2022: System's outputs submission deadline.
   - February 25, 2022: Release of results.
   - February 25, 2022: Shared task title and abstract due
   - March 01, 2022: Shared task paper submissions due
   - March 03, 2022: Registration deadline.
   - March 10, 2022: Camera-ready version of shared task paper due
   - April 25-26, 2022: FinWeb workshop @WWW Conference 2022

*Contact*
For any questions on the shared task, please contact us on
fin.sim.task at gmail.com.