CLEF 2022 Workshop
Bologna, 5-8 September 2022
Find Out MoreeRisk explores the evaluation methodology, effectiveness metrics and practical applications (particularly those related to health and safety) of early risk detection on the Internet. Early detection technologies can be employed in different areas, particularly those related to health and safety. For instance, early alerts could be sent when a predator starts interacting with a child for sexual purposes, or when a potential offender starts publishing antisocial threats on a blog, forum or social network. Our main goal is to pioneer a new interdisciplinary research area that would be potentially applicable to a wide variety of situations and to many different personal profiles. Examples include potential paedophiles, stalkers, individuals that could fall into the hands of criminal organisations, people with suicidal inclinations, or people susceptible to depression.
This is the fifth year of eRisk and the lab plans to organize three tasks:
This is a continuation of eRisk 2021's T1 task.
The challenge consists in performing a task on early risk detection of pathological gambling. The challenge consists of sequentially processing pieces of evidence and detect early traces of pathological gambling, also known as compulsive gambling or disordered gambling, as soon as possible. The task is mainly concerned about evaluating Text Mining solutions and, thus, it concentrates on texts written in Social Media. Texts should be processed in the order they were created. In this way, systems that effectively perform this task could be applied to sequentially monitor user interactions in blogs, social networks, or other types of online media.
The test collection for this task has the same format as the collection described in [Losada & Crestani 2016]. The source of data is also the same used for previous eRisks. It is a collection of writings (posts or comments) from a set of Social Media users. There are two categories of users, pathological gamblers and non-pathological gamblers, and, for each user, the collection contains a sequence of writings (in chronological order).
In 2019, we moved from a chunk-based release of data (used in 2017 and 2018) to a item-by-item release of data. We set up a server that iteratively gives user writings to the participating teams. More information about the server is given here. In 2022, the server will be used to provide the users' writings during the test stage.
The task is organized into two different stages:
Evaluation: The evaluation will take into account not only the correctness of the system's output (i.e. whether or not the user is a pathological gambler) but also the delay taken to emit its decision. To meet this aim, we will consider the ERDE metric proposed in [Losada & Crestani 2016] and other alternative evaluation measures. A full description of the evaluation metrics can be found at 2020's erisk overview.
The proceedings of the lab will be published in the online CEUR-WS Proceedings and on the conference website.
To have access to the collection all participants have to fill, sign and send a user agreement form (follow the instructions provided here). Once you have submitted the signed copyright form, you can proceed to register for the lab at CLEF 2022 Labs Registration site
Important DatesThis is a continuation of eRisk 2017's T1 and 2018's T2 tasks.
The challenge consists in performing a task on early risk detection of depression. The challenge consists of sequentially processing pieces of evidence and detect early traces of depression as soon as possible. The task is mainly concerned about evaluating Text Mining solutions and, thus, it concentrates on texts written in Social Media. Texts should be processed in the order they were created. In this way, systems that effectively perform this task could be applied to sequentially monitor user interactions in blogs, social networks, or other types of online media.
The test collection for this task has the same format as the collection described in [Losada & Crestani 2016]. The source of data is also the same used for previous eRisks. It is a collection of writings (posts or comments) from a set of Social Media users. There are two categories of users, depressed and non-depressed, and, for each user, the collection contains a sequence of writings (in chronological order).
In 2019, we moved from a chunk-based release of data (used in 2017 and 2018) to a item-by-item release of data. We set up a server that iteratively gives user writings to the participating teams. More information about the server is given here. In 2022, the server will be used to provide the users' writings during the test stage.
The task is organized into two different stages:
Evaluation: The evaluation will take into account not only the correctness of the system's output (i.e. whether or not the user has been diagnosed with depression) but also the delay taken to emit its decision. To meet this aim, we will consider the ERDE metric proposed in [Losada & Crestani 2016] and other alternative evaluation measures. A full description of the evaluation metrics can be found at 2021's erisk overview.
The proceedings of the lab will be published in the online CEUR-WS Proceedings and on the conference website.
To have access to the collection all participants have to fill, sign and send a user agreement form (follow the instructions provided here). Once you have submitted the signed copyright form, you can proceed to register for the lab at CLEF 2022 Labs Registration site
Important DatesThis is new task. The task consists of estimating the level of features associated with a diagnosis of eating disorders from a thread of user submissions. For each user, the participants will be given a history of postings and the participants will have to fill a standard eating disorder questionnaire (based on the evidence found in the history of postings).
The questionnaires are defined from Eating Disorder Examination Questionnaire (EDE-Q) is a 28-item self-reported questionnaire adapted from the semi-structured interview Eating Disorder Examination (EDE). We will only use questions 1-12 and 19-28. It is designed to assess the range and severity of features associated with a diagnosis of eating disorder using 4 subscales (Restraint, Eating Concern, Shape Concern and Weight Concern) and a global score:
Instructions: The following questions are concerned with the past four weeks (28 days) only. Please read each question carefully. Please answer all the questions. Thank you. 1. Have you been deliberately trying to limit the amount of food you eat to influence your shape or weight (whether or not you have succeeded) 0. NO DAYS 1. 1-5 DAYS 2. 6-12 DAYS 3. 13-15 DAYS 4. 16-22 DAYS 5. 23-27 DAYS 6. EVERY DAY 2. Have you gone for long periods of time (8 waking hours or more) without eating anything at all in order to influence your shape or weight? 0. NO DAYS 1. 1-5 DAYS 2. 6-12 DAYS 3. 13-15 DAYS 4. 16-22 DAYS 5. 23-27 DAYS 6. EVERY DAY 3. Have you tried to exclude from your diet any foods that you like in order to influence your shape or weight (whether or not you have succeeded)? 0. NO DAYS 1. 1-5 DAYS 2. 6-12 DAYS 3. 13-15 DAYS 4. 16-22 DAYS 5. 23-27 DAYS 6. EVERY DAY 4. Have you tried to follow definite rules regarding your eating (for example, a calorie limit) in order to influence your shape or weight (whether or not you have succeeded)? 0. NO DAYS 1. 1-5 DAYS 2. 6-12 DAYS 3. 13-15 DAYS 4. 16-22 DAYS 5. 23-27 DAYS 6. EVERY DAY 5. Have you had a definite desire to have an empty stomach with the aim of influencing your shape or weight? 0. NO DAYS 1. 1-5 DAYS 2. 6-12 DAYS 3. 13-15 DAYS 4. 16-22 DAYS 5. 23-27 DAYS 6. EVERY DAY 6. Have you had a definite desire to have a totally flat stomach? 0. NO DAYS 1. 1-5 DAYS 2. 6-12 DAYS 3. 13-15 DAYS 4. 16-22 DAYS 5. 23-27 DAYS 6. EVERY DAY 7. Has thinking about food, eating or calories made it very difficult to concentrate on things you are interested in (for example, working, following a conversation, or reading)? 0. NO DAYS 1. 1-5 DAYS 2. 6-12 DAYS 3. 13-15 DAYS 4. 16-22 DAYS 5. 23-27 DAYS 6. EVERY DAY 8. Has thinking about shape or weight made it very difficult to concentrate on things you are interested in (for example, working, following a conversation, or reading)? 0. NO DAYS 1. 1-5 DAYS 2. 6-12 DAYS 3. 13-15 DAYS 4. 16-22 DAYS 5. 23-27 DAYS 6. EVERY DAY 9. Have you had a definite fear of losing control over eating? 0. NO DAYS 1. 1-5 DAYS 2. 6-12 DAYS 3. 13-15 DAYS 4. 16-22 DAYS 5. 23-27 DAYS 6. EVERY DAY 10. Have you had a definite fear that you might gain weight? 0. NO DAYS 1. 1-5 DAYS 2. 6-12 DAYS 3. 13-15 DAYS 4. 16-22 DAYS 5. 23-27 DAYS 6. EVERY DAY 11. Have you felt fat? 0. NO DAYS 1. 1-5 DAYS 2. 6-12 DAYS 3. 13-15 DAYS 4. 16-22 DAYS 5. 23-27 DAYS 6. EVERY DAY 12. Have you had a strong desire to lose weight? 0. NO DAYS 1. 1-5 DAYS 2. 6-12 DAYS 3. 13-15 DAYS 4. 16-22 DAYS 5. 23-27 DAYS 6. EVERY DAY /* Questions 13 to 18 from EDQ v6.0 will not be used */ 19. Over the past 28 days, on how many days have you eaten in secret (ie, furtively)? ... Do not count episodes of binge eating. 0. NO DAYS 1. 1-5 DAYS 2. 6-12 DAYS 3. 13-15 DAYS 4. 16-22 DAYS 5. 23-27 DAYS 6. EVERY DAY 20. On what proportion of the times that you have eaten have you felt guilty (felt that you’ve done wrong) because of its effect on your shape or weight? ... Do not count episodes of binge eating. 0. NO DAYS 1. 1-5 DAYS 2. 6-12 DAYS 3. 13-15 DAYS 4. 16-22 DAYS 5. 23-27 DAYS 6. EVERY DAY 21. Over the past 28 days, how concerned have you been about other people seeing you eat? ... Do not count episodes of binge eating 0. NO DAYS 1. 1-5 DAYS 2. 6-12 DAYS 3. 13-15 DAYS 4. 16-22 DAYS 5. 23-27 DAYS 6. EVERY DAY 22. Has your weight influenced how you think about (judge) yourself as a person? 0. NOT AT ALL (0) 1. SLIGHTY (1) 2. SLIGHTY (2) 3. MODERATELY (3) 4. MODERATELY (4) 5. MARKEDLY (5) 6. MARKEDLY (6) 23. Has your shape influenced how you think about (judge) yourself as a person? 0. NOT AT ALL (0) 1. SLIGHTY (1) 2. SLIGHTY (2) 3. MODERATELY (3) 4. MODERATELY (4) 5. MARKEDLY (5) 6. MARKEDLY (6) 24. How much would it have upset you if you had been asked to weigh yourself once a week (no more, or less, often) for the next four weeks? 0. NOT AT ALL (0) 1. SLIGHTY (1) 2. SLIGHTY (2) 3. MODERATELY (3) 4. MODERATELY (4) 5. MARKEDLY (5) 6. MARKEDLY (6) 25. How dissatisfied have you been with your weight? 0. NOT AT ALL (0) 1. SLIGHTY (1) 2. SLIGHTY (2) 3. MODERATELY (3) 4. MODERATELY (4) 5. MARKEDLY (5) 6. MARKEDLY (6) 26. How dissatisfied have you been with your shape? 0. NOT AT ALL (0) 1. SLIGHTY (1) 2. SLIGHTY (2) 3. MODERATELY (3) 4. MODERATELY (4) 5. MARKEDLY (5) 6. MARKEDLY (6) 27. How uncomfortable have you felt seeing your body (for example, seeing your shape in the mirror, in a shop window reflection, while undressing or taking a bath or shower)? 0. NOT AT ALL (0) 1. SLIGHTY (1) 2. SLIGHTY (2) 3. MODERATELY (3) 4. MODERATELY (4) 5. MARKEDLY (5) 6. MARKEDLY (6) 28. How uncomfortable have you felt about others seeing your shape or figure (for example, in communal changing rooms, when swimming, or wearing tight clothes)? 0. NOT AT ALL (0) 1. SLIGHTY (1) 2. SLIGHTY (2) 3. MODERATELY (3) 4. MODERATELY (4) 5. MARKEDLY (5) 6. MARKEDLY (6)
This task aims therefore at exploring the viability of automatically estimating the severity of multiple symptoms associated with eating disorders. Given the user's history of writings, the algorithms have to estimate the user's response to each individual question. We collected questionnaires filled by Social Media users together with their history of writings (we extracted each history of writings right after the user provided us with the filled questionnaire). The questionnaires filled by the users (ground truth) will be used to assess the quality of the responses provided by the participating systems.
The participants will be given a dataset with multiple users (for each user, its history of writings is provided) and they will be asked to produce a file with the following structure:
username1 answer1 answer2 .... answer28 username2 .... ....
Each line has the username and 22 values. These values correspond with the responses to the questions above (the possible values are 0,1,2,3,4,5,6).
This will be an "only test" task, no training data will be provided in 2022.
Evaluation will be based on:
mean zero-one error (MZOE) between the questionnaire filled by the real user and the questionnaire filled by the system (i.e. fraction of incorrect predictions).
mean absolute error (MAE) between the questionnaire filled by the real user and the questionnaire filled by the system (i.e. average deviation of the predicted response from the true response)
macroaveraged mean absolute error (MAE) between the questionnaire filled by the real user and the questionnaire filled by the system (see Baccianella et al. Evaluation Measures for Ordinal Regression, 2009, eq (4))
Global ED: RMSE between the global ED score obtained from the questionnaire filled by the real user and the global ED score obtained from the questionnaire filled by the system (see the SCORING section of this document that describes how to get the global ED score)
Restraint subscale: RMSE between the restraint ED score obtained from the questionnaire filled by the real user and the restraint ED score obtained from the questionnaire filled by the system (see the SCORING section of this document that describes how to get the subscale ED score)
Eating concern subscale: RMSE between the eating concern ED score obtained from the questionnaire filled by the real user and the eating concern ED score obtained from the questionnaire filled by the system (see the SCORING section of this document that describes how to get the subscale ED score)
Shape concern subscale: RMSE between the shape concern ED score obtained from the questionnaire filled by the real user and the shape concern ED score obtained from the questionnaire filled by the system (see the SCORING section of this document that describes how to get the subscale ED score)
Weight concern subscale: RMSE between the weight concern ED score obtained from the questionnaire filled by the real user and the weight concern ED score obtained from the questionnaire filled by the system (see the SCORING section of this document that describes how to get the subscale ED score)
To have access to the collection all participants have to fill, sign and send a user agreement form (follow the instructions provided here). Once you have submitted the signed copyright form, you can proceed to register for the lab at CLEF 2022 Labs Registration site
15/11/2021
30/11/2021
17/01/2022
15/04/2022
06/05/2022
27/05/2022
13/06/2022
01/07/2022
Chair:
8:50-09:20
Javier Parapar, Patricia Martín-Rodilla, David E. Losada, Fabio Crestani
Overview of eRisk at CLEF 2022: Early Risk Prediction on the Internet (Extended Overview)
09:20-09:40
Harshvardhan Srivastava, Lijin Ns, Sruthi S and Tanmay Basu.
09:40-10:00
Shih-Hung Wu and Zhao-Jun Qiu.
10:00-10:20
Kang Xin, Dou Rongyu and Yu Haitao.
TUA1 at eRisk 2022: Exploring Affective Memories for Early Detection of Depression
Chair:
13:30-13:50
Alba María Mármol-Romero, Salud María Jiménez-Zafra, Flor Miriam Plaza-del-Arco, María Dolores Molina-González, María-Teresa Martín-Valdivia and Arturo Montejo-Ráez.
13:50-14:10
Rodrigo Ferreira, Alina Trifan and José Luis Oliveira.
Early risk detection of mental illnesses using various types of textual features
14:10-14:30
Samuel Stalder and Erman Zankov.
ZHAW at eRisk 2022: Predicting Signs of Pathological Gambling - GloVe for Snowy Days
14:30-14:50
Hermenegildo Fabregat, Andres Duque, Lourdes Araujo and Juan Martinez-Romo.
Chair:
15:30-15:50
Tudor-Andrei Dumitrascu and Alexandra Mihaela Enescu.
15:50-16:10
Andreas Säuberli, Sooyeon Cho and Laura Stahlhut.
LauSAn at eRisk 2022: Simply and effectively optimizing text classification for early detection
16:10-16:30
Raluca-Andreea Gînga, Andrei-Alexandru Manea and Bogdan-Mihai Dobre.
Sunday Rockers at eRisk 2022: Early Detection of Depression
16:30-16:50
Ana-Maria Bucur, Adrian Cosma, Liviu P. Dinu and Paolo Rosso.
An End-to-End Set Transformer for User-Level Classification of Depression and Gambling Disorder
Chair:
17:20-17:40
Elena Campillo-Ageitos, Juan Martinez-Romo and Lourdes Araujo.
UNED-MED at eRisk 2022: depression detection with TF-IDF, linguistic features and Embeddings
17:40-18:00
Juan Martín Loyola, Horacio Thompson, Sergio Burdisso and Marcelo Errecalde.
Decision policies with history for early classification
18:00-18:20
Seyed Habib Hosseini Saravani, Diego Maupomé, Fanny Rancourt, Thomas Soulas, Lancelot Normand, Sara Besharati, Anaelle Normand, Sebastien Mosser and Marie-Jean Meurs.
Measuring the Severity of the Signs of Eating Disorders Using Similarity-Based Models
Chair:
09:00-09:20
Sreegeethi Devaguptam, Thanmai Kogatam, Nishka Kotian and Anand Kumar M.
Early detection of depression using BERT and DeBERTa
09:20-09:40
Ilija Tavchioski, Blaž Škrlj, Senja Pollak and Boshko Koloski.
Early detection of depression with linear models using hand-crafted and contextual features
09:40-10:30
Javier Parapar, Patricia Martín-Rodilla, David E. Losada, Fabio Crestani
eRisk wrap-up session: feedback, closing and future.