CLEF eRisk 2025:

Early risk prediction on the Internet

eRisk explores the evaluation methodology, effectiveness metrics and practical applications (particularly those related to health and safety) of early risk detection on the Internet. Early detection technologies can be employed in different areas, particularly those related to health and safety. For instance, early alerts could be sent when a predator starts interacting with a child for sexual purposes, or when a potential offender starts publishing antisocial threats on a blog, forum or social network. Our main goal is to pioneer a new interdisciplinary research area that would be potentially applicable to a wide variety of situations and to many different personal profiles. Examples include potential paedophiles, stalkers, individuals that could fall into the hands of criminal organisations, people with suicidal inclinations, or people susceptible to depression.

Participate

This is the ninth year of eRisk and the lab plans to organize three tasks:

Task 1: Search for Symptoms of Depression

This is a continuation of eRisk 2024's Task 1.

The task consists of ranking sentences from a collection of user writings according to their relevance to a depression symptom. The participants will have to provide rankings for the 21 symptoms of depression from the BDI Questionnaire. A sentence is deemed relevant if it provides information about the user's condition regarding a particular symptom. That is, it may be relevant even when it indicates that the user is okay with the symptom.

We will release a TREC-formatted sentence-tagged dataset (based on eRisk past data) together with the BDI questionnaire. Participants are free to decide on the best strategy to derive queries from describing the BDI symptoms in the questionnaire.

After receiving the runs from the participating teams, we will create the relevance judgments with the help of human assessors using pooling. The resulting qrels will be used to evaluate the systems with classical ranking metrics (e.g., MAP, nDCG, etc.). This new corpus with annotated sentences would be a valuable resource with multiple applications beyond eRisk.

The task is organized into two different stages:

Submission stage. After the release of the datasets, the participants will have time to produce and upload to our FTP server their TREC-formatted runs. Each participant may upload up to 5 files corresponding to 5 systems to the FTP.
The required submission TREC format is as follows:

            symptom_number   Q0   sentence-id   position_in_ranking   score   system_name

An example of the format of your runs should be as follows:

            1   Q0   sentence-id-121   0001   10    myGroupNameMyMethodName
            1   Q0   sentence-id-234   0002   9.5   myGroupNameMyMethodName
            1   Q0   sentence-id-345   0003   9     myGroupNameMyMethodName
            ...
            21  Q0   sentence-id-456   0998   1.25  myGroupNameMyMethodName
            21  Q0   sentence-id-242   0999   1     myGroupNameMyMethodName
            21  Q0   sentence-id-347   1000   0.9   myGroupNameMyMethodName

Participants should submit up to 1000 results sorted by estimated relevance for each of the 21 symptoms of the BDI-II questionnaire. Each line contains: symptom_number, Q0, sentence-id, position_in_ranking, score, system_name.

Evaluation stage. Once the submission stage is closed, the submitted runs will be used for obtaining the relevance judgments using classical pooling strategies with human assessors. With those judgments, systems will be evaluated.

The proceedings of the lab will be published in the online CEUR-WS Proceedings and on the conference website.

To have access to the collection, all participants have to fill, sign, and send a user agreement form (follow the instructions provided here). Once you have submitted the signed copyright form, you can proceed to register for the lab at CLEF 2025 Labs Registration site.

Important Dates

🆕 Task 2: Contextualized Early Detection of Depression

New task introduced in eRisk 2025.

This new task focuses on detecting early signs of depression by analyzing full conversational contexts. Unlike previous tasks that focused on isolated user posts, this challenge considers the broader dynamics of interactions by incorporating writings from all individuals involved in the conversation. Participants must process user interactions sequentially, analyze natural dialogues, and detect signs of depression within these rich contexts. Texts will be processed chronologically to simulate real-world conditions, making the task applicable to monitoring user interactions in blogs, social networks, or other types of online media.

The test collection for this task follows the format described in Losada & Crestani, 2016 and is derived from the same data sources as previous eRisk tasks. The dataset includes:

1. Writing history from the specific user: Posts from specific target social media users for depression estimation.
2. Full conversational contexts:
- The discussion title and comment.
- Comments from all users participating in the conversation, in chronological order.
- Messages exchanged between the specific target user to classify and others in the discussion.

There are two categories of users: individuals suffering depression and control users. For each user, the collection contains a sequence of writings from that specific user along with the rest of the users that participated in the conversation (in chronological order). This approach allows systems to monitor ongoing interactions and make timely decisions based on the evolution of the conversation.

The task is organized into two different stages:

Training Stage. Participants will be provided with a dataset containing isolated writings from a set of training users. For each user, the dataset will indicate whether they have explicitly mentioned a depression diagnosis.
- Participants will only have access to the user’s writings, not the full conversation context.
- Training data will include users from prior early depression detection tasks, allowing teams to train their systems effectively.
Test Stage. During the test phase, participants will connect to our server that provides user writings iteratively, including full conversational contexts (e.g., the discussion title and other users’ comments).

Participants have to:
- Analyze the data in real time and send their predictions after processing each writing.
- Use the provided full conversational contexts for each user interaction, simulating real-world scenarios.
🚀 The test phase is now LIVE! 🚀
Check all the details here: Click Here

Evaluation: The evaluation will consider not only the correctness of the system's output (i.e., whether or not the user is depressed) but also the delay taken to emit its decision. To meet this aim, we will consider the ERDE metric proposed in Losada & Crestani, 2016 and other alternative evaluation measures. A full description of the evaluation metrics can be found in 2021's eRisk overview.

The proceedings of the lab will be published in the online CEUR-WS Proceedings and on the conference website.

To have access to the collection, all participants must fill, sign, and send a user agreement form (follow the instructions provided here). Once you have submitted the signed copyright form, you can proceed to register for the lab at CLEF 2025 Labs Registration site.

Important Dates

Pilot Task: Conversational Depression Detection via LLMs

This pilot task introduces a unique challenge: detecting depression through conversational agents. Participants will interact with a large language model (LLM) persona that has been fine-tuned using user writings, simulating real-world conversational exchanges. The challenge lies in determining whether the LLM persona exhibits signs of depression, accompanied by an explanation of the main symptoms that informed their decision. This task pushes participants to develop more interactive, dynamic models that can engage with users and assess their mental state through dialogue.

Training Stage. Since this is a pilot and new task in eRisk, there will be no training data. The participants need to obtain their own data or make unsupervised methods for the task.
Test Stage. During the test stage, participants will connect to a server where they can interact with the LLMs, requiring them to identify depressive symptoms within a conversational window.

🚀 The test phase is now LIVE! 🚀
Check all the details here: Click Here

Ongoing schedule

18 NOV

Registration for lab opens
18/11/2024

01 DEC

Release of the training data (T1,T2) and test dataset for T1
01/12/2024

05 FEB

T2 and T3: Beginning of test stage (servers are open)
05/02/2025

01 APR

T1 deadline for submitting participants' results to FTP
01/04/2025

12 APR

T2 and T3: End of test stage (server closes).
12/04/2024

10 MAY

Release of evaluation results to all participants
10/05/2025

30 MAY

Submission of Participant Papers [CEUR-WS]
30/05/2025

27 JUN

Notification of acceptance
27/06/2025

07 JUL

Camera ready. Participant Papers [CEUR-WS]
07/07/2025

Programme

Thursday Sep 11th

eRisk Session Task 2 (16:30-18:00 Local Time)

Task 2 Presentation

16:30-16:40

Javier Parapar, Anxo Perez, Xi Wang, Fabio Crestani

Overview of eRisk at CLEF 2025: Early Risk Prediction on the Internet (Task 2)

16:40-16:48

Andreu Casamayor, Vicent Ahuir, Antonio Molina, Lluís-Felip Hurtado

ELiRF-UPV at eRisk 2025: New approaches to the Detection and Early Detection of Symptoms and Signs of Depression

16:48-16:56

Elif Kara, Rosa Esther Martín Peña, Lisa Raithel

FU-TU-DFKI@eRisk 2025: A Linguistically Informed but Overdiagnosing Approach to Early Depression Detection

16:56-17:04

Xabier Larrayoz, Arantza Casillas, Alicia Pérez

(Lotu-Ixa) Leveraging Conversational Context and Semantic Relabeling for Early Depression Detection

17:04-17:12

Tu-Phuong Mai, Minh-Ha H. Le, Duc-Luong Tran, Duy-Cat Can, Hoang-Quynh Le

UET@eRisk2025: Severity Estimation for Depression Symptoms Searching and Early Risk Detection

17:12-17:20

Luis Mendoza, Joan Suarez, Edwin Puertas, Juan Martinez, Jairo Serrano

COTECMAR-UTB at eRisk 2025: Semantic-Centroid Symptom Ranking and Early Depression Detection using Adaptive Decision Rule

17:20-17:28

Anthony Miyaguchi, David Guecha, Yuwen Chiu, Sidharth Gaur

DS@GT at eRisk 2025: From Prompts to Predictions, Benchmarking Early Depression Detection with Conversational Agent-Based Assessments and Temporal Attention Models

17:28-17:36

Alba María Mármol-Romero, Manuel García-Vega, Miguel Ángel García-Cumbreras, Arturo Montejo-Ráez

SINAI at eRisk@CLEF 2025: Transformer-Based and Conversational Strategies for Depression Detection

17:36-17:44

Muhammad Saad, Asad Ullah Chaudhry, Meesum Abbas, Faisal Alvi, Abdul Samad

Contextualized Early Detection of Depression - Hybrid and Time-Aware Approaches: HU at eRisk Task 2 2025

17:44-17:52

Poojan Vachharajani

(Pjs Team) Transformer Ensembles and LLM-Powered Approaches for Depression Symptom Analysis and Contextualized Early Risk Detection

17:52-18:00

Yuzhe Zi, Bichen Wang, Yanyan Zhao, Bing Qin

HIT-SCIR@eRisk2025: Exploring the Potential of a Learnable Screening Model and Risk Post Buffer-Based Framework for Contextualized Early Prediction of Depression on Social Media

Friday Sep 12th

eRisk Session Task 1 (09:30-11:00 Local Time)

Task 1 Presentation

09:30-09:40

Javier Parapar, Anxo Perez, Xi Wang, Fabio Crestani

Overview of eRisk at CLEF 2025: Early Risk Prediction on the Internet (Task 1)

09:40-09:48

Diogo A.P. Nunes, Eugénio Ribeiro

INESC-ID @ eRisk 2025: Exploring Fine-Tuned, Similarity-Based, and Prompt-Based Approaches to Depression Symptom Identification

09:48-09:56

Subinay Adhikary, Junsume Das, Dwaipayan Roy

THINKIR at eRisk 2025: Early Detection and Risk Assessment of Depression using Transformer Models

09:56-10:04

Aisha Benloucif, Yashasvini Nannapuraju, Sripriya Bellam, Yuyan Hu, Zhe Zhao, V.G.Vinod Vydiswaran

LHS712Team-1 at eRisk@ CLEF 2025: Searching for Depression Symptoms Using Various Natural Language Processing Algorithms

10:04-10:12

Javier Campos-Molina, Paloma Martinez

HULAT-UC3M at Task1@eRisk 2025: Detecting Depression Using Machine Learning Approaches

10:12-10:20

Andreu Casamayor, Vicent Ahuir, Antonio Molina, Lluís-Felip Hurtado

ELiRF-UPV at eRisk 2025: New approaches to the Detection and Early Detection of Symptoms and Signs of Depression

10:20-10:28

Tu-Phuong Mai, Minh-Ha H. Le, Duc-Luong Tran, Duy-Cat Can, Hoang-Quynh Le

UET@eRisk2025: Severity Estimation for Depression Symptoms Searching and Early Risk Detection

10:28-10:36

Luis Mendoza, Joan Suarez, Edwin Puertas, Juan Martinez, Jairo Serrano

COTECMAR-UTB at eRisk 2025: Semantic-Centroid Symptom Ranking and Early Depression Detection using Adaptive Decision Rule

10:36-10:44

Nguyen Minh Son, Dang Van Thin

SonUIT eRisk2025: Enhanced Depression Detection on Social Media via Filtering and Re-Ranking

10:44-10:52

Poojan Vachharajani

(Pjs Team) Transformer Ensembles and LLM-Powered Approaches for Depression Symptom Analysis and Contextualized Early Risk Detection

10:52-11:00

Buffer Time / Discussion

Friday Sep 12th

eRisk Session Task 1 + Task 3 (11:30-13:00 Local Time)

Task 3 Presentation

11:30-11:40

Javier Parapar, Anxo Perez, Xi Wang, Fabio Crestani

Overview of eRisk at CLEF 2025: Early Risk Prediction on the Internet (Task 3)

11:40-11:48

Noam Munz, Eliya Naomi Aharon, Avi Segal, Kobi Gal

(BGU Data-Science Lab) Semantic Retrieval of BDI Symptoms in User Writings

11:48-11:56

Poojan Vachharajani

(Pjs Team) Transformer Ensembles and LLM-Powered Approaches for Depression Symptom Analysis and Contextualized Early Risk Detection

11:56-12:04

Ane Varela, Maite Oronoz, Arantza Casillas, Alicia Pérez

(ixa_ave) Detection of Depression with Symptom Similarity: Data Reduction and LLM Personas

12:04-12:12

Anthony Miyaguchi, David Guecha, Yuwen Chiu, Sidharth Gaur

DS@GT at eRisk 2025: From Prompts to Predictions, Benchmarking Early Depression Detection with Conversational Agent-Based Assessments and Temporal Attention Models

12:12-12:20

Alba María Mármol-Romero, Manuel García-Vega, Miguel Ángel García-Cumbreras, Arturo Montejo-Ráez

SINAI at eRisk@CLEF 2025: Transformer-Based and Conversational Strategies for Depression Detection

12:20-12:28

Ane Varela, Maite Oronoz, Arantza Casillas, Alicia Pérez

(ixa_ave) Detection of Depression with Symptom Similarity: Data Reduction and LLM Personas

12:28-13:00

Wrap-Up

More information

+34 881 016 027

anxo.pvila@udc.es

@earlyrisk

CLEF 2025 Conference & CLEF initiative:

Funded by

Big-eRisk: Predicción temprana de riesgos personales en conjuntos de datos masivos. Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación, Plan de Recuperación, Transformación y Resiliencia, Unión Europea-Next Generation EU PLEC2021-007662
Projects PID2022-137061OBC21 (Ministerio de Ciencia e Innovación supported by the European Regional Development Fund)

eRisk 2025:

Early risk prediction on the Internet

CLEF eRisk 2025:

Early risk prediction on the Internet

Participate

Task 1: Search for Symptoms of Depression

🆕 Task 2: Contextualized Early Detection of Depression

Pilot Task: Conversational Depression Detection via LLMs

Programme

Thursday Sep 11th

eRisk Session Task 2 (16:30-18:00 Local Time)

Friday Sep 12th

eRisk Session Task 1 (09:30-11:00 Local Time)

Friday Sep 12th

eRisk Session Task 1 + Task 3 (11:30-13:00 Local Time)

Organizers

Javier Parapar

Anxo Pérez

Xi Wang

Fabio Crestani

More information

CLEF 2025 Conference & CLEF initiative:

Funded by