This document provides all details needed to have access to the research collection eRisk 2025.

Any scientific publication derived from the use of this collection should explicitly refer and cite the following publications:

1. Crestani, F., Losada, D., & Parapar, J. (2022). Early Detection of Mental Health Disorders by Social Media Monitoring. Studies in Computational Intelligence, 1018, 4..

2. Parapar, J., Perez, A., Wang, X., & Crestani, F. (2025). eRisk 2025: Contextual and Conversational Approaches for Depression Challenges. In European Conference on Information Retrieval (pp. 416–424). .

The eRisk 2025 collections are available for research purposes under proper user agreements.

Datasets

User agreement

This collection can only be used for research purposes. If you are interested in having access to this data, please fill the following user agreement and send it to anxo.pvila@udc.es.

Task 1

The collection contains sentences from redditors. The collection is formated as in TREC:

<DOC> 
  <DOCNO> SENTENCE_ID </DOCNO> 
    <PRE> previous sentence text  </PRE> 
    <TEXT>  sentence text  </TEXT> 
    <POST> next sentence text  </POST> 
</DOC>

Task 2

In the dataset, there are two types of instances: submissions and comments. Submissions represent the primary posts created by users. They are the main content entries, often containing a title, a body, and additional metadata such as the author and date. Comments are the responses or replies made by users to a submission or to other comments, forming a hierarchical structure. Each comment includes information about the author, content, and its parent (which could be another comment or a submission).

Submission Fields:

submissionId: A unique string identifier for the submission.
author: The identifier (nickname) of the user who created the submission.
date: The timestamp indicating when the submission was created, in ISO 8601 format.
body: The main content of the submission (the text body).
title: The title of the submission summarizing its content.
number: The round number of the submission. A value of 0 indicates the first writing of the subject.
targetSubject: The identifier (nickname) of the primary subject related to the submission.
comments: A list of comments associated with the submission, where each comment includes its own fields.

Comment Fields: