This document provides all details needed to have access to the research collection eRisk 2025.

Any scientific publication derived from the use of this collection should explicitly refer and cite the following publications:

1. Crestani, F., Losada, D., & Parapar, J. (2022). Early Detection of Mental Health Disorders by Social Media Monitoring. Studies in Computational Intelligence, 1018, 4..

2. Parapar, J., Perez, A., Wang, X., & Crestani, F. (2025). eRisk 2025: Contextual and Conversational Approaches for Depression Challenges. In European Conference on Information Retrieval (pp. 416–424). .

The eRisk 2025 collections are available for research purposes under proper user agreements.

Datasets

User agreement

This collection can only be used for research purposes. If you are interested in having access to this data, please fill the following user agreement and send it to anxo.pvila@udc.es.

Task 1

The collection contains sentences from redditors. The collection is formated as in TREC:
<DOC> 
  <DOCNO> SENTENCE_ID </DOCNO> 
    <PRE> previous sentence text  </PRE> 
    <TEXT>  sentence text  </TEXT> 
    <POST> next sentence text  </POST> 
</DOC>

Task 2

In the dataset, there are two types of instances: submissions and comments. Submissions represent the primary posts created by users. They are the main content entries, often containing a title, a body, and additional metadata such as the author and date. Comments are the responses or replies made by users to a submission or to other comments, forming a hierarchical structure. Each comment includes information about the author, content, and its parent (which could be another comment or a submission).

Submission Fields:

Comment Fields:

The files are in JSON format:
        [
        {
            "submissionId": "mdB60ef",
            "author": "subject_lEQN6dA",
            "date": "2023-03-08T17:26:33.000+00:00",
            "body": "...",
            "title": "...",
            "number": 3,
            "targetSubject": "subject_6wEJkcb",
            "comments": [
                {
                    "commentId": "UspY8Bg",
                    "author": "subject_6wEJkcb",
                    "date": "2023-03-08T17:51:42.000+00:00",
                    "body": "...",
                    "parent": "mdB60ef"
                },
                ...
                {
                    "commentId": "nsnT1GB",
                    "author": "subject_ifthvcc",
                    "date": "2023-03-22T19:15:33.000+00:00",
                    "body": "...",
                    "parent": "bmC4ctO"
                }
            ]
        },
        {
            "submissionId": "0F6QmWR",
            "author": "subject_Wotqigb",
            "date": "2024-11-02T20:53:53.000+00:00",
            "body": "...",
            "title": "...",
            "number": 3,
            "targetSubject": "subject_pypfjky",
            "comments": [
                {
                    "commentId": "Oeas2Wu",
                    "author": "subject_pypfjky",
                    "date": "2024-11-02T21:55:41.000+00:00",
                    "body": "...",
                    "parent": "K3Z1yt8"
                },
                {
                    "commentId": "5CTC18p",
                    "author": "subject_2DDad7j",
                    "date": "2024-11-02T21:03:09.000+00:00",
                    "body": "...",
                    "parent": "0F6QmWR"
                },
                ...
                {
                    "commentId": "ZqEqil6",
                    "author": "subject_pypfjky",
                    "date": "2024-11-02T21:09:50.000+00:00",
                    "body": "...",
                    "parent": "0F6QmWR"
                }
            ]
        }
    ]