This document provides all details needed to have access to the research collection eRisk 2024.

Any scientific publication derived from the use of this collection should explicitly refer to the following publication:

Crestani, F., Losada, D. E., & Parapar, J. (2022). Early Detection of Mental Health Disorders by Social Media Monitoring. Springer, Cham..

The eRisk 2024 collections are available for research purposes under proper user agreements.

Data

Tasks 1

The collection contains sentences from redditors. The collection is formated as in TREC:
<DOC> 
  <DOCNO> SENTENCE_ID </DOCNO> 
    <PRE> previous sentence text  </PRE> 
    <TEXT>  sentence text  </TEXT> 
    <POST> next sentence text  </POST> 
</DOC>

Tasks 2 and 3

The collection contains textual interactions (posts or comments) from multiple users (task2: individuals suffering anorexia vs control, task3: eating disorder users with different severities). For each subject, a (usually long) history of writings (posts or comments from a social networking site) is available. This is stored as a XML file (one per subject) with the following structure:

<INDIVIDUAL>
<ID> ... </ID>
<WRITING>
<TITLE> ...   </TITLE>
<DATE> ... </DATE>
<INFO> ... </INFO>
<TEXT> ...  </TEXT>
</WRITING>
<WRITING>
<TITLE> ... </TITLE>
<DATE> ... </DATE>
<INFO> ... </INFO>
<TEXT> ... </TEXT>
</WRITING>
....
</INDIVIDUAL>

ID: contains the anonymised id of the subject

TITLE: title of the post if available (if it is a comment then TITLE is empty)

INFO: additional info about the writing (source of the post/comment)

TEXT: body of the post or comment

User agreement

This collection can only be used for research purposes. If you are interested in having access to this data, please fill the following user agreement and send it to david.losada@usc.es .