CLEF 2019 Workshop
Lugano, 09-12 September 2019
Find Out MoreeRisk explores the evaluation methodology, effectiveness metrics and practical applications (particularly those related to health and safety) of early risk detection on the Internet. Early detection technologies can be employed in different areas, particularly those related to health and safety. For instance, early alerts could be sent when a predator starts interacting with a child for sexual purposes, or when a potential offender starts publishing antisocial threats on a blog, forum or social network. Our main goal is to pioneer a new interdisciplinary research area that would be potentially applicable to a wide variety of situations and to many different personal profiles. Examples include potential paedophiles, stalkers, individuals that could fall into the hands of criminal organisations, people with suicidal inclinations, or people susceptible to depression.
This is the third year of eRisk and the lab plans to organize three tasks:
This is a continuation of eRisk 2018's T2 task.
The challenge consists in performing a task on early risk detection of anorexia. The challenge consists of sequentially processing pieces of evidence and detect early traces of anorexia as soon as possible. The task is mainly concerned about evaluating Text Mining solutions and, thus, it concentrates on texts written in Social Media. Texts should be processed in the order they were created. In this way, systems that effectively perform this task could be applied to sequentially monitor user interactions in blogs, social networks, or other types of online media.
The test collection for this task has the same format as the collection described in [Losada & Crestani 2016]. The source of data is also the same used for eRisk 2017 and 2018. It is a collection of writings (posts or comments) from a set of Social Media users. There are two categories of users, anorexia and non-anorexia, and, for each user, the collection contains a sequence of writings (in chronological order).
In 2019, we move from a chunk-based release of data (used in 2017 and 2018) to a item-by-item release of data. We set up a server that iteratively gives user writings to the participating teams. More information about the server is given here.
The task is organized into two different stages:
Evaluation: The evaluation will take into account not only the correctness of the system's output (i.e. whether or not the user is depressed) but also the delay taken to emit its decision. To meet this aim, we will consider the ERDE metric proposed in [Losada & Crestani 2016] and other alternative evaluation measures.
The proceedings of the lab will be published in the online CEUR-WS Proceedings and on the conference website.
To have access to the collection all participants have to fill, sign and send a user agreement form (follow the instructions provided here). Once you have submitted the signed copyright form, you can proceed to register for the lab at CLEF 2019 Labs Registration site
Important DatesThis is a new task in 2019. Essentially, it has the same format as T1 (but T2 has no training data!).
The challenge consists in performing a task on early risk detection of signs of self-harm. The challenge consists of sequentially processing pieces of evidence and detect early traces of self-harm as soon as possible. The task is mainly concerned about evaluating Text Mining solutions and, thus, it concentrates on texts written in Social Media. Texts should be processed in the order they were created. In this way, systems that effectively perform this task could be applied to sequentially monitor user interactions in blogs, social networks, or other types of online media.
The test collection for this task has the same format as the collection described in [Losada & Crestani 2016]. The source of data is also the same used for eRisk 2017 and 2018. It is a collection of writings (posts or comments) from a set of Social Media users. There are two categories of users, self-harm and non-self-harm, and, for each user, the collection contains a sequence of writings (in chronological order).
In 2019, we move from a chunk-based release of data (used in 2017 and 2018) to a item-by-item release of data. We set up a server that iteratively gives user writings to the participating teams. More information about the server is given here.
T2 has only a test stage (no training stage) and, therefore, we encourage participants to design their own unsupervised (e.g. search-based) strategies to detect possible cases of self-harm. The test stage will consist of a period of time where the participants have to connect to our server and iteratively get user writings and send responses.
Evaluation: The evaluation will take into account not only the correctness of the system's output (i.e. whether or not the user is depressed) but also the delay taken to emit its decision. To meet this aim, we will consider the ERDE metric proposed in [Losada & Crestani 2016] and other alternative evaluation measures.
The proceedings of the lab will be published in the online CEUR-WS Proceedings and on the conference website.
To have access to the collection all participants have to fill, sign and send a user agreement form (follow the instructions provided here). Once you have submitted the signed copyright form, you can proceed to register for the lab at CLEF 2019 Labs Registration site
Important DatesThis is a new task in 2019. The task consists of estimating the level of depression from a thread of user submissions. For each user, the participants will be given a history of postings and the participants will have to fill a standard depression questionnaire (based on the evidence found in the history of postings).
The questionnaires are defined from Beck's Depression Inventory (BDI), which assesses the presence of feelings like sadness, pessimism, loss of energy, etc. The questionnaire has the following 21 questions:
Instructions: This questionnaire consists of 21 groups of statements. Please read each group of statements carefully, and then pick out the one statement in each group that best describes the way you feel. If several statements in the group seem to apply equally well, choose the highest number for that group. 1. Sadness 0. I do not feel sad. 1. I feel sad much of the time. 2. I am sad all the time. 3. I am so sad or unhappy that I can't stand it. 2. Pessimism 0. I am not discouraged about my future. 1. I feel more discouraged about my future than I used to be. 2. I do not expect things to work out for me. 3. I feel my future is hopeless and will only get worse. 3. Past Failure 0. I do not feel like a failure. 1. I have failed more than I should have. 2. As I look back, I see a lot of failures. 3. I feel I am a total failure as a person. 4. Loss of Pleasure 0. I get as much pleasure as I ever did from the things I enjoy. 1. I don't enjoy things as much as I used to. 2. I get very little pleasure from the things I used to enjoy. 3. I can't get any pleasure from the things I used to enjoy. 5. Guilty Feelings 0. I don't feel particularly guilty. 1. I feel guilty over many things I have done or should have done. 2. I feel quite guilty most of the time. 3. I feel guilty all of the time. 6. Punishment Feelings 0. I don't feel I am being punished. 1. I feel I may be punished. 2. I expect to be punished. 3. I feel I am being punished. 7. Self-Dislike 0. I feel the same about myself as ever. 1. I have lost confidence in myself. 2. I am disappointed in myself. 3. I dislike myself. 8. Self-Criticalness 0. I don't criticize or blame myself more than usual. 1. I am more critical of myself than I used to be. 2. I criticize myself for all of my faults. 3. I blame myself for everything bad that happens. 9. Suicidal Thoughts or Wishes 0. I don't have any thoughts of killing myself. 1. I have thoughts of killing myself, but I would not carry them out. 2. I would like to kill myself. 3. I would kill myself if I had the chance. 10. Crying 0. I don't cry anymore than I used to. 1. I cry more than I used to. 2. I cry over every little thing. 3. I feel like crying, but I can't. 11. Agitation 0. I am no more restless or wound up than usual. 1. I feel more restless or wound up than usual. 2. I am so restless or agitated that it's hard to stay still. 3. I am so restless or agitated that I have to keep moving or doing something. 12. Loss of Interest 0. I have not lost interest in other people or activities. 1. I am less interested in other people or things than before. 2. I have lost most of my interest in other people or things. 3. It's hard to get interested in anything. 13. Indecisiveness 0. I make decisions about as well as ever. 1. I find it more difficult to make decisions than usual. 2. I have much greater difficulty in making decisions than I used to. 3. I have trouble making any decisions. 14. Worthlessness 0. I do not feel I am worthless. 1. I don't consider myself as worthwhile and useful as I used to. 2. I feel more worthless as compared to other people. 3. I feel utterly worthless. 15. Loss of Energy 0. I have as much energy as ever. 1. I have less energy than I used to have. 2. I don't have enough energy to do very much. 3. I don't have enough energy to do anything. 16. Changes in Sleeping Pattern 0. I have not experienced any change in my sleeping pattern. la. I sleep somewhat more than usual. lb. I sleep somewhat less than usual. 2a. I sleep a lot more than usual. 2b. I sleep a Iot less than usual. 3a. I sleep most of the day. 3b. I wake up 1-2 hours early and can't get back to sleep. 17. Irritability 0. I am no more irritable than usual. 1. I am more irritable than usual. 2. I am much more irritable than usual. 3. I am irritable all the time. 18. Changes in Appetite 0. I have not experienced any change in my appetite. la. My appetite is somewhat less than usual. lb. My appetite is somewhat greater than usual. 2a. My appetite is much less than before. 2b. My appetite is much greater than usual. 3a. I have no appetite at all. 3b. I crave food all the time. 19. Concentration Difficulty 0. I can concentrate as well as ever. 1. I can't concentrate as well as usual. 2. It's hard to keep my mind on anything for very long. 3. I find I can't concentrate on anything. 20. Tiredness or Fatigue 0. I am no more tired or fatigued than usual. 1. I get more tired or fatigued more easily than usual. 2. I am too tired or fatigued to do a lot of the things I used to do. 3. I am too tired or fatigued to do most of the things I used to do. 21. Loss of Interest in Sex 0. I have not noticed any recent change in my interest in sex. 1. I am less interested in sex than I used to be. 2. I am much less interested in sex now. 3. I have lost interest in sex completely
This task aims therefore at exploring the viability of automatically estimating the severity of multiple symptoms associated with depression. Given the user's history of writings, the algorithms have to estimate the user's response to each individual question. We collected questionnaires filled by Social Media users together with their history of writings (we extracted each history of writings right after the user provided us with the filled questionnaire). The questionnaires filled by the users (ground truth) will be used to assess the quality of the responses provided by the participating systems.
The participants will be given a dataset with multiple users (for each user, his history of writings is provided) and they will be asked to produce a file with the following structure:
username1 answer1 answer2 .... answer21 username2 .... ....
Each line has the username and 21 values. These values correspond with the responses to the questions above (the possible values are 0, 1a, 1b, 2a, 2b, 3a, 3b -for questions 16 and 18- and 0, 1, 2, 3 -for the rest of the questions-).
Evaluation will be based on:
the overlapping between the questionnaire filled by the real user and the questionnaire filled by the system (number of correct responses).
the absolute difference between the levels of depression obtained from both questionnaires (level of depression obtained from the real questionnaire vs level of depression obtained from the estimated questionnaire). The level of depression is simply obtained by summing the numeric values of the responses to the individual questions. This gives an integer value in the range 0-63.
the depression level obtained from this questionnaire is regularly used to categorize users as: minimal depression (0-9), mild depression (10-18), moderate depression (19-29), and severe depression (30-63). A third method of evaluation will consist of assessing the systems in terms of how many users are correctly categorized (automatic questionnaire vs real questionnaire).
8/11/2018
30/11/2018
14/01/2019
03/03/2019
10/04/2019
24/04/2019
24/05/2019
15/06/2019
29/06/2019