Master - Lab - Automatic Extraction and Completeness Detection of Statistical information
Supervisor: Anna-Marie (ortloff@cs.uni-bonn.de)
To conduct meta-analyses (for an example, see [1]) in Usable Security and Privacy (USECAP), statistical information, such as participant numbers, effect sizes, and results from statistical hypothesis tests need to be extracted from publications and assessed for completeness manually, which is quite time consuming. While there are tools for consistency checking results of statistical tests [2] when information is reported correctly according to APA, in USECAP, reporting of statistics is not always complete, or matches APA criteria. In a prior student work, a wide range of regex patterns to identify statistical concepts in scientific text were developed. In this lab, your focus would be on automatically identifying information belonging to a limited number of types of hypothesis tests and assessing the completeness of this information.
Your task
- Building on the prior student work, and a paper on extracting methodological information from scientific papers [3], identify statistical information and group it according to incidences of statistical hypothesis tests
- For each identified hypothesis test, determine whether the reported information is complete enough for meta-analysis
It is possible to do this lab as a group.
Literature to start with
- [1] Example of meta-analysis: Ma et al. (2019) Virtual Humans in Health-Related Interventions: A Meta-Analysis, accessible at https://www.researchgate.net/profile/Debaleena-Chattopadhyay/publication/331164999_Virtual_Humans_in_Health-Related_Interventions_A_Meta-Analysis/links/5c6a0b264585156b570300fc/Virtual-Humans-in-Health-Related-Interventions-A-Meta-Analysis.pdf
- [2] Nuijten et al. (2016) The prevalence of statistical reporting errors in psychology (1985 - 2013), accessible at https://mbnuijten.files.wordpress.com/2013/08/nuijtenetal_2016_reportingerrorspsychology.pdf
- [3] Example for automatic extraction of information from papers: Niksirat et al. (2023) Changes in Research Ethics, Openness, and Transparency in Empirical Studies between CHI 2017 and CHI 2022, accessible at https://dl.acm.org/doi/pdf/10.1145/3544548.3580848, and supplemental material at https://github.com/petlab-unil/replica
Prerequisites
- You should have basic knowledge about inferential statistical testing on the level of the Bachelor-course “Usable Security and Privacy”. If necessary, we can provide English slides and German videos for self-study.