23 July 2020
Validating digital assessments for real-world testing
The power of remote assessments is the flexibility in when and where they can be administered. However, with this increased flexibility comes a responsibility to collect data at a frequency that is meaningful but not burdensome for the patient. This article will cover how to navigate this tricky process and strike the balance.
Conducting research remotely during and after COVID-19
While living through a pandemic has taught us not to take face-to-face interactions for granted, it has also taught us the importance of being equipped with the right tools to carry on as normal (or close to it) when face-to-face interactions are not possible.
Psychological research is theoretically a field that should be hindered by social distancing measures as they prevent researchers from being able to conduct face-to-face assessments. Fortunately, psychological research was already moving towards remote assessment, using online platforms and personal devices to gather data from individuals as they go about their normal lives (1–3). Therefore, many projects have been able to progress either as intended or by amending the study design, relying on the use of digital tools in field settings.
It is very feasible that remote research will become increasingly popular, and perhaps even the new norm (4). Firstly, society is now more attuned to the possibility of future pandemics occurring and the widespread effects. Researchers and participants may not be as willing to participate in close-contact experiments as they once were. Secondly, researchers that traditionally did not rely on remote assessments to conduct research may now discover the benefits of these tools (especially as the technologies improve) which could change the way they operate in the future. Therefore, it is vital that resources are invested in the development of remote assessments.
Determining the suitability of an assessment for remote research
Remote assessments have flexibility in when (e.g., 3 times per day) and where (e.g., at the pub) they can be administered, avoiding recall and experimenter biases, and potentially increasing the external validity of the research. However, having less control over extraneous factors makes it difficult to demonstrate robust internal validity of an assessment. This limitation can be counterbalanced by increasing the number of observations per participant to increase the signal-to-noise ratio (5) . However, striking the right balance between validity of the data and feasibility of the methods is key. It is unrealistic to expect participants to spend significant amounts of their time completing assessments on a regular basis. Therefore, field assessments will need to be either brief or infrequent. This should minimize non-compliance and attrition. Yet provide enough data to decrease variability among measurements and increase confidence that measurements are correctly detecting the presence or absence of the symptom of interest (6).
Assessments that are lengthy and cannot be abbreviated (because sacrificing items or trials would disrupt the assessment’s psychometric properties) would be difficult to administer at high frequencies without overly inconveniencing users. If they can be administered infrequently, and do not have to be completed in real-time, they can instead be administered online (e.g., once a week, or at the end of each day). This approach is useful when the priority is to obtain a comprehensive evaluation and as a result, a degree of bias from poor recall is deemed an acceptable trade-off (e.g., when qualitative data is needed, for purposes of a clinical diagnostic interview).
When measurements are only meaningful if they are captured after a relevant event occurs, the frequency and regularity of assessments should depend on the frequency and regularity with which relevant events occur. Otherwise, data will be collected unnecessarily, which has ethical implications, wastes resources, and could impact data quality (e.g., by decreasing compliance) (7).
There are also assessments that are brief and appropriate to administer at high frequencies but may not be suitable for remote testing due to other considerations. For example, if the assessment traditionally requires trained personnel to deliver and supervise the assessment, creative solutions to replace human elements need to be developed.
Validating low frequency assessments for remote research
To validate a novel assessment for remote data collection, the novel assessment is administered remotely, and the resulting outcomes are compared to those generated by an established assessment. The established assessment is either administered in a testing facility or remotely (depending on if the assessment has already been validated for remote data collection). While there are disadvantages of remote data collection, they might be offset by the advantages as research suggests that data collected online and offline are comparable (1,8–12). On one hand, an experimenter is not present to intervene if participants are incapacitated, disengaged, or require clarification (13). There may be sampling bias, in that participants are more likely to have computer and internet access and have greater technology proficiency (1,12,14). On the other hand, the absence of an experimenter and being studied in natural settings, may mitigate social facilitation/impairment and evaluation apprehension, allowing for ecologically valid data to be captured (15). Participation in the study is accessible to individuals who might otherwise find it difficult to take part in face-to-face studies due to physical barriers (e.g., limiting health conditions, living in remote areas, lack of suitable transport) (1,14).
Validating high frequency assessments for remote research
Brief assessments which are designed to monitor dynamic changes in real-time and are not limited by strict testing requirements (e.g. do not require specialist equipment) are appropriate for high-frequency remote testing. A high-frequency field assessment can be validated against another high-frequency field assessment measuring the same construct if one is available. If one is not yet available, or the comparison needs to be made to a more comprehensive assessment (e.g., a full diagnostic interview), the high-frequency field assessment can be validated against a low-frequency assessment administered in a more controlled environment (i.e., in a testing facility or online). The data measured by a high-frequency field assessment and a low-frequency assessment will not be collected at the same time or in the space. Therefore, the comparison will need to be made between the accumulation of high-frequency measurements taken over time in the field and a single measurement taken in a more controlled environment. Comparing the outcomes produced by these two complementary methods controls for confounding influences, such as the retrospective recall bias that the low-frequency assessments are subject to (2) and lack of control that high-frequency assessments are subject to. The data measured by two high-frequency assessments can be collected at the same time and in the same space. The resulting high-resolution data allows for in-depth exploration of the inter-individual and intra-individual variability. This method can be used to improve the signal-to-noise ratio, and as a result, achieve more representative baselines and develop novel digital phenotypes to improve precision and accuracy in diagnosis and outcomes (16).
The far-reaching impact of the 2020 COVID-19 pandemic emphasizes the need for flexible approaches to conducting research, such as utilizing remote assessments. A remote assessment needs to be designed so that it can be administered repetitively in natural settings to collect a sufficient amount of data without overburdening its users. Because there are key temporal and spatial differences in how remote and traditional research tools are used, validity and feasibility need to be evaluated taking each tool’s unique spatial and temporal profile into account. The fine granularity of the data and increased external validity afforded by remote assessments compensates for limitations such as a lack of control in the study environment.
1. Woods AT, Velasco C, Levitan CA, Wan X, Spence C. Conducting perception research over the internet: A tutorial review. Vol. 2015, PeerJ. PeerJ Inc.; 2015. p. e1058.
2. Shiffman S, Stone AA, Hufford MR. Ecological Momentary Assessment. Annu Rev Clin Psychol. 2008;4(1):1–32.
3. Rajagopalan A, Shah P, Zhang MW, Ho RC. Digital platforms in the assessment and monitoring of patients with bipolar disorder. Brain Sci. 2017;7(11).
4. Drew DA, Nguyen LH, Steves CJ, Menni C, Freydin M, Varsavsky T, et al. Rapid implementation of mobile technology for real-time epidemiology of COVID-19. Science (80- ) [Internet]. 2020 May 5 [cited 2020 May 12];eabc0473. Available from: https://www.sciencemag.org/lookup/doi/10.1126/science.abc0473
5. Smith PL, Little DR. Small is beautiful: In defense of the small-N design. Psychon Bull Rev [Internet]. 2018 Dec 1 [cited 2020 Jul 6];25(6):2083–101. Available from: https://doi.org/10.3758/s13423-018-1451-8
6. Maxwell SE, Kelley K, Rausch JR. Sample Size Planning for Statistical Power and Accuracy in Parameter Estimation AR Further. 2007 [cited 2020 Apr 3]; Available from: http://psych.annualreviews.org
7. Ebner-priemer UW, Sawitzki G. of Affective Instability in Borderline Personality Disorder The Effect of the Sampling Frequency. 2007;(2004).
8. Assmann KE, Bailet M, Lecoffre AC, Galan P, Hercberg S, Amieva H, et al. Comparison between a self-administered and supervised version of a web-based cognitive test battery: Results from the nutri net-santé cohort study. J Med Internet Res [Internet]. 2016 Apr 1 [cited 2020 Apr 2];18(4):e68. Available from: http://www.ncbi.nlm.nih.gov/pubmed/27049114
9. Cromer JA, Harel BT, Yu K, Valadka JS, Brunwin JW, Crawford CD, et al. Comparison of Cognitive Performance on the Cogstate Brief Battery When Taken In-Clinic, In-Group, and Unsupervised. Clin Neuropsychol [Internet]. 2015 May 19 [cited 2020 Apr 2];29(4):542–58. Available from: http://www.ncbi.nlm.nih.gov/pubmed/26165425
10. Feenstra HEM, Murre JMJ, Vermeulen IE, Kieffer JM, Schagen SB. Reliability and validity of a self-administered tool for online neuropsychological testing: The Amsterdam Cognition Scan. J Clin Exp Neuropsychol [Internet]. 2018 Mar 16 [cited 2020 Apr 2];40(3):253–73. Available from: http://www.ncbi.nlm.nih.gov/pubmed/28671504
11. Silverstein SM, Berten S, Olson P, Paul R, Williams LM, Cooper N, et al. Development and validation of a World-Wide-Web-based neurocognitive assessment battery: WebNeuro. Behav Res Methods. 2007;39(4):940–9.
12. Lumsden J, Skinner A, Woods AT, Lawrence NS, Munafò M. The effects of gamelike features and test location on cognitive test performance and participant enjoyment. PeerJ. 2016 Jul 6;2016(7):e2184.
13. Bauer RM, Iverson GL, Cernich AN, Binder LM, Ruff RM, Naugle RI. Computerized Neuropsychological Assessment Devices: Joint Position Paper of the American Academy of Clinical Neuropsychology and the National Academy of Neuropsychology †. Arch Clin Neuropsychol [Internet]. 2012 [cited 2020 Apr 1];27:362–373. Available from: https://academic.oup.com/acn/article-abstract/27/3/362/4858
14. Skitka LJ, Sargis EG. The Internet as Psychological Laboratory. Annu Rev Psychol. 2006 Jan;57(1):529–55.
15. Yantz CJ, McCaffrey RJ. Social Facilitation Effect of Examiner Attention or Inattention to Computer-Administered Neuropsychological Tests: First Sign that the Examiner May Affect Results. Clin Neuropsychol [Internet]. 2007 Jun 29 [cited 2020 Apr 1];21(4):663–71. Available from: http://www.tandfonline.com/doi/abs/10.1080/13854040600788158
16. Cohen AS, Schwartz E, Le T, Cowan T, Cox C, Tucker R, et al. Validating digital phenotyping technologies for clinical use: the critical importance of “resolution.” Vol. 19, World Psychiatry. Blackwell Publishing Ltd; 2020. p. 114–5.
Dr Jennifer Ferrar is an experimental psychologist at the University of Bristol, specialising in the development of measures to explore mechanisms underlying health behaviours and develop relevant interventions, including collaborative projects with Cambridge Cognition.