15 December 2020
Bringing home cognitive assessment: Comparing web-based and in-person CANTAB
In times of a pandemic, being able to deliver cognitive tests remotely becomes even more relevant and can prevent the interruption of clinical studies: but are cognitive assessments comparable in the clinic and the home? New research, published in the Journal of Medical Internet Research, demonstrates that performance on the Cambridge Neuropsychological Test Automated Battery (CANTABTM) is broadly comparable when delivered unsupervised online or in-person in the laboratory.
Cognitive function is typically assessed during one-to-one administration of a neuropsychological test in a clinic or lab setting by a trained psychometrician . In-person assessments come with significant costs, consisting of the employment and training of staff, as well as time and travel costs for personnel and participants . In clinic test administration may also limit participation to people who are willing and able to travel, making some communities under-represented in clinical research (e.g. individuals who are geographically isolated, non-drivers, physically disabled, those suffering from agoraphobias or social phobias). In the current climate, of a global pandemic, some other barriers are added to the list such as the participants’ willingness to attent a clinic or more general restriction to accessing the clinic for both participants and site staff.
The demand for remote testing is supported by developments in digital health technology, such as the availability of cognitive tests on a smartphone or smartwatch or via the web on people’s own laptop or desktop computers. Web-based automated assessments are inexpensive and quick to conduct. They provide fewer restrictions on location, easily fit in with daily schedule and can reduce costs [2,3–5]. However, evidence was needed to show to what extent results from these two different test environments are comparable. This study was designed to gain further insight in the extent of comparability between web-based cognitive testing and lab-based assessment on the Cambridge Neuropsychological Test Automated Battery (CANTAB).
To investigate this comparability, fifty-one healthy adults completed two testing sessions on average one week apart. Participants were randomly allocated to one of two groups, either assessed in-person in the laboratory first (n=33) or with unsupervised web-based assessment on their personal computing systems first (n=18). Emotion recognition, episodic memory, working memory and spatial planning and sustained attention were assesed on both occasions with CANTAB ERT, PRM, PAL, SWM, OTS and RVP.
For the in-person assessment, a trained psychometrician was present, whose role was to provide technical support where needed or additional instructions where required, as well as to log observations (distraction, problems) during task performance. The unsupervised web-based assessments were completed at home via the CANTAB Connect web-based testing feature. Web-based testing was enabled only on desktop or laptop computers, and not on touch-screen devices. Responses were logged via mouse or trackpad clicks. For both assessments, test administration and training, as required, was automated with voiceover guidance for each task.
Overall this study provides supporting evidence for the comparability of a range of performance outcome measures examined using a web-based unsupervised administration of the CANTAB in a healthy adult sample. Certain performance outcome measures showed better comparability than others and should therefore be preferable for use where comparability with typical in-person assessment is required. Strict criteria for comparability were set in advace, including satisfactory reliability, equivalence, and agreement across testing modalities. Acceptable comparability, in line with pre-defined criteria, broken down per outcome measure, are shown in table 1. Criteria were defined as follows:
(a) Show high levels of reliability in relation to in-person assessments. Reliability criteria met where Intraclass correlation coefficients (ICCs) ≥0.60. This cutoff score is in line with previous research  and interpretive recommentations for ICC .
(b) Show equivalence with in-person tests. Equivalence criteria met where no significant different between performance levels across testing modalities in mixed effects models, and data supporting the null hypothesis for Bayesian paired t-tests.
(c) Meet established thresholds for agreement. Agreement criteria met where ≥95% of data points lie within the 95% limits of agreement on Bland Altman plots , and there is no evidence of bias or proportional bias.
Table 1. (Tick=criteria met, Cross=criteria not met, - analyses are not completed). Abbreviations: PAL – Paired Associate Learning, OTS – One-Touch Stockings of Cambridge, PRM – Pattern Recognition Memory, SWM – Spatial Working Memory, ERT – Emotion Recognition Task, PRM-D – Pattern Recognition Memory Delayed, RVP - Rapid Visual Information Processing
Two out of nine performance outcome measures met all pre-defined criteria for comparability between measures. PAL Total Errors Adjusted and RVP A’ showed no difference between testing modalities, good reliability between test modalities, and good agreement. Additionally, for SWM between errors, agreement analyses could not be completed but intra-class correlations were above threshold, and there was no evidence of performance difference between modalities. These measures are therefore determined to have good overall validity in relation to typical in-person assessment and are well suited to studies delivering tets via both modalities.
Correlations were moderate and significant across all other performance measures (p<.01), but did not reach pre-defined thresholds for cross-setting reliability, and some did not meet thresholds for agreement. This suggests that although these test continue to tap into the related cognitive processes when completed during unsupervised web-based asssessments, the resultant data shows key differences. These outcome measures may therefore not be as well suited to mixed study designs.
Reaction time indices were not found to be comparable between in-person and unsupervised web-based assessments, and greater care is required in the interpretation of latency results in relation to typical in-person assessments.
Whilst the above findings highlight the similarities and differences between supervised in-clinic assessments and unsupervised web-based assessments, there may be things that can be done to bring test results closer together across different settings. Using the same computing hardware across settings can help to improve response consistency, particularly for timed tasks. Coaching participants to optimise their at-home test environment and to minimise distractions can help to improve the fidelity of the test data in relation to what would be obtained in the laboratory. Previous studies have also used remote guided testing to provide examiner support and allow for behavioural observations of participants in an at-home test environment. These different test manipulations now need to be further examined to identify the right balance between participant and examiner burden and the consistency of the resultant data across settings. However, this balance will also depend on the needs of the population under study and the aims and resources of the individual study.
Interested in reading the full paper? See https://www.jmir.org/2020/8/e16792/
1. Morrison GE, Simone CM, Ng NF, Hardy JL. Reliability and validity of the NeuroCognitive Performance Test, A web-based neuropsychological assessment. Front Psychol 2015;6(NOV):1–15. PMID:26579035
2. Haworth C, Harlaar N, Kovas Y, Davis O, Bonamy O, Hayiou-Thomas ME, Frances J, Busfield P, McMillan A, Dale PS, Plomin R. Internet cognitive testing of large samples needed in genetic research. Twin Res Hum Genet [Internet] 2007;10(4):554–563. PMID:18179835
3. Hansen TI, Lehn H, Evensmoen HR, Håberg AK. Initial assessment of reliability of a self-administered web-based neuropsychological test battery. Comput Human Behav [Internet] Elsevier Ltd; 2016;63:91–97. [doi: 10.1016/j.chb.2016.05.025]
4. Barenboym DA, Wurm LH, Cano A. A comparison of stimulus ratings made online and in person: Gender and method effects. Behav Res Methods 2010;42(1):273–285. PMID:20160306
5. Gosling SD, Vazire S, Srivastava S, John OP. Should We Trust Web-Based Studies? A Comparative Analysis of Six Preconceptions About Internet Questionnaires. Am Psychol 2004;59(2):93–104. PMID:14992636
6. Salthouse TA, Nesselroade JR, Berish DE. Short-term variability in cognitive performance and the calibration of longitudinal change. J Gerontol B Psychol Sci Soc Sci 2006;61(3):144–151.
7. Feenstra HEM, Murre JMJ, Vermeulen IE, Kieffer JM, Schagen SB. Reliability and validity of a self-administered tool for online neuropsychological testing: The Amsterdam Cognition Scan. J Clin Exp Neuropsychol [Internet] Routledge; 2017;00(00):1–21. PMID:28671504
8. Zou GY. Sample size formulas for estimating intraclass correlation coefficients with precision and assurance. 2012;(March). [doi: 10.1002/sim.5466]
9. Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res 1999;8:135–160.
Dr Caroline Skirrow and Rosa Backx