Evaluation of the psychometric properties of patient-reported outcome measures of health-related quality of life across the European cancer continuum: a systematic review protocol using COSMIN methodology

Over the past decades, there has been increasing recognition that assessing patients with cancer’s health-related quality of life (HRQoL) is pivotal to delivering optimal patient-centred healthcare. However, with the increasing number of patient-reported outcome measures (PROMs) available, it becomes more and more challenging to identify the most appropriate PROM to capture HRQoL. Therefore, the aim of this systematic review is to (1) identify all available PROMs assessing HRQoL across the European cancer continuum and (2) critically appraise, compare and summarise the psychometric properties of the identified PROMs.

Bibliographic databases MEDLINE and PubMed Central (through PubMed) and EMBASE (through Scopus) will be comprehensively searched from database inception until March 2024. Studies reporting on the measurement properties of PROMs assessing HRQoL throughout the European cancer continuum will be included. The evaluation of the psychometric properties, data extraction and data synthesis will be conducted according to the Consensus-based Standards for the selection of health Measurement Instruments (COSMIN) methodology. Two reviewers will independently assess the methodological quality using the COSMIN risk of bias checklist and the COSMIN criteria for good measurement properties. Subsequently, findings will be qualitatively summarised. The Grading of Recommendations Assessment, Development and Evaluations (GRADE) guidelines will be used to grade and summarise the quality of the evidence.

Ethical clearance for this research is not required, as the systematic review will only use information from previously published research. The results of this review will be submitted for publication in a peer-reviewed journal and will be used to provide a set of evidence-based recommendations for a European project (EUonQOL), which aims at developing a new PROM (EUonQOL toolkit) to assess HRQoL across the European cancer continuum. Moreover, findings will be disseminated to a clinical audience and policymakers through conferences, supporting researchers and clinicians in choosing the best measure to evaluate HRQoL in patients with cancer and survivors in Europe.

CRD42023418616.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Health-related quality of life (HRQoL) can be defined as ‘how well a person functions in their life and his or her perceived well-being in physical, mental and social domains of health’.1 Functioning refers here to a patient’s ability to carry out some pre-defined activities and well-being to their subjective feelings.1 More specifically, the framework developed by Wilson and Cleary, which is currently the most applied theoretical model of HRQoL,2 conceives HRQoL as a multidimensional construct encompassing five components: symptom status, functional status, biological and psychological variables, general health perceptions and overall quality of life.

Over the past decades, there has been increasing recognition that assessing patients with cancer’s HRQoL is pivotal to delivering optimal patient-centred healthcare.3 4 HRQoL is now perceived as a meaningful endpoint throughout the cancer continuum5 6 and can serve as a valuable source of information to guide healthcare policies (eg, Europe’s Beating Cancer plan7). However, HRQoL is often inaccurately assessed by healthcare providers and poorly captured by medical procedures or tests, highlighting the need for patient involvement in reporting their outcomes.3 4 8 9

Patient-reported outcomes (PROs) are defined by the Food and Drug Administration as ‘a measurement based on a report that comes directly from the patient about the status of a patient’s health condition, without amendment or interpretation of the patient’s response by a clinician or anyone else’.10 Patient-reported outcome measures (PROMs) refer to the tools used to measure PROs and are now systematically used for the assessment of HRQoL in cancer care. To assess the HRQoL of patients with cancer, a wide array of PROMs is currently available, ranging from generic (eg, 36-Item Short-Form Survey Instrument [SF-36], 5-level EQ-5D [EQ-5D-5L]) to cancer-specific (eg, EORTC Core Quality of Life questionnaire [QLQ-C30], Functional Assessment of Cancer Therapy - General [FACT-G]) and tumour-specific tools (eg, EORTC Breast Cancer Module [QLQ-BR23], Functional Assessment of Cancer Therapy - Breast [FACT-B]). However, this diversity made it more and more challenging to select the most appropriate PROM. This choice should be made with regard to the target population, the target construct and, importantly, the PROM measurement properties.11

Over the past years, many systematic reviews comparing PROMs for the assessment of HRQoL in patients with cancer have been published. Most of them focused on PROMs measuring HRQoL in a specific type of cancer (eg, breast cancer, prostate cancer)12–23 or cancer population (eg, cancer survivors, advanced cancer, palliative patients).14 24–26 Several of these reviews focused on PROMs evaluating one specific HRQoL-related construct (eg, depression, fatigue, pain),12 13 27–29 and the majority did not report the psychometric properties of the PROMs under investigation per subscale.13–17 19–22 24 25 27 28 30 For the reviews reporting on the psychometric properties of PROMs, the methods used to assess both the quality of the studies and the results differed significantly.31

Currently, the highest methodological standards for the conduct of systematic reviews on the psychometric properties of PROMS are provided by the COnsensus-based Standards for the selection of health Measurement INstruments initiative (COSMIN32). However, among the reviews published to date, only half relied on the COSMIN methodology, and most of them did not apply it fully. For instance, in several reviews, the rating of the overall results per PROM was unclear or not performed,12 16 20 27 33 and the risk of bias assessment or the grading of the evidence was not conducted.12 13 24 27 30 33 As such, a comprehensive overview of the psychometric properties of PROMs used for the assessment of HRQoL across the cancer continuum is still needed and missing. Therefore, this study aims to systematically review the measurement properties of PROMs assessing the multidimensional construct of HRQoL in European patients with cancer and survivors to make objective recommendations on the most suitable PROM to use in these populations.

The protocol of this systematic review is based on the PRISMA-P (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocols) guidelines34 and has been prospectively registered in the International Prospective Register of Systematic Reviews database (PROSPERO 2023, CRD42023418616). In case of protocol amendments, modifications will be reported in the publication reporting the results of the systematic review as supplementary material.

The systematic review will be conducted according to the COSMIN guidelines for systematic reviews32 and will use the COSMIN taxonomy of measurement properties (table 1). All steps of the screening process will be performed using Rayyan.35

Table 1

COSMIN definitions of measurement properties

A systematic search will be performed in the bibliographic databases MEDLINE and PubMed Central (through PubMed) and EMBASE (through Scopus) without a publication date restriction up to February 2023 (updated up to March 2024). The search strategy will be based on the Population Intervention Comparator acronym36 (PICOM) in which the population will be represented by patients with cancer and survivors, the outcome by health-related quality of life and the methods by psychometric properties. No comparator or intervention will be used. Both Medical Subject Headings (MeSH) terms and text words will be used.

Original research articles published in English (including erratum and correction articles) will be considered for inclusion. Reference lists of included articles will be manually searched by hand to ensure all relevant studies will be considered. Additionally, the exclusion filter of Terwee et al37 will be used. The grey literature will not be considered.

The respective search strategies that will be used for PubMed and Scopus are provided in online supplemental appendix 1.

The selection process will be twofold. First, it will be determined whether the PROMs captured by the search should be included or excluded. Second, all titles and abstracts will be screened for eligibility in a blinded standardised manner. If the study seems relevant or in case of doubt, the full-text article will be retrieved and screened. Both the abstract and full-text screening will be done independently by a minimum of two reviewers. For both steps, a pilot screening will be performed on a random subsample of studies and the screening methodology will be clarified within the review team if deemed necessary. Discrepancies will be resolved by discussion and/or consultation of a third reviewer. Inter-rater reliability will be assessed and reported.

PROM selection

To be included, PROMs will need to meet the following criteria:

Study selection

Studies will be included when the following criteria are met:

Detailed information on the selection process will be reported in a PRISMA flowchart (PRISMA 2020 flow diagram39).

During the data extraction, it will be determined which measurement properties will be evaluated for every included study. Extracted data will be entered into a customised xls file using Microsoft Excel. Data extraction will be performed independently by two reviewers, and discrepancies will be resolved by discussion and/or consultation of a third reviewer. Data extraction will be piloted on a random subsample of studies, and the extraction methodology will be clarified within the review team if deemed necessary. When available, data will be extracted as follows:

  1. PROM measurement properties: development and content validity, structural validity/undimensionality, internal consistency, cross-cultural validity and measurement invariance, reliability, measurement error and construct validity. Detailed information on the data that will be extracted for these measurement properties is provided in online supplemental appendix 4.

Following data extraction, all PROMs and related studies will be included in the next phase of the review process for quality assessment.

A scoring manual based on the procedures mentioned hereafter will be built and piloted on a random subsample of studies to enhance the inter-rater homogeneity of PROM quality assessment. The assessment will be performed independently by two reviewers. Discrepancies will be solved by consensus. In case of disagreement, a third reviewer will be involved to solve the discrepancy. As per COSMIN guidelines,32 quality assessment will be conducted sequentially for each PROM in the following order: development/content validity, internal structure (ie, structural validity, internal consistency and cross-cultural validity/measurement invariance), reliability, measurement error and construct validity (ie, criterion validity and hypotheses testing). The COSMIN group defines content validity as the most important measurement property and recommends assessing it first and excluding PROMs with high-quality evidence of inadequate content validity.32 40 However, studies that would report on the poor content validity of a PROM are unlikely to be published and this requirement is unlikely to be met, which does not allow for differentiating between PROMs based on the quality of content validity. Therefore, it was decided that the remaining psychometric properties will not be assessed if PROMs demonstrated inadequate content validity at any level of evidence or no evidence of content validity could be found, as PROMs should be relevant, comprehensive and comprehensible with respect to HRQoL and the European cancer population. Studies assessing structural validity based on a Multi-Trait Multimethod approach41 will be considered to inform construct validity as this method is not appropriate for the assessment of structural validity.32

For all psychometric properties, the assessment will be performed at a subscale level (when applicable). Quality assessment will be performed for each study and measurement property as follows:

Risk of bias assessment

The methodological quality of each study will be evaluated using the COSMIN Risk of Bias Checklist,42 which provides a set of standards for design requirements and preferred statistical analyses per measurement property. These standards provide a framework to assess whether the results, based on the methodological quality of a given study, are trustworthy. Each standard will be rated on a four-point rating scale as ‘very good’, ‘adequate’, ‘doubtful’ or ‘inadequate’. Each assessment of a measurement property is considered to be a separate study. For development/content validity, the quality of each standard will first be determined by retaining the highest rating across the identified studies before taking the lowest rating of each standard to determine the overall quality of the PROM development and content validity. For all other measurement properties, the overall rating of the quality of each study will be determined separately by taking the lowest rating of each standard. Several adjustments were made to the ratings of the COSMIN Risk of Bias Checklist, which are all listed in online supplemental appendix 5.

Criteria for good measurement properties

These criteria are recommendations from COSMIN for which PROMs are assessed as appropriate to be used in research or clinical practice.32

Development and content validity

The overall content validity scoring will comprise four steps.40 First, the results of both the PROM development and content validity studies will be rated by two reviewers independently (online supplemental appendix 6). Each criterion will be scored as ‘sufficient’ (+), ‘insufficient’ (−) or ‘indeterminate’ (?). Reviewers will rate the content of the PROM of interest with ‘sufficient’ (+) or ‘insufficient’ (−), using the same criteria. When there is no content validity study available, content validity criteria will be rated ‘insufficient’ (−). The scoring ‘indeterminate’ (?) will only be used when there is evidence that some aspects of content validity were assessed, but the authors did not provide enough information to score the criterion appropriately. Second, an overall ‘sufficient’ (+), ‘insufficient’ (−), ‘indeterminate’ (?) or ‘inconsistent’ (±) rating will be calculated for relevance, comprehensiveness and comprehensibility per study40 (online supplemental appendix 7). Third, an overall rating per PROM will be calculated for relevance, comprehensiveness and comprehensibility by jointly considering the results of the PROM development and content validity studies, and the reviewer’s ratings. The evidence from the content validity will be weighted higher than the evidence from the development study and the reviewer’s rating. Online supplemental appendix 8 provides a detailed overview of this overall rating process. Last, an overall ‘sufficient’ (+), ‘insufficient’ (−) or ‘inconsistent’ (±) content validity rating will be calculated by aggregating the overall relevance, comprehensiveness and comprehensibility rating. Online supplemental appendix 9 provides a detailed overview of the overall content validity rating process.

Other psychometric properties

Criteria for good measurement properties will be applied for each individual study, resulting in a ‘sufficient’ (+), ‘insufficient’ (−), or ‘indeterminate’ (?) rating. The evidence across studies will be summarised qualitatively, and it will be decided whether the results per psychometric property are consistent. Consistency is defined as at least 75% of individual studies being rated similarly for a given PROM and measurement property. If the threshold of 75% is not reached for any of the rating options and studies with exclusively ‘+’ or ‘−’ ratings are available in combination with ‘?’ ratings, studies with a ‘?’ will be ignored and not included when summarising the results. In all other cases, the overall rating will be scored as ‘inconsistent’ (±). If the results are inconsistent, possible explanations will be explored and the results will be summarised per subgroup when applicable. If no explanation for the inconsistency can be found, the overall rating will remain ‘inconsistent’ (±). A detailed overview of the criteria for good measurement properties, incorporating the inconsistency rating, can be found in table 2. For construct validity, a priori hypotheses were formulated to evaluate the results (table 3).

Table 2

COSMIN criteria for good measurement properties

Table 3

A priori hypotheses for construct validity

Quality of evidence

The quality of the evidence will be graded per measurement property using a modified Grading of Recommendations Assessment, Development and Evaluation approach (GRADE32 43) resulting in four quality levels: ‘high’, ‘moderate’, ‘low’ or ‘very low’. Starting from high-quality level, the quality of evidence will be downgraded if applicable according to the following factors: risk of bias (methodological quality of the studies), inconsistency (of results across studies), imprecision (total sample size of the studies) and indirectness (evidence comes from a different target population). For some factors, the original COSMIN modified GRADE approach does not provide clear guidance on the criteria to be used for the risk assessment; therefore, the GRADE approach was further adapted. The adapted GRADE approach that will be used is reported in tables 4 and 5 for development/content validity and the remaining psychometric properties respectively. The quality of evidence for internal consistency will start at the level of structural validity.32

Table 4

COSMIN adapted the GRADE approach for development/content validity

Table 5

COSMIN adapted the GRADE approach for other psychometric properties

The reporting of the results will follow the PRISMA 2020 statement and a PRISMA checklist will be provided.44 Considering the expected high heterogeneity of the results, no quantitative pooling of the studies’ results per PROM will be performed and no meta-analysis will be planned. In line with the COSMIN guidelines,32 summary tables describing the PROMs’ characteristics, including feasibility and interpretability, and study populations will be produced. The reporting of the results will include the individual ratings on PROM development and content validity, PROM measurement properties and quality of evidence per study. The findings will then be qualitatively summarised as follows.

For content validity, an overall rating per PROM will be calculated for relevance, comprehensiveness and comprehensibility by jointly considering the results of the PROM development and content validity studies, and the reviewer’s ratings. The overall content validity will be rated as ‘sufficient’ (+), ‘insufficient’ (−) or ‘inconsistent’ (±), by aggregating the overall relevance, comprehensiveness and comprehensibility rating.

For the remaining psychometric properties, the evidence across studies will be summarised and it will be decided whether the results per psychometric property are consistent. Consistency will be defined as at least 75% of studies being rated similarly for a given PROM and measurement property. If the threshold of 75% is not reached for any of the rating options and studies with exclusively ‘+’ or ‘−’ ratings are available in combination with ‘?’ ratings, studies with a ‘?’ will be ignored and excluded from the summary. In all other cases, the overall rating will be scored as ‘inconsistent’ (±). For construct validity, a priori hypotheses will be formulated to evaluate the results.

PROMs with sufficient content validity (ie, rated ‘±’ or higher) and at least low-quality evidence (ie, GRADE)43 for sufficient structural validity and internal consistency will be recommended.32 On the other hand, PROMs will not be recommended when there is high-quality evidence for any insufficient measurement property. As with the quality assessment, the formulation of recommendations will be made at a subscale level.

Currently, it is expected that researchers actively involve patients, healthcare professionals and the public in their research. Within systematic reviews, active patient and public involvement has been proposed as a way to enhance the actual and perceived usefulness of the summarised evidence, hence addressing barriers to the uptake of evidence in practice.45 Patient involvement will be ensured at key stages of the systematic review and peer reviewing the academic papers. The results of the review will be discussed with a representative panel of stakeholders, including patients and healthcare professionals to ensure the co-design approach throughout the entire EUonQoL project. It is essential that the PROMs selected to serve as a basis for the development of the EUonQOL toolkit are supported by evidence of content validity, that is, the items constituting these PROMs should be relevant, comprehensive and comprehensible with respect to HRQoL and the European cancer population.

Ethical clearance for this research is not required, as the systematic review will only use information from previously published research. The results will be disseminated to clinicians, researchers and health policymakers by presenting at relevant conferences and by publication in a peer-reviewed journal. Besides that, the findings will be used to identify the most appropriate PROMs for the assessment of HRQoL throughout the European cancer continuum, to serve as a basis for the development of the EUonQOL toolkit and to provide evidence-based recommendations to the EUonQOL consortium.

Not applicable.

Read the full text or download the PDF: