|
|
|||||||||
Dr. Lipscomb is Professor, Rollins School of Public Health, Emory University, Atlanta, GA.
Dr. Gotay is Professor, Cancer Research Center of Hawai'i, University of Hawai'i, Honolulu, HI.
Dr. Snyder is Assistant Professor of Medicine, Division of General Internal Medicine, Johns Hopkins School of Medicine, Baltimore, MD.
This article is available online at http://CAonline.AmCancerSoc.org
To earn free CME credit for successfully completing the online quiz based on this article, go to http://CME.AmCancerSoc.org.
Disclosure: Dr. Lipscomb's work on this article was partially supported by a grant from the Georgia Cancer Coalition's Distinguished Cancer Clinicians and Scientists program.
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
The principal means for treating cancer—surgery, chemotherapy, radiation, and hormonal therapy—are frequently very effective in stopping tumor progression, reducing cancer-attributable pain and discomfort, extending life, and in many instances, curing the disease. However, all such therapies come with the risk of substantial side effects. Some are short-term and time-limited, others are long-term and persistent, and still others arise only years after the initial cancer treatment. Traditional biomedical outcome measures, particularly survival and disease-free survival, remain indisputably of central importance in cancer decision making. But there has been growing recognition that patient-reported outcome (PRO) measures—including, in particular, measures of health-related quality of life (HRQOL)—can convey important additional information for assessing the overall burden of cancer and the effectiveness of interventions.
Informal affirmation that PROs "matter" in cancer decision making is registered whenever a provider asks patients how they have been feeling, whether they have been fatigued or in pain, whether they have been able to carry on with the usual activities of life, whether they have required caregiver support, whether they have been well enough to stick with their prescribed therapy, and so on.
Concrete indicators that support the importance of PROs in the cancer sphere can be found in a variety of recent US research and policy-related developments. See Table 1 for a summary of some leading public, private, and mixed public-private sector initiatives that revolve around the application of PROs in cancer. The efforts noted in Table 1 have virtually all come into being since 2000, suggesting a growing interest in bringing "the patient's perspective" to cancer decision making. Taken together, such initiatives are leading to a clearer understanding of the current strengths and limitations of using PRO measures in cancer.
|
In the next section, we examine in some detail how PROs may be defined and measured for cancer outcomes assessment. In a given patient-provider encounter, the patient's response to a simple inquiry of "how do you feel today?" will likely provide valuable qualitative information for treatment evaluation and planning. However, for research studies to evaluate the impact of specific interventions on PROs, for population surveillance of progress against the cancer burden, and for systematic and interpretable PRO data to inform patient-provider decision making, one needs high-quality quantitative measures of PROs. Thus, this section looks briefly at how HRQOL-oriented PRO measurement instruments are developed, applied in practice, and evaluated in terms of performance in the field.
The succeeding 4 sections of the paper examine the actual and potential role of PROs in, respectively, the evaluation and approval of cancer therapies, the assessment of cancer care in the community, patient-provider decision making in clinical oncology practice, and population surveillance of cancer patients and survivors. The final section identifies future challenges and opportunities in PRO measure development and application in light of advances in the state of the science in cancer outcome measurement and the evolving needs of decision makers.
| DEFINING AND MEASURING PROS IN CANCER |
|---|
|
|
|---|
What is HRQOL?
Over the past 40 years, the published literature dealing with "quality of life" in "cancer or neoplasms" has grown substantially. A Medline search crossing these 2 terms yields over 12,500 English-language citations over the period of 1966 to 2006, with about 92% occurring from 1990 onward (1,382 over 1990 to 1994; 2,866 over 1995 to 1999; 5,236 over 2000 to 2004; and 2,063 over 2005 to 2006). In line with this growth in the literature, there has been increasing discussion about how HRQOL in cancer is defined and how to measure it. In 1993, Aaronson et al16 noted the "growing interest in broadening the evaluation criteria employed in cancer clinical trials beyond traditional biologic markers of therapeutic outcome—tumor response, time to progression, and disease-free and overall survival—to include an assessment of the impact of the disease and its treatment on the physical, psychological, and social functioning of the patient." In a comprehensive assessment of conceptual models of HRQOL, Ferrans17 identified a range of HRQOL definitions, including this one by Cella18: "the extent to which one's usual or expected physical, emotional, and social well-being is affected by a medical condition and/or its treatment."
After evaluating hundreds of published applications of quality-of-life measures in cancer, the National Cancer Institute (NCI)'s Cancer Outcomes Measurement Working Group (COMWG) concluded that the distinguishing features of an HRQOL measure are that it is patient-reported and that it involves the patient's subjective assessment or evaluation of important aspects of his or her well-being.19 An implication is that all HRQOL measures are PRO measures, but there are PRO measures that have little or no evaluative component and, thus, would not qualify as HRQOL measures. For example, a simple patient report on the presence or absence of a symptom such as nausea may require some subjective interpretation on the respondent's part, but it conveys little or no information about the impact of the symptoms on functioning or other aspects of well-being. Specifically, the COMWG defined HRQOL measurement to include patient assessments of symptom impact, functional status, and/or global well-being.
Symptom measures that would qualify as HRQOL measures thus report not only the existence or frequency, but also the severity, bother, or other impacts of symptoms, including both disease-related and treatment-induced. Well-known examples include the Rotterdam Symptom Checklist,20 which encompasses multiple aspects of symptom effects (psychological distress, physical distress, and disease-specific symptoms), and the Brief Pain Inventory,21 which focuses expressly on one prominent symptom area. The widely used Common Terminology Criteria for Adverse Events,22 which are based on patient interview and/or laboratory data, but are physician-recorded and reported, are distinct from PROs and would not qualify as HRQOL measures.
Functional status measures that are intended to capture only one dimension (that is, one definable aspect or domain) of HRQOL are termed unidimensional; an example is the Beck Depression Inventory.23 In point of fact, most functional status measures of HRQOL are multidimensional, designed to reflect multiple domains of impact. The specific domains of focus vary by instrument, but often include physical, psychological, and social components of outcome. Prominent examples include the European Organization for Research and Treatment of Cancer (EORTC) QLQ-C30,16 the Functional Assessment of Cancer Therapy General (FACT G),24 the Health Utilities Index (HUI),25 and the EQ-5D Health Questionnaire.26
Despite the fact that all of these questionnaires purport to measure HRQOL, there are sharp distinctions in the conceptualization, construction, and intended application among different multidimensional instruments. One basic difference is between (1) measures based on psychometric science, in which an individual indicates his or her HRQOL response along a subjective scale of well-being (eg, the EORTC QLQ C30, the FACT G, and indeed, the majority of HRQOL functional status measures applied in cancer to date); and (2) measures based on the science of economic evaluation in health care in which respondents supply a relative value (or utility) rating for a given HRQOL state in comparison with designated anchor states (often the "best" and "worst" states the individual can imagine) (eg, the HUI and EQ-5D; see Feeny27). The latter type of measure is generally termed "preference-based," while the psychometrically oriented measures are sometimes referred to as "nonpreference-based." These distinctions give rise to practical differences in how HRQOL summary scores are derived and used.
For the psychometric-based measures, an individual's HRQOL score is based on the specific survey items endorsed, which are transformed to scale scores indicating the relative degree of functioning or well-being the individual reports along each posited HRQOL dimension. That is, one is attempting to pinpoint the individual's "location" along the measurement scale corresponding to each HRQOL dimension. In this sense, the score is nonpreference-based since there is no explicit attempt otherwise to compute a utility value for the individual's scale location. Summary scores for nonpreference-based multidimensional HRQOL measures may be reported in terms of a profile of (unidimensional) scale scores (eg, the EORTC QLQ-C3016) or, in addition, an overall summary score (eg, FACT G24), as will be illustrated below.
For the economic-based measures, an individual's HRQOL score basically reflects 2 categories of information: specific levels of functioning along each of the posited dimensions, as indicated by which survey items are endorsed; and the relative value or utility weight assigned to each of these levels of functioning. These utility weights may be assigned by the individual directly, or (as is more commonly the case) they may be imputed to the individual based on community health state preference surveys. By combining both categories of information, an overall preference-based HRQOL score is assigned to the individual (as will be illustrated below). Global well-being measures of HRQOL capture the individual's overall assessment of well-being or happiness in a summary score or indicator typically based on a single (global) question. A frequently used psychometric global HRQOL measure asks the individual to indicate whether his or her health is Excellent, Very Good, Good, Fair, or Poor (the "E-VG-G-F-P" scale). A common preference-based measure is obtained by asking the individual simply to rate his or her overall health or well-being numerically on a scale that includes an explicit comparative standard (eg, a 0 to 100 "visual analog scale" where 100 represents the best possible health level and 0 the worst). Note that global approaches do not deny that HRQOL may be multidimensional. Rather, such measures require the individual to engage in a holistic evaluation that effectively aggregates across whatever dimensions are (implicitly) important to him or her.
Additionally, HRQOL measures in cancer may be classified as either generic, general cancer, or cancer site-specific or cancer problem-specific. A generic HRQOL measure can be applied to a range of diseases and conditions that may be, but need not be, cancer related. Examples include the Medical Outcomes Study Short Form (SF)-3628,29 and a number of other psychometrically based measures (see Erickson30), as well as the EQ-5D, HUI, and indeed, most all preference-based measures (see Feeny27). A general cancer measure is intended for application across the full range of cancer-related events, regardless of the patient's tumor type; among the many examples reviewed by the COMWG are the FACT G and the EORTC QLQ-C30. A cancer site-specific or cancer problem-specific HRQOL measure is tailored, respectively, to a particular tumor type (eg, EORTC-QLQ-BR2331 for breast cancer), problem area (eg, FACT N32 for febrile neutropenia associated with adjuvant chemotherapy), or treatment modality (eg, FACT BRM33 for treatment with biologic response modifiers such as interferon).
That PRO measures generally, and HRQOL measures in particular, can play multiple important roles in cancer intervention assessment and decision making was emphasized in a recent Journal of the National Cancer Institute Monograph34 and is a recurring theme in this paper. In particular, multidimensional (nonpreference-based) HRQOL measures have been employed extensively to evaluate cancer interventions in clinical trials and in community settings. Preference-based HRQOL measures have been less frequently embraced in these contexts, but are often used in cost-effectiveness analyses of cancer interventions. As discussed here later, applications of PRO measures in clinical oncology practice and in population surveillance of the cancer burden are at early stages of development, and we are just beginning to understand what types of PRO formulations might be most appropriate and feasible in these settings.
Finally, to provide a concrete feel for how HRQOL scores are derived, we illustrate the application of a multidimensional, psychometrically based, cancer site-specific instrument—namely, the FACT Colorectal (C)—in Figure 1 and a multidimensional preference-based instrument—namely, the EQ-5D—in Figure 2. In each case, we compute the HRQOL score for a hypothetical patient undergoing chemotherapy following surgery for Stage III colorectal cancer.
|
|
The process for developing a preference-based HRQOL instrument includes essentially the steps above for ensuring appropriate dimension-specific item content, but it generally also includes the challenging task of deriving utility weights associated with each item on each dimension scale. For discussions of how this has been accomplished for the major preference-based HRQOL measurement systems, see Feeny et al25 regarding the HUI, Kaplan et al38 regarding the Quality of Well-Being (QWB) index, and Shaw et al36 regarding the new, US-based utility weights for the EQ-5D.
Evaluating the Performance of PRO Instruments: The Medical Outcomes Trust (MOT) Framework
How might one evaluate the technical quality and appropriateness of an instrument for measuring PROs? To guide its assessment of PRO measures, NCI's COMWG adopted the framework of attributes and criteria developed by the Scientific Advisory Committee of the nonprofit MOT.39,19 The MOT attributes for judging the psychometric performance of health status and quality-of-life measures generally are shown in Table 2 (as adapted from the MOT39 and Lipscomb et al40). The important point here is that standardized criteria to evaluate PRO questionnaires have been developed, are widely accepted, and can be used to assess the merit of the measurement tools used in cancer.
|
| PROS IN THE EVALUATION AND APPROVAL OF CANCER INTERVENTIONS |
|---|
|
|
|---|
Perhaps the most comprehensive such evaluation to date was carried out by NCI's COMWG. Created in 2001, the COMWG comprised 35 experts in outcomes assessment drawn from academia, government, industry, and the cancer patient and survivorship communities. For comprehensive discussions concerning the functioning of the working group, key findings, and lessons learned, see Lipscomb et al,5 Lipscomb et al,40 Snyder et al,41 and Gotay et al.42 Among the main points are the following5:
The circumstances under which HRQOL measures bring significant information value to outcomes assessment over and above that provided by traditional biomedical endpoints need to be identified, particularly when the study's primary endpoint is survival or disease-free survival. For purposes of COMWG deliberations, HRQOL measures were defined as providing added value when these measures were instrumental in interpreting a study's findings and would be expected to influence clinical recommendations.19 The COMWG authors' judgments were primarily based on their study-by-study assessments of HRQOL findings compared with those for biomedical outcomes. The specific observations are as follows:
Finally, in a separate study that served to complement the COMWG's work, NCI's Community Clinical Oncology Program (CCOP) undertook a comprehensive review of all NCI-supported symptom-management trials initiated since 1987.52 In sum, just over half of these trials assessed "global quality of life" using a total of 22 distinct instruments (though the FACT G and the Uniscale were adopted in over half the trials). The conceptual framework for most applications consisted of a posited simple, one-way relationship between symptom relief and global quality of life; the possibly complex interplay between the different quality-of-life dimensions and types of symptoms was rarely recognized. Across these symptom-management trials, there was "no consistent relationship" found between global quality-of-life measures and either the symptoms being targeted or the interventions studied.52
Regardless of the focus of the cancer trial—treatment, symptom management, screening, or prevention—several consistent themes emerge. To maximize the information value of PRO assessment in cancer trials, one needs a clear rationale for including PROs; an explicit conceptual model to guide the measurement task; appropriate instrumentation; and a sound plan for data collection, analysis, and reporting.
These are precisely the matters at issue when decision makers have to judge the merits of a claim that a particular drug provides "clinical benefit" based on improvement in PROs, as discussed below.
| REFLECTING THE PATIENT'S PERSPECTIVE IN CANCER PRODUCT APPROVAL |
|---|
|
|
|---|
FDA Guidance on PRO Measurement
More than a decade in the making and much anticipated by the pharmaceutical industry, clinical trialists, and outcomes measurement researchers, the FDA's draft "Guidance for Industry-Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims"4 was released for comment in February 2006. According to the FDA,4 the Guidance is intended to increase the efficiency of the FDA's communication with industry about PRO endpoints in trials, streamline its review of the adequacy of PRO endpoints used to support product-labeling claims, and "provide optimal information about the patient's perspective of treatment benefit at the time of product approval." By way of definition, the FDA states the following:
A PRO measurement is any aspect of a patient's health status that comes directly from the patient (ie, without the interpretation of the patient's responses by a physician or anyone else). In clinical trials, a PRO instrument can be used to measure the impact of an intervention on one or more aspects of patients' health status, ranging from the purely symptomatic (response to a headache), to more complex concepts (eg, ability to carry out activities of daily living), to extremely complex concepts such as quality of life, which is widely understood to be a multidimensional concept with physical, psychological, and social components.4
Although the FDA's final guidance on PROs has yet to be issued, it may be reasonable to expect it to include a number of themes enunciated in the draft document. With greater specificity than in the past, the FDA discusses the rationales for using PRO measures in medical product development: (1) some treatment effects (eg, pain intensity and pain relief) are known only to the patient; (2) patients provide a unique perspective on treatment effectiveness (since "improvements in clinical measures of a condition may not necessarily correspond to improvements in how the patient feels or functions"); and (3) formal assessment by patients may be more reliable than informal interviews with providers or other sources of information about the patient's condition.
On balance, this proposed guidance suggests that PRO measures used in studies to support drug-labeling claims need to meet the same general standards of scientific rigor and clinical usefulness expected for more traditional clinician-reported outcome measures such as treatment toxicity scores. A forthcoming paper by FDA-associated authors identifies 5 "sources of bias" that, taken together, explain "why HRQOL-based efficacy claims have not to date been accepted by the FDA for inclusion in anticancer product labels."53 The sources of bias include lack of randomization (especially as occurring in single-arm trials); lack of blinding (since masking treatment assignments from patients or providers can prove difficult); missing data; multiplicity of endpoints (arising when one fails to adjust statistically for the resulting increased likelihood of obtaining a "significant" finding by chance); and intrinsic meaning (whether the HRQOL findings consider all relevant information, are internally consistent, and exhibit clinical relevance to the population studied) (see Note 2).
Along this line, it may be pertinent to note what FDA oncology officials reported in a 2003 article in the Journal of Clinical Oncology.3 Of 57 regular approvals for cancer drugs over 1990 to 2002, tumor response was the approval basis for 26, supported by relief from tumor-specific symptoms in 9 of the 26. Symptom relief "provided critical support" for approval in 13 of the 57 cases. And although many of the 53 marketing applications for approval based on nonsurvival endpoints "used surrogate endpoints for a better life, no approvals were based on instruments measuring health-related quality of life."
But note that whatever the final form of the FDA's PRO guidance, it applies expressly to the industry-submitted claims for product approval in the United States. Regulatory authorities elsewhere in the world have, in fact, taken a somewhat different perspective in recent years, particularly regarding the value of HRQOL measurement. A recent review of drug approvals in the European Union nations over 1995 to 2003 found that 34% of the dossiers submitted by product sponsors for evaluation reported HRQOL and other PRO measure findings, "with cancer-related treatments most frequently including PRO data."55 Within the United States, the extent to which such an FDA guidance will influence the use of PROs in nonindustry trials is unclear. In particular, no more than 10% of NCI-supported cancer treatment trials are conducted to support FDA submissions (E. L. Trimble, personal communication, March 2006). Rather, most NCI Phase III treatment trials compare new interventions with standard therapeutic approaches to inform treatment decisions in the cancer community. A 2003 examination of all NCI-sponsored cancer treatment trials found that 31% (59 of 189) of Phase III trials and 4% (34 of 810) of Phase I, II, and I/II trials had 1 or more HRQOL endpoints.56
NCI's Patient-Reported Outcomes Assessment in Cancer Trials (PROACT) Conference
Although NCI has never issued a formal guidance document on the use of PROs, NCI scientists have published extensively on the appropriate inclusion of quality-of-life measures in trial protocols (see Gotay et al57,58). Regarding decisions on the use of HRQOL in trials in recent years, NCI staff have essentially worked with investigators on a trial-by-trial basis. The guiding principle has been to encourage inclusion of HRQOL assessment in a trial when there is "an HRQOL hypothesis that will add to the existing body of knowledge, will generate hypotheses for future studies, or stimulate a change in clinical practice."56
The NCI's Clinical Trials Working Group, created to guide the NCI's efforts to restructure its clinical trial enterprise, recommended in its 2005 final report that a "funding mechanism and prioritization process [be established] to ensure that the most important...quality of life studies" are carried out appropriately alongside clinical trials.59
With these developments as a backdrop, the NCI conducted an international conference in September 2006, focusing on "Patient-Reported Outcomes Assessment in Cancer Trials: Evaluating and Enhancing the Payoff to Decision Making."6 Cosponsored by the American Cancer Society (ACS), the conference examined the circumstances under which the use of PROs, including HRQOL, in cancer trials could yield the greatest payoff to decision making; best practices for the application of PROs in a range of trials (Phase I/II, Phase III, and symptom management); and high-priority topics for future research. Conference findings, slated for publication in the Journal of Clinical Oncology in late 2007, are intended to inform the early deliberations of NCI's new Symptom Management and Health-related Quality of Life Steering Committee (S. B. Clauser, personal communication, March 2007) (see Note 3).
Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT)
In parallel, there is at least one more initiative that could well have an impact on PRO measurement standards and practice, and it is called IMMPACT. Organized in 2002 as a voluntary association of participants from academia, government agencies (including the FDA and the NIH), patient advocacy organizations, and the pharmaceutical industry, IMMPACT's mission has been to develop consensus recommendations for improving the design, execution, and interpretation of clinical trials of treatment for pain.13,60,61
| PROS IN THE ASSESSMENT OF CANCER CARE IN THE COMMUNITY |
|---|
|
|
|---|
Prostate Cancer Outcomes Study (PCOS)
The first population-based evaluation of HRQOL for prostate cancer patients on a multiregional scale, PCOS was initiated by the NCI in 1994 in collaboration with 6 geographically dispersed cancer registries in the NCI's Surveillance, Epidemiology, and End Results (SEER) program.63 On a final sample of over 3,500 men, PCOS investigators combined cancer registry and detailed medical records data to obtain information on specific diagnostic procedures, prostate-specific antigen (PSA) values, clinical stage and grade of tumor, details of treatment (prostatectomy, hormonal therapy, evidence of "watchful waiting"), and acute complications of treatment. Of particular import, study participants were surveyed by mail at 6, 12, 24, and 60 months after the initial diagnosis to elicit their own reports on such issues as urinary/bladder control, bowel habits, sexual function, satisfaction with care, and overall impact of their condition(s) on health status. Several complementary HRQOL measures were employed, including the SF-36 instrument, the E-VG-G-F-P summary rating scale, and some survey items developed expressly for PCOS.64 Through 2006, PCOS had generated about 25 publications, with findings such as these:
Cancer Care Outcomes Research and Surveillance Consortium (CanCORS)
With PCOS showing the way, the NCI in 2001 launched CanCORS, a comprehensive study of cancer care treatment and outcomes in large, population-representative cohorts of individuals newly diagnosed with either lung or colorectal cancer.11 The consortium, composed of 6 research groups and a statistical coordinating center funded by the NCI and 1 group funded by the US Department of Veterans Affairs, has 2 principal aims: first, to determine how the characteristics and beliefs of cancer patients and their providers, as well as the characteristics of health care organizations, affect treatment decisions, clinical outcomes, and PROs; and second, to examine the association between specific treatments and patient outcomes.
To accomplish this, CanCORS investigators have identified roughly 5,000 lung cancer and 5,000 colorectal cancer patients, drawn from 5 geographically defined regions of the United States (which together include 5 integrated health care delivery systems), 15 VA medical centers, and a range of community-based practice sites. For each included patient, a rich longitudinal picture of cancer diagnosis, treatment, and outcomes is being constructed by linking registry data with information from medical records, administrative files, and surveys conducted of the patient and his or her cancer care providers (including, for a sample of patients, their informal caregivers).
Through telephone interview surveys, CanCORS investigators have collected PRO data using a variety of measures, including symptom/problem checklists, SF-12 items, lung cancer-specific items and colorectal cancer-specific items from the EORTC suite of HRQOL instruments, the EQ-5D preference-based measure, the E-VG-G-F-P global health status scale, a visual analog scale (0 to 100), and questions on the patient's perception of and satisfaction with cancer treatment planning and decision making.68
The scientific papers flowing from CanCORS over the next several years will likely provide unprecedented insights into the value, as well as possible limitations, of PRO measures for evaluating the effectiveness of cancer care in real-world community settings.
National Initiative for Cancer Care Quality (NICCQ)
Organized by the American Society of Clinical Oncology in 2000 with support from several professional societies, patient advocates, and private foundations, the NICCQ was an in-depth study of cancer care quality in a (final) sample of nearly 1,800 patients diagnosed with either breast or colorectal cancer across 5 geographically dispersed regions of the United States.12,69 Of particular interest for present purposes is that 9 of the 36 quality measures for breast cancer and 6 of the 25 quality measures for colorectal cancer focused expressly on "respect for patient preferences and inclusion in decision making"—measures necessarily requiring the patient's own report. For these measures, overall adherence was 86% for breast cancer and 89% for colorectal cancer.69
| PROS IN CLINICAL ONCOLOGY PRACTICE |
|---|
|
|
|---|
Regarding the final point, a comprehensive analysis by Gotay et al76 examined 66 published investigations from 1986 to 2005 of the relationship between the cancer patient's reported HRQOL and length of survival. In 60 of these 66 studies, HRQOL was significantly related to survival time after controlling for biological variables in multivariable analyses. Indeed, HRQOL demonstrated a stronger relationship to survival prognosis than standard clinical predictor variables. A compelling recent example was provided by Efficace et al,77 who found that baseline social functioning scale scores of the EORTC QLQ-C30 were significant predictors of survival time for metastatic colorectal cancer patients.
Interestingly, much of the literature on implementing or evaluating PROs in oncology practice comes from Europe and Canada, not the United States where the use of PRO measures to inform patient-level decision making remains "rare."70 Given the potential benefits, it is reasonable to ask why there has not been more active investigation, if not adoption, of PRO measurement in oncology care generally. Donaldson identifies patient, clinician, and health system factors that may be at work.71 For patients, potential hurdles include response burden, confidentiality concerns, and the possibility that PRO assessment may touch on sensitive topics. For clinicians, there may be doubts about whether PRO instruments are truly useful vehicles to inform cancer care decision making and about the time and practice resources required for PRO assessment (especially in the general absence of third-party payment for such activities). For health care delivery systems, potential barriers include the anticipated resource costs of collecting and managing PRO data, as well as concerns that such data could eventually be used by payers, purchasers, and others to monitor the quality of care.
To enhance the use of PRO measures in oncology practice, Donaldson urges the adoption of new information infrastructures and technologies, combined with redesign of cancer care delivery itself. Such changes could lower data collection costs, ensure confidentiality, and facilitate the day-to-day use of PRO information in patient-provider decision making.
System changes notwithstanding, provider reimbursement for PRO data collection and use will likely remain an important issue. In that regard, we note a striking, if time-limited, example of how third-party payers could go about reimbursing providers to collect HRQOL data in day-to-day oncology practice. It was provided by the "Demonstration of Improved Quality of Care for Patients Undergoing Chemotherapy" put in effect by the US Centers for Medicare & Medicaid Services (CMS) for (calendar year) 2005.10 Under terms of the Demonstration, medical oncologists delivering chemotherapy to Medicare enrollees could bill $130 per patient encounter if they collected and submitted data on the patient's current status regarding nausea/vomiting, pain, and fatigue, as measured by items taken from the Rotterdam Symptom Checklist, a prominent HRQOL instrument. A preliminary analysis of the data by the CMS78 indicated that only a minority of patients suffered significant symptoms (defined as checking the Rotterdam boxes "quite a bit" or "very much"): 2% for nausea/vomiting, 8% for pain, and 26% for fatigue. To be sure, this 2005 CMS quality-of-life demonstration was not without its controversy (see Note 4). Nonetheless, it provides both a precedent and some practical information about how the "pay for performance" mechanism might be employed to incentivize PRO assessment in routine cancer care.
To accelerate this transition, it is important that patients, providers, and health systems become increasingly familiar with the PRO assessment process and ongoing efforts to make it feasible and attractive. In that regard, 2 recent international conferences sought to provide comprehensive assessments of the state of the science in PRO assessment in oncology practice. In June 2007, the International Society for Quality of Life Research (ISOQOL) sponsored a special conference on "Patient-Reported Outcomes in Clinical Practice."81 The conference, held in Budapest, addressed a wide spectrum of issues: PRO use in clinical practice from the perspectives of various stakeholders (eg, patients and providers); theoretical underpinning for using PROs in clinical practice; applications of PROs in clinical practice; and topics pertaining to data collection, analysis, and interpretation. In October 2003, the Mayo Clinic sponsored the conference "Quality of Life III: Translating the Science of QOL into Clinical Practice."9 The conference, held in Scottsdale, Arizona, investigated a range of issues bearing on the benefits and costs of incorporating HRQOL measures into clinical practice.
| PROS IN POPULATION SURVEILLANCE OF CANCER PATIENTS AND SURVIVORS |
|---|
|
|
|---|
The question thus arises: at the population level, how might we go about measuring and monitoring progress against the suffering due to cancer and, at the same time, progress toward improving the quality of life of all those touched by cancer?
These issues of PRO data availability take on added import with the ACS's recent revision of its "2015 Nationwide Objectives," which are designed to facilitate fulfillment of its 2015 Challenge Goal in the area of quality of life.83 This Challenge Goal calls for "Measurable improvement in the quality of life (physical, psychological, social, and spiritual) from the time of diagnosis and for the balance of life of all cancer survivors by the Year 2015." In support of the Goal, ACS's National Board of Directors has adopted 4 new Nationwide Objectives intended to foster, over time, both improvement in the quality of life of individuals affected by cancer and the capacity to measure changes on a population-wide basis. Specifically, ACS's Quality of Life Objectives call for (1) improving access to care (by ensuring health care coverage for all by 2015); (2) reducing the impact of out-of-pocket costs on receipt of care by individuals diagnosed with cancer (so that by 2015, no more than 2% of cancer patients report cost-related access-to-care difficulties); (3) improving pain control by greatly strengthening pain management policies within the states by 2015; and (4) having in place national surveillance systems by 2015 to support population-based measurement of quality of life for individuals affected by cancer. That these objectives do not also include specific national targets for quality-of-life improvement simply reflects the fact, underscored in objective 4, that the data for such population-based monitoring are presently unavailable in the United States.
The question is how to accelerate the evolution to cancer data systems that incorporate not only traditional clinical and epidemiological parameters, but PROs. Building on pertinent commentary and recommendations from several recent analyses,84–87 we see at least 3 strategies, and they are not mutually exclusive:
The technology for carrying out such an ambitious effort is fully available now, even as the nation progresses apace toward an interoperable electronic health care information system. Research initiatives like PCOS, CanCORS, and the NICCQ demonstrate that we already know "how to do it." The question is whether public agencies and private cancer organizations will devote the resources and then work in concert to create an enduring cancer data infrastructure.
| ENHANCING THE SCIENCE OF PRO MEASUREMENT AND APPLICATION |
|---|
|
|
|---|
Improving the State of the Science in PRO Measurement
While PRO measures can undoubtedly benefit from "continuous quality improvement" in multiple ways, there are several specific issues that merit and are receiving close attention.
Strengthening the Conceptual Foundations of PRO Measurement
As underscored in a number of COMWG analyses42 and in NCI's review of HRQOL in symptom-management studies,52 much work remains in strengthening the conceptual underpinnings of PRO measurement models. Ferrans17 has proposed additional work in a number of areas, including stronger attention to the cause-effect relationships among variables; distinguishing more clearly between "objectively" measured biomedical outcomes and "subjectively" measured PROs; and identifying not only the PRO domains that are conceptually important to measurement, but their interrelationships. Hays et al88 has advocated "structural equation modeling" as an analytic platform for exploring conceptual model-measurement model relationships, as well as rigorous investigation of other MOT attribute issues, such as construct validity.
Enhancing the Interpretability of PRO Data
Two matters have drawn much attention in recent years: defining what constitutes a MID in a PRO measure score and understanding whether the patient's adaptation to illness over time influences not only his or her PRO measure scores, but the very meaning of those scores to the patient ("response shift").
Perhaps the most discussed if not also most challenging interpretative issue concerns defining the MID. The issue arose at multiple points in the COMWG's deliberations. Working on this issue in parallel with but independently of the COMWG was a Clinical Significance Consensus Meeting Group (ClinSig) organized by Mayo Clinic statistician Sloan and comprising 30 experts from academia, industry, and government.8 ClinSig produced a 6-paper monograph89–94 examining such topics as individual versus group differences in the meaning of clinical significance; assessing HRQOL changes over time; and strategies for communicating HRQOL findings to patients, providers, and other decision makers. The monograph highlighted the strengths and limitations of the 2 principal approaches for demonstrating clinical significance in a PRO measure. The "distribution-based approach" expresses the PRO-measured treatment effect in terms of some underlying statistical distribution of possible effect results; for example, it has been argued that an effect may be regarded as clinically significant if it is greater in absolute value than one half the standard deviation of effect sizes. The "anchor-based" approach compares the PRO-measured treatment effect to concurrent changes in some independent standard (or anchor) that itself is regarded as clearly interpretable and meaningful (eg, ability to perform usual activities).
Writing on the topic of clinical significance for the COMWG, Osoba (who was also a member of ClinSig) concluded there is preliminary evidence suggesting that a meaningful change in an HRQOL score appears to be about 7% of the full breadth of the measurement scale, perhaps bracketed by 5% and 10%.95 For example, this implies that a score change from, say, 50 to 57 on a 100-point scale constitutes a perceptible and meaningful HRQOL improvement. Such a difference (7%, with a range of 5% to 10%) is consistent with what Cohen96 and others have termed a "small" to "medium" effect size, which is often regarded as approximating a MID in effect sizes. Hence, there is preliminary evidence that the MID for an HRQOL measure is roughly the same under current articulations of distribution-based and anchor-based approaches to defining the MID.
On the other hand, there is no direct evidence from either its draft guidance to industry or recent publications by its staff scientists that the FDA openly embraces or rejects any of these approaches to determining what constitutes a clinically meaningful difference in an HRQOL measure (or a PRO measure, generally).4,53
The second intriguing interpretation matter is response shift, the hypothesized phenomenon that the very meaning of an individual's evaluation of a subjective construct like HRQOL changes over time (see Schwartz and Sprangers97,98). Such response shift within an individual is said to reflect changes in either his or her (1) internal standards of measurement; (2) valuation of the individual HRQOL domains; or (3) definitions or perceptions of what the domains mean. For example, the following stylized scenario is consistent with the occurrence of a response shift. Suppose that a (cancer-free) individual rates his own health as Very Good (on the E-VG-G-F-P global scale) and the health of someone described with localized prostate cancer as Fair. Suppose this same individual is subsequently diagnosed with localized prostate cancer and now—living his life with this cancer—rates his overall health as Good. The pre/post change in his rating of localized prostate cancer from Fair to Good could reflect internal adaptations arising from either 1, 2, or 3 above.
Ferrans17 noted that a better understanding of response shift could lead to conceptual models of HRQOL that provide a better portrayal of dynamic relationships among "objective" biomedical outcomes and (the comparatively more malleable measures of) HRQOL.
Applying Modern Psychometric Approaches to Enhance PRO Assessment
As noted earlier,5,39 modern measurement approaches such as IRT modeling may offer significant opportunities to enhance the rigor and efficiency of PRO data collection and analysis compared with what is possible using conventional psychometric approaches (based on "Classical Test Theory" [CTT]).
IRT modeling not only allows PRO item responses to inform the scale score assigned to an individual as with CTT, but also uses (large samples of) individual item responses to estimate a PRO "scale score" for each item. Such information allows the investigator to pose items to a respondent strategically so as to rapidly "hone in" on his or her most likely position and, thus, statistically optimal score on the PRO measurement scale.
In consequence, it can be demonstrated that IRT modeling can generate PRO measurement scales with stronger reliability across the full breadth of the scale and also possibly greater validity in terms of being able to distinguish among true individual-level differences in outcome.99–101 IRT (unlike CTT) enables one to "cross-walk"—and, thus, compare directly—the scores between 2 instruments purporting to measure the same health-related construct. IRT also facilitates a statistically rigorous investigation of what has been called "differential item functioning," which is whether a given PRO instrument performs the same or differently when applied to populations that differ by cultural background, geography, or other considerations.99
Perhaps most significantly, IRT modeling provides the theoretical underpinnings for the construction of "item banks" to facilitate PRO measurement via computer-adaptive testing (CAT). Under this approach, IRT modeling is used to choose the PRO survey items, taken from existing fixed-item instruments or else newly created, that constitute the membership of the bank. To assess where a given individual lies on a given PRO scale (for example, to assign the individual a score on a physical functioning scale), the CAT procedure selects survey items for the individual in a strategic fashion, with the answer to the previous questions driving the selection of the next question. The upshot is that the individual's scale score can be estimated with acceptable precision using significantly fewer questions than with traditional fixed-item instruments. (Precisely the same methodology is used today in computer-assisted administration of the Scholastic Aptitude Tests and other standardized exams.102)
In reality, virtually all PRO measure applications in the published literature to date are based on fixed-item instruments. So the promise of IRT modeling, demonstrated in a number of recent experimental studies,99–101 awaits the broad "market test." And that test is about to come. In 2004, as part of its new Roadmap Initiative, the NIH launched a 5-year, $25 million project to develop the Patient Reported Outcomes Measurement Information System (PROMIS).14 PROMIS is building web-based, public-domain item banks to support CAT for selected health symptoms and HRQOL domains affected by a variety of chronic diseases, including cancer. A network of 6 research and data collection sites and a statistical coordinating center are pursuing the following aims: electronic administration of individually tailored PRO questionnaires via a number of secure platforms (eg, computers, Internet, telephone); collection of PRO data in research studies, including clinical trials; and eventually, the capability of providing instant-turnaround PRO reports to patients, providers, and researchers.
Targeted Therapies and Longer Life Expectancies: PRO Measurement in the 21st Century
While today's cancer therapies are frequently effective at arresting the progression of disease, they often have significant toxic effects since they generally affect healthy tissue and organs in addition to cancerous cells; surgery, of course, carries its own particular risks. Targeted therapies are at the center of a paradigm shift in cancer treatment (see Gotay103). They work directly to interfere with the cancer cell growth process, thus greatly reducing damage to adjacent healthy cells. On the other hand, some targeted therapies may require ongoing administration so that patients are to maintain contact with their oncologist over time—indeed, they may never be able to "complete" treatment in a decisive way. Rather, the individual may live for years with his or her cancer under control, but at risk to other chronic disease problems.
Consequently, do targeted therapies require a distinctly new approach to PRO assessment? If so, in what ways? For example, to focus solely on cancer symptoms or single HRQOL domains—in the spirit of published observations by FDA senior oncology staff3—could prove problematic for new types of cancer therapy where the nature of the health status impacts is simply not known early on. Rather, it may be necessary to ascertain the patient's own perspective on the short-term effects of these new treatments through qualitative interviews.
The possible long-term and late effects of cancer treatments are receiving heightened attention now, initially for pediatric cancers104 and more recently for adult cancers.105 Adverse downstream consequences of therapy may include excess fatigue, sexual dysfunction, other physical effects (such as arm edema), and cognitive problems. Consequently, it is important that PRO assessment extend across the survivorship period, perhaps through greater emphasis on postmarketing surveillance of therapies.103
The challenges outlined here point to the ongoing importance of monitoring and enhancing the content validity of PRO instruments (see Table 2). For example, it is reasonable to ask whether certain HRQOL instruments developed and validated in younger cancer patients are the most appropriate measures for older cancer patients. Likewise, as Zebrack and Cella have pointed out,49 HRQOL instruments frequently used among older adult cancer survivors may not adequately address the concerns of young adult and adolescent survivors.
Meeting the Needs of Decision Makers
Matching the PRO Measure to the Task at Hand
For each potential PRO application, an enduring challenge is selecting the most appropriate PRO measure(s) from the large menu of available options. From the COMWG analyses, we conclude the following42: