|
|
|||||||||
Dr. Tiwari is Mathematical Statistician and Program Director, Statistical Research and Applications Branch, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD.
Dr. Ghosh is Assistant Professor, Department of Statistics, George Washington University, Washington, DC.
Dr. Jemal is Program Director, Cancer Occurrence, Department of Epidemiology and Surveillance Research, American Cancer Society, Atlanta, GA.
Mr. Hackey is Statistical Programmer, Information Management Services Inc., Silver Spring, MD.
Dr. Ward is Director, Surveillance Research, Department of Epidemiology and Surveillance Research, American Cancer Society, Atlanta, GA.
Dr. Thun is Vice President, Department of Epidemiology and Surveillance Research, American Cancer Society, Atlanta, GA.
Dr. Feuer is Chief, Statistical Research and Applications Branch, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD.
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
In this issue of CA and in Cancer Facts & Figures, the American Cancer Society (ACS) publishes estimates of the number of cancer cases and deaths for the current year, based on projections from observed data that ended three years in the past. For example, the published estimates of cancer deaths in Cancer Facts & Figures 2003 were based on projections from mortality data from 1979 to 2000.2 The estimated number of deaths projected to occur in individual states and in the nation is based on underlying cause-of-death data reported to the National Center for Health Statistics (NCHS). Since 1995, the ACS has used a method of estimating the number of deaths in the current year that applies statistical projections and subjective judgment to adjust for recent changes in death rates.
In this article, we briefly describe the ACS method and present an alternative method for projecting the number of cancer deaths in the current year. This new method is called the state-space method (SSM). To compare the predictive accuracy of the SSM and the ACS method, we used both methods to estimate deaths for the years 1997 to 1999, based on data that would have been available at the time. We compared these estimates to the actual number of deaths that occurred from specific cancers and from all cancers combined at the state and national levels. Although neither method was consistently more accurate for every cancer site, in general the SSM performed better. The time-varying coefficients give added flexibility to this model, allowing it to adjust to rapidly evolving trends. Based on the results of this collaboration with the National Cancer Institute, the ACS has elected to use the SSM model in predicting the number of cancer deaths in Cancer Facts & Figures 2004 and in Cancer Statistics, 2004.3,4
| DATA DESCRIPTION |
|---|
|
|
|---|
The previous ACS method has been used since 1995.5 This approach considers cancer mortality data from 1979, the first year deaths were coded according to the ninth revision of the International Classification of Diseases (ICD-9), through the most recent year for which the mortality data is available, ie, three years before the current calendar year. Our analysis uses mortality data from 1969 to 1999, using ICD-8 through ICD-10 codes.6–8 These codes were made comparable using Surveillance Epidemiology and End Results (SEER) program recode.9
| MATERIALS AND METHODS |
|---|
|
|
|---|
Quadratic Time Series Model (the Previous ACS Method)
The statistical method previously used by the ACS to estimate the number of deaths from cancer during 1995 to 2003 is a multistep procedure.5 First, a quadratic time trend is fitted to the mortality counts, based on the mortality data from 1979 to the most recently available year. This models the long-term variation in trend as a function of time and the square of time. Second, the discrepancies between the fitted and observed values, called residuals, are calculated. An autoregressive process is then fitted to the residuals obtained from the first step. This process models the residuals at any point of time as a function of those at previous time points and accounts for the short-term behavior of the mortality trend. In the third and final step, the combined model is projected three years into the future. The model fitting and prediction are implemented using PROC FORECAST in SAS software.11 This gives three-year-ahead predictions and 95% prediction intervals for the mortality counts. The default setting (which is the one used by the ACS) of PROC FORECAST requires at least seven years of data to successfully fit the model and then predict into the future.
Because the previous statistical approach does not account for recent changes in cancer death rates, it has been combined with subjective judgments about which of five alternative predictions seem most plausible. These include the three-year-ahead point prediction, the upper and lower 95% prediction limits and the midpoints between these prediction limits, and the point estimate. The value selected is then rounded to either the nearest 10s or the nearest 100s in the most recent projections.
The method described above is used to obtain both the state- and national-level predictions. For the state-level predictions, an additional restriction is imposed that the predictions for the 50 states and the District of Columbia must add up to the national-level prediction. Where there is a discrepancy between the two, the difference is proportionally allocated to the states and these revised numbers become the state-level predictions. It should be noted that the projection methods are applied at the national level for a large number of cancer sites.1 For site groupings (digestive or respiratory systems, for example, as in Cancer Facts & Figures 2003, page 4) and all sites combined, the predictions from individual sites are summed.1 At the state level, fewer individual cancer sites (see Cancer Facts & Figures 2003, page 5) and all the cancers combined are modeled directly.1 We follow the same procedures for our new proposed method.
State-Space Method
As an alternative to the previous method, we have used the SSM to predict mortality counts. The motivation for developing this method was to improve sensitivity to short-term trends and to eliminate subjectivity. The SSM model consists of two main parts: a measurement equation and the transition equation.12 In the measurement equation, the mortality counts follow a linear model with time-varying regression coefficients (also known as the state vector of parameters, because all the information about the current state of the process is assumed to be present in this vector). These time-varying regression coefficients also follow a linear model. The two equations combined give a quadratic trend over short time segments, in contrast to the ACS model, which assumes a quadratic trend over the entire time period.
In addition, a transition equation is used to model the dependence of the current state to its immediate past, using another linear model. Both the measurement and transition equations are assumed to incorporate random errors. When the transition error variances are zero, the SSM reduces to fitting a quadratic time trend with independent error terms, mimicking the ACS model.12 In this sense, the SSM represents a generalization of the previous ACS model. The error variances are estimated from the data, using the method of moments described in Ghosh, et al.10 The SSM model is more flexible than a standard regression model, because the former has time-varying coefficients, allowing the model to adjust to sudden changes in the observed data. In contrast, the standard (fixed coefficient) regression model tries to fit one curve through the entire data set, which may not provide a good fit if multiple rapid changes occur. Details of how the optimal estimate of the current state vector is computed are provided elsewhere.10
The sensitivity of the SSM to sudden changes in the data can be disadvantageous when real or random variations in the observed series give rise to a zigzag curve. As a compromise between the accuracy of fit and projections, we have modified the SSM by introducing two tuning parameters. In the adjusted model, the error variances are rescaled by two constants, one each for the measurement error and transition error. These tuning parameters are estimated by minimizing the sum of squares of the differences between the observed number of deaths and their three-year-ahead predictions.
To illustrate, suppose that mortality data from 1969 to 2000 are to be used to predict cancer deaths in 2003. The new method uses a procedure to estimate the tuning parameters as follows: Data to 1975 are used to obtain the three-year-ahead prediction for 1978, and data to 1976 are used to obtain the prediction for 1979, for example. Proceeding like this yields the prediction for 2000. For each year the prediction error is estimated by computing the predicted minus the observed value. The sum of the squares of the prediction errors is obtained by adding the squares of these prediction errors for the years 1978 to 2000, providing a measure of discrepancy of the observed values from the projected values. The tuning parameters are adjusted to produce the smallest sum of the squares of the prediction errors. This final model is then used to fit the 1979 to 2000 mortality counts data and to project three years ahead to obtain the prediction for 2003.
The expressions for estimates of the state vector and the tuning parameters are obtained using an iterative numeric procedure. We have used "R" software to code the procedure just described.13,10
In our comparison, the estimates from the previous ACS method, denoted CFF (for Cancer Facts & Figures), are based on data from 1979 onward and include the subjective choice among the five possible candidates. Estimates obtained directly from PROC FORECAST, denoted PF, are based on data from 1969 onward, as are the estimates made using the SSM model.
| RESULTS |
|---|
|
|
|---|
|
|
Figures 2A to D illustrate the comparison of the SSM and PF models for prostate cancer. Before 1990, the SSM fits are more erratic than are those from PF, because of sensitivity to random error. However, after 1990, the projections from the SSM model are much closer to the observed values than are those from the PF method.
Table 1 compares the projected mortality counts for 1999, based on three methods (PF, CFF, and SSM) with the observed counts for eight cancer sites each in men and women. All the methods use observed data through 1969 and project to 1999 using three-year-ahead projections. Among the 16 sex-specific and site-specific projections, the SSM-generated projections were closer to the observed for nine sites, CFF-generated projections for five sites, and PF-generated projection for one site (PF and SSM were tied for one site).
|
Figures 3 and 4 compare the predictions from the PF, CFF, and SSM models for the years 1995 to 1999 for selected cancer sites using data up to three years before the prediction year. To make these figures comparable, although on different scales, the vertical axes are all drawn approximately ±25% from the average observed values. On this relative scale, the SSM follows the observed trend closer than do the other methods for male and female lung cancer, female breast cancer, male colorectal cancer, and prostate cancer. The best prediction overall is for colorectal cancer, followed by lung, breast, and prostate. Note that the PF and CFF methods in female colorectal cancer consistently underestimate whereas the SSM estimates bounce over and under the observed values. The extra variability of the SSM model in female colorectal cancer is a reflection of variation in past observations.
|
Table 2 shows a similar comparison for all cancer sites combined. The SSM model gives better predictions compared with both the PF and CFF methods for the years 1997 to 1999.
|
Using the squared deviation as the measure of error between the observed and predicted death counts, we compared the accuracy of the PF, CFF, and SSM methods. These quantities are non-negative and become larger with increased discrepancy, giving a proportionally greater penalty to larger discrepancies. We believe that this measure of deviation is appropriate because a large error seems more serious than several smaller ones. We calculated squared deviations of PF, CFF, and SSM from the corresponding observed values for each of the three-year predictions for the years 1997, 1998, and 1999 for the comprehensive set of cancer sites reported in Cancer Facts & Figures. The three-year predictions use data from 1969 until 1994, 1995, and 1996, respectively (with the exception of CFF, which uses data only from 1979). Then, for a particular site, the deviations so obtained are averaged for the three years. Table 3 shows the results for selected cancer site and sex combinations. In general, the average squared deviations for SSM are smaller than those for the PF and CFF methods.
|
Table 4 shows a summary of how the methods perform for a three-year period. The entries in the table are the squared deviations averaged over the comprehensive set of cancer sites reported in Cancer Facts & Figures.
|
In addition, to determine whether one method is better depending on the rarity of cancer deaths, we grouped all the sites into four categories according to the number of observed deaths in 1999. We averaged the squared deviations over all the cancers in a certain group for the years 1997 to 1999. Table 5 shows the results. In general, SSM performs better regardless of the rarity of the cancer.
|
Finally, Table 6 shows summary statistics for the comparison of the PF, CFF, and SSM methods when applied to the state-level data. The predicted values were adjusted so that the sum of the state predictions matches the corresponding national prediction. For each of the cancer sites listed, we averaged the squared deviations over all 50 states and the District of Columbia for the years 1997 to 1999.
|
Table 6 shows that the PF method performs better than the SSM at the state level (which in turn performs better than the CFF method), although the improvement is slight in most cases.
| DISCUSSION |
|---|
|
|
|---|
Despite the decrease in mortality rates that has occurred for several major cancer sites since the 1990s, a corresponding decline in the number of cancer deaths has not occurred.14 This is because the increase in the size and age structure of the population offsets the decrease in age-specific death rates. However, the number of deaths from cancer has increased more slowly than the size of the population because of the decline in age-specific rates. For example, the number of men in the United States increased 41% during the period of our analyses, from 97,884,292 in 1969 to 138,053,563 in 2000. During this interval, the number of deaths from colon cancer in men increased 29% from 22,044 in 1969 to 28,484 in 2000. However, the age-adjusted mortality rate (based on 2000 US standard) actually declined from 33.2 per 100,000 in 1969 to 25.2 per 100,000 in 2000.
The PF method is slower to adapt to the change in death counts, which occurred in the 1990s, because of the fixed regression coefficients. The subjective selection among the five data points generated by the PF, used in the CFF, did not adjust for recent trends as well as the SSM did. However, in some cases, such as female colorectal cancer, this sensitivity may cause results to fluctuate year to year more than is desired. This demonstrates the difficulty of deriving a method that is optimal in all situations.
Yet another method for estimating the number of cancer deaths is based on fitting a joinpoint (also known as changepoint) model to the available mortality data and then extrapolating it to future years. The Joinpoint regression software developed by the National Cancer Institute is based on the permutation test approach and is available at the Web site http://srab.cancer.gov/software/joinpoint.html.15,16 This method is used to characterize cancer trends in the United States. It fits a series of joined linear segments, usually on a log scale, to a data series. When fitted on a log scale, the slope of each segment can be characterized as an annual percentage change in the rates. The number and locations of the joinpoints are determined using a series of statistical tests, called permutation tests. The joinpoint method is useful to identify changes in the trend throughout the data series, although changes near the end of the series are usually of most interest. Because this method is sensitive to changes in trend near the end of the series, it is a natural candidate for short-term extrapolation. Although the joinpoint method did not do as well overall as the SSM, it did reasonably well, especially for moderately rare cancer sites (ie, those with 2,000 to 10,000 deaths), and for some cancer sites it performed better than any other method. In another report we compare in detail the joinpoint method with the others presented here.10
We continue to examine statistical methods that can reliably predict cancer incidence and death rates. Based on the results of this validation study, however, the ACS has elected to use the SSM to project mortality counts for Cancer Facts & Figures 2004.3 Another possible improvement for future estimates may be to use preliminary estimates of mortality, which are available from the NCHS approximately one year before the final estimates. This would allow two-year rather than three-year projections. These preliminary estimates (which have been available since 1995) have been shown to be generally within a few percentage points of the final estimates for most cancer sites at the national level.17 Studies to validate the use of these preliminary estimates are ongoing.
To predict incidence counts for the nation, incidence must first be spatially projected from SEER to the nation, and then projected forward in time to the current calendar year. An improved method for spatial projections of incidence has been developed and work is in progress to incorporate the SSM model to add a temporal component to this spatial model.18
| Acknowledgments |
|---|
| Footnotes |
|---|
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
A. Jemal, R. Siegel, E. Ward, Y. Hao, J. Xu, T. Murray, and M. J. Thun Cancer Statistics, 2008 CA Cancer J Clin, February 20, 2008; (2008) CA.2007.0010v1. [Abstract] [Full Text] |
||||
![]() |
J. P. Lopez, J. Wang-Rodriguez, C. Chang, J. S. Chen, F. S. Pardo, J. Aguilera, and W. M. Ongkeko Gefitinib Inhibition of Drug Resistance to Doxorubicin by Inactivating ABCG2 in Thyroid Cancer Cell Lines Arch Otolaryngol Head Neck Surg, October 1, 2007; 133(10): 1022 - 1027. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. W. Pickle, Y. Hao, A. Jemal, Z. Zou, R. C. Tiwari, E. Ward, M. Hachey, H. L. Howe, and E. J. Feuer A New Method of Estimating United States and State-level Cancer Incidence Counts for the Current Calendar Year CA Cancer J Clin, January 1, 2007; 57(1): 30 - 42. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Jemal, R. Siegel, E. Ward, T. Murray, J. Xu, and M. J. Thun Cancer Statistics, 2007 CA Cancer J Clin, January 1, 2007; 57(1): 43 - 66. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Smigal, A. Jemal, E. Ward, V. Cokkinides, R. Smith, H. L. Howe, and M. Thun Trends in breast cancer by race and ethnicity: update 2006. CA Cancer J Clin, May 1, 2006; 56(3): 168 - 183. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Jemal, R. Siegel, E. Ward, T. Murray, J. Xu, C. Smigal, and M. J. Thun Cancer statistics, 2006. CA Cancer J Clin, March 1, 2006; 56(2): 106 - 130. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Badruddoja Size of Breast Cancer Tumor After Core-Needle Biopsy and Fine-Needle Aspiration Does Not Affect Patient Treatment Plan Arch Surg, October 1, 2005; 140(10): 1008 - 1009. [Full Text] [PDF] |
||||
![]() |
M. K. Bucci, A. Bevan, and M. Roach III Advances in Radiation Therapy: Conventional to 3D, to IMRT, to 4D, and Beyond CA Cancer J Clin, March 1, 2005; 55(2): 117 - 134. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Jemal, T. Murray, E. Ward, A. Samuels, R. C. Tiwari, A. Ghafoor, E. J. Feuer, and M. J. Thun Cancer Statistics, 2005 CA Cancer J Clin, January 1, 2005; 55(1): 10 - 30. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. J. Semmes, Z. Feng, B.-L. Adam, L. L. Banez, W. L. Bigbee, D. Campos, L. H. Cazares, D. W. Chan, W. E. Grizzle, E. Izbicka, et al. Evaluation of Serum Protein Profiling by Surface-Enhanced Laser Desorption/Ionization Time-of-Flight Mass Spectrometry for the Detection of Prostate Cancer: I. Assessment of Platform Reproducibility Clin. Chem., January 1, 2005; 51(1): 102 - 112. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | COVER ARCHIVE | SEARCH | TABLE OF CONTENTS |