Wavelet density estimation for weighted data
https://doi.org/10.1016/J.JSPI.2013.09.015Abstract
We consider the estimation of a density function on the basis of a random sample from a weighted distribution. We propose linear and nonlinear wavelet density estimators, and provide their asymptotic formulae for mean integrated squared error. In particular, we derive an analogue of the asymptotic formula of the mean integrated square error in the context of kernel density estimators for weighted data, admitting an expansion with distinct squared bias and variance components. For nonlinear wavelet density estimators, unlike the analogous situation for kernel or linear wavelet density estimators, this asymptotic formula of the mean integrated square error is relatively unaffected by assumptions of continuity, and it is available for densities which are smooth only in a piecewise sense. We illustrate the behavior of the proposed linear and nonlinear wavelet density estimators in finite sample situations both in simulations and on a real-life dataset. Comparisons with a kernel density estimator are also given.
References (45)
- Ahmad, I.A., 1995. On multivariate kernel estimation for samples from weighted distributions. Statistics and Probability Letters 22, 121-129.
- Bhattacharyya, B.B., Franklin, L.A., Richardson, G.D., 1988. A comparison of nonparametric unweighted and length-biased density estimation of fibres. Communications in Statistics, Series A 17, 3629-3644.
- Brunel, E., Comte, F., Guilloux, A., 2009. Nonparametric density estimation in presence of bias and censoring. Test 18, 166-194.
- Cai, T., 2002. On block thresholding in wavelet regression: adaptivity, blocksize and threshold level. Statistica Sinica 12, 1241-1273.
- Chesneau, C., 2010. Wavelet block thresholding for density estimation in the presence of bias. Journal of the Korean Statistical Society 39, 43-53.
- Cox, D.R., 1969. Some sampling problems in technology. In: Johnson, N.L., Smith, H. (Eds.), New Developments in Survey Sampling. Wiley, New York, pp. 506-527.
- Daubechies, I., 1992. Ten Lectures on Wavelets. SIAM, Philadelphia.
- Dewanji, A., Kalbfleisch, J.D., 1987. Estimation of sojourn time distributions for cyclic semi-Markov processes in equilibrium. Biometrika 74, 281-288.
- Eberhardt, L.L., 1978. Transect methods for population studies. Journal of Wildlife Management 42, 1-31.
- Efromovich, S., 1999. Nonparametric Curve Estimation. Springer Verlag, New York.
- Efromovich, S., 2004a. Density estimation for biased data. Annals of Statistics 32, 1137-1161.
- Efromovich, S., 2004b. Distribution estimation for biased data. Journal of Statistical Planning and Inference 124, 1-43.
- El Barmi, H., Simonof, J.S., 2000. Transformation-based estimation for weighted distributions. Journal of Nonparametric Statistics 12, 861-878.
- Gail, M.H., Benichou, J., 2000. Encyclopedia of Epidemiologic Methods. Wiley, Chichester. Gordis, L., 2000. Epidemiology. Saunders, Philadelphia.
- Hall, P., Heyde, C.C., 1980. Martingale Limit Theory and its Applications. Academic Press, New York.
- Hall, P., Patil, P., 1995a. On wavelet methods for estimating smooth functions. Bernoulli 1, 41-58.
- Hall, P., Patil, P., 1995b. Formulae for mean integrated squared error of nonlinear wavelet-based density estimators. Annals of Statistics 23, 905-928.
- Hall, P., Patil, P., 1996. Effect of threshold rules on the performance of wavelet-based curve estimators. Statistica Sinica 6, 331-345.
- Hall, P., Penev, S., 2001. Cross-validation for choosing resolution level for nonlinear wavelet curve estimators. Bernoulli 7, 317-341.
- Hall, P., Schucany, W.R., 1989. A local cross-validation algorithm. Statistics and Probability Letters 8, 109-117.
- Hanberry, B.B., Yang, J., Kabrick, J.M., He, H.S., 2012. Adjusting forest density estimates for surveyor bias in historical tree surveys. The American Midland Naturalist 167, 285-306.
- Jones, M.C., 1991. Kernel density estimation for length biased data. Biometrika 78, 511-519.
- Jones, M.C., Karunamuni, R.J., 1997. Fourier series estimation for length biased data. Australian Journal of Statistics 39, 57-68.
- Keiding, N., 1991. Age-specific incidence and prevalence: a statistical perspective (with discussion). Journal of the Royal Statistical Society, Series A 154, 371-412.
- Kvam, P., 2008. Length bias in the measurements of carbon nanotubes. Technometrics 50, 462-467.
- Mallat, S.G., 1999. A Wavelet Tour of Signal Processing, 2nd ed. Academic Press, San Diego.
- Patil, G.P., Rao, C.R., 1978. Weighted distributions and size-biased sampling with applications to wildlife populations and human families. Biometrics 34, 179-189.
- Patil, G.P., Rao, C.R., Zelen, M., 1988. Weighted distributions. In: Kotz, S., Johnson, N.L. (Eds.), Encyclopedia of Statistical Sciences, vol. 9. . Wiley, New York, pp. 565-571.
- Pollard, D., 1984. Convergence of Stochastic Processes. Springer, New York.
- Ramirez, P., Vidackovic, B., 2010. Wavelet density estimation for stratified size-biased sample. Journal of Statistical Planning and Inference 140, 419-432.
- Sansgiry, P., Akman, O., 2000. Transformations of the lognormal distribution as a selection model. The American Statistician 54, 307-309.
- Scheike, T.H., Keiding, N., 2006. Design and analysis of time-to-pregnancy. Statistical Methods in Medical Research 15, 127-140.
- Silverman, B.W., 1986. Density Estimation for Statistics and Data Analysis. Chapman & Hall, London.
- Terwilliger, J., Shannon, W., Lathrop, G., Nolan, J., Goldin, L., Chase, G., Weeks, D., 1997. True and false positive peaks in genomewide scans: applications of length-biased sampling to linkage mapping. American Journal of Human Genetics 61, 430-438.
- Tribouley, K., 1995. Practical estimation of multivariate densities using wavelet methods. Statistica Neerlandica 49, 41-62.
- van Eeden, C., 1985. Mean integrated squared error of kernel estimators when the density and its derivative are not necessarily continuous. Annals of the Institute of Statistical Mathematics 37, 461-472.
- van Es, B., 2001. On the expansion of the mean integrated squared error of a kernel density estimator. Statistics and Probability Letters 52, 441-450.
- Vardi, Y., 1982. Nonparametric estimation in the presence of length bias. Annals of Statistics 10, 616-620.
- Vardi, Y., 1985. Empirical distribution in selection bias model (with discussion). Annals of Statistics 13, 178-205.
- Wand, M.P., Jones, M.C., 1995. Kernel Smoothing. Chapman & Hall, London.
- Wu, C.O., 1995. Minimax kernel density estimators with length biased data. Mathematical Methods of Statistics 4, 56-80.
- Wu, C.O., Mao, A.Q., 1996. Minimax kernels for density estimation with biased data. Annals of the Institute of Statistical Mathematics 48, 451-467.
- Zelen, M., Feinleib, M., 1969. On the theory of screening for chronic diseases. Biometrika 56, 601-614.
- Zelen, M., 1974. Problems in cell kinetics and the early detection of disease. In: Proschan, F., Serfling, R.J. (Eds.), Reliability and Biometry. SIAM, Philadelphia, pp. 701-706.
- Zelen, M., 2004. Forward and backward recurrence times and length biased sampling: age specific models. Lifetime Data Analysis 10, 325-334.