立即注册 登录
公卫人 返回首页

WWY2009的个人空间 https://www.epiman.cn/?32201 [收藏] [复制] [RSS]

日志

What to do when data are non-normal

热度 6已有 739 次阅读2011-1-19 16:28 |个人分类:统计理论

 What to do when data are non-normal

Often it is possible to transform non-normal data into approximately normal data Non-normality is a way of life, since no characteristic (height, weight, etc.) will have exactly a normal distribution. One strategy to make non-normal data resemble normal data is by using a transformation. There is no dearth of transformations in statistics; the issue is which one to select for the situation at hand. Unfortunately, the choice of the "best" transformation is generally not obvious.

This was recognized in 1964 by G.E.P. Box and D.R. Cox. They wrote a paper in which a useful family of power transformations was suggested. These transformations are defined only for positive data values. This should not pose any problem because a constant can always be added if the set of observations contains one or more negative values.

The Box-Cox power transformations are given by

The Box-Cox Transformation
x(lambda) = (x**lambda - 1)/lambda   lambda <> 0;
     x(lambda) = LN(x)    lambda = 0

Given the vector of data observations x = x1, x2, ...xn, one way to select the power lambda is to use the lambda that maximizes the logarithm of the likelihood function

The logarithm of the likelihood function
f(x,lambda) =
 (-n/2)*LN{SUM[i=1 to n][(x(i)(lambda) - xbar(lambda))**2/n}
 + (lambda - 1)*SUM[i=1 to n][LN(x(i)]
where

xbar(lambda) = (1/n)*SUM[i=1 to n][x(i)(lambda)]

is the arithmetic mean of the transformed data.

Confidence bound for lambda In addition, a confidence bound (based on the likelihood ratio statistic) can be constructed for lambda as follows: A set of lambda values that represent an approximate 100(1- alpha)% confidence bound for lambda is formed from those lambda that satisfy
    f(x,lambda) >= f(x,lamda hat) - 0.5*chi-square(alpha,1)
where lamda hat denotes the maximum likelihood estimator for lambda and chi-square(alpha,1) is the upper 100x(1-alpha) percentile of the chi-square distribution with 1 degree of freedom.
Example of the Box-Cox scheme To illustrate the procedure, we used the data from Johnson and Wichern's textbook (Prentice Hall 1988), Example 4.14. The observations are microwave radiation measurements.
Sample data
.15 .09 .18 .10 .05 .12 .08
.05 .08 .10 .07 .02 .01 .10
.10 .10 .02 .10 .01 .40 .10
.05 .03 .05 .15 .10 .15 .09
.08 .18 .10 .20 .11 .30 .02
.20 .20 .30 .30 .40 .30 .05
Table of log-likelihood values for various values of lambda The values of the log-likelihood function obtained by varying lambda from -2.0 to 2.0 are given below.

lambda LLF lambda LLF lambda LLF

-2.0 7.1146 -0.6 89.0587 0.7 103.0322
-1.9 14.1877 -0.5 92.7855 0.8 101.3254
-1.8 21.1356 -0.4 96.0974 0.9 99.3403
-1.7 27.9468 -0.3 98.9722 1.0 97.1030
-1.6 34.6082 -0.2 101.3923 1.1 94.6372
-1.5 41.1054 -0.1 103.3457 1.2 91.9643
-1.4 47.4229 0.0 104.8276 1.3 89.1034
-1.3 53.5432 0.1 105.8406 1.4 86.0714
1.2 59.4474 0.2 106.3947 1.5 82.8832
-1.1 65.1147 0.3 106.5069 1.6 79.5521
-0.9 75.6471 0.4 106.1994 1.7 76.0896
-0.8 80.4625 0.5 105.4985 1.8 72.5061
-0.7 84.9421 0.6 104.4330 1.9 68.8106

This table shows that lambda = .3 maximizes the log-likelihood function (LLF). This becomes 0.28 if a second digit of accuracy is calculated.

The Box-Cox transform is also discussed in Chapter 1 under the Box Cox Linearity Plot and the Box Cox Normality Plot. The Box-Cox normality plot discussion provides a graphical method for choosing lamda to transform a data set to normality. The criterion used to choose lamda for the Box-Cox linearity plot is the value of lamda that maximizes the correlation between the transformed x-values and the y-values when making a normal probability plot of the (transformed) data.

http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc52.htm
4

路过
1

鸡蛋

鲜花

握手

雷人

刚表态过的朋友 (5 人)

发表评论 评论 (1 个评论)

回复 yt6616 2011-11-29 22:39
转了~

facelist

您需要登录后才可以评论 登录 | 立即注册

手机版|会员|至尊|接种|公卫人 ( 沪ICP备06060850号-3 )

GMT+8, 2024-4-30 23:31 , Processed in 0.030911 second(s), 7 queries , Gzip On, MemCached On.

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

返回顶部