公卫人

 找回密码
 立即注册

QQ登录

只需一步,快速开始

查看: 2328|回复: 1

[分享] winsor2 异常值处理

[复制链接]
epiman 发表于 2016-2-11 18:56:36 | 显示全部楼层 |阅读模式

注册后推荐绑定QQ,之后方才可以使用下方的“用QQ帐号登录”。

您需要 登录 才可以下载或查看,没有帐号?立即注册

x
winsor
% C5 G7 C& b( o6 f% gwinsor2    winsor2 can winsorize a varlist, operate with the by prefix, and offers a replace option.
- ^8 V& X) ?1 I1 p% \5 g$ F
+ R6 B7 D" s# V# I+ ^  Gtrimplot3 |& Q# b+ V. D6 O
trimmean
6 V& E) S$ B3 l# c" k9 A: m& J/ X0 m5 A- x9 G
Winsorizing is not equivalent to simply excluding data, which is a simpler procedure, called trimming or truncation.  In a trimmed estimator, the extreme values are discarded;
6 M" K$ G' O2 [ in a Winsorized estimator, the extreme values are instead replaced by certain percentiles, specified by option cuts(# #). 0 m) ]' q; E# u  w
/ B* }# y% H0 `* @# A1 ^
  . sysuse nlsw88, clear
+ ~! ?. a7 D) d6 L! M& J9 u3 U- j  . sum wage, detail
4 P8 W; X7 Z6 ?4 W, p* M' W4 w# [# C
! M# z, g' m5 ^Winsorizing
/ _4 K+ r1 c  M" w; m  q$ `7 A3 ?; \
    In defult, winsor2 winsorize wage at 1th and 99th percentiles,% O- G& l; j; d( u6 D; s$ N% a

' ]( L/ |" F$ |6 s4 z% y+ N* B4 c1 J        . winsor2 wage, replace cuts(1 99)2 b$ T/ S7 q# _2 I# |

# U+ }* T' O  [3 p  f3 W$ d    which can be done by hands:
' f. A* W  T7 G3 k! S, H  q. B- z# g1 y3 w: e3 U$ J  D  X
        . replace wage=1.930993 if wage<1.930993
8 f1 J$ n% Z: w6 T: {6 d& t        . replace wage=38.70926 if wage>38.70926. Z0 \2 e! J9 g8 N

9 }9 w1 f) D0 a9 d    Note that, values smaller than the 1th percentile is repalce by the 1th percentile, and the similar thing is done with the 99th percentile.
! \# Y7 p* m, c; k, U. Y4 s; }1 R. E' k& f# X& B
trimming
( k) h: k& W5 {6 p) |$ u! U# h8 y7 o0 h8 o: X+ `5 r; X
Things change when -trim- option is specified:
2 {( ~3 C6 {2 i" P% j( s3 y- u3 J+ }, K3 l  c
        . winsor2 wage, replace cuts(1 99) trim
8 Z" }$ P; F7 c* `6 p. ^: G8 m
- n' F% o: Z( l" f+ {    which can also be done by hands:, e, G' m/ N/ t9 T5 [

& @9 B4 b4 ?" q; j+ ]# H        . replace wage=. if wage<1.930993
5 {/ Y, K1 \8 @9 K. y        . replace wage=. if wage>38.70926
5 l; t5 C6 M5 ~+ R- W0 t2 j1 z
3 l- v1 p6 W  f, Y! PIn this case, we discard values smaller than 1th percentile or greater than 99th percentile.  This is trimming.
0 v9 x0 h# u) M* q2 P0 G* p: C) r7 A
简介:winsor2 winsorize or trim (if trim option is specified) the variables in varlist at particular percentiles specified by option cuts(# #). In defult, new variables will be generated with a suffix "_w" or "_tr", which can be changed by specifying suffix() option. The replace option replaces the variables with their winsorized or trimmed ones. 3 q. {. b$ i3 A8 v2 @
' s$ |% j; \9 C
相比于winsor命令的改进:
) [; ], @  r2 \* W; n1 I2 n6 R(1) 可以批量处理多个变量;
( E7 Z$ u4 e9 P(2) 不仅可以 winsor,也可以 trimming;$ K! k* i) o$ P# B- `) M: i: V4 J0 e
(3) 附加了 by() 选项,可以分组 winsor 或 trimming;
& W( Z& w( E+ Q+ S3 N( n5 H(4) 增加了 replace 选项,可以不必生成新变量,直接替换原变量。
1 f) I. G( K1 t6 z5 ]! v5 {
2 j) V1 b$ t1 F. k: z下载:
8 s; s6 n( W/ c8 C/ [; ~5 xssc install winsor2, replace
2 \2 I& J: B' _' ]5 J" G. I9 R$ F2 I: i" z% \1 @, f

评分

参与人数 1钢镚 +5 收起 理由
异香菲 + 5 多多分享,互相学习

查看全部评分

异香菲 发表于 2019-10-7 10:10:02 | 显示全部楼层
哇哦,好棒
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

提现|充值|至尊会员|接种|公卫人 ( 沪ICP备06060850号-3 )

GMT+8, 2019-10-19 08:38 , Processed in 0.160406 second(s), 37 queries , Gzip On.

Powered by Discuz! X3.4

© 2001-2017 Comsenz Inc.

快速回复 返回顶部 返回列表