- 积分
- 23697
好友
记录
日志
相册
回帖0
主题
分享
精华
威望 旺
钢镚 分
推荐 人
|
注册后推荐绑定QQ,之后方才可以使用下方的“用QQ帐号登录”。
您需要 登录 才可以下载或查看,没有账号?立即注册
x
Cody's Data Cleaning Techniques Using SAS, Second Edition0 m0 R9 r3 R# y! g$ C4 {6 t% F" F
Cody's Data Cleaning Techniques Using SAS, Second Edition+ t$ J% R. A% f4 O# s/ L+ W
By Ron Cody. e7 C7 C! z. h, I* S7 A+ @ H
- Y* S0 G( L% u1 N# u
3 S7 @. M! n$ L' f3 M( I) GPublisher: SAS Press # h8 V9 [5 ?3 z: Q! Q, q
Number Of Pages: 272
1 s, n1 W/ u, {, N5 {: bPublication Date: 2008-05-13
8 L- z# f: R- @8 l8 d9 D2 |. [0 w7 yISBN-10 / ASIN: 1599946599
5 g! {% r ~. D. {. J( b8 hISBN-13 / EAN: 9781599946597 + D1 J0 _6 U' o3 B6 r
Binding: Perfect Paperback - E( Y& D- _7 { c
& d9 n; Q! I$ [' ]
( ~+ X7 A" L. p/ t
附件是Cody’s Data Cleaning Techniques Using SAS, 2nd Edition.pdf文件,由SAS公司2008年5月出版,272页,原价39.95美金。
6 K7 `6 z7 ^$ W9 A" r, aReview y$ ^/ _7 e* m) f- h" Y# B, ]4 L
"Clean data is critical to accurate analysis. By implementing programs and macros in Cody's Data Cleaning Techniques Using SAS, Second Edition, you can achieve the goal of a clean SAS data set. Easy-to-follow examples identify invalid, missing, or out-of-range data. Also included are chapters working with dates and matching Primary Key (Identifier) variables across multiple files. This Second Edition incorporates new features in SAS9." u; ]& q, h9 G/ ^5 H3 B% D
This book is a valuable tool for all SAS users to prepare data for analysis." --Karol Katz, MS, Programmer/Analyst, Yale University School of Medicine
N5 O( D, `* _) Z+ C"Many veteran coders become comfortable - sometimes too comfortable - with the coding techniques that they learned early on in their careers. They believe that there is no need to adopt enhanced features since their old skills continue to provide an adequate return. Dr. Ron Cody is NOT one of those people; his published works on SAS embrace the changes that have occurred in the SAS language over the years. Some of his books, most notably SAS Functions by Example and Learning SAS by Example: A Programmer's Guide, are benchmarks by which other books should be measured. He's now taken one of his earlier works, Cody's Data Cleaning Techniques Using SAS Software and updated it to take advantage of what SAS has introduced in the 9 years since the original version was published.1 g( I+ D. F; ^" k5 z, b
Folks who purchased his original volume should be prepared to put their first copy away and begin to use the newer work at their earliest opportunity." --Andrew T. Kuligowski, SouthEast SAS Users Group- M) H! L) I5 T4 a7 b
Product Description. r- E J/ }2 F; p0 v6 I2 l6 l( V
Thoroughly updated for SAS 9, this second edition addresses tasks that nearly every SAS programmer needs to do - that is, make sure that data errors are located and corrected. Written in Ron Cody's signature informal, tutorial style, this book develops and demonstrates data cleaning programs and macros that you can use as written or modify for your own special data cleaning needs. Each topic is developed through specific examples, and every program and macro is explained in detail.# r0 _3 ~; v& A! [' f
You'll learn how to' o& j1 ^: l$ \7 V( Q
* find and correct errors in character and numeric values
1 o, L( x1 D+ W! ^, @* develop programming techniques related to dates and missing values
* L- r% Y3 h4 o' \ o8 t* use SQL approaches to data cleaning
! t5 V |" X' \8 w6 I* develop techniques for correcting your data errors) f0 y2 p: K4 E( Q# H. K$ L
* use integrity constraints and audit trails to prevent errors from being added to a clean data set
5 |: X* g+ ]: e {( Q' X c! L+ mNovice and experienced SAS users will discover ways to detect and correct data errors while learning how to apply DATA step programming techniques and SAS procedures.
2 d q/ i( ]/ R& A4 y
7 r& {" r# Z: v书的目录如下:" s& u, Q3 G" F* C
1 Checking Values of Character Variables1 F. r# D f4 |$ |* ? C7 P f
2 Checking Values of Numeric Variables7 m1 ^* b5 Q) Z/ j+ d, b( F3 s
3 Checking for Missing Values
4 c7 u' B" ^$ t1 A5 ~ 4 Working with Dates
, z9 ~4 o3 o8 T! v 5 Looking for Duplicates and "n" Observations per Subject" _% {. A! `# Z, r9 Y& Z
6 Working with Multiple Files1 ^% K3 I; q" q3 F
7 Double Entry and Verification (PROC COMPARE)
! c. L+ O& H. c' o! [- K 8 Some PROC SQL Solutions to Data Cleaning- G3 g; A9 j( U/ a
9 Correcting Errors% `2 w* E) p( x4 L c# D% y* c/ s
10 Creating Integrity Constraints and Audit Trails
! A" y2 A" }, E5 J3 Q0 t6 X, O0 r11 DataFlux and dfPower Studio: l0 s: A9 V( E( c/ o; f, r/ T
; d& c) {% k- _ K5 u% J6 G+ z: h/ R/ G1 s$ V% P" \
0 `' v) g9 \) ~: r# a# g0 l3 F% f# U
1 Q+ O7 }! M- [1 c% B$ ^" P* y# \) j( d6 Q1 z7 q7 R
9 V- p' K O" r6 ?; }) H, W
7 H+ I: z% N* Z- Q5 u) N$ \ r+ [1 H
Table of Contents
& b9 U& }& T1 y/ E& d0 WList of Programs ix T5 T7 S/ H# R$ k7 m3 s2 @
Preface xv/ O* {9 l, q$ h V
Acknowledgments xvii _: K3 y; a* @8 _, O. Q
Checking Values of Character Variables) Y- K8 ~- S( h M
Introduction 1
$ u; u2 f& \5 Q# S8 ~Using PROC FREQ to List Values 1: H0 d$ ^. ^' d& i/ M s9 h
Description of the Raw Data File PATIENTS.TXT 24 E/ F( Q0 j) h& Q
Using a DATA Step to Check for Invalid Values 7" J+ X) b; V: n: C D3 N8 `
Describing the VERIFY, TRIM, MISSING, and NOTDIGIT Functions 9) b! O" K4 t! w$ u- H
Using PROC PRINT with a WHERE Statement to List Invalid Values 13
3 a; ~. i" _- p7 B# A& P2 }: WUsing Formats to Check for Invalid Values 155 m1 Q. ?! L0 d% m& t) g
Using Informats to Remove Invalid Values 187 d8 U* X; w; a1 s+ e3 G6 b
Che Checking Values of Numeric Variables
$ {' N6 ]7 s) I! b6 zIntroduction 23
; C+ |5 S& }' s1 ^0 {8 M9 _Using PROC MEANS, PROC TABULATE, and PROC UNIVARIATE to Look
1 U# B2 `" N/ afor Outliers 24
! ]! n9 l* S2 A% ^! k8 z( sUsing an ODS SELECT Statement to List Extreme Values 34: u7 C; A3 O/ n% J. H& Y, Z
Using PROC UNIVARIATE Options to List More Extreme Observations 353 c' r- t, r# v4 p
Using PROC UNIVARIATE to Look for Highest and Lowest Values by Percentage 37! r, B# k r5 k' i5 }" d
Using PROC RANK to Look for Highest and Lowest Values by Percentage 43
( N2 ]- P) F q. N! rPresenting a Program to List the Highest and Lowest Ten Values 47* Z7 g) n, v# w7 Z( E
Presenting a Macro to List the Highest and Lowest "n" Values 50
4 Y+ C: G6 ?$ r$ J" d8 d$ MUsing PROC PRINT with a WHERE Statement to List Invalid Data Values 526 H" `! u& `! j$ }
Using a DATA Step to Check for Out-of-Range Values 54
! x, x- k$ X _9 ^% k6 |. JIdentifying Invalid Values versus Missing Values 55Listing Invalid (Character) Values in the Error Report 57
4 q9 n( G* a+ U3 ~Creating a Macro for Range Checking 60
& H, p; n* I8 }8 h6 k9 u" kChecking Ranges for Several Variables 627 {3 M) h" \1 @5 I
Using Formats to Check for Invalid Values 667 A; g1 F i j4 G; D+ C
Using Informats to Filter Invalid Values 68, x" w7 }' ^( t! |
Checking a Range Using an Algorithm Based on Standard Deviation 71- g, S5 I7 A% X! \' E
Detecting Outliers Based on a Trimmed Mean and Standard Deviation 73
( C ~3 D* B, _+ Z" Y' s- w9 a9 j8 {Presenting a Macro Based on Trimmed Statistics 76& g* x/ J5 e6 _4 E
Using the TRIM Option of PROC UNIVARIATE and ODS to Compute, @; x8 j6 h% A6 m
Trimmed Statistics 80
# ]; {. Q! @6 }0 ^8 n& O% tChecking a Range Based on the Interquartile Range 86
9 a& O( s! [; z$ P' S$ }4 S- PChecking for Missing Values
: r% k- D% Y; j$ l/ i) LIntroduction 91
; s: S7 T6 O& O1 r) R7 }7 Q" q1 nInspecting the SAS Log 91
# C, l3 I4 s& {3 |% ~ c" l& P8 mUsing PROC MEANS and PROC FREQ to Count Missing Values 93
4 `# S5 v% U% kUsing DATA Step Approaches to Identify and Count Missing Values 96
5 O0 g3 d' w, ~. }Searching for a Specific Numeric Value 100
3 J# p& A* ]+ T3 S* e) M mCreating a Macro to Search for Specific Numeric Values 102
) w( r' I- i% U% M/ a9 w/ ?Working with Dates
c' B9 L7 K& [2 k. n" z x7 E2 SIntroduction 1058 n& U1 D1 x7 b4 ~# q
Checking Ranges for Dates (Using a DATA Step) 106
9 \2 Z: P0 B3 \' XChecking Ranges for Dates (Using PROC PRINT) 107
5 l9 w, b4 A% @% K6 KChecking for Invalid Dates 108
' J1 T2 a# F" V$ b) Z) O/ @0 aWorking with Dates in Nonstandard Form 111
$ M/ ^' h# y! d) u. dCreating a SAS Date When the Day of the Month Is Missing 1130 _( }# S$ C5 t+ ?( A3 k
Suspending Error Checking for Known Invalid Dates 114
- U' x8 B& B* rChecking a Range Using an Algorithm Based on the Standard Deviation 169* l9 `: m# Q a& e0 X3 o7 y. k
Checking for Missing Values 1708 }, J( _- M5 i" a0 t2 n; M
Range Checking for Dates 1729 {6 Q2 z: O7 `7 D7 n
Checking for Duplicates 173
0 w+ {, {$ a1 w7 g6 r+ _- tIdentifying Subjects with "n" Observations Each 174
$ h) c/ b, }2 o" MChecking for an ID in Each of Two Files 174
0 v0 z2 g7 g' LMore Complicated Multi-File Rules 1762 S# @- l! v# D* a; W
Corr Correcting Errors
# ]# t1 S4 d9 n7 H6 _Introduction 181" o+ G3 G6 `% [0 d3 D. d% H& n
Hardcoding Corrections 1815 o" J/ M% R; F% k, h! N
Describing Named Input 182
% l! b/ N# Q& C5 MReviewing the UPDATE Statement 184; \4 N2 k9 \! O U6 t; @4 w
Corr Creating Integrity Constraints and Audit Trails
* a' t& ~' M5 t# oIntroducing SAS Integrity Constraints 1875 j" [+ B$ w; ]1 H
Demonstrating General Integrity Constraints 188
! Y1 V* I5 g4 p& `( yDeleting an Integrity Constraint Using PROC DATASETS 1932 c) N$ R& {$ E) g4 b4 v5 `% ?5 \' n- R! e
Creating an Audit Trail Data Set 193! H& R" [: L$ G9 Q( ~
Demonstrating an Integrity Constraint Involving More than One Variable 200
: X% m3 |4 U/ u& ~+ |* fDemonstrating a Referential Constraint 2026 g3 r7 C L" y1 }; h) c
Attempting to Delete a Primary Key When a Foreign Key Still Exists 205
7 {7 M0 e) V$ |Attempting to Add a Name to the Child Data Set 2071 s+ d; j9 t' ^# V# w
Demonstrating the Cascade Feature of a Referential Constraint 208
! M% N4 s% E' A) g( f; Y7 C! _Demonstrating the SET NULL Feature of a Referential Constraint 2108 A6 N4 b q. X
Demonstrating How to Delete a Referential Constraint 211 |
|