From: "Muralidharan, Bharath" Bharath.Muralidharan@stjude.org Subject: [NMusers] Coding for missing data values Date: Mon, June 28, 2004 11:32 am Dear NONMEM Users, Let me introduce myself as Kumar. I am a graduate student in the department of Biomedical Engineering at UT Health science center – Memphis. Is there a way in which I can code for missing data relating to a possible covariate? An example would be that I have Technetium clearance for only few patients and do not have most of the other data set. How do I code for the fact that the data item is missing in a few individuals? I assume that NONMEM reads this as a zero value rather than as a missing value. Kumar Graduate Student Pharmaceutical Sciences Department St. Jude Children's Research Hospital 332 North Lauderdale Street Memphis, TN 38105 Danny Thomas Research Center _______________________________________________________ From: "Bachman, William (MYD)" bachmanw@iconus.com Subject: RE: [NMusers] Coding for missing data values Date: Mon, June 28, 2004 12:20 pm There are a number of ways you can do this: 1. simply code separate parameters for those with and without the covariate. IF(TECL.EQ.0) THEN CL=THETA(1) ;where TECL is assigned to zero in the data file for those with missing value ELSE CL=THETA(2)+(TECL-5.4)*THETA(3) ; where 5.4 might be the mean TECL ENDIF 2. impute the missing covariate. again a number of ways this can be done. eg. simplest way is to use the population mean for the missing subjects or devise a more complex imputation scheme possibly based on the relationship between the covariate and other available covariates. second way probably more prone to inducing bias in the model, first way possibly less explanatory of variance. William J. Bachman, Ph.D. Manager, Pharmacometrics Research and Development GloboMax® The Strategic Pharmaceutical Development Division of ICON plc 7250 Parkway Drive, Suite 430 Hanover, MD 21076 410-782-2212 bachmanw@iconus.com _______________________________________________________ From: Nick Holford n.holford@auckland.ac.nz Subject: RE: [NMusers] Coding for missing data values Date: Mon, June 28, 2004 7:40 pm Bill, In your first method you propose estimating THETA(1) for CL when TECL is missing and THETA(2) for CL when TECL is equal to the mean TECL. If TECL is missing then wouldn't the simplest thing be to assume that TECL is equal to the mean TECL (e.g. 5.4) in which case THETA(2) is the prediction for CL if TECL is missing? This only requires estimation of one THETA instead of two. If I understand the second method you are proposing correctly then it shouldn't be any worse than method 1 and in general will be better. If observed TECL is used as a DV with DVID.EQ.2 and observed CONC has DVID.EQ.1 then I would suggest the following: $THETA 10 ; POPCL $THETA 5.4 ; POPTCL $THETA 0.1 ; SLOPE $OMEGA 0.25 ; PPV for CL $OMEGA 0.01 ; PPV for POPTCL $SIGMA 1 ; eps(1) $SIGMA 0.01 FIX ; eps(2). Use a plausible value for the measurement error of TECL e.g. SD=0.1 $PK ITCL=THETA(2)*EXP(ETA(2)) ; individual prediction for TECL GRPCL=THETA(1)*EXP((ITCL-5.4)*THETA(3)) ; group prediction for CL CL=GRPCL*EXP(ETA(1)) ; individual CL prediction ... $ERROR IF (DVID.EQ.1) THEN Y=F+EPS(1) ; observed conc ENDIF IF (DVID.EQ.2) THEN Y=POPTCL+EPS(2) ; observed TECL ENDIF If population parameter variability for TECL [OMEGA(2,2)] is fixed to 0 then this becomes essentially the same as your method 1 i.e. it uses the mean observed TECL to centre the TECL covariate. If OMEGA(2,2) is estimated then the value of ITCL will vary from subject to subject. Depending on how small EPS(2) is made the value will be close to the observed value when TECL is not missing. If it is missing then a plausible value will be imputed that reflects the uncertainty in CL for that individual given the particular covariate model using TECL. If I remember correctly this method for imputing missing covariates with NONMEM was first proposed by Karlsson M, Jonsson E, Wiltse C, Wade J. Assumption testing in population pharmacokinetic models: illustrated with an analysis of moxonidine data from congestive heart failure patients. J Pharmacokinet Biopharm 1998;26(2):207-46. Note the empirical covariate model for TECL uses EXP() to avoid predicting negative values of GRPCL. If THETA(3) is 'small' then this model is approximately the same as a linear function of TECL. Nick -- Nick Holford, Dept Pharmacology & Clinical Pharmacology University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand email:n.holford@auckland.ac.nz tel:+64(9)373-7599x86730 fax:373-7556 http://www.health.auckland.ac.nz/pharmacology/staff/nholford/ _______________________________________________________ From: "Anthe Zandvliet" Apaza@SLZ.NL Subject: RE: [NMusers] Coding for missing data values Date: Tue, June 29, 2004 Nick, Thank you for your suggestion how to account for missing covariates. I hope that I haven't misunderstood the code , but I suppose that the ERROR block should contain Y=ITCL+EPS(2) rather than Y=POPTCL+EPS(2). Could you please let me know if I'm wrong? I will definitely try the code provided by you. Thanks again! Anthe _______________________________________________________ From: Nick Holford n.holford@auckland.ac.nz Subject: RE: [NMusers] Coding for missing data values Date: Tue, June 29, 2004 2:57 pm Anthe, Sorry about the mistake. You are correct. Of course, the prediction for DVID.EQ.2 should be ITCL not POPTCL. Note that you can make the code for prediction of ITCL fancier if you want e.g. if you have WT then you could make an allometric prediction. POPTCL would then be the pop TECL for a 70 kg subject. POPTCL=THETA(2) GRPTCL=POPTCL*(WT/70)**0.75 ITCL=GRPTCL*EXP(ETA(2)) ; individual prediction for TECL Nick -- Nick Holford, Dept Pharmacology & Clinical Pharmacology University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand email:n.holford@auckland.ac.nz tel:+64(9)373-7599x86730 fax:373-7556 http://www.health.auckland.ac.nz/pharmacology/staff/nholford/ _______________________________________________________ From: "Bachman, William (MYD)" bachmanw@iconus.com Subject: RE: [NMusers] Coding for missing data values Date: Mon, June 28, 2004 9:33 pm Nick, You've misunderstood the code. In the second instance, the code is CENTERED on the mean but INDIVIDUALIZED by TECL from the data file. There is also a difference between assuming the mean and just allowing the CL for missing TECL to be estimated at whatever value the data will dictate. Bill _______________________________________________________ From: Nick Holford n.holford@auckland.ac.nz Subject: RE: [NMusers] Coding for missing data values Date: Mon, June 28, 2004 10:05 pm Bill, I don't think I have misunderstood your code. What I don't understand is why you chose this code. Is this because you do not want to assume that the group clearance in a subject with TECL missing is the same as the group clearance in a subject with TECL equal to the mean TECL? Would you please try to clarify your remarks about assuming the mean etc. by referring explicitly to the comments I made earlier? Are your remarks related to Method 1 or Method 2? Thanks, Nick -- Nick Holford, Dept Pharmacology & Clinical Pharmacology University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand email:n.holford@auckland.ac.nz tel:+64(9)373-7599x86730 fax:373-7556 http://www.health.auckland.ac.nz/pharmacology/staff/nholford/ _______________________________________________________ From: "Bachman, William (MYD)"Subject: RE: [NMusers] Coding for missing data values Date: Tue, June 29, 2004 9:02 am "Why did I choose this code?" Actually, the bottom line is that I gave two methods that are commonly employed: assuming the mean for the missing value (or other imputation algorithm) or letting the data decide if this is a valid assumption. (remember that the original question asked was: how do you code missing covariates?) Choose whichever you want. In actual practice, I've most often assumed the mean for missing covariates and frankly it usually has no significant effect on the model which method is used. However, I've decided to be the devils advocate in the finest Holfordesque tradition. There is no reason to assume that the missing values are random and representative of the population (unless you have additional prior information and in the absence of which it is not rigorous to assume that they are from a statistical viewpoint). e.g. they may have come from a pediatric population (less blood drawn, fewer tests, high likelihood of different parameters), from a site with less rigorous procedures (just skipped the test, could also have less attention to sampling times resulting in more variability), from a sicker subset, or any number of other scenarios. Assuming the mean for them introduces systematic bias under these scenarios. Allowing the parameter to be estimated could prove/disprove the validity of the assumption. The other reason for coding they way I did was the interpretation of the thetas. In retrospect, this is how I would code it today: IF(TECL.EQ.0) THEN TVCL=THETA(1) + THETA(2) + (COVn-x.x)*THETA(n) + ... ELSE TVCL=THETA(1)+(TECL-5.4)*THETA(3) + (COVn-x.x)*THETA(n) + ... ENDIF CL=TVCL*EXP(ETA(1)) Then, theta(1) is "basically" the population typical value, theta(2) relates the difference in CL between those with and without measured TECL, and theta(3), theta(n) represent the influence of TECL, COVn ... At the conclusion of the modeling excercise, test for significance of all thetas. If any are not, remove them from the model. (if theta(2) is zero you have proved that that population without measured TECL can be adequately represented by mean TECL, get rid of it). Let the data drive the model to the simplest model rather than assuming it apriori. If a simpler model is warranted, the data will tell you that and the prudent modeler will listen to the data. Also, give 10 analysts a set of data and you will get 10 differently coded models. _______________________________________________________ From: Nick Holford Subject: RE: [NMusers] Coding for missing data values Date: Tue, June 29, 2004 9:42 pm Bill, Thanks for explaining your approach. I agree with your overall strategy (not very Holfordesque!) if you do not want to use the joint modelling method to describe the missing covariate. Returning to being Holfordesque, I would quibble with the choice of an additive model for all covariate effects. Unless one is careful these kinds of models can lead to predictions of negative values which are usually unphysiological. I prefer to use multiplicative covariate models for empirical covariate effects e.g. POPCL=THETA(1) KMISS=THETA(2) KTECL=THETA(2) KCOVN=THETA(4) IF(TECL.EQ.0) THEN GRPCL=POPCL*EXP(KMISS)*EXP((COVn-x.x)*KCOVN)*EXP(...) ... ELSE GRPCL=POPCL*EXP((TECL-5.4)*KTECL)*EXP(COVn-x.x)*KCOVN)*EXP(...) ... ENDIF CL=GRPCL*EXP(ETA(1)) In this particular case the TECL is probably being used to predict renal function in which case an additive model would be mechanistically more appropriate. I would then prefer to write: PPCLNR=THETA(1) ; constrain this to be non-negative in $THETA KMISS=THETA(2) POPCLR=THETA(2) KCOVN=THETA(4) TCLSTD=5.4 ; or whatever value is appropriate for a standard renal function IF(TECL.EQ.0) THEN RF=EXP(KMISS) ; KMISS.NE.0 means Renal Function is non-standard when TECL is missing ELSE RF=TECL/TCLSTD ; RF.EQ.1 means this is standard Renal Function ENDIF GRPCL=(PPCLNR + RF*POPCLR)*EXP(COVn-x.x)*KCOVN)*EXP(...) ... CL=GRPCL*EXP(ETA(1)) Nick -- Nick Holford, Dept Pharmacology & Clinical Pharmacology University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand email:n.holford@auckland.ac.nz tel:+64(9)373-7599x86730 fax:373-7556 http://www.health.auckland.ac.nz/pharmacology/staff/nholford/ _______________________________________________________