"See also: 99aug222002 "Imputation of missing sex covariate" From:"Venkatesh Atul Bhattaram"Subject:[NMusers] Missing Gender (Categorical values) Date:Tue, 30 Jul 2002 11:19:59 -0400 Hello All Could somebody share their views on how to analyse data where there are missing "gender" data. I am analysing a data where in one study in nearly 70% of the data the information on "Gender" is missing. In earlier discussion on missing covariates in 2001 Dr Leonid discussed a way to analyse this data. Are there any new views on these type of data? Could somebody suggest me some references in this direction? Thanks in advance for your time. Venkatesh Atul Bhattaram Post-doctoral Fellow University of Florida Gainesville-32610 ------------------ From:Nick Holford Subject:Re: [NMusers] Missing Gender (Categorical values) Date:Wed, 31 Jul 2002 07:34:37 +1200 Atul, You probably are missing data on sex (not gender -- Kim JS, Nafziger AN. Is it sex or is it gender? Clin Pharmacol Ther 2000;68(1):1-3). If you are missing sex then I suggest you simulate it. You know from the existing data the probability of being female (PRFEM) so simply simulate the missing sex values e.g. if you use NONMEM: $SIM (20000625 NEW) (12345678 UNIFORM) SUBPROBLEMS=1 ONLYSIMULATION IF (ICALL.EQ.4.AND.SEX.LT.0) THEN ; assume missing SEX is coded < 0 CALL RANDOM(2,R) IF (R.GT.PRFEM) THEN SEX=1 ;male ELSE SEX=0 ;female ENDIF ENDIF An alternative, more elegant approach, is to treat SEX as another DV. This is a bit trickier as it requires a LIKELIHOOD model that allows you to estimate continuous and categorical data at the same time. The missing SEX values are then predicted from the parameter describing the probability of being female just like you can predict DV values at times when you have no observations. Nick Nick Holford, Divn Pharmacology & Clinical Pharmacology University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand email:n.holford@auckland.ac.nz tel:+64(9)373-7599x6730 fax:373-7556 http://www.health.auckland.ac.nz/pharmacology/staff/nholford/ ------------------ From:Nick Holford Subject: [NMusers] Not enough sex! Date:Wed, 31 Jul 2002 07:57:24 +1200 {assuming the subject line got past your spam filter...] I forgot to say that just simulating sex once is not enough. Ideally you should simulate sex about 6 times i.e. simulate the missing SEX covariate in 6 different data sets and run your model with each of these data sets then average the parameter estimates you get across all the 6 runs. This is called multiple imputation. Why 6 times? This is Rubin's Rule (Don Rubin is Prof Statistics at Harvard and invented the multiple imputation method). In this particular instance the rule of 6 is even older -- the Romans would say SEX was enough too. The joint function model (which I mention below) gets around the need to have 6 data sets so is probably more time efficient in the long run. Nick Holford, Divn Pharmacology & Clinical Pharmacology University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand email:n.holford@auckland.ac.nz tel:+64(9)373-7599x6730 fax:373-7556 http://www.health.auckland.ac.nz/pharmacology/staff/nholford/ ------------------ From:Alan Xiao Subject:Re: [NMusers] Missing Gender (Categorical values) Date: Tue, 30 Jul 2002 16:28:04 -0400 Nick and Atul, We might want to take into account the effect of SEX on PK/PD parameters if the effect is for sure before we talk about the random imputation. Does anyone have any literature reference on this topic? Thanks, Alan. ------------------ From:"diane r mould" Subject:RE: [NMusers] Missing Gender (Categorical values) Date: Tue, 30 Jul 2002 16:49:05 -0400 Dear All I would think that you should be able to get some good information for imputation from other covariate data such as weight, creatinine clearance, age etc. So even if a sex effect on the PK or PD of a drug is not well established, one should be able to create a covariate based model that is not unreasonable for use with multiple imputation. Then you could use the covariate information (including the sex data) as your DV and use the Likelihood option as Nick suggested. Diane ------------------ From:Leonid Gibiansky Subject:Re: [NMusers] Not enough sex! Date:Tue, 30 Jul 2002 16:53:10 -0400 I would try the following approaches: 1. Create a three-level covariate: gender= M, F, missing. Then patients with gender="missing" should have intermediate values of the parameters comparing with M and F. At least, this should give a feeling on whether this covariate is important and what is the difference between the parameters for M and F. 2. It is likely that one can predict gender based on weight and height (something else ?). Simple tree model in S+ or any other similar software can do it (find the model based on 30% of the available data and predict for the other 70%). One can then fit the model with these "predicted" gender and compare OF and fit with the model obtained in (1). 3. Alternative may be to try mixture model. If for 30% of patients with known gender, the probability of being in one of two groups will correlate with the gender then one may conclude that groups are defined by the gender. If on the other hand, the mixture model will not reveal importance of gender (again, comparing model (3) with (1) and (2) ), then one can safely ignore the issue and omit the gender. In fact, weight and height may compensate for absence of gender. On the other hand, gender is one of the most easily measured covariates. It should be possible to recover it if any information about the study is available. Leonid ------------------ From:Nick Holford Subject:Re: [NMusers] Missing Gender (Categorical values) Date:Wed, 31 Jul 2002 08:58:57 +1200 Alan, I thought the idea was to simulate SEX in order to help discover the effect of SEX on other model parameters. Can you be be clearer about what you mean about taking this into account BEFORE doing the imputation? The original Rubin paper on imputation is Rubin DB, Schenker N. Multiple imputation in health-care databases: an overview and some applications. Stat Med 1991;10(4):585-98. Joe Shafer (http://www.stat.psu.edu/~jls/) gave an excellent talk at PAGE in 2001 on the topic and has lots of stuff on his web pages. I am not aware of any applications of multiple imputation using NONMEM but Lewis Sheiner has remarked that there is little point because you can always do joint modelling which is a better overall approach. You can look at Mould DR, Holford NHG, Schellens JHM, Beijnen JH, Hutson PR, Rosing H, et al. Population Pharmacokinetic and Adverse Event Analysis of Topotecan in Patients with Solid Tumors. Clinical Pharmacology & Therapeutics 2002;71(5):334-348 for a recent application of the joint modelling approach for missing covariates. The initial suggestion for using this with NONMEM was Karlsson M, Jonsson E, Wiltse C, Wade J. Assumption testing in population pharmacokinetic models: illustrated with an analysis of moxonidine data from congestive heart failure patients. J Pharmacokinet Biopharm 1998;26(2):207-46. -- Nick Holford, Divn Pharmacology & Clinical Pharmacology University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand email:n.holford@auckland.ac.nz tel:+64(9)373-7599x6730 fax:373-7556 http://www.health.auckland.ac.nz/pharmacology/staff/nholford/ ------------------ From:"Stephen Duffull" Subject: RE: [NMusers] Missing Gender (Categorical values) Date:Wed, 31 Jul 2002 08:55:00 +1000 Hi Just my 2c worth. I have tried joint function modelling on one data set - where about 50% of the patients had missing covariates. The covariates were continuous - rather than categorical. There weren't any other covariates to try and get an idea about the missing one in question (which is again different from your example)... so it was a matter of estimating the missing covariate and parameter values simultaneously from the PK data (there was a lot of that). Anyway it seemed to work ok - but we had difficulties due to the large proportion of missing covariates. With 70% of your sex data missing this could be problematic also. In addition, we did not find a satisfactory way of assessing the statistical significance of covariate relationships. When using joint function modelling the objective function is greatly inflated with estimating the covariates, which means that a simple LRT is not straightforward to perform. Regards Steve ***************************************** Stephen Duffull School of Pharmacy University of Queensland Brisbane 4072 Australia Tel +61 7 3365 8808 Fax +61 7 3365 1688 http://www.uq.edu.au/pharmacy/duffull.htm ------------------ From:"Bachman, William" Subject:RE: [NMusers] Missing Gender (Categorical values) - my $0.02 Date: Wed, 31 Jul 2002 08:25:03 -0400 imho, if I had other covariate data such as weight, creatinine clearance, age etc. (and 70% of the gender data was missing), I would forget about gender entirely and not go about making up data! :) Bill ------------------ From:"Lewis B. Sheiner" Subject: Re: [NMusers] Missing Gender (Categorical values) - my $0.02 Date:Wed, 31 Jul 2002 08:58:31 -0700 And just to chime in ... If you *must* know the effect of sex for some reason, then mult imputation is a way of evaluating what Diane (& Nick) call the likelihood option; actually a marginal likelihood -- p(Y|data) = Integral[p(Y,S|data),p(S|data)dS], where Y is your usual response, and S is sex. It is important to realize that both methods are attempting to find the MLE of the SAME the same underlying likelihood (i.e. model). They are simply using different methods to do so. The std error of the sex covaraite (when correctly computed using either method) will of course be larger than if there had been no missing data (you can't get something for nothing). On the other hand, if there is no reason to need to knbow the sex coefficient per se (e.g. you're just on a hunt for explanatory variables & don'tcare which ones you find), then you can just leave sex out if you ahve little informationon it, UNLESS the missingness of sex is non-ignorable (that is, the sex covariate is missing selectively in individuals whose responses are systematically different than the rest). In that case (which should be revealed by Leonid's analysis using a separate 'missing' class for those with missing sex data), if the other covariates do correlate with sex, then sex should be taken into account in the likelihood to avoid bias. This can be done using either of the computational approaches above. LBS. _/ _/ _/_/ _/_/_/ _/_/_/ Lewis B Sheiner, MD (lewis@c255.ucsf.edu) _/ _/ _/ _/_ _/_/ Professor: Lab. Med., Biopharmaceut. Sci. _/ _/ _/ _/ _/ Box 0626, UCSF, SF, CA, 94143-0626 _/_/ _/_/ _/_/_/ _/ 415-476-1965 (v), 415-476-2796 (fax) ------------------ From: Alan Xiao Subject: Re: [NMusers] Missing Gender (Categorical values) Date:Wed, 31 Jul 2002 12:20:21 -0400 Nick, You got me. I replaced "think about" with "take into account" but forgot to change "before" to " when". Anyway, my English undoubtedly need be improved and a careful check should be done before the email was sent out. About the imputation, I am just curious about the effect of the imputation on the identifiability of the covariate effect if the effect of the imputed covariate is significant (for example, known from other data). About this topic, you can ask hundreds of similar questions. Here is an example, if the effect of SEX on CL is surely significant, you might get different imputation results and/or parameter estimates when you include SEX into your model, as compared to when you exclude SEX from your model. Or say, the results might be different between imputations using a structural model and using a full model. (I'm not really sure but this should be testable by simulation). If this is true, then it's reasonable to think (in an opposite way) that imputation will influence the identification of the covariates or the significance of the covariates to the model, depending on what kind of imputation method is used. Thank you for the information you listed. Alan. ------------------ From: "Serge Guzy" Subject: RE: [NMusers] Missing Gender (Categorical values) Date:Wed, 31 Jul 2002 09:41:25 -0700 I think that there is no difference conceptually between filling missing data and estimating PK parameters from sparse data. The same strategy could be used and one of them is the Monte Carlo Implementation of the EM Algorithm. Serge Guzy,PH.D Head of Pharmacometrics Xoma ------------------ From:Alan Xiao Subject:Re: [NMusers] Missing Gender (Categorical values) Date:Wed, 31 Jul 2002 14:08:22 -0400 In methodology, I think you are right. However, in implementation, I'm not sure. As you know, to estimate PK parameters from sparse data, an appropriate model structure is a prerequisite although this model structure can be tested/selected using statistical tools. Similarly, in filling missing data, an appropriate model structure (or algorithm or whatever you name) is also a prerequisite. Likelihood method is just a statistic tool to force the model to fit the data (or converge) in a certain way and it is not the model itself. When the above two processes or tasks (filling the missing data and estimating PK parameters) are not correlated, that would be relatively simple - both model structures (or algorithms) can be adjusted independently at the same time based on the selected statistic tools. If they are correlated, how to make sure both model structures are correct or correctly adjusted based on a selected statistic tool is somehow a question I would like to ask. I'm just wondering if there are any reports on this. I noticed that in Mould et al's paper on topotecan (J Pharmacokinet Biopharm 1998;26(2):207-46), WEIGHT was used as a "built-in" covariate in the model. SEX was not identified as a covariate. I'm not sure whether SEX is really not significant or just because SEX is highly correlated with WEIGHT (equation 4) and the WEIGHT imputation covers the effect of SEX. Of course, I do not mean SEX must be in the model, either. I'm just curious if any testing was performed on this. Alan. ------------------ From:Nick Holford Subject:Re: [NMusers] Missing Gender (Categorical values) Date: Thu, 01 Aug 2002 07:52:19 +1200 Alan, Alan Xiao wrote: The idea of multiple imputation is to test the effect of SEX under the range of plausible possibilities for the missing SEX covariate. You cannot get something for nothing so I expect the power to detect the effect of SEX is lower than if you had the full data set. On the other hand if you conclude via imputation that SEX has an effect then you are more likely to be correct (assuming the effect of SEX is the truth) than by assuming an intermediate SEX which is guaranteed to be the wrong value for the missing covariate. Your response to Serge Guzy said"I noticed that in Mould et al's paper on topotecan (J Pharmacokinet Biopharm 1998;26(2):207-46), ". Note there are 2 papers cited here. Diane Mould wrote about topotecan in 2002, Mats Karlsson wrote about assumption testing in 1998. I leave Diane and/or Mats to respond to your question... -- Nick Holford, Divn Pharmacology & Clinical Pharmacology University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand email:n.holford@auckland.ac.nz tel:+64(9)373-7599x6730 fax:373-7556 http://www.health.auckland.ac.nz/pharmacology/staff/nholford/ ------------------ From: "diane r mould" Subject:RE: [NMusers] Missing Gender (Categorical values) Date: Wed, 31 Jul 2002 19:24:54 -0400 Dear Alan I would have to agree with Lewis' summary - that if you have some reason to believe that sex is an important covariate or if there is reason to believe that the missingness of sex is informative (non ignorable) then you have reason to undertake some form of multiple imputation to attempt to account for that covariate. However, if you are just in the process of identification of covariates, then you would be better off taking less intensive measures than imputation. Therefore, I would ask for more more input from you on that issue before launching into some discussion on how to do that and whether the results are reasonable. is this covariate part of your hunt for covariates? is there some reason to think that the missingness is not ignorable? The work that was published in CPT did not have to estimate sex based on imputation, but we did have to estimate performance status, which is also a discrete covariate and therefore has some of the same issues associated with imputation. Unlike sex, which is correlated with other covariates such as weight or creatinine clearance, we did not see correlations for performance status that would help predict it (other than the response) although we were also not missing as much data as you seem to be. Therefore, I expect that your imputation model for sex would probably be more reliable than ours was for performance status. So if you do need to use some form of imputation then I think it would be do-able. Please let me know your thoughts Best Regards Diane ------------------ From: Alan Xiao Subject:Re: [NMusers] Missing Gender (Categorical values) Date: Sun, 04 Aug 2002 20:58:07 -0400 Dear Diane, Sorry for the late reply to your email because I was out for vacation 20 minutes after I sent the last email. About the imputation of missing data, I'm not against Lewis' summary at all. By contrast, I agree with his summary. However, as expressed in the last email, what I'm concerned is about the potential effect of the imputation model (or algorithm) on the evaluation of the significance of the imputed covariate and corresponding correlated covariates (used in the imputation model, or joint model in your paper) to the parameters in a PK model and/or the justification of the imputation model and the PK model - this is not about the Likelihood method itself. Here, by justification of the PK model, I mean the type of function in the PK model for the covariate effect (when covariate is other covariates than SEX) rather than the whole PK model. To make this easier to understand, let's take your paper as an example: 1). How about if you replace Equation 4 in your paper with other simpler functions, such as WEIGHT as a function of BSA and HEIGHT or as a function of AGE, CLCR and SEX. As you know, BSA is usually calculated from HEIGHT and WEIGHT while CLCR is calculated from AGE, WEIGHT and SEX. The functions for them are very certain and no modeling/simulation is needed at all if your BSA data was indeed calculated from HEIGHT and WEIGHT or CLCR was indeed calculated from AGE, WEIGHT and SEX, and HEIGHT and/or SEX data was not missing. From your Table I, AGE was not missing at all. BSA, CLCR and SEX were also available (1 missing in CLCR and 3 in SEX, as compared to AGE). Or, BSA and SEX were also partly imputed in TABLE 1? In another word, patients with missing WEIGHT actually had all other covariate values missing, including AGE, BSA, SEX and CLCR? - I don't think the information about this is clear in the paper. Or your BSA and CLCR were directly measured so that you did not have a certain function for them to simply connect WEIGHT with BSA and HEIGHT or CLCR, AGE and SEX ? If so, can you tell us how they were measured? (The note under table I says that CLCR was calculated from Cockcroft and Gault formula, which is a function of AGE, WEIGHT and SEX - why couldn't you just simply revert the calculation to get WEIGHT from CLCR, AGE and SEX?) Actually, whether they were measured or calculated does not influence our discussion. The question is, how are you sure your joint model is the best imputation model? Did you try other imputation models? If you include a function, for example, WEIGHT**THETA(), to the volume of distribution, or another one such as THETA()*SEX to clearance in the PK model (assuming they are significant, thus your PK model and imputation model are correlated) and simultaneously fit the PK model and imputation model to the data using likelihood method to control the minimization, would you get the same results? or close enough? 2). We talked about SEX previously just because SEX was the missing covariate in the email sent by Atul. If the missing covariate is a continuous covariate, e.g. WEIGHT in your paper, it becomes a little bit more complicated, because the potential function could be a power function in additive or multiplication on some parameters of the PK model. I'm afraid this function will also influence the imputation results. Or, just for testing, how about replacing (WT/70)**0.75 in Equation 3 with (WT/70)? Would the results be the same? (I am trying to figure out the conditions for a model and generalize it). 3). Back to covariate effects. If the missing covariate is not significant to any parameters of a PK/PD model, then whatever value you impute does not matter - the imputation is not really important. However, if the missing covariate (e.g. WEIGHT in your paper) is significant to the PK/PD model, then the type of function in the PK/PD model to express the covariate effect will be correlated with the imputation model, as discussed in (1) and (2) above. Furthermore, if a imputation model-predicting covariate (or "joint model-predicting covariates" such as SEX in your paper, right below equation 4) is significant or marginally significant, its significance could be neutralized (I'm not sure it's the right word) or largely weakened by the inclusion of the imputed covariate (WEIGHT here) into the model (i.e., both the imputation model-predicting covariate, e.g. SEX, and the imputed covariate, e.g. WEIGHT, are significant and correlated). If you have tested that the imputation model-predicting covariate (SEX here) is not significant in the PK/PD model which does not include the imputed covariate (WEIGHT here), then we might be able to ignore the influence of the imputed covariate on the identification of the imputation model-predicting covariate (SEX here). When you say that you "did not have to estimate sex based on imputation", can you explain a little bit more in detail? Did you mean that you have tested or you knew from other data that SEX is not significant (whether WEIGHT is significant or not)? or that SEX is not significant based on the PK model after imputation? 4). How strong is this potential influence of the type of the imputation model on the type of the function of the imputed covariate on parameters in a PK model? and how strong is the potential influence of the imputed covariate on the identification of other significant covariates on parameters in a PK model? I have no idea. This is why I asked for the information if anyone has done this before. But I think this should be case dependent. 5). Do I have a more reliable imputation model for SEX? No. I don't think I can develop one without any detailed information about the dataset. Actually, the specific model itself is not the most important. It is the methodology used to develop the model and the interpretation of the model that is the most important. After all, science is just science, it can be questioned and can be defended. 6). Another minor thing. From my experience, in a combined dataset (from many studies), when the missing ratio is high, the missing pattern is usually not random (refer to the combined dataset) - if the covariate is missing for all subjects in one or more studies, or even if the missing is random in one or more studies. In this case, the simple random imputation may not be appropriate at all - This could be easily overlooked if it is not yourself who have merged all sub datasets together. I assume this is not the case in your paper (20% missing) and in Atul's data (70% missing). I have to admit that I don't have had this paper yet: Nick Holford wrote: Karlsson M, Jonsson E, Wiltse C, Wade J. Assumption testing in population pharmacokinetic models: illustrated with an analysis of moxonidine data from congestive heart failure patients. J Pharmacokinet Biopharm 1998;26(2):207-46. If all or some of above questions/concerns have already been addressed in this paper, please just simply skip and flag them. Thanks. Best regards, Alan. ------------------ From:"diane r mould" Subject:RE: [NMusers] Missing Gender (Categorical values) Date:Mon, 5 Aug 2002 21:10:12 -0400 Dear Alan Height was not available in the data, if it had been then we could have just back-calculated weight from the Dubois and Dubois formula without imputing it. We had BSA for all of the patients, although we did not have the original values of weight that had been used to calculate BSA. We would rather have done that than pay the price in the long run times and other model qualification aspects involved with using joint functions. We did test age, sex, creatinine clearance as covariates for the joint model for the same reasons that you cite. This joint model was built in the same fashion that any model would be built. You should have seen that BSA, sex and creatinine clearance were included in the model - age and the other covariates did not improve the fit. We used the observed covariates in the pk part of the model if they were available. BSA was available for all patients. creatinine clearance and sex were estimated for the few individuals that were missing it. There were 2 patients who were missing weight and were also missing sex and ECOG performance status. All of the patients who were missing weight did have creatinine clearance however. The patient who was missing creatinine clearance had the other covariates available. Patients missing ECOG performance status had all other covariates (with the exception of the two who were missing weight and sex). We did not back calculate from creatinine clearance for several reasons - the first was that we did not have sex for 2 of the patients and we did not have the serum creatinine data for any of them. Without serum creatinine, its hard to back estimate weight even if you happen to know the sex, creatinine clearance, and age of a patient. The second reason was that we were also missing ECOG performance status from a fairly large percentage of patients. This latter covariate historically had been shown to influence the safety of the drug. Therefore we had to handle at least 2 missing covariates that were potentially meaningful. Many of the studies used in that work had been conducted a long time ago, and data bases change. Its hard to answer your question given the age and number of data bases that these data were extracted from. Some of the data was faxed to me on paper because the data bases were not easily available. We felt that this was the 'best imputation model' for the same reasons that a modeler would decide that his final pop pk model was 'best'. It was used because the model described the data, the covariates were physiologically reasonable and because changing that model (ie adding other factors) did not improve it further. The manuscript was fairly clear about the fact that we tested a lot of covariates both for the joint model and for the pk model. The second aspect of your question seems to deal with the selection of the covariates for the PK part of the model. Weight was not added just because we had imputed it. This covariate was added because it explained a lot of the inter-individual variability and its inclusion reduced the objective function. We tested a rather long list of covariates, including sex, age, weight, creatinine clearance, BSA, etc. Other covariates and other functions did not do the job as well. If it helps, I did fit a much reduced data set (where all the imputed data had been removed) and came to the same conclusions that were drawn from the larger data set. In addition, I have completed a second analysis using a new data set, with no missing data and the functions are nearly the same - with the same results on IIV. You may be confusing regenerating missing data with standard model building practices. The covariate model that was ultimately used for weight was an allometric model - which is why the exponential terms for clearance were fixed. I did try other models for weight (and I also tried BSA), but they were not as good as this allometric model. Changing the PK model does have some impact on the imputation model but its not that profound. We noticed that the estimated weights did change slightly from the base to the final model but the observed weights do help keep the individual estimates of weight in line. Unless the model is grossly misspecified, I dont think that you are going to do a lot to change the individual estimates if the imputation model is not perfect. Actually - I think I misspoke - we did impute sex using the joint function for the two subjects who were missing it. Sex was not statistically significant as a covariate in the pk model. You seem to be saying that one could dismiss a covariate because of imputation. I dont think so. Perhaps you are missing an important point - the imputed value of a covariate that is used in the pk is the INDIVIDUAL predicted value, not a typical value. Even a base PK model (with no covariates) should provide good individual predicted values of a concentration. Furthermore, in imputation, one would use the observed values of a covariate when they are available. Sex, in our case, was missing only for 2 patients - a very small percentage. if sex was not significant (and it was not) then its not dismissed because we used imputed weight as a covariate in the final model. Covariate effects are checked individually too. Good model building practices should help prevent the sort of thing that you describe from happening. I am not sure that I can answer that. I would imagine that it would be case dependent but a simulation study would need to be done to test that, or perhaps some other person could answer this. True enough. You seem to be referring to informative missingness, such as missing creatinine clearance information because all of the patients with low CLCR values dropped out due to high drug levels leading to adverse events or something of that sort. Is that right? this is not the case with topotecan - it was missing completely at random as far as we could tell. However, it may be a problem or an issue with Atul's data - but that would make it even more reasonable to impute, in order to avoid bias as Lewis suggested earlier. Best Regards Diane ------------------ "See also: 99aug222002 "Imputation of missing sex covariate"