June 5, 2000 Notes on the SLIDA Splus GUI functions for life data analysis Copyright 2000 W. Q. Meeker These notes provide a brief introduction to some of the capabilities of SLIDA, including the SLIDA command-line operation. Detailed instruction for installation and operation within the SLIDA GUI are contained in the SLIDA documentation (SlidaGui.pdf). After SLIDA is installed in the usual way, Slida will appear on the menu bar. The GUI menu structure was designed so that it could be used without reference to documentation. The use of the menu structure should be, for the most part, straightforward and intuitive. SLIDA will work with any version of Splus after 4.0. If, however, you intend to use the SLIDA GUI, it is strongly recommended that you have a version of Splus 4.5, updated with the July 1998 service pack (or any subsequent release of Splus). The most important statistical tools for the most widely used models are available for use through the SLIDA GUI. A number of other more sophisticated models and analyses are available only through the command line. Some of these more sophisticated models will eventually be available in the GUI. These rest of these notes describe Splus command-line functions for the analysis of censored life data and data from other nonstandard models. For those who will use the GUI, this information is not really necessary, but might be interesting to provide glimpse of what is happening under the GUI and to indicate possibilities for user-extension and modification of SLIDA. One big advantage of working in Splus is that changes to the functions are easy to implement. Users who are familiar with the Splus language can rename my functions and customize them to meet special needs. The work on SLIDA has been motivated by research problems, consulting problems, the need to software my Statistics 533 course, and to do the examples in Meeker and Escobar (1998), Statistical Methods for Reliability Data, published John Wiley and Sons, Inc. I have focused my development to provide high-quality graphical output of the results, although there are a number of functions that will provide tabular output, when requested. This collection of SLIDA/S-PLUS functions can, roughly, be divided into two different groups of functions 1. Functions that can be used to analyze censored data with standard models using nonparametric methods, standard life distributions, and accelerated life test relationships (Chapters 1-8, 16-21 of Meeker and Escobar (1998)). These functions can be used by simply giving commands to Splus (or by sending Splus a file of commands to run in batch mode). One does not have to know much about Splus or anything about "programming" in Splus to use these functions. 2. Functions that allow likelihood analysis and maximum likelihood fitting of nonstandard models, but that require the user to program a likelihood in the Splus language. This second collection of functions also allows the user to compute likelihood profiles (one and two dimensional). This approach was used to fit a number of the special models and distributions in Chapter 11 of Meeker and Escobar. This provides a number of examples that could be easily extended to yet other distributions and models. These notes describe only the first set of functions. Examples of all of these functions are given in the "echapter?.q" in the echapters folder. An Splus object with the name like xxx.ld is a "life.data" object containing the information about dataset xxx. Typically there will be a different .ld file for each data set to be analyzed. The xxx.ld life.data objects contain information like times, censor codes, case weights (for ties, interval-count data or multiple censoring at a point), units of time, information about explanatory variables (e.g., for regression or acceleration models), data set title, etc. SLIDA uses some of the Splus object-oriented programming features. This makes it very easy to do different analyses and reduces the number of function names that must be used. Detailed documentation for these functions has not yet been prepared. Instead users can rely on the large number of examples in the echapters folder. Here is a brief description of some of the most important functions: *************** > frame.to.ld(...) input: ascii data file or data frame name column(s) of responses (2 columns needed if there are intervals) column containing censor codes (default no censoring) column containing case weights (or multiplicities) (default all case have weight 1) data title (for plots and tables) units for response (e.g., minutes, hours, days, or cycles) columns containing explanatory variables (default is none) names for the x variables (default is x1, x2, ...) The numerical censor codes are: 0 dummy observations (ignored in analysis) 1 exact failure time 2 right censored observation 3 left censored observation (or interval assumed to start at 0 of -infinity, depending on the support of the specified distribution) 4 interval censored observation 5 small interval around reported exact failure time (useful when the density approximation to the likelihood is inadequate). It is possible (actually suggested) that these codes be replaced with meaningful words or symbols (like "Failed" and "Censored") and this has been done in the data sets distributed with SLIDA. The list of allow synonyms for the censor codes (which are still allowed) can be seen by using the commands: failure.censor.names: GetSlidaDefault("SLIDA.FailName") right.censor.names: GetSlidaDefault("SLIDA.RcName") left.censor.names: GetSlidaDefault("SLIDA.LcName") interval.censor.names: GetSlidaDefault("SLIDA.IcName") sinterval.censor.names: GetSlidaDefault("SLIDA.DefaultSintervalCensorNames") There is a corresponding > frame.to.rd(...) for recurrence data (Chapter 16) > frame.to.rmd(...) for repeated measures (degradation data), (Chapters 13 and 21). See the detailed examples in the echapter files and the corresponding data in the SLIDA_textdata folder. *************** > summary(lzbearing.ld) > print(lzbearing.lld) The first command provides a summary of the indicated data set. The second command prints the data set. > summary(lzbearing.ld) Data set name: Ball Bearing Cycles to Failure Number of rows in data matrix= 23 Response units: Millions of Cycles Response minimum: 17.88 Response maximum: 173.4 Number of cases in data set= 23 Number of exact failures in data set= 23 Number of right censored observations in data set= 0 Number of left censored observations in data set= 0 Number of interval censored observations in data set= 0 Number of small-interval observations in data set= 0 No explanatory variables *************** > plot(lzbearing.ld) Plots the empirical cdf on a linear-by-linear plot with simultaneous confidence bands, by default or can get a log scale on the x (time) axis with > plot(lzbearing.ld, x.axis="log") To see a table of the output use > print(plot(lzbearing.ld)) To get instead a set of point-wise confidence intervals with a log time axis use: > plot(lzbearing.ld, x.axis="log", band.type="Point-wise") *************** To obtain a probability plot of the requested distribution, use > plot(lzbearing.ld, distribution="Weibull") Simultaneous confidence bands (using the method described in Vijay Nair's 1984 Technometrics paper) are provided by default, but pointwise nonparametric confidence intervals can be requested instead. > plot(lzbearing.ld, distribution="Weibull", band.type="p") The distribution can be specified using and of the following names. sev weibull Weibull normal Normal lognormal Lognormal logistic Logistic loglogistic Loglogistic exponential Exponential In the commands, you can control the axes by using something like plot(xx, x.range=c(my.min,my.max), y.range=c(my.min,my.max)) If you want SLIDA to choose any of the above, use something like: plot(xx, x.range=c(NA,my.max), y.range=c(my.min,NA)) *************** > mleprobplot(lzbearing.ld, distribution="Lognormal") *************** Makes a probability plot of the requested distribution and superimposes an ML fit with a set of pointwise parametric confidence intervals on failure probabilities. To get tabular output, use > lzbearing.mlest.out <- mleprobplot(lzbearing.ld, distribution="Lognormal") print(lzbearing.mlest.out) quantiles(lzbearing.mlest.out) failure.probabilities(lzbearing.mlest.out) *************** > compare.mlprobplot(lzbearing.ld, main.distribution="Lognormal", compare.distribution="Weibull") *************** This is similar to mleprobplot(), but also superimposes the ML fit of the "compare.distribution". *************** > censored.data.plot(nf.ld) *************** Plot the response versus all explanatory variables. *************** > groupi.mleprobplot(mylarpoly.ld,distribution="Weibull") For a set of accelerated life test data with subexperiments at a small number of stress-levels, produces a multiple probability plot with ML fits done individually to each subexperiment and plotted on the plot (slopes may not be equal because the spread parameters are not constrained to be the same) *************** > groupm.mleprobplot(mylarpoly.ld, distribution="Weibull", relationship="log") For a set of accelerated life test data with subexperiments at a small number of stress-levels, produces a multiple probability plot with a model ML tying together the subexperiments and plotted on the plot (slopes will be equal because the spread parameters are constrained to be the same). The regression capabilities in Slida are undergoing a slow evolution that will eventually provide are more a general class of models that are easy to specify. Right now, the capabilities are somewhat limited and/or require the user to do some up-front work in Splus. Capabilities that are in good shape include simple regression and multiple regression with some specific relationships (transformations on the explanatory variables such as log, box-cox, and Arrhenius), and class variables (which are automatically mapped into dummy variables). For anything more complicated (e.g. squared terms, interaction terms, etc.), the user must have the needed terms as part of the inputted x matrix. This will be generalized in the future to take advantage of the powerful Splus modeling language, at least for models that are linear in the parameters. All of this is also available in the GUI. See the examples in echapter19.q. The estimations are and will remain robust to ill-conditioned x-matrices that arise in some applications. SLIDA can also fit models with nonconstant sigma. One warning here, is that a separate algorithm is being used and this algorithm is not as robust to problems with ill-conditioned x-matrices. The user has to make sure that the inputted x matrix is well conditioned. For example, the quadratic model for location for the Nelson super alloy fatigue data in chapter 17 can be analyzed using the parameterization suggested in Nelson (1984) in which the log stress variable is centered before it is squared. In Meeker and Escobar (1998) we did not center the x variable as we feel that today, users of statistical methods should not be forced to do such things. The the coefficients in our presentation there were worked out in a different way that has not been programmed in general. With some more programming effort, we could make our nonconstant-sigma algorithm robust too, but we have not gotten to this. To fit models in which there is a log linear model for sigma, one must generalize the input explan.var. Instead of a vector, it must be a list of two vectors, one for mu and one for sigma. The following command fits the quadratic model for location and a log-linear model for sigma to the Nelson superalloy data. The Slida data object nf.ld contains in its X matrix: Pseudo-stress centered-x (centered-x)^2 where centered-x =log(Pseudo-stress) - mean(log(Pseudo-stress)) Because of the centering, the parameterization is different, but the model is the same. This model is fit in SLIDA with the following command: gmlest(nf.ld,dist="Weibull", explan.vars=list(mu.relat=c(2,3), sigma.relat=c(2))) I would welcome feedback and suggestions for improvement of these functions. I intend to continue development. Please feel free to call or send email if you have questions. The most up-to-date version of Slida can always be found at http://www.public.iastate.edu/~stat533/slida.html Please send email to wqmeeker@iastate.edu if you would like to be notified when new versions have been posted. This document and other SLIDA materials may be freely copied for educational purposes. Reference: Meeker, W. Q. and Escobar, L. A. (1998), Statistical Methods for Reliability Data, New York: John Wiley and Sons. (800)-526-5368 ISBN 0471143286 --------------------------------------- --------------------------------------- There is a continuing, sophisticated process for checking computations done with SLIDA. It is, of course, possible that bugs exist in the software. I will try to investigate and fix any problems that are reported to me. Because it is free, however, SLIDA comes with NO GUARANTEE OR WARRANTY, IMPLIED OR OTHERWISE. --------------------------------------- William Q. Meeker Department of Statistics Iowa State University Ames, IA 50010 wqmeeker@iastate.edu