June 5, 2000


    Notes on the SLIDA Splus GUI functions for life data analysis

                 Copyright 2000 W. Q. Meeker


These notes provide a brief introduction to some of the capabilities
of SLIDA, including the SLIDA command-line operation. Detailed
instruction for installation and operation within the SLIDA GUI are
contained in the SLIDA documentation (SlidaGui.pdf).

After SLIDA is installed in the usual way, Slida will appear on the
menu bar. The GUI menu structure was designed so that it could be
used without reference to documentation.  The use of the menu
structure should be, for the most part, straightforward and
intuitive.  

SLIDA will work with any version of Splus after 4.0.  If, however,
you intend to use the SLIDA GUI, it is strongly recommended that you
have a version of Splus 4.5, updated with the July 1998 service pack
(or any subsequent release of Splus). 

The most important statistical tools for the most widely used models
are available for use through the SLIDA GUI. A number of other more
sophisticated models and analyses are available only through the
command line. Some of these more sophisticated models will
eventually be available in the GUI.

These rest of these notes describe Splus command-line functions for
the analysis of censored life data and data from other nonstandard
models. For those who will use the GUI, this information is not
really necessary, but might be interesting to provide glimpse of
what is happening under the GUI and to indicate possibilities for
user-extension and modification of SLIDA. One big advantage of
working in Splus is that changes to the functions are easy to
implement. Users who are familiar with the Splus language can rename
my functions and customize them to meet special needs.

The work on SLIDA has been motivated by research problems,
consulting problems, the need to software my Statistics 533 course,
and to do the examples in Meeker and Escobar (1998), Statistical
Methods for Reliability Data, published John Wiley and Sons, Inc. I
have focused my development to provide high-quality graphical output
of the results, although there are a number of functions that will
provide tabular output, when requested.

This collection of SLIDA/S-PLUS functions can, roughly, be divided
into two different groups of functions

1. Functions that can be used to analyze censored data with standard
models using nonparametric methods, standard life distributions, and
accelerated life test relationships (Chapters 1-8, 16-21 of Meeker
and Escobar (1998)). These functions can be used by simply giving
commands to Splus (or by sending Splus a file of commands to run in
batch mode). One does not have to know much about Splus or anything
about "programming" in Splus to use these functions.


2. Functions that allow likelihood analysis and maximum likelihood
fitting of nonstandard models, but that require the user to program a
likelihood in the Splus language. This second collection of functions
also allows the user to compute likelihood profiles (one and two
dimensional). This approach was used to fit a number of the
special models and distributions in Chapter 11 of Meeker
and Escobar. This provides a number of examples that could be easily
extended to yet other distributions and models.

These notes describe only the first set of functions. Examples of
all of these functions are given in the "echapter?.q" in the
echapters folder.

An Splus object with the name like xxx.ld is a "life.data" object
containing the information about dataset xxx. Typically there will be a
different .ld file for each data set to be analyzed. The xxx.ld life.data
objects contain information like times, censor codes, case weights
(for ties, interval-count data or multiple censoring at a point),
units of time, information about explanatory variables (e.g., for
regression or acceleration models), data set title, etc.

SLIDA uses some of the Splus object-oriented programming features.
This makes it very easy to do different analyses and reduces the
number of function names that must be used.

Detailed documentation for these functions has not yet been
prepared. Instead users can rely on the large number of examples
in the echapters folder. Here is a brief description of some of the
most important functions:

***************

> frame.to.ld(...)


input: ascii data file or data frame name
       column(s) of responses (2 columns needed if there are intervals)
       column containing censor codes (default no censoring)
       column containing case weights (or multiplicities) (default
		all case have weight 1)
       data title (for plots and tables)
       units for response (e.g.,  minutes, hours, days, or cycles)
       columns containing explanatory variables (default is none)
       names for the x variables (default is x1, x2, ...)

The numerical censor codes are:

0   dummy observations (ignored in analysis)
1   exact failure time
2   right censored observation
3   left censored observation (or interval assumed to start at 0
    of -infinity, depending on the support of the specified distribution)
4   interval censored observation
5   small interval around reported exact failure time 
    (useful when the density approximation to the likelihood is inadequate).

It is possible (actually suggested) that these codes be replaced
with meaningful words or symbols (like "Failed" and "Censored") and
this has been done in the data sets distributed with SLIDA. The list
of allow synonyms for the censor codes (which are still allowed) can
be seen by using the commands:

failure.censor.names: GetSlidaDefault("SLIDA.FailName")
right.censor.names: GetSlidaDefault("SLIDA.RcName")
left.censor.names: GetSlidaDefault("SLIDA.LcName")
interval.censor.names: GetSlidaDefault("SLIDA.IcName")
sinterval.censor.names: GetSlidaDefault("SLIDA.DefaultSintervalCensorNames") 

There is a corresponding

> frame.to.rd(...)

for recurrence data (Chapter 16)

> frame.to.rmd(...)

for repeated measures (degradation data),  (Chapters 13 and 21).


See the detailed examples in the echapter files and the corresponding
data in the SLIDA_textdata folder.

***************

> summary(lzbearing.ld)
> print(lzbearing.lld)

The first command provides a summary of the indicated data set.
The second command prints the data set.


> summary(lzbearing.ld)

 Data set name:  Ball Bearing Cycles to Failure
 Number of rows in data matrix= 23
 Response units:  Millions of Cycles
 Response minimum:  17.88
 Response maximum:  173.4
 Number of cases in data set= 23
 Number of exact failures in data set= 23
 Number of right censored observations in data set= 0
 Number of left censored observations in data set= 0
 Number of interval censored observations in data set= 0
 Number of small-interval observations in data set= 0
 No explanatory variables


*************** 

> plot(lzbearing.ld) 

Plots the empirical cdf on a linear-by-linear plot with simultaneous
confidence bands, by default or can get a log scale on the x (time)
axis with

> plot(lzbearing.ld, x.axis="log")

To see a table of the output use

> print(plot(lzbearing.ld))

To get instead a set of point-wise confidence intervals with a log time axis use:

> plot(lzbearing.ld, x.axis="log", band.type="Point-wise")


*************** 

To obtain a probability plot of the requested distribution, use

> plot(lzbearing.ld, distribution="Weibull")


Simultaneous confidence bands (using the method described in Vijay
Nair's 1984 Technometrics paper) are provided by default, but
pointwise nonparametric confidence intervals can be requested
instead.

> plot(lzbearing.ld, distribution="Weibull", band.type="p")

The distribution can be specified using and of the following names.

sev
weibull
Weibull
normal		
Normal		
lognormal		
Lognormal	
logistic		
Logistic		
loglogistic		
Loglogistic			
exponential		
Exponential



In the commands, you can control the axes by using something like

plot(xx, x.range=c(my.min,my.max), y.range=c(my.min,my.max))

If you want SLIDA to choose any of the above, use something like:

plot(xx, x.range=c(NA,my.max), y.range=c(my.min,NA))




*************** 
> mleprobplot(lzbearing.ld, distribution="Lognormal")
*************** 

Makes a probability plot of the requested distribution
and superimposes an ML fit with a set of pointwise parametric
confidence intervals on failure probabilities.

To get tabular output, use

> lzbearing.mlest.out <- mleprobplot(lzbearing.ld, distribution="Lognormal")

print(lzbearing.mlest.out)
quantiles(lzbearing.mlest.out)
failure.probabilities(lzbearing.mlest.out)

***************
> compare.mlprobplot(lzbearing.ld,
	main.distribution="Lognormal", compare.distribution="Weibull")
***************
This is similar to mleprobplot(), but also superimposes the ML fit of the
"compare.distribution".


***************
> censored.data.plot(nf.ld)
***************
Plot the response versus all explanatory variables.

***************

> groupi.mleprobplot(mylarpoly.ld,distribution="Weibull")
 
For a set of accelerated life test data with
subexperiments at a small number of stress-levels, produces a
multiple probability plot with ML fits done individually to each
subexperiment and plotted on the plot (slopes may not be equal because
the spread parameters are not constrained to be the same)


***************

> groupm.mleprobplot(mylarpoly.ld, distribution="Weibull",
        relationship="log") 

For a set of accelerated life test data with subexperiments at a
small number of stress-levels, produces a multiple probability plot
with a model ML tying together the subexperiments and plotted on the
plot (slopes will be equal because the spread parameters are
constrained to be the same).


The regression capabilities in Slida are undergoing a slow evolution
that will eventually provide are more a general class of models that
are easy to specify. Right now, the capabilities are somewhat
limited and/or require the user to do some up-front work in Splus.

Capabilities that are in good shape include simple regression and
multiple regression with some specific relationships
(transformations on the explanatory variables such as log, box-cox,
and Arrhenius), and class variables (which are automatically mapped
into dummy variables).

For anything more complicated (e.g. squared terms, interaction
terms, etc.), the user must have the needed terms as part of the
inputted x matrix. This will be generalized in the future to take
advantage of the powerful Splus modeling language, at least for
models that are linear in the parameters. All of this is also
available in the GUI. See the examples in echapter19.q. The
estimations are and will remain robust to ill-conditioned x-matrices
that arise in some applications.

SLIDA can also fit models with nonconstant sigma. One warning here,
is that a separate algorithm is being used and this algorithm is not
as robust to problems with ill-conditioned x-matrices. The user has
to make sure that the inputted x matrix is well conditioned. For
example, the quadratic model for location for the Nelson super alloy
fatigue data in chapter 17 can be analyzed using the
parameterization suggested in Nelson (1984) in which the log stress
variable is centered before it is squared. In Meeker and Escobar
(1998) we did not center the x variable as we feel that today, users
of statistical methods should not be forced to do such things.  The
the coefficients in our presentation there were worked out in a
different way that has not been programmed in general. With some
more programming effort, we could make our nonconstant-sigma
algorithm robust too, but we have not gotten to this.

To fit models in which there is a log linear model for sigma, one
must generalize the input explan.var. Instead of a vector, it must
be a list of two vectors, one for mu and one for sigma. The
following command fits the quadratic model for location and a
log-linear model for sigma to the Nelson superalloy data. The Slida
data object nf.ld contains in its X matrix:

Pseudo-stress centered-x (centered-x)^2 

where centered-x =log(Pseudo-stress) - mean(log(Pseudo-stress))

Because of the centering, the parameterization is different, but the
model is the same. This model is fit in SLIDA with the following command:

gmlest(nf.ld,dist="Weibull",
       explan.vars=list(mu.relat=c(2,3), sigma.relat=c(2)))



I would welcome feedback and suggestions for improvement of these
functions.  I intend to continue development. Please feel free to call
or send email if you have questions.

The most up-to-date version of Slida can always be found at

http://www.public.iastate.edu/~stat533/slida.html

Please send email to wqmeeker@iastate.edu if you would like to be
notified when new versions have been posted. 

This document and other SLIDA materials may be freely copied for
educational purposes.

Reference:

Meeker, W. Q. and Escobar, L. A. (1998),
Statistical Methods for Reliability Data,
New York: John Wiley and Sons. (800)-526-5368

ISBN 0471143286   

---------------------------------------

---------------------------------------

There is a continuing, sophisticated process for checking
computations done with SLIDA. It is, of course, possible that bugs
exist in the software. I will try to investigate and fix any
problems that are reported to me. Because it is free, however, SLIDA
comes with NO GUARANTEE OR WARRANTY, IMPLIED OR OTHERWISE.

---------------------------------------

William Q. Meeker
Department of Statistics
Iowa State University
Ames, IA  50010
wqmeeker@iastate.edu