In
statistics, one often considers a family of
probability distributions for a
random variable X (and
X is often a
vector whose components are
scalar-valued random variables, frequently
independent) parameterized by a scalar- or vector-valued parameter, which let us call θ. A quantity
T(X)
that depends on the (observable) random variable
X but
not on the (unobservable) parameter θ is called a
statistic.
Sir Ronald Fisher tried to make precise the intuitive idea that a statistic may capture all of the information in
X that is relevant to the estimation of θ. A statistic that does that is called a
sufficient statistic. The precise definition is this: A statistic
T(X) is
sufficient for θ precisely if the conditional
probability distribution of the data
X given the statistic
T(X) does not depend on θ.
For example:
- If X1, ...., Xn are independent Bernoulli-distributed random variables with expected value p, then the sum X1 + ... + Xn is a sufficient statistic for p.
- If X1,....,Xn are independent and uniformly distributed on the interval [0,θ], then max(X1,....,Xn ) is sufficient for θ.
Since the conditional distribution of X given T(X) does not depend on θ, neither does the conditional expected value of g(X) given T(X), where g is any (sufficiently well-behaved) function. Consequently that conditional expected value is actually a statistic, and so is available for use in estimation. If g(X) is any kind of estimator of θ, then typically the conditional expectation of g(X) given T(X) is a better estimator of θ ; one way of making that statement precise is called the Rao-Blackwell Theorem. Sometimes one can very easily construct a very crude estimator g(X), and then evaluate that conditional expected value to get an estimator that is in various senses optimal.