Definitions
of Concepts and Terms that we use will appear here in alphabetical order
Click
on a letter to jump to words starting with:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
This
glossary is in its early stages – we will be filling in the gaps very soon.
|
@Risk |
An Excel Add-In, produced by
Palisade Corporation, to facilitate Monte Carlo simulations in a
spreadsheet. With @Risk the whole
simulation process can be managed. |
|
Accept
or Reject the Null Hypothesis |
The outcome or conclusion of a test
of a hypothesis, when we decide whether the sample data tends to confirm
(accept) or provide evidence against (reject) the hypothesis. |
|
Additive
Model |
|
|
Additivity |
The additivity property implies the
parts of a whole may simply be added together to give the value for the
whole. |
|
Adjusted
R-squared |
A measure of how well a multiple
regression model fits to the data.
The proportion of the total variance of the dependent variable values
explained by the independent variables in the model. |
|
Aggregate
Planning Model |
A model combining the effects of the
availability of a workforce on production levels, inventory and plant
capacity. The model is used over a
period of time so variables to link inventory variables over time are also
needed. |
|
Alternative
Hypothesis |
That which is to be true if the null
hypothesis is not true. |
|
Analysis
ToolPak (in Excel) |
A set of programs that come with
Excel that can be used to do many statistical calculations. Found under the Tools menu. Can be rather limited, using a statistical
package (like SPSS or S-plus) or an add-in (like StatPro) is usually easier
and better. |
|
Analytical |
vs Numerical approaches |
|
ANOVA
Table |
A display of the calculations and
testing of an Analysis of Variance test. |
|
Array |
A range, in Excel
terms. A block of cells in the
spreadsheet. |
|
Array Function (in Excel) |
A function which operates
on an array, filling in a whole range or array at once, according to the
formula in the function. |
|
Assignable
Cause |
Where a process is ‘out-of-control’
and the reason for that out of control condition can be traced back to cause
that is within the operator’s control. |
|
Attribute |
A characteristic of an
observation, that it either has or doesn’t have (e.g is Blue or not, is Old
or not). |
|
Attribute
Sampling |
Sampling in which we are interested
in recording a variable which is equal to 1 if the unit in the sample has a
particular attribute, and is equal to 0 if the unit doesn’t. |
|
Australia's Population by year by sex |
The age distribution of Australians is
changing. The proportion of older
Australian's in the population is increasing. A graphical representation of the ages of male and female
Australians used to form a pyramid.
What would you call the shape now?
How do we think it will change in the future? See a powerpoint show with one second per
year from 1971 to date using data from the Australian Bureau of Statistics in
Australia's Age
pyramids. |
|
ARMA(p, q) Autoregressive-moving average process |
An autoregressive-moving average
process is a time series where the latest observations depends both on
previous observations in the series and on averages of previous random
disturbances. An ARMA(p, q) for a
time series Xt can
be defined by
|
|
Autocorrelation |
Correlation of a time series values
with previous (lagged) values of the series. |
|
Automatic forecasts |
Parzen's ARAR models performed extremely well in the
various Makridakis forecasting competitions for instance in Makridakis, S.,
Anderson, A., Carbone, R., Fildes, R., Hibbon, M., Lewandowski, R., Newton,
J., Parzen E., and Winkler, R. (1984), The Forecasting Accuracy of
Major Time Series Methods, John Wiley, New York. They are best suited to strongly seasonal
data. For some examples see the Word
documents: Forecasts of monthly sales
of red wine
by Australian winemakers and Quarterly electricity
demand. |
|
Average
Run Length (ARL) |
The ARL is the average number of
‘in-control’ signals that are generated between two ‘out-of-control’ signals,
e.g., with 3-sigma Control Limits this is 1/0.0027 b 370. It is useful for designing control charts for
particular shifts in the process mean. |
|
Backward
Variable Selection |
In multiple regression, the
approach of starting with all variables in the model and one by one dropping
those not significant. |
|
Bayes'
Rule |
Where a number of uncertain
outcomes are linked probabilistically, i.e., are not independent, Bayes’ Rule
provides a way of calculating the probability of an outcome if we already
know the result of other outcomes. (Section 6.7) |
|
Between
Sample Means Variation |
When we have a number of mean
(average) values of a variable, one for each category of another variable,
this is the overall variance of those means. |
|
Bimodal
Data |
Data where the frequency plot has
two peaks, that is, two separate values in the data are more frequent than
any others. |
|
Binary
Variable |
A variable that can only take on
two values, 0 and 1 usually. |
|
Blending
Model |
The aim of these models is to produce
an output blend given an input range of raw materials that satisfies demand
and blend specifications. |
|
Box-Jenkins |
|
|
Boxplot |
A graph of a set of data based on
the median and quartiles. To show the
distribution of the data set. |
|
Capital
Budgeting Model |
These are applied in situations
where a number of investment options are available subject to the constraint
of the amount of available capital and other considerations. |
|
Case |
An item in a set of data, on which we
have one or more values of variables (often represented as a row in a data
spreadsheet). Also called an
observation. |
|
Cash
Balance Model |
A model that may be used for
tracking ‘cash balance’ or ‘cash flow’ over time. |
|
Categorical
Data |
Data in which the observations are
just which category each case falls into.
The counts, or frequencies, of cases in each category are
analysed. E.g. colour of eyes – blue,
green, brown, etc are the categories. |
|
Categorization
Analysis |
A way of trying to predict which
category a case will fall into, based on the values of other variables. |
|
Causal
Methods |
|
|
Causes
of Problems |
When an ‘out-of-control’ signal
occurs, the reasons for this signal are investigated. These are then usually as5cibed to being
assignable, i.e., with operator control, common, i.e., process caused and
hence uncontrollable. |
|
Central
Limit Theorem |
A fundamental theorem in Statistics
that specifies that under very general conditions that the process of average
of data produces numbers (the averages) that eventually (the larger the
sample) conform to the Normal distribution. |
|
Central
Location |
The centre of a set of data, or a
distribution. Mean and median are
commonly used measures of the centre. |
|
Certainty
Equivalent |
This is the certain dollar amount
that is equivalent to a risky venture.
Used to construct or evaluate Utility Functions. |
|
Chart
Wizard (in Excel) |
A tool in Excel that can help to
make drawing graphs (charts) of data easier. |
|
Chi-Squared |
A particular distribution used in
goodness of fit tests. |
|
Chi-Squared
Goodness-of-Fit Test |
A test of how well (or not) a set
of observed frequencies match to a set of frequencies expected from some
hypothesis or theory. |
|
Cluster
Analysis |
A way of analysing data on a number
of variables to determine how the cases group (cluster) together. |
|
Consumer
Price Index (CPI) |
An overall, average measure of how prices
have changed from time to time. Based
on the spending habits of an average consumer. |
|
Contingency
Table |
A table of frequencies broken down
by two categorical variables. It
shows the frequencies of each category of one variable, as spread over the
categories of the other variable. Can
be extended to three or more variables. |
|
Continuous
Variable |
A variable which has measured
values for each case (that is, not categorical or discrete data). The possible values for each case is
infinite, that is, one of a continum.
E.g. how much time you have spent at university |
|
Correlation |
A measure of the extent of linear
relationship between two variables. |
|
Covariance |
A measure of the extent to which two
variables vary together, rather than independently. |
|
Cross-sectional
Data |
Data collected all at a particular
point in time. |
|
Cluster
Sampling |
A sampling method whereby the
sampling unit is a cluster or a collection of smaller units. The smaller units are the ones to be
sampled. Cluster are constructed so
that they individually mirror the total population. |
|
Coefficient
of Determination (R-squared) |
A measure of how well a multiple
regression model fits to the data. The
proportion of the total variation of the dependent variable values explained
by the independent variables in the model. |
|
Coincident
Indicator |
|
|
Combining
Forecasts |
|
|
Common
Causes |
Common causes of problems are due to
systems, environmental, or other factors that operate on the system itself,
outside the control of those working within the system. |
|
Conditional
Probability |
Consider the situation of say, two
events, which are not independent.
The conditional probability of one event is the probability of that
event after the outcome of the first event is known. |
|
Confidence
Interval Estimation |
Using sample data to estimate a
range which has a certain (specified) percentage probability (confidence) of
having the true, unknown parameter value within the range. |
|
Confidence
Level |
The probability that the Confidence
Interval has the true, unkown, value within it. |
|
Constant
Elasticity Relationship |
If an independent variable X changes
by a percentage amount then the dependent variable will change by the
elasticity value times the percentage change in X. No matter what the value
of X started from, the elasticity value is unchanging. |
|
Constant
Error Variance (Homoscedasticity) |
The variance does not change as any
of the relevant variables change. |
|
Constraints |
These are the limitations on
available resources. |
|
Contingency
Plan |
An alternative plan to the main
plan in case of failure of the main plan. |
|
Control
Charts for Attributes |
A Control Chart for monitoring the
proportion of defectives in a process.
The underlying distribution is Binomial, although the Normal
approximation is often used to calculate the 3-sigma limits. |
|
Convenience
Sample |
A sample chosen, not at random, but
for the ease and convenience with which it can be selected. |
|
Correlations |
|
|
Correlogram |
|
|
Covariance |
|
|
Cp |
A measure of ‘potential’
capability, i.e., if the process remains centred on the target value. Cp=1, means that the process is
‘potentially’ capable. |
|
Cpk |
A measure of the ‘actual’
capability, i.e., using the actual mean.
Cpk=1 means that the process is ‘capable’. |
|
Crosstabs |
A contingency table (or pivot table
in Excel). |
|
Data |
An unanalysed collection of basic
information, on some number of cases and variables. |
|
Data
Mining |
Using a variety of techniques to
try and find patterns, trends and relationships between variables in a set of
data. Typically computerised. |
|
Data
Warehousing |
Combines information from a number
of sources for the purpose of discovering interrelationships or patterns in
the data |
|
See DEA |
|
|
Decision analysis |
The study of decisions |
|
Decision
Making under Uncertainty |
Decision making where the outcomes
are not known before making the decision. |
|
Decision
Outcomes |
The alternative outcomes that may
result from a decision. |
|
Decision
trees |
A diagrammatic method of analysing
a decision problem as a ‘tree’. The
elements of this tree are decision, probability and end nodes. Outcome values, values, costs and probabilities
are entered into the tree and used to calculate the value of alternative
decisions. |
|
Decision Support System (DSS) |
A system that provides a decision
maker with a variety of tools and data sources to facilitate the decision
making process. |
|
Design of Experiments |
|
|
Defect |
A non-conforming product, i.e., it
does not meet specifications. |
|
Defective
Component (p2_2.xls,q2) |
Those items or things in a
collection which have a particular defect (thing wrong with them). |
|
Degree
of Belief Probability |
These are subjective probabilities
based on personal assessment of the likelihood of outcomes. They often used in situations where probabilities
cannot be calculated from past experience or logical deductions. |
|
Degrees
of Freedom |
A parameter of a distribution that
provides an idea of how spread out the distribution is. Based on the sample size in t-tests, based
on the number of cells in a chi-squared test. |
|
Deming's
14 Points |
W Edwards Deming devised 14 rules
to be adopted by management for an organisation to be a truly TQM. |
|
Deming's
Funnel Experiment and tampering |
One of Deming’s key insights was
the effect of reacting to or making decisions on the basis of ‘noise’. This experiment shows that variability
becomes worse when decisions are made reactively to random fluctuations. |
|
Dependent
variable |
A variable the values of which are
considered to depend on the values of other variables. |
|
Deseasonalise
Data |
|
|
Discrete
Variable |
A variable which can only take on one
of a finite set of values for each case.
E.g. how many years of university you have completed. |
|
Distribution |
The values a variable can take on,
together with the frequency or probability of each value. Can be expressed as a table or formula. |
|
Divisibility |
The divisibility property means
that the level of activities can measured on a continuous scale. |
|
Dummy
Variable |
A variable which takes the value 1
if an observation has a certain attribute, and 0 if it does not. |
|
Durbin-Watson
statistic |
A statistic used to test if errors
from a regression model are autocorrelated. |
|
Dynamic
Financial Model |
This is a generalisation of the
usual Cash Flow Model in that additional borrowings may be made over the
period of time. |
|
Econometrics |
Econometrics is
formed from two Greek words Economic
theory is the study
of how and why variables in the economy are related. Statistics involves the measurement of variables
and the relationships using limited data, or information, and drawing
conclusions from them. Starting from
the relationships postulated by economists (economic theory) we
express them in mathematical terms (mathematical economics). We obtain data (economic
statistics) and use specific methods (econometric methods) in order
to obtain numerical estimates of economic relationships (called models). A good
description of the scope and division of econometrics is given in A.
Koutsoyiannis, Theory of Econometrics, Macmillan (pp. 3-10) and details of methodology are given
in the same text (pp. 11-30). |
|
Empirical
CDF |
The cumulative frequency
distribution derived from the actual observations and their frequencies in a
set of data. For each value of the
variable, it shows the number of observations less than or equal to the value. |
|
Error
Term |
That part of the value of a
dependent variable not explained by the independent variables. Hence, the difference between the observed
value of the dependent variable and the value expected from the model. |
|
Expected
Monetary Value (EMV) |
The mean of the probability
distribution of possible monetary outcome.
For discrete outcomes, this is calculated as the weighted average of
the possible monetary values, with the weights being the probabilities of the
values. (section 6.2.2) |
|
Expected
Utility Maximizers |
These are decision makers who
maximise their expected utilities, i.e., taking into account risk seeking or
risk averse behaviour. |
|
Expected
Value of Perfect Information (EVPI) |
Given the uncertain nature of the outcomes
of some decisions, this is the additional EMV that is created if the outcome
is known before the decision is made. (section 6.6.2) |
|
Expected
Value of Sample Information (EVSI) |
Additional information, such as
extra tests or research, may have an impact on the EMV of a decision. The change in the EMV is the Expected
Value of Sample Information. Bayes’ Theorem is an important component in this
calculation. |
|
Experimental
Design |
Ways of setting the values of the
explanatory, treatment and blocking variables in an experiment. |
|
Explanatory
variable |
Independent variables. Those variables in a regression model on
which the dependent variable is held to depend (i.e. which help to explain
the value of the dependent variable). |
|
Exponential
Smoothing |
|
|
Exponential
Trend |
|
|
Exponential
Utility |
The Exponential function is one
form of the Utility function. It is
parametrised by a single parameter, the risk
tolerance, and is usually used to describe Risk Averse behaviour.
(section 6.8.3) |
|
Exptrapolation
Methods |
|
|
Extrapolation |
Extending a pattern in a time
series (such as a trend) or regression model beyond the range of the data (or
time period of the observations). |
|
F
distribution |
The theoretical distribution of the
ratio of two variances. Used in
Analysis of Variance tests. |
|
Feasible
Region |
The feasible region is the area
where all of the constraints are satisfied. |
|
Financial
Planning Model |
A model used for planning capital
budgeting and cash flow over time. |
|
Finite
Population Correction (fpc) |
The calculation of variances is
based on either an infinite population or a sampling with replacement for
finite populations. In a finite
population, where the sampling is done without replacement, a finite
population correction needs to be applied to the variance calculation. |
|
Fitted
Value |
The expected value of the dependent
variable, calculated by putting values for the independent variables into the
regression model estimated. |
|
Fixed
Cost Model |
The feature of a fixed cost model
is that an additional one-off cost is incurred if a particular option is
chosen, e.g., using a particular production plant, or a machine setup cost. |
|
Folding
Back on the Tree |
The process of calculating the
optimal decision on a Decision Tree.
It works from the right to the left of the tree. |
|
Forecast
Error |
|
|
Forecasting |
|
|
Forecast method selection |
A survey of forecasting methods (see for instance, Nigel
Meade, Evidence for the Selection of Forecasting Methods, J. Forecast., 19,
515-535) concludes that ·
the
characteristics of the data series are an important factor in determining the
relative performance of methods and ·
statistically
sophisticated or complex methods do not necessarily produce more accurate
forecasts than simple ones. Meade shows in this paper that
summary statistics can be used to select a good forecasting method (or set of
methods) although not necessarily the best. |
|
Forensic Statistical Analysis |
|
|
Formulating
the Model |
The process of abstraction of a
problem from real life into a mathematical form. |
|
Forward
Variable Selection |
In multiple regression, the
approach of starting with only one independent variable in the model and one
by one adding in others, keeping those significant. |
|
Fractionally integrated ARMA models ARFIMA(p,d,q) |
Brodsky, Julia and Hurvich, Clifford M., ‘Multi-step Forecasting for
Long-memory Processes’, J. Forecasting, 18, 59-75 (1999) with
the ARMA model with adaptive parameters proposed by Tiao, G.C. and Tsay,
R.S., ‘Some advances in non-linear and adaptive modelling in time series’ J.
Forecasting, 13, (1994), 109-131. |
|
F-Ratio |
A ratio of two variances, used to
test whether they are equal. Also
used in Analysis of Variance to test whether a set of means are all equal. |
|
Frequency
Table |
A table of values of a variable and
the number of cases (frequency) of each value. |
|
Fuzzy
Logic |
A logical system in which things
are not just True or False, but can have degrees of truth (a bit like
probability of having a characteristic). |
|
Genetic
Algorithm |
A method of optimisation that uses
a genetic code to formulate the problem, a ‘fitness’ criterion to judge the
quality of a solution, an evolutionary heuristic to select the current
‘fittest” best set from a new ‘generation’ of solutions. Generations are
created from an older one by random mutation of individuals or mixing the
genetic codes of pairs. |
|
Global
Maximum (Minimum) |
In some optimisation problems a
number of local maxima may be present (like small hills in a landscape). However, the aim of the optimisation
process is to find the largest of these, the Global Maximum. In LP Models, there is only one hill and
therefore one maximum. |
|
Glossary |
|
|
Grand
Mean |
The overall mean of all the
observations, in an experiment or analysis of variance data set, over all the
levels of the design variables. |
|
Graphical Excellence |
Principles of graphical excellence are clearly
explained in Edward R Tufte's books, the first of which is The
Visual Display of Quantitative Information, Graphic Press, Cheshire,
Connecticut published in 1983.
Envisioning Information (1990) and name? followed and they are
fascinating as well as informative. A
powerpoint lesson on graphical excellence is available from Graphical_Excellence.ppt. |
|
Graphical
Solution Method |
For two dimensional LP problems, it
is possible to solve for the optimum graphically. |
|
H1 |
The alternative hypothesis. |
|
Ha |
The alternative hypothesis. |
|
Ho |
The null hypothesis. |
|
Histogram |
A graph of a frequency table,
showing each value of a variable and a bar whose height represents the
frequency of that value in the data set. |
|
Holt's
Method |
|
|
Hypothesis
Testing |
Analysing a sample of data to test
whether or not it tends to confirm or deny a particular hypothesised value
for a variable parameter. |
|
In
Statistical Control |
A process that has all of its data
within its control limits. |
|
Independent
Samples |
Two (or more) samples selected
independently of each other, that is, with no association between the
selection of one sample and the selection of the other sample. |
|
Independent
Samples Test |
Testing whether the parameter (e.g.
mean0 value from one sample is the same or not as the parameter value from
another, independent, sample. |
|
Independent
variable |
Explanatory variables. Those variables in a regression model on
which the dependent variable is held to depend (i.e. which help to explain the
value of the dependent variable). |
|
Indifference
Value |
The indifference value is the
certain (“for sure”) value that a decisionmaker thinks is the same as a risky
venture. |
|
Infeasibility |
The infeasibility property describes
whether or not a solution satisfies all of the problem constraints. |
|
Influence
Diagram |
A method for describing the
elements of a decision. It displays
decisions, uncertain outcomes, intermediate calculations and payoffs. |
|
Influential
Point |
An observation, in a regression
model, which has a particularly strong impact on the parameter estimates,
that is, which the results are especially sensitive to. |
|
Inspection |
The management process of ‘weeding’
out all of the defective products. |
|
Integer
Programming Models |
Integer Programming (IP) Models
contain one or more variables which can only have integer variables. |
|
Interaction
term |
A term added to an Analysis of
Variance analysis, or to a regression model, to account for the effect of one
variable being determined by the value of another variable. |
|
Interquartile
Range (IQR) |
The difference between the upper
quartile and the lower quartile. It
thus represents a range which has half the observations in it. |
|
Inventory Control |
|
|
ITSM |
Interactive Time Series Modelling, a computer
package for univariate and multivariate time series modelling and forecasting
is distributed with Brockwell, P.J. and Davis, R.A. (1996), Introduction
to Time Series and Forecasting, Springer-Verlag New York Inc. |