Glossary                                                                                                          

Definitions of Concepts and Terms that we use will appear here in alphabetical order

Click on a letter to jump to words starting with:

 

A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R   S   T   U   V   W   X   Y   Z

 

This glossary is in its early stages – we will be filling in the gaps very soon.

A.   A

@Risk

An Excel Add-In, produced by Palisade Corporation, to facilitate Monte Carlo simulations in a spreadsheet.  With @Risk the whole simulation process can be managed.

Accept or Reject the Null Hypothesis

The outcome or conclusion of a test of a hypothesis, when we decide whether the sample data tends to confirm (accept) or provide evidence against (reject) the hypothesis.

Additive Model

 

Additivity

The additivity property implies the parts of a whole may simply be added together to give the value for the whole.

Adjusted R-squared

A measure of how well a multiple regression model fits to the data.  The proportion of the total variance of the dependent variable values explained by the independent variables in the model.

Aggregate Planning Model

A model combining the effects of the availability of a workforce on production levels, inventory and plant capacity.  The model is used over a period of time so variables to link inventory variables over time are also needed.

Alternative Hypothesis

That which is to be true if the null hypothesis is not true.

Analysis ToolPak (in Excel)

A set of programs that come with Excel that can be used to do many statistical calculations.  Found under the Tools menu.  Can be rather limited, using a statistical package (like SPSS or S-plus) or an add-in (like StatPro) is usually easier and better.

Analytical

vs Numerical approaches

ANOVA Table

A display of the calculations and testing of an Analysis of Variance test.

Array

A range, in Excel terms.  A block of cells in the spreadsheet.

Array Function (in Excel)

A function which operates on an array, filling in a whole range or array at once, according to the formula in the function.

Assignable Cause

Where a process is ‘out-of-control’ and the reason for that out of control condition can be traced back to cause that is within the operator’s control.

Attribute

A characteristic of an observation, that it either has or doesn’t have (e.g is Blue or not, is Old or not).

Attribute Sampling

Sampling in which we are interested in recording a variable which is equal to 1 if the unit in the sample has a particular attribute, and is equal to 0 if the unit doesn’t.

Australia's Population by year by sex

The age distribution of Australians is changing.  The proportion of older Australian's in the population is increasing.  A graphical representation of the ages of male and female Australians used to form a pyramid.  What would you call the shape now?  How do we think it will change in the future?  See a powerpoint show with one second per year from 1971 to date using data from the Australian Bureau of Statistics in Australia's Age pyramids.

ARMA(p, q)

Autoregressive-moving average process

An autoregressive-moving average process is a time series where the latest observations depends both on previous observations in the series and on averages of previous random disturbances.  An ARMA(p, q) for a time series Xt  can be defined by

 where Zt is a random disturbance.

Autocorrelation

Correlation of a time series values with previous (lagged) values of the series.

Automatic forecasts

Parzen's ARAR models performed extremely well in the various Makridakis forecasting competitions for instance in Makridakis, S., Anderson, A., Carbone, R., Fildes, R., Hibbon, M., Lewandowski, R., Newton, J., Parzen E., and Winkler, R. (1984), The Forecasting Accuracy of Major Time Series Methods, John Wiley, New York.  They are best suited to strongly seasonal data.  For some examples see the Word documents:  Forecasts of monthly sales of red wine by Australian winemakers and Quarterly electricity demand.

Average Run Length (ARL)

The ARL is the average number of ‘in-control’ signals that are generated between two ‘out-of-control’ signals, e.g., with 3-sigma Control Limits this is 1/0.0027 b 370.  It is useful for designing control charts for particular shifts in the process mean.

 

B.   B

Backward Variable Selection

In multiple regression, the approach of starting with all variables in the model and one by one dropping those not significant.

Bayes' Rule

Where a number of uncertain outcomes are linked probabilistically, i.e., are not independent, Bayes’ Rule provides a way of calculating the probability of an outcome if we already know the result of other outcomes. (Section 6.7)

Between Sample Means Variation

When we have a number of mean (average) values of a variable, one for each category of another variable, this is the overall variance of those means.

Bimodal Data

Data where the frequency plot has two peaks, that is, two separate values in the data are more frequent than any others.

Binary Variable

A variable that can only take on two values, 0 and 1 usually.

Blending Model

The aim of these models is to produce an output blend given an input range of raw materials that satisfies demand and blend specifications.

Box-Jenkins

 

Boxplot

A graph of a set of data based on the median and quartiles.  To show the distribution of the data set.

 

C.   C

Capital Budgeting Model

These are applied in situations where a number of investment options are available subject to the constraint of the amount of available capital and other considerations.

Case

An item in a set of data, on which we have one or more values of variables (often represented as a row in a data spreadsheet).  Also called an observation.

Cash Balance Model

A model that may be used for tracking ‘cash balance’ or ‘cash flow’ over time.

Categorical Data

Data in which the observations are just which category each case falls into.  The counts, or frequencies, of cases in each category are analysed.  E.g. colour of eyes – blue, green, brown, etc are the categories.

Categorization Analysis

A way of trying to predict which category a case will fall into, based on the values of other variables.

Causal Methods

 

Causes of Problems

When an ‘out-of-control’ signal occurs, the reasons for this signal are investigated.  These are then usually as5cibed to being assignable, i.e., with operator control, common, i.e., process caused and hence uncontrollable.

Central Limit Theorem

A fundamental theorem in Statistics that specifies that under very general conditions that the process of average of data produces numbers (the averages) that eventually (the larger the sample) conform to the Normal distribution.

Central Location

The centre of a set of data, or a distribution.  Mean and median are commonly used measures of the centre.

Certainty Equivalent

This is the certain dollar amount that is equivalent to a risky venture.  Used to construct or evaluate Utility Functions.

Chart Wizard (in Excel)

A tool in Excel that can help to make drawing graphs (charts) of data easier.

Chi-Squared

A particular distribution used in goodness of fit tests.

Chi-Squared Goodness-of-Fit Test

A test of how well (or not) a set of observed frequencies match to a set of frequencies expected from some hypothesis or theory.

Cluster Analysis

A way of analysing data on a number of variables to determine how the cases group (cluster) together.

Consumer Price Index (CPI)

An overall, average measure of how prices have changed from time to time.  Based on the spending habits of an average consumer.

Contingency Table

A table of frequencies broken down by two categorical variables.  It shows the frequencies of each category of one variable, as spread over the categories of the other variable.  Can be extended to three or more variables.

Continuous Variable

A variable which has measured values for each case (that is, not categorical or discrete data).  The possible values for each case is infinite, that is, one of a continum.  E.g. how much time you have spent at university

Correlation

A measure of the extent of linear relationship between two variables.

Covariance

A measure of the extent to which two variables vary together, rather than independently.

Cross-sectional Data

Data collected all at a particular point in time.

Cluster Sampling

A sampling method whereby the sampling unit is a cluster or a collection of smaller units.  The smaller units are the ones to be sampled.  Cluster are constructed so that they individually mirror the total population.

Coefficient of Determination (R-squared)

A measure of how well a multiple regression model fits to the data.  The proportion of the total variation of the dependent variable values explained by the independent variables in the model.

Coincident Indicator

 

Combining Forecasts

 

Common Causes

Common causes of problems are due to systems, environmental, or other factors that operate on the system itself, outside the control of those working within the system.

Conditional Probability

Consider the situation of say, two events, which are not independent.  The conditional probability of one event is the probability of that event after the outcome of the first event is known.

Confidence Interval Estimation

Using sample data to estimate a range which has a certain (specified) percentage probability (confidence) of having the true, unknown parameter value within the range.

Confidence Level

The probability that the Confidence Interval has the true, unkown, value within it.

Constant Elasticity Relationship

If an independent variable X changes by a percentage amount then the dependent variable will change by the elasticity value times the percentage change in X. No matter what the value of X started from, the elasticity value is unchanging.

Constant Error Variance (Homoscedasticity)

The variance does not change as any of the relevant variables change.

Constraints

These are the limitations on available resources.

Contingency Plan

An alternative plan to the main plan in case of failure of the main plan.

Control Charts for Attributes

A Control Chart for monitoring the proportion of defectives in a process.  The underlying distribution is Binomial, although the Normal approximation is often used to calculate the 3-sigma limits.

Convenience Sample

A sample chosen, not at random, but for the ease and convenience with which it can be selected.

Correlations

 

Correlogram

 

Covariance

 

Cp

A measure of ‘potential’ capability, i.e., if the process remains centred on the target value.  Cp=1, means that the process is ‘potentially’ capable.

Cpk

A measure of the ‘actual’ capability, i.e., using the actual mean.  Cpk=1 means that the process is ‘capable’.

Crosstabs

A contingency table (or pivot table in Excel).

 

D.   D

Data

An unanalysed collection of basic information, on some number of cases and variables.

Data Mining

Using a variety of techniques to try and find patterns, trends and relationships between variables in a set of data.  Typically computerised.

Data Warehousing

Combines information from a number of sources for the purpose of discovering interrelationships or patterns in the data

DEA Data envelopment analysis

See DEA

Decision analysis

The study of decisions

Decision Making under Uncertainty

Decision making where the outcomes are not known before making the decision.

Decision Outcomes

The alternative outcomes that may result from a decision.

Decision trees

A diagrammatic method of analysing a decision problem as a ‘tree’.  The elements of this tree are decision, probability and end nodes.  Outcome values, values, costs and probabilities are entered into the tree and used to calculate the value of alternative decisions.

Decision Support System (DSS)

A system that provides a decision maker with a variety of tools and data sources to facilitate the decision making process.

Design of Experiments

 

Defect

A non-conforming product, i.e., it does not meet specifications.

Defective Component (p2_2.xls,q2)

Those items or things in a collection which have a particular defect (thing wrong with them).

Degree of Belief Probability

These are subjective probabilities based on personal assessment of the likelihood of outcomes. They often  used in situations where probabilities cannot be calculated from past experience or logical deductions.

Degrees of Freedom

A parameter of a distribution that provides an idea of how spread out the distribution is.  Based on the sample size in t-tests, based on the number of cells in a chi-squared test.

Deming's 14 Points

W Edwards Deming devised 14 rules to be adopted by management for an organisation to be a truly TQM.

Deming's Funnel Experiment and tampering

One of Deming’s key insights was the effect of reacting to or making decisions on the basis of ‘noise’.  This experiment shows that variability becomes worse when decisions are made reactively to random fluctuations.

Dependent variable

A variable the values of which are considered to depend on the values of other variables.

Deseasonalise Data

 

Discrete Variable

A variable which can only take on one of a finite set of values for each case.  E.g. how many years of university you have completed.

Distribution

The values a variable can take on, together with the frequency or probability of each value.  Can be expressed as a table or formula.

Divisibility

The divisibility property means that the level of activities can measured on a continuous scale.

Dummy Variable

A variable which takes the value 1 if an observation has a certain attribute, and 0 if it does not.

Durbin-Watson statistic

A statistic used to test if errors from a regression model are autocorrelated.

Dynamic Financial Model

This is a generalisation of the usual Cash Flow Model in that additional borrowings may be made over the period of time.

 

E.   E

Econometrics

Econometrics is formed from two Greek words  (economy) and  (measure).  It is a combination of economic theory, mathematical economics and statistics, yet it is a distinct branch of study itself.

Economic theory is the study of how and why variables in the economy are related.

Statistics involves the measurement of variables and the relationships using limited data, or information, and drawing conclusions from them.

Starting from the relationships postulated by economists (economic theory) we express them in mathematical terms (mathematical economics).  We obtain data (economic statistics) and use specific methods (econometric methods) in order to obtain numerical estimates of economic relationships (called models).

A good description of the scope and division of econometrics is given in A. Koutsoyiannis, Theory of Econometrics, Macmillan (pp.  3-10) and details of methodology are given in the same text (pp.  11-30).

 

Empirical CDF

The cumulative frequency distribution derived from the actual observations and their frequencies in a set of data.  For each value of the variable, it shows the number of observations less than or equal to the value.

Error Term

That part of the value of a dependent variable not explained by the independent variables.  Hence, the difference between the observed value of the dependent variable and the value expected from the model.

Expected Monetary Value (EMV)

The mean of the probability distribution of possible monetary outcome.  For discrete outcomes, this is calculated as the weighted average of the possible monetary values, with the weights being the probabilities of the values. (section 6.2.2)

Expected Utility Maximizers

These are decision makers who maximise their expected utilities, i.e., taking into account risk seeking or risk averse behaviour.

Expected Value of Perfect Information (EVPI)

Given the uncertain nature of the outcomes of some decisions, this is the additional EMV that is created if the outcome is known before the decision is made. (section 6.6.2)

Expected Value of Sample Information (EVSI)

Additional information, such as extra tests or research, may have an impact on the EMV of a decision.  The change in the EMV is the Expected Value of Sample Information. Bayes’ Theorem is an important component in this calculation.

Experimental Design

Ways of setting the values of the explanatory, treatment and blocking variables in an experiment.

Explanatory variable

Independent variables.  Those variables in a regression model on which the dependent variable is held to depend (i.e. which help to explain the value of the dependent variable).

Exponential Smoothing

 

Exponential Trend

 

Exponential Utility

The Exponential function is one form of the Utility function.  It is parametrised by a single parameter, the risk tolerance, and is usually used to describe Risk Averse behaviour. (section 6.8.3)

Exptrapolation Methods

 

Extrapolation

Extending a pattern in a time series (such as a trend) or regression model beyond the range of the data (or time period of the observations).

 

F.    F

F distribution

The theoretical distribution of the ratio of two variances.  Used in Analysis of Variance tests.

Feasible Region

The feasible region is the area where all of the constraints are satisfied.

Financial Planning Model

A model used for planning capital budgeting and cash flow over time.

Finite Population Correction (fpc)

The calculation of variances is based on either an infinite population or a sampling with replacement for finite populations.  In a finite population, where the sampling is done without replacement, a finite population correction needs to be applied to the variance calculation.

Fitted Value

The expected value of the dependent variable, calculated by putting values for the independent variables into the regression model estimated.

Fixed Cost Model

The feature of a fixed cost model is that an additional one-off cost is incurred if a particular option is chosen, e.g., using a particular production plant, or a machine setup cost.

Folding Back on the Tree

The process of calculating the optimal decision on a Decision Tree.  It works from the right to the left of the tree.

Forecast Error

 

Forecasting

 

Forecast method selection

A survey of forecasting methods (see for instance, Nigel Meade, Evidence for the Selection of Forecasting Methods, J. Forecast., 19, 515-535) concludes that

·        the characteristics of the data series are an important factor in determining the relative performance of methods and

·        statistically sophisticated or complex methods do not necessarily produce more accurate forecasts than simple ones.

Meade shows in this paper that summary statistics can be used to select a good forecasting method (or set of methods) although not necessarily the best.

Forensic Statistical Analysis

 

Formulating the Model

The process of abstraction of a problem from real life into a mathematical form.

Forward Variable Selection

In multiple regression, the approach of starting with only one independent variable in the model and one by one adding in others, keeping those significant.

Fractionally integrated ARMA models

ARFIMA(p,d,q)

Brodsky, Julia and Hurvich, Clifford M., ‘Multi-step Forecasting for Long-memory Processes’, J. Forecasting, 18, 59-75 (1999) with the ARMA model with adaptive parameters proposed by Tiao, G.C. and Tsay, R.S., ‘Some advances in non-linear and adaptive modelling in time series’ J. Forecasting, 13, (1994), 109-131.

F-Ratio

A ratio of two variances, used to test whether they are equal.  Also used in Analysis of Variance to test whether a set of means are all equal.

Frequency Table

A table of values of a variable and the number of cases (frequency) of each value.

Fuzzy Logic

A logical system in which things are not just True or False, but can have degrees of truth (a bit like probability of having a characteristic).

 

G.  G

Genetic Algorithm

A method of optimisation that uses a genetic code to formulate the problem, a ‘fitness’ criterion to judge the quality of a solution, an evolutionary heuristic to select the current ‘fittest” best set from a new ‘generation’ of solutions. Generations are created from an older one by random mutation of individuals or mixing the genetic codes of pairs.

Global Maximum (Minimum)

In some optimisation problems a number of local maxima may be present (like small hills in a landscape).  However, the aim of the optimisation process is to find the largest of these, the Global Maximum.  In LP Models, there is only one hill and therefore one maximum.

Glossary

 

Grand Mean

The overall mean of all the observations, in an experiment or analysis of variance data set, over all the levels of the design variables.

Graphical Excellence

Principles of graphical excellence are clearly explained in Edward R Tufte's books, the first of which is The Visual Display of Quantitative Information, Graphic Press, Cheshire, Connecticut published in 1983.  Envisioning Information (1990) and name? followed and they are fascinating as well as informative.  A powerpoint lesson on graphical excellence is available from Graphical_Excellence.ppt.

Graphical Solution Method

For two dimensional LP problems, it is possible to solve for the optimum graphically.

 

H.   H

H1

The alternative hypothesis.

Ha

The alternative hypothesis.

Ho

The null hypothesis.

Histogram

A graph of a frequency table, showing each value of a variable and a bar whose height represents the frequency of that value in the data set.

Holt's Method

 

Hypothesis Testing

Analysing a sample of data to test whether or not it tends to confirm or deny a particular hypothesised value for a variable parameter.

 

I.      I

In Statistical Control

A process that has all of its data within its control limits.

Independent Samples

Two (or more) samples selected independently of each other, that is, with no association between the selection of one sample and the selection of the other sample.

Independent Samples Test

Testing whether the parameter (e.g. mean0 value from one sample is the same or not as the parameter value from another, independent, sample.

Independent variable

Explanatory variables.  Those variables in a regression model on which the dependent variable is held to depend (i.e. which help to explain the value of the dependent variable).

Indifference Value

The indifference value is the certain (“for sure”) value that a decisionmaker thinks is the same as a risky venture.

Infeasibility

The infeasibility property describes whether or not a solution satisfies all of the problem constraints.

Influence Diagram

A method for describing the elements of a decision.  It displays decisions, uncertain outcomes, intermediate calculations and payoffs.

Influential Point

An observation, in a regression model, which has a particularly strong impact on the parameter estimates, that is, which the results are especially sensitive to.

Inspection

The management process of ‘weeding’ out all of the defective products.

Integer Programming Models

Integer Programming (IP) Models contain one or more variables which can only have integer variables.

Interaction term

A term added to an Analysis of Variance analysis, or to a regression model, to account for the effect of one variable being determined by the value of another variable.

Interquartile Range (IQR)

The difference between the upper quartile and the lower quartile.  It thus represents a range which has half the observations in it.

Inventory Control

 

ITSM

Interactive Time Series Modelling, a computer package for univariate and multivariate time series modelling and forecasting is distributed with Brockwell, P.J. and Davis, R.A. (1996), Introduction to Time Series and Forecasting, Springer-Verlag New York Inc.<