Glossary                                                                                                          

Definitions of Concepts and Terms that we use will appear here in alphabetical order

Click on a letter to jump to words starting with:

 

A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R   S   T   U   V   W   X   Y   Z

 

This glossary is in its early stages – we will be filling in the gaps very soon.

A.   A

@Risk

An Excel Add-In, produced by Palisade Corporation, to facilitate Monte Carlo simulations in a spreadsheet.  With @Risk the whole simulation process can be managed.

Accept or Reject the Null Hypothesis

The outcome or conclusion of a test of a hypothesis, when we decide whether the sample data tends to confirm (accept) or provide evidence against (reject) the hypothesis.

Additive Model

 

Additivity

The additivity property implies the parts of a whole may simply be added together to give the value for the whole.

Adjusted R-squared

A measure of how well a multiple regression model fits to the data.  The proportion of the total variance of the dependent variable values explained by the independent variables in the model.

Aggregate Planning Model

A model combining the effects of the availability of a workforce on production levels, inventory and plant capacity.  The model is used over a period of time so variables to link inventory variables over time are also needed.

Alternative Hypothesis

That which is to be true if the null hypothesis is not true.

Analysis ToolPak (in Excel)

A set of programs that come with Excel that can be used to do many statistical calculations.  Found under the Tools menu.  Can be rather limited, using a statistical package (like SPSS or S-plus) or an add-in (like StatPro) is usually easier and better.

Analytical

vs Numerical approaches

ANOVA Table

A display of the calculations and testing of an Analysis of Variance test.

Array

A range, in Excel terms.  A block of cells in the spreadsheet.

Array Function (in Excel)

A function which operates on an array, filling in a whole range or array at once, according to the formula in the function.

Assignable Cause

Where a process is ‘out-of-control’ and the reason for that out of control condition can be traced back to cause that is within the operator’s control.

Attribute

A characteristic of an observation, that it either has or doesn’t have (e.g is Blue or not, is Old or not).

Attribute Sampling

Sampling in which we are interested in recording a variable which is equal to 1 if the unit in the sample has a particular attribute, and is equal to 0 if the unit doesn’t.

Australia's Population by year by sex

The age distribution of Australians is changing.  The proportion of older Australian's in the population is increasing.  A graphical representation of the ages of male and female Australians used to form a pyramid.  What would you call the shape now?  How do we think it will change in the future?  See a powerpoint show with one second per year from 1971 to date using data from the Australian Bureau of Statistics in Australia's Age pyramids.

ARMA(p, q)

Autoregressive-moving average process

An autoregressive-moving average process is a time series where the latest observations depends both on previous observations in the series and on averages of previous random disturbances.  An ARMA(p, q) for a time series Xt  can be defined by

 where Zt is a random disturbance.

Autocorrelation

Correlation of a time series values with previous (lagged) values of the series.

Automatic forecasts

Parzen's ARAR models performed extremely well in the various Makridakis forecasting competitions for instance in Makridakis, S., Anderson, A., Carbone, R., Fildes, R., Hibbon, M., Lewandowski, R., Newton, J., Parzen E., and Winkler, R. (1984), The Forecasting Accuracy of Major Time Series Methods, John Wiley, New York.  They are best suited to strongly seasonal data.  For some examples see the Word documents:  Forecasts of monthly sales of red wine by Australian winemakers and Quarterly electricity demand.

Average Run Length (ARL)

The ARL is the average number of ‘in-control’ signals that are generated between two ‘out-of-control’ signals, e.g., with 3-sigma Control Limits this is 1/0.0027 b 370.  It is useful for designing control charts for particular shifts in the process mean.

 

B.   B

Backward Variable Selection

In multiple regression, the approach of starting with all variables in the model and one by one dropping those not significant.

Bayes' Rule

Where a number of uncertain outcomes are linked probabilistically, i.e., are not independent, Bayes’ Rule provides a way of calculating the probability of an outcome if we already know the result of other outcomes. (Section 6.7)

Between Sample Means Variation

When we have a number of mean (average) values of a variable, one for each category of another variable, this is the overall variance of those means.

Bimodal Data

Data where the frequency plot has two peaks, that is, two separate values in the data are more frequent than any others.

Binary Variable

A variable that can only take on two values, 0 and 1 usually.

Blending Model

The aim of these models is to produce an output blend given an input range of raw materials that satisfies demand and blend specifications.

Box-Jenkins

 

Boxplot

A graph of a set of data based on the median and quartiles.  To show the distribution of the data set.

 

C.   C

Capital Budgeting Model

These are applied in situations where a number of investment options are available subject to the constraint of the amount of available capital and other considerations.

Case

An item in a set of data, on which we have one or more values of variables (often represented as a row in a data spreadsheet).  Also called an observation.

Cash Balance Model

A model that may be used for tracking ‘cash balance’ or ‘cash flow’ over time.

Categorical Data

Data in which the observations are just which category each case falls into.  The counts, or frequencies, of cases in each category are analysed.  E.g. colour of eyes – blue, green, brown, etc are the categories.

Categorization Analysis

A way of trying to predict which category a case will fall into, based on the values of other variables.

Causal Methods

 

Causes of Problems

When an ‘out-of-control’ signal occurs, the reasons for this signal are investigated.  These are then usually as5cibed to being assignable, i.e., with operator control, common, i.e., process caused and hence uncontrollable.

Central Limit Theorem

A fundamental theorem in Statistics that specifies that under very general conditions that the process of average of data produces numbers (the averages) that eventually (the larger the sample) conform to the Normal distribution.

Central Location

The centre of a set of data, or a distribution.  Mean and median are commonly used measures of the centre.

Certainty Equivalent

This is the certain dollar amount that is equivalent to a risky venture.  Used to construct or evaluate Utility Functions.

Chart Wizard (in Excel)

A tool in Excel that can help to make drawing graphs (charts) of data easier.

Chi-Squared

A particular distribution used in goodness of fit tests.

Chi-Squared Goodness-of-Fit Test

A test of how well (or not) a set of observed frequencies match to a set of frequencies expected from some hypothesis or theory.

Cluster Analysis

A way of analysing data on a number of variables to determine how the cases group (cluster) together.

Consumer Price Index (CPI)

An overall, average measure of how prices have changed from time to time.  Based on the spending habits of an average consumer.

Contingency Table

A table of frequencies broken down by two categorical variables.  It shows the frequencies of each category of one variable, as spread over the categories of the other variable.  Can be extended to three or more variables.

Continuous Variable

A variable which has measured values for each case (that is, not categorical or discrete data).  The possible values for each case is infinite, that is, one of a continum.  E.g. how much time you have spent at university

Correlation

A measure of the extent of linear relationship between two variables.

Covariance

A measure of the extent to which two variables vary together, rather than independently.

Cross-sectional Data

Data collected all at a particular point in time.

Cluster Sampling

A sampling method whereby the sampling unit is a cluster or a collection of smaller units.  The smaller units are the ones to be sampled.  Cluster are constructed so that they individually mirror the total population.

Coefficient of Determination (R-squared)

A measure of how well a multiple regression model fits to the data.  The proportion of the total variation of the dependent variable values explained by the independent variables in the model.

Coincident Indicator

 

Combining Forecasts

 

Common Causes

Common causes of problems are due to systems, environmental, or other factors that operate on the system itself, outside the control of those working within the system.

Conditional Probability

Consider the situation of say, two events, which are not independent.  The conditional probability of one event is the probability of that event after the outcome of the first event is known.

Confidence Interval Estimation

Using sample data to estimate a range which has a certain (specified) percentage probability (confidence) of having the true, unknown parameter value within the range.

Confidence Level

The probability that the Confidence Interval has the true, unkown, value within it.

Constant Elasticity Relationship

If an independent variable X changes by a percentage amount then the dependent variable will change by the elasticity value times the percentage change in X. No matter what the value of X started from, the elasticity value is unchanging.

Constant Error Variance (Homoscedasticity)

The variance does not change as any of the relevant variables change.

Constraints

These are the limitations on available resources.

Contingency Plan

An alternative plan to the main plan in case of failure of the main plan.

Control Charts for Attributes

A Control Chart for monitoring the proportion of defectives in a process.  The underlying distribution is Binomial, although the Normal approximation is often used to calculate the 3-sigma limits.

Convenience Sample

A sample chosen, not at random, but for the ease and convenience with which it can be selected.

Correlations

 

Correlogram

 

Covariance

 

Cp

A measure of ‘potential’ capability, i.e., if the process remains centred on the target value.  Cp=1, means that the process is ‘potentially’ capable.

Cpk

A measure of the ‘actual’ capability, i.e., using the actual mean.  Cpk=1 means that the process is ‘capable’.

Crosstabs

A contingency table (or pivot table in Excel).

 

D.   D

Data

An unanalysed collection of basic information, on some number of cases and variables.

Data Mining

Using a variety of techniques to try and find patterns, trends and relationships between variables in a set of data.  Typically computerised.

Data Warehousing

Combines information from a number of sources for the purpose of discovering interrelationships or patterns in the data

DEA Data envelopment analysis

See DEA

Decision analysis

The study of decisions

Decision Making under Uncertainty

Decision making where the outcomes are not known before making the decision.

Decision Outcomes

The alternative outcomes that may result from a decision.

Decision trees

A diagrammatic method of analysing a decision problem as a ‘tree’.  The elements of this tree are decision, probability and end nodes.  Outcome values, values, costs and probabilities are entered into the tree and used to calculate the value of alternative decisions.

Decision Support System (DSS)

A system that provides a decision maker with a variety of tools and data sources to facilitate the decision making process.

Design of Experiments

 

Defect

A non-conforming product, i.e., it does not meet specifications.

Defective Component (p2_2.xls,q2)

Those items or things in a collection which have a particular defect (thing wrong with them).

Degree of Belief Probability

These are subjective probabilities based on personal assessment of the likelihood of outcomes. They often  used in situations where probabilities cannot be calculated from past experience or logical deductions.

Degrees of Freedom

A parameter of a distribution that provides an idea of how spread out the distribution is.  Based on the sample size in t-tests, based on the number of cells in a chi-squared test.

Deming's 14 Points

W Edwards Deming devised 14 rules to be adopted by management for an organisation to be a truly TQM.

Deming's Funnel Experiment and tampering

One of Deming’s key insights was the effect of reacting to or making decisions on the basis of ‘noise’.  This experiment shows that variability becomes worse when decisions are made reactively to random fluctuations.

Dependent variable

A variable the values of which are considered to depend on the values of other variables.

Deseasonalise Data

 

Discrete Variable

A variable which can only take on one of a finite set of values for each case.  E.g. how many years of university you have completed.

Distribution

The values a variable can take on, together with the frequency or probability of each value.  Can be expressed as a table or formula.

Divisibility

The divisibility property means that the level of activities can measured on a continuous scale.

Dummy Variable

A variable which takes the value 1 if an observation has a certain attribute, and 0 if it does not.

Durbin-Watson statistic

A statistic used to test if errors from a regression model are autocorrelated.

Dynamic Financial Model

This is a generalisation of the usual Cash Flow Model in that additional borrowings may be made over the period of time.

 

E.   E

Econometrics

Econometrics is formed from two Greek words  (economy) and  (measure).  It is a combination of economic theory, mathematical economics and statistics, yet it is a distinct branch of study itself.

Economic theory is the study of how and why variables in the economy are related.

Statistics involves the measurement of variables and the relationships using limited data, or information, and drawing conclusions from them.

Starting from the relationships postulated by economists (economic theory) we express them in mathematical terms (mathematical economics).  We obtain data (economic statistics) and use specific methods (econometric methods) in order to obtain numerical estimates of economic relationships (called models).

A good description of the scope and division of econometrics is given in A. Koutsoyiannis, Theory of Econometrics, Macmillan (pp.  3-10) and details of methodology are given in the same text (pp.  11-30).

 

Empirical CDF

The cumulative frequency distribution derived from the actual observations and their frequencies in a set of data.  For each value of the variable, it shows the number of observations less than or equal to the value.

Error Term

That part of the value of a dependent variable not explained by the independent variables.  Hence, the difference between the observed value of the dependent variable and the value expected from the model.

Expected Monetary Value (EMV)

The mean of the probability distribution of possible monetary outcome.  For discrete outcomes, this is calculated as the weighted average of the possible monetary values, with the weights being the probabilities of the values. (section 6.2.2)

Expected Utility Maximizers

These are decision makers who maximise their expected utilities, i.e., taking into account risk seeking or risk averse behaviour.

Expected Value of Perfect Information (EVPI)

Given the uncertain nature of the outcomes of some decisions, this is the additional EMV that is created if the outcome is known before the decision is made. (section 6.6.2)

Expected Value of Sample Information (EVSI)

Additional information, such as extra tests or research, may have an impact on the EMV of a decision.  The change in the EMV is the Expected Value of Sample Information. Bayes’ Theorem is an important component in this calculation.

Experimental Design

Ways of setting the values of the explanatory, treatment and blocking variables in an experiment.

Explanatory variable

Independent variables.  Those variables in a regression model on which the dependent variable is held to depend (i.e. which help to explain the value of the dependent variable).

Exponential Smoothing

 

Exponential Trend

 

Exponential Utility

The Exponential function is one form of the Utility function.  It is parametrised by a single parameter, the risk tolerance, and is usually used to describe Risk Averse behaviour. (section 6.8.3)

Exptrapolation Methods

 

Extrapolation

Extending a pattern in a time series (such as a trend) or regression model beyond the range of the data (or time period of the observations).

 

F.    F

F distribution

The theoretical distribution of the ratio of two variances.  Used in Analysis of Variance tests.

Feasible Region

The feasible region is the area where all of the constraints are satisfied.

Financial Planning Model

A model used for planning capital budgeting and cash flow over time.

Finite Population Correction (fpc)

The calculation of variances is based on either an infinite population or a sampling with replacement for finite populations.  In a finite population, where the sampling is done without replacement, a finite population correction needs to be applied to the variance calculation.

Fitted Value

The expected value of the dependent variable, calculated by putting values for the independent variables into the regression model estimated.

Fixed Cost Model

The feature of a fixed cost model is that an additional one-off cost is incurred if a particular option is chosen, e.g., using a particular production plant, or a machine setup cost.

Folding Back on the Tree

The process of calculating the optimal decision on a Decision Tree.  It works from the right to the left of the tree.

Forecast Error

 

Forecasting

 

Forecast method selection

A survey of forecasting methods (see for instance, Nigel Meade, Evidence for the Selection of Forecasting Methods, J. Forecast., 19, 515-535) concludes that

·        the characteristics of the data series are an important factor in determining the relative performance of methods and

·        statistically sophisticated or complex methods do not necessarily produce more accurate forecasts than simple ones.

Meade shows in this paper that summary statistics can be used to select a good forecasting method (or set of methods) although not necessarily the best.

Forensic Statistical Analysis

 

Formulating the Model

The process of abstraction of a problem from real life into a mathematical form.

Forward Variable Selection

In multiple regression, the approach of starting with only one independent variable in the model and one by one adding in others, keeping those significant.

Fractionally integrated ARMA models

ARFIMA(p,d,q)

Brodsky, Julia and Hurvich, Clifford M., ‘Multi-step Forecasting for Long-memory Processes’, J. Forecasting, 18, 59-75 (1999) with the ARMA model with adaptive parameters proposed by Tiao, G.C. and Tsay, R.S., ‘Some advances in non-linear and adaptive modelling in time series’ J. Forecasting, 13, (1994), 109-131.

F-Ratio

A ratio of two variances, used to test whether they are equal.  Also used in Analysis of Variance to test whether a set of means are all equal.

Frequency Table

A table of values of a variable and the number of cases (frequency) of each value.

Fuzzy Logic

A logical system in which things are not just True or False, but can have degrees of truth (a bit like probability of having a characteristic).

 

G.  G

Genetic Algorithm

A method of optimisation that uses a genetic code to formulate the problem, a ‘fitness’ criterion to judge the quality of a solution, an evolutionary heuristic to select the current ‘fittest” best set from a new ‘generation’ of solutions. Generations are created from an older one by random mutation of individuals or mixing the genetic codes of pairs.

Global Maximum (Minimum)

In some optimisation problems a number of local maxima may be present (like small hills in a landscape).  However, the aim of the optimisation process is to find the largest of these, the Global Maximum.  In LP Models, there is only one hill and therefore one maximum.

Glossary

 

Grand Mean

The overall mean of all the observations, in an experiment or analysis of variance data set, over all the levels of the design variables.

Graphical Excellence

Principles of graphical excellence are clearly explained in Edward R Tufte's books, the first of which is The Visual Display of Quantitative Information, Graphic Press, Cheshire, Connecticut published in 1983.  Envisioning Information (1990) and name? followed and they are fascinating as well as informative.  A powerpoint lesson on graphical excellence is available from Graphical_Excellence.ppt.

Graphical Solution Method

For two dimensional LP problems, it is possible to solve for the optimum graphically.

 

H.   H

H1

The alternative hypothesis.

Ha

The alternative hypothesis.

Ho

The null hypothesis.

Histogram

A graph of a frequency table, showing each value of a variable and a bar whose height represents the frequency of that value in the data set.

Holt's Method

 

Hypothesis Testing

Analysing a sample of data to test whether or not it tends to confirm or deny a particular hypothesised value for a variable parameter.

 

I.      I

In Statistical Control

A process that has all of its data within its control limits.

Independent Samples

Two (or more) samples selected independently of each other, that is, with no association between the selection of one sample and the selection of the other sample.

Independent Samples Test

Testing whether the parameter (e.g. mean0 value from one sample is the same or not as the parameter value from another, independent, sample.

Independent variable

Explanatory variables.  Those variables in a regression model on which the dependent variable is held to depend (i.e. which help to explain the value of the dependent variable).

Indifference Value

The indifference value is the certain (“for sure”) value that a decisionmaker thinks is the same as a risky venture.

Infeasibility

The infeasibility property describes whether or not a solution satisfies all of the problem constraints.

Influence Diagram

A method for describing the elements of a decision.  It displays decisions, uncertain outcomes, intermediate calculations and payoffs.

Influential Point

An observation, in a regression model, which has a particularly strong impact on the parameter estimates, that is, which the results are especially sensitive to.

Inspection

The management process of ‘weeding’ out all of the defective products.

Integer Programming Models

Integer Programming (IP) Models contain one or more variables which can only have integer variables.

Interaction term

A term added to an Analysis of Variance analysis, or to a regression model, to account for the effect of one variable being determined by the value of another variable.

Interquartile Range (IQR)

The difference between the upper quartile and the lower quartile.  It thus represents a range which has half the observations in it.

Inventory Control

 

ITSM

Interactive Time Series Modelling, a computer package for univariate and multivariate time series modelling and forecasting is distributed with Brockwell, P.J. and Davis, R.A. (1996), Introduction to Time Series and Forecasting, Springer-Verlag New York Inc.

 

J.     J

Joint Probability

The probability distribution of two or more events, e.g., the probability distribution of wind and sunshine on any day. (section 4.7)

Judgemental Methods

 

Judgemental Sample

A sampling method based on the judgement of the selector, i.e., it is not random.

 

K.   K

Key Performance indicators (KPI’s)

 

Kurtosis

A measure of how flat, or peaked, a distribution is.

 

L.    L

Lag

 

Leading Indicator

 

Learning Curve Model

 

Least Squares Estimation

 

Least Squares Line

 

Level

 

Likelihood

Often used in the same way as probability, but also has a more technical statistical interpretation.

Likert Scale

A scale of (typically 5 or 7) attitudes, in order from one extreme to the other, from which a survey respondent is asked to chose one.  E.g. Do you approve of the current Prime Minister?  Chose one of: highly approve / approve / neither approve nor disapprove / disapprove . highly disapprove.

Lilliefors Test

A test of the hypothesis that a set of data is from a Normal distribution.

Line of Best Fit

 

Linear Dependence

 

Linear Programming (LP)

Linear Programming is a modelling process that aims to optimises a specific quantity, such as profit.  The features of an LP are that :

·      It has an objective function that is to be optimised,

·      It has a number of resource or other constraints,

·      All relationships are described by linear equations.

A Microsoft Word lesson on linear programming formulation is available in LP_Intro.doc.   

Linear Relationship, positive & negative

A relationship between two variables (say Y and X) such that one is a linear function of the other (so that Y = a + bX).  The scatterplot graph of Y against X will then give points on a straight line.  If the line slopes up, so that as X increases Y increases also, it is termed a positive relationship.  If the line slopes down, so that as X increases Y decreases, it is termed a negative relationship.  The relationship may be only approximate (correlation provides a measure of the extent of the linear relationship).

Linear Programming

Brief explanation needed here.

 

Linkage Analysis

Used to find things that tend to go together.  E.g. people who buy cigarettes also tend to buy matches, buying cigarettes and matches are things which are linked.

Local Maxima

Maximum values that are not the overall maximum – the global maximum, hills in a mountainous landscape.

Logistics Model

A model that links supply and demand at different locations with shipping costs to minimise the overall cost.

Lower Control Limit (LCL)

The lower part of the set of control limits.

Lower Specification Limit (LSL)

The lower part of the set of specification limits.

 

M. M

Managerial Economics Model

These are models where economic considerations also play a role, e.g., using price/demand functions.  These models are often non-linear.

Market Share Model

A model that incorporates competition in a market and calculates the relative shares of the competitors

Mathematical Programming Models

The generalisation of LP models to models with functions that may not be linear.

Maximum Probable Absolute Error

This is the quantity of a characteristic such that there is a 95% probability that the sampling error will not be greater than this quantity.

MA

MA(q)

Moving average

A moving average process is a time series where the latest observation depends principally on averages of previous random disturbances.  An MA(q), a moving average of order q for a time series Xt  can be defined by

 where the Zt terms are random disturbances.

Market Research

 

Matrix Plots

With multivariate data, an examination of the plot of each variable in the set against every other variable in the set can be revealing.  One example is given in Psychology.

Maximum

The largest value of a variable seen in a set of data.

Mean

A measure of central tendency – the average value of a variable in a set of data.  Calculated by adding all the values observed (counting each value as often as it is observed) and dividing the total by the number of observations.  (See also median and mode)  A powerpoint lesson on basic statistical summary measures is available in Summary_measures.ppt.

Mean Absolute Error (MAE)

 

Mean Absolute Precentage Error (MAPE)

 

Measurement Error

The error contribution resulting from the measurement instrument, e.g., poorly framed questions.

Measure of Association

A measure of how two (or more) variables are related, as distinct from independent.  E.g. correlation.

Measure of Dispersion (q15)

A measure of how spread out a set of data (or a distribution) is.  E.g. variance, inter-quartile range.

Median

A measure of central tendency – the middle observation – half the observations in the data set exceed the median and half fall below the median.  (See also mean and mode)  A powerpoint lesson on basic statistical summary measures is available in Summary_measures.ppt.

Minimum

The smallest value of a variable seen in a set of data.

Minimum Cost Network Flow Model

This is a general set of models where goods flow to demand nodes from supply nodes.  Capacity restrictions may apply on some or all of the arcs.  The objective is to minimise the overall cost of supply.

Mode

A measure of central tendency – the most frequently occurring value.  (See also median and mean)  A powerpoint lesson on basic statistical summary measures is available in Summary_measures.ppt.

Model

 

Modelling the Price of a Stock

A model of changes in a stock price using Black Scholes’ theory.

Moving Averages

 

MSE

 

MSR

 

Multicollinearity

 

Multiple Regression

 

Multiplicative Model

 

Multiplicative Relationship

 

multi-stage decision problem

A decision problem which has more than one decision to be made. (section 6.6)

Multistage Sampling

A sampling process which is done in stages.  Sometimes what happens in one stage will determine what is done in a subsequent stage.

 

N.   N

Naïve Forecasting Model

 

Negatively Skewed data

Data for which the frequency distribution is not symmetrical, but extends further (to more values) below the centre (mean/median/mode) than above it.

Neural Networks

A form of modelling complex relationships between variables, based on an analogy with the working of the brain.  Between the input (independent0 variables and the output (dependent) variables are a range of intermediate variables.

Nightingale, Florence

Florence Nightingale (1820 – 1910) was an accomplished statistician and she invented several graphical displays to support her theories that poor medical practices, poor nutrition and lack of nursing were the principle causes of deaths in the Army.  For example, see her Coxcomb.

Nominal Data

Data falling into categories, and there is no meaningful order to the categories.  E.g. colour of eyes.  Compare with Ordinal data.

Nonlinear Relationship

A relationship between variables which, when graphed, is a curve and does not approximate a straight line.

Nonlinear Transformation

 

Nonnegativity

The Nonegativity property specifies that variables cannot be negative for an LP Model.

Nonnormal Distribution

A distribution which is not the Normal distribution.

Nonresponse Bias

Non responses sometimes are not random but have a well defined characteristic, e.g., people not home during the day, and these may have an impact on the characteristic being measured.

Nonsampling Error

The error contribution resulting from sources other the sampling process.

Nontruthful Responses

Responses which are not truthful particularly to questions that some people may find threatening.

Normal distribution

 

Numerical Data

Observations on variables made up of numbers.

Nuisance Parameter

A parameter which is not the one we are interested in, but which we need to know the value of to test a hypothesis about the parameter we are intested in, or carry out some other analysis.

Null Hypothesis

The hypothesis that we are testing in a hypothesis test.  It specifies a particular value for the parameter of interest.

 

O.  O

Objective Function

The mathematical description of the quantity that is to be optimised.

Observable Trend

A clear, fairly obvious trend, usually seen in a time series graph of data.

Observation

An item in a set of data, on which we have one or more values of variables (often represented as a row in a data spreadsheet).  Also called a case.

Odds ratio

The probability of an event, divided by the probability the event will not happen.

One-sided Confidence Interval

A confidence interval with an upper limit, but no lower limit; or, vice versa, a confidence interval with a lower limit, but no upper limit.

One-tailed

Refers to either the upper, or lower, tail of a distribution, used in finding one-sided confidence intervals or carrying out one-sided hypothesis tests.  A hypothesis test where the alternate hypothesis specifies a single direction only, that is, the null hypothesis can only be false if the test statistic is too big, but not too small (or vice versa).

One-Way Analysis of Variance (ANOVA)

A test, simultaneously, of whether three or means are all equal, or have some differences.

Operations Research

Operations Research is a philosophy used to develop models of systems or processes in a manner that will facilitate improving the performance of the system or process.  It can be applied in the management and improvement of all types of private and public sector enterprises.

Operations Research provides the basis for a Decision Support Service for all levels of management. To read more, including some historical details, see  OR_Intro.doc.

Optimisation

vs Heuristics

 

 

Optimal Solution

The solution to the LP problem that satisfies all of the contraints and optimises the objective function.

Optimal Strategy

The strategy that maximises or minimisers a particulat objective, such as EMV or Expected Utility.

Optimisation Modelling

The modelling process to develop optimal solutions to problems.

Ordinal Data

Data falling into categories, where there is a meaningful order to the categories.  E.g. Do you approve of the current Prime Minister?  Chose one of: highly approve / approve / neither approve nor disapprove / disapprove . highly disapprove..  Compare with nominal data.

Out of Statistical Control

A process that has some of its data outside of the 3-sigma limits or not satisfying a number of statistical criteria to remain within control.

Outlier (mild, extreme)

A value for a variable which is noticeably apart from the rest.  A value more than three standard deviations from the mean, or one outside the outer hinge of a boxplot, can be called an extreme outlier; a value more than two standard deviations from the mean, or outside the inner hinge of a boxplot, can be called a mild outlier.

 

P.    P

Paired samples

Two samples in which the units in one sample are each closely linked to a similar unit in the other sample.

Pareto Distribution

 

Partial F-Test

 

Payoff Table

A decision may have a number of different alternatives, and each alternative a number of different outcomes.  The values resulting from the combination of alternatives and outcomes is often represented in a payoff table.

p-Chart

Also the control chart for attributes, i.e., for proportions.

Percentile

That value of a variable which has a given percentage of the observations below it.  For example, the 5th percentile has 5% of values below it, the 75th percentile has 75% of values below it. 

Pivot Tables

Contingency tables produced in Excel.  The pivot table tool in Excel has a flexibility and power that make this a very useful tool.

Point Estimate

An estimate of a population characteristic based on a sample.

Pooled Standard Deviation

An overall standard deviation calculated by combining two or more standard deviations, usually from sub-groups or samples.

Population

The full set from which a sample is chosen, and hence to which the sample inference statistics apply.

Portfolio Optimisation Model

Given a set of investments, a portfolio optimisation model selects those investments that has a minimum variance and an acceptable expected return.  These models are often quadratic optimisation problems.

Positively Skewed Data

Data for which the frequency distribution is not symmetrical, but extends further (to more values) above the centre (mean/median/mode) than below it.

Poster presentations

Poster presentations are often used at conferences to enable researchers to communicate ideas flexibly to people attending.  An effective poster should have certain attributes.  A Poster about producing posters is available in Poster.rtf.

Posterior Probability

The probability resulting from a Bayes’ Calculation

PrecisionTree Add-in

An Excel Add-In supplied with Albright, S. C., Winston, W. L. & Zappe, C (1999), Data analysis and decision making with Microsoft Excel,  Duxbury Press, Brooks-Cole, Pacific Grove, Ca. that may be used to evaluate decisions using decision trees or influence diagrams.

Prediction Interval

 

Principle of Parsimomy

 

Prior Probability

The conditional probability of events that are used in a Bayes’ calculation.

Probability Sample

A sample which is selected according to a random mechanism such as a set of probability tables.

Process Capability

The state of a process which determines how well it is able to meet specifications when operating in its natural state.

Process Capability Analysis

An analytical process to determine how well a process can meet it specifications when operating normally.

Process Capability Indices

Indices which numerically show well a process can meet its specifications when operating normally.  If the Process Capability Indices have a value of 1.0 or better, then the process is called ‘capable’.  Typical indices are Cpk and Cp.

Process

Business Process Re-engineering

 

Product Mix Model

A factory, say, may be able to make a number of different products.  A Product Mix model has as its output the quantities of each of the products to be made subject to the given  constraints that optimise the objective function

Proportional Sample Sizes

Sample sizes in strata that are determined on the basis of the overall strata sizes.

Proportionality

The proportionality property means that if the level of an activity is multiplied by a constant factor, then the contribution of this activity to the objective or to any of the constraints in which this activity is involved is multiplied by the same factor.

p-value

The probability that, if the null hypothesis were true, a value of the test statistic as extreme or more as that observed would have occurred.  A small value is taken to indicate that the null hypothesis is to be rejected.

 

Q.  Q

Quadratic Loss Function

A quadratic function, first introduced by Taguchi, who formulated it as measuring societal loss when a product is off-target.  The quadratic function achieves its minimum when the product quality characteristic has its distribution centred on its target specification.

Quality Assurance

 

Quality Control (QC)

The section of an organisation, or the process which has as its brief to monitor the quality characteristics of products.

Quality Function Deployment (QFD)

QFD is a planning tool whose purpose is to design quality into a product or service by starting from customer needs.  It then translates these through a number of iterations into product and process specifications.

Quantile-Quantile (Q-Q) plot

An informal, graphical test of whether an observed distribution is Normal or not.

Quartiles

The 25th and 75th percentiles of a variable.  That is, the upper quartile has 75% of values below it, the lower quartile has 25% of values below it.

Queueing Theory

 

 

R.   R

Random Numbers

Numbers, supposedly without any structure, but which are representative of the full range of possible values.  They are the basic inputs to simulation studies.

Random Samples

A sample selected by a random selection process.

Random Selection

Selection of a sample from a population that is done randomly.  Technically, selection of a sample such that each possible sample is equally likely to be chosen, and each unit of the population equally likely to be in the sample.

Randomized Experiment

An experiment in which subjects or objects are randomly assigned to groups, which we are going to test for differences in.

Randomized Responses

A response whose answer is randomized to counteract the effect of nontruthful responses.  In this way whilst the individual response may be unreliable, the overall estimate is unbiased.

Rational Subsample

Rational subsamples are designed so that only common cause variation exists within a sample.  Assignable cause variability, if it exist occurs as variation between samples.  For example, samples are taken and analysed separately from different machines (not pooled) and separately for operators.

Range

The difference between the minimum and maximum values of a variable.

Ratio-to-Moving-Average

 

Red Bead Experiment

Demings’ ‘Paddle stick’ experiment that is used to demonstrate the variability inherent in sampling proportions of red beads (defects) from a boxed with both red and white beads, as well as, how the imperfect manufacture of the ‘paddle stick’ leads to results which deviate from the theoretical Binomial distribution.

Regression Analysis

The term regression comes from one of the first applications of the technique, carried out by Francis Galton (Family Likeness in Stature, Proceedings of the Royal Society of London, 1886, pp.  42-72.) in a series of papers studying the relationships between the heights of children and their parents.  He found that the child of a tall parent (or parents) tended to be tall, but not quite as tall as the parent(s), and that the child of short parent(s) tended to be short, but not quite as short as the parent(s).  There was a tendency for children’s heights to regress towards the population average height.

It is unfortunate that the name is not descriptive of the technique itself but it is over one hundred years too late to complain of the term regression.

Relationship between variables

Variables can be related, or associated, in various ways.  The idea is that knowing the value of one variable gives you information about the likely value of the other variable.

Rejection Region

Those values of a hypothesis test statistic that, if seen, would lead to the rejection of the null hypothesis.

Research Hypothesis

The hypothesis which a research project is set up to test.

Residual Value

 

Response Variable

 

Rework

The process of correcting the defects of products.

Risk Attitude

The term that describes whether a decision maker is ‘risk seeking’, ‘risk averse’, or ‘risk neutral’ in respect of EMV.

Risk Averse

The Risk Attitude where a decisionmaker trades off some of the EMV for a less risky venture.

Risk Profile

The probability distribution of the outcomes of a decision.

Risk Seeker

The Risk Attitude where a decisionmaker trades off some of the EMV for a more risky venture.

Risk Tolerance

The parameter that specifies the Exponential Utility Function.  It is approximately equal to the dollar amount, R, such that the decisionmaker is indifferent between,

·        Obtaining no payoff at all,

·        Obtaining a payoff of $R or the loss of $R/2, depending on the flip of a fair coin.

RiskView

RiskView is a part of the Decision Tools Suite that shows the graph of any input probability distribution.

Robust to Violations of Normality

Many tests and analyses involve an inherent assumption that the data is normal in distribution.  An analysis is robust if it is not too sensitive to departures from this assumption, that is, if it is still reasonably accurate even if the data is not exactly Normal.

Rolling Planning Horizon Model

This is an Aggregate Planning Model where the time horizon is fixed at a certain number of periods ahead.

Root Mean Square Error (RMSE)

 

R-squared

 

Runs Test

 

 

S.    S

6-Sigma

The number of standard deviations (plus or minus) from the mean.  Also an approach to statistically control a process to within 6-sigma, i.e., so that only .002 ppm lie outside of the control limits (Motorola).

Sample

A sub-set (part0 of a population, chosen out of the population, usually as in some way representative of the population.

Sampling

The process of obtaining sampling units.

Sampling Distribution of the Sample Mean

The distribution of the sample mean, e.g., normal, or t-distribution.

Sampling Distributions

The distributions resulting from the sampling process, the normal, t-distribution, chi-squared, F-distribution.

Sampling Error

The error contribution resulting from the sampling process.  As the sample size becomes larger the sampling error becomes smaller.

Sampling Frame

The list of all units from a population from which a sample is to be drawn.

Sampling Interval

In a systematic sample, the gap between selecting units, e.g., every 10 for a one tenth sample.

Sampling Unit

The basic unit in a population that can be selected.

Seasonal Adjustment

A means of taking out the effect of regular seasonal impacts on a time series of data, to enable the trends to be seen more clearly.  There are a number of techniques for doing this.

Seasonal Pattern

A regularly repeating pattern in a time series, that repeats every year.

Sensitivity Analysis

A standard part of many analysis to see the impact that the numerical assumptions have on the outcome.  They are often done as ‘what-if’ questions.

Sensitivity Graph

A graph that shows how the solution changes with changes in particular numerical assumptions.

Set Covering Model

In a set-covering model, each member of a given set must be ‘covered’ by an acceptable member of another set.  The objective is to minimise the number of members of the second set to cover all of the first set, e.g., fire stations covering city areas, or the location of hubs for airlines.

Seven-Step OR methodology

One version is …

Another description is given in OR_Intro.doc.

Shadow Price

The value, in objective function terms, that results from relaxing a constraint by one unit.

Shewhart Chart

The process control charts first formulated by Walter A  Shewhart.  This generally refers to the X-bar, R charts

Sigma-hat

The unbiased sample estimator of the population standard deviation.

Significance level of the Test

The (chosen) probability of a type I error.

Simple Exponential Smoothing

 

Simple Random Sample

A random sampling process which takes no account of any population characteristics, but gives each population unit an equal chance of being selected.

Simple Regression

 

Simplex Method

The mathematical technique for solving LP problems.

Simulating Correlated Values

Where there are multiple input variables to be simulated, it is often unrealistic that these variable be independent of each other.  In this case correlated values need to be simulated.  @Risk is able to generate these values.

Simulation Model

A model that has one or more input variables that are subject to a probability distribution.

Simulation

Deterministic/Stochastic  Discrete Event  Continuous Replicative

Single-stage decision problem

A decision problem with only one decision to be made.

Skewed data

Data that has a frequency distribution that is not symmetrical, or evenly balanced, about the centre.  See positively and negatively skewed.

Skewed to the Left

Data for which the frequency distribution is not symmetrical, but extends further (to more values) below the centre (mean/median/mode) than above it.  I.e. negatively skewed.

Skewed to the right

Data for which the frequency distribution is not symmetrical, but extends further (to more values) above the centre (mean/median/mode) than below it.  I.e. positively skewed data.

Smoothing Constant

 

Smoothing Method

 

Solver Add-In

The add-in, developed by Frontline Systems, that is available with Excel for solving LP and other problems.

SolverTable Add-In

The add-in used in conjunction with Solver to solve for a range of alternative assumptions.  Used for performing Sensitivity Analysis.

Sources of Estimation Error

Identified reasons contributing to statistical variability

Span

 

Special causes

Another term for assignable causes.

Spider Graph

A graph that shows how the base solution changes, in percentage terms, with changes to particular numerical assumptions.

Spurious Correlation

 

SSE (sum of squared errors)

 

SSR (sum of squares due to regression)

 

SST (total sum of squares)

 

Stacked Boxplot (side-by-side Boxplots)

A set of box plots of a variable, with one boxplot for each category value of a second variable.  The boxplots are stacked, or drawn side by side, to enable easy comparisons.

Standard deviation

A measure of variation in data, the square root of the variance.  A powerpoint lesson on basic statistical summary measures is available in Summary_measures.ppt.

Standard Error of Estimate

 

Standard Error of the Mean

The standard deviation of the sample mean.

Static Workforce Scheduling

A model that allocates the number of employees required on different days, say, subject to demand and work constraints.

Statistical Inference

The process of drawing conclusions on population characteristics on the basis of using statistical samples.

Statistical Model

 

Statistical Process Control (SPC)

SPC is the method of monitoring the output quality of a process by using statistical means, in particular, control charts.

Statistically Significant at the alpha level

The p-value is less than alpha.  The hypothesis test at the level alpha leads to a reject the null hypothesis conclusion.

Statistics

Statistical Analysis

 

Statistical Decision Theory

Decision Analysis

Statistical Process Control

 

StatPro

This Add-In to Excel enables one to carry out basic statistical analyses and produce simple but effective graphical summaries of data.  In particular the box plot feature is excellent.  A good book with which StatPro is distributed is Albright, S. C., Winston, W. L. & Zappe, C (1999), Data analysis and decision making with Microsoft Excel,  Duxbury Press, Brooks-Cole, Pacific Grove, Ca.  It also contains other Add-Ins including @Risk and decision tree software.

Steady state

 

Stochastic Optimisation

 

Straightline Relationship

 

Stratified Sampling

A sampling process which firstly divides the population on a basis of a particular characteristic and then takes a random sample from each strata.

Stratify (with Pivot Tables)

Create strata from a particular characteristic

Subpopulation Strata

A sub-division of a population which shares a common charcteristic.

Subsample

A sample taken from a sample.

Supply Chain

 

Survey

The process of going out and asking a set of questions (by mailed out questionnaire, interviews, via telephone, etc) or inspecting, to collect data, from a sample.

Symmetrical Distribution

A frequency distribution that is evenly balanced on either side of the centre (mean/median/mode).

Systematic Relationship

A relationship between variables that persists and continues.

Systematic Sampling

A sampling process which samples from a population based on a systematic rule.

System

IT/General

 

T.   T

Tail of a Distribution

The ends of a frequency distribution where the frequencies are small.

Target (objective) Cell

When setting up an LP model for Solver, this is the cell containing the objective function and is to be optimised.

t-distribution

The (theoretical) distribution of the standardised sample mean (that is, the difference between the sample mean and the true mean, divided by the standard deviation of the mean), when the standard deviation is estimated from the sample data.

Test for Normality

A test of the hypothesis that the data observed comes from a Normally distributed variable.

Test Statistic

A value calculated from the sample used to test a hypothesis.

The Law of total Probability

The Law of Total Probability gives a way of dividing up the probability of an outcome by basing it on a conditioning event.

The Value Model

The Value Model provides a means of transforming decisions and outcomes into monetary values.

The Value of Information

The value of information is the increment that the information brings to a decision.

Time Series Analysis

Analysis of time series data, often with the aim of forecasting future values of the series.  The analysis involves finding patterns in the time series, including trend and seasonality as examples of such patterns.

Time Series Data

A set of values of a variable at different times, usually regular times (e.g. yearly, monthly, daily, or hourly).  Each time is a case for the variable.

Time Series Plot

A graph of time series data, usually with time as the horizontal axis variable.

t-multiple

Same as t-value

TopRank

TopRank is a part of the Decision Tools Suite.  It is used to cycle through the input variables to determine the impact of these variables on the output variables.

Tornado Graph

A graph that presents the impact of the possible range of some of the parameters have on the outcome.  The impact is usually sorted from greatest to least presented from top to bottom of the graph.  Hence the name.

Total Quality Management (TQM)

A management philosophy based on the teaching of W Edwards Deming, first developed by the Japanese.  The guiding principles for this approach are found in Demings’ 14 points.

Trade-Off

 

Transient state

 

Transportation Model

A specific model that links demand and supply locations.  The objective is to minimise costs.  It often has a particular structure that allows for the application of a fast solution algorithm.

Transshipment Model

This is very similar to the Transportation model except that a demand point may also ‘transship’ to another demand point in order to minimise costs.

Treatment Group

A sample group, in an experiment, that has the same value of the factor of interest (treatment).

Triangular Distribution

A probability distribution that looks like a triangle and is thus determine by 3 points.

t-value

The test statistic used to test a null hypothesis about a mean. The standardised sample mean (that is, the difference between the sample mean and the true mean, divided by the standard deviation of the mean), when the standard deviation is estimated from the sample data.

Two Way Sensitivity Analysis

Much of sensitivity Analysis is conducted ‘one variable at a time’.  This analysis is done ‘two at a time’ and is presented graphically as such.

Two-Sided Confidence Interval

A confidence interval with both an upper and a lower limit.  The most usual confidence interval form.

Two-tailed

Refers to both the upper and lower, tails of a distribution, used in finding two-sided confidence intervals or carrying out two-sided hypothesis tests.  A hypothesis test where the alternate hypothesis is just that the null hypothesis is false, which it can be if the test statistic is too big or too small.

Type I Error

The error of rejecting the null hypothesis when it is true.

Type II Error

The error of accepting the null hypothesis when it is false.

 

U.   U

Unbiased Estimate

An sample estimate of a population characteristic which will get closer to the true value as the sample size increases.

Unboundedness

The property of a solution that indicates that the model has no finite solution.

Uncertain Outcome  (and its probability)

An outcome that cannot be predicted beforehand but which can be associated a probability.

Uncertainty

The state of not knowing the outcome of an event.

Unequal Variance (heteroscedasticity)

 

Uniform Distribution

A distribution which has as its characteristic that all of its X- values have the same probability of occurring.

Uniform Distribution

A probability distribution that is completely flat between two specified points.

Unrepresentative Sample

A sample which is not representative of the total population, i.e., does not have all of the characteristics of the population from which it was drawn.

Upper Control Limit (UCL)

The upper part of the set of control limits.

Upper Specification Limit (USL)

The upper part of the set of specification limits.

Utility Function

A mathematical function to relates the risk attitude on a scale of 0 to 1 (0=extremely undesirable, 1=extremely desirable) against monetary value.

 

V.   V

Validation of the Fit

 

Valuing a European Call Option

A specific option pricing model for an option that can be bought or sold on a specific date for a specific price.

Valuing a More Exotic Call Option

An option pricing model for a more complex option where the payoff may be varied.

Variable

Something which we have values of for the cases in a set of data.

Variance

A measure of the spread of data – essentially an average of the squared deviations of observations about the mean.    A powerpoint lesson on basic statistical summary measures is available in Summary_measures.ppt.

Variance Reduction

 

 

W.W

Weighted Least Squares

 

Well-Scaled Model

This is a model where the coefficients are approximately of the same order of magnitude.  A badly scaled model may be difficult to solve because of rounding errors.

Western Electric Rules

A set of rules applied to control charts to monitor for changes in a process.  They include the monitoring of runs of observations, above and below the target, up and down as well as clumping in particular parts of the chart.

Winter's Method

 

Within Sample Variation

The variation (variance) of values of a variable in a particular sample.  In Analysis of Variance, designates the variance between units in the same sample sub-group.

X.    X

X-bar, R Chart

A control chart typically consists of two sub charts.  The first monitors for process target, the X-bar, and the second monitors for variability, the R-chart.  X-bar being the average of the selected sample and R the range of the measurements in that sample.

"X-Y" Chart

A two dimensional graph of data on two variables (X and Y) which shows the relationship between the variables.  Also called a Scatterplot.

 

Y.   Y

Y

 

 

Z.    Z

z-multiple

Same as z-score

Zone A Rule

The Zone A rule specifies how many observations beyond two standard deviations of the process target constitutes a ‘signal’ - this is 2 out of 3 consecutive observations on the same side of the target..

Zone B Rule

The Zone B rule specifies how many observations beyond one standard deviation of the process target constitutes a ‘signal’ - this is 4 out of 5 consecutive observations on the same side of the target.

z-score

The standardised sample mean, that is, the difference between the sample mean and the true mean, divided by the standard deviation of the mean, when the standard deviation is a known value.

    Return to home page