From the Syllabus
3.4 Quantitative Skills
With respect to quantitative skills, learners should understand the purposes and difference between the following and be able to use them in appropriate contexts:
- Mean, median, mode, range, interquartile range, and standard deviation
- Lines of best fit and correlation on graphical representations
- Measurement, measurement errors, and sampling.
Note: for all the different types of mathematical functions the command to carry out that function in excel is listed.
Quanititive Data – data that is numerical (numbers); the opposite is qualitative data (non-numerical).
Measures of Central Tendency
Mean, median, and mode are all referred to as measures of central tendency; they allow a simple way to compare different data sets. They give a single figure which provides a summary of all the data collected and allows for easy comparison. For example, if you collected a number of air quality readings in two different cities you could work out the mean, median, or mode to give you a single figure to represent all the data from that town.
When we use the term ‘average’ usually we are referring to the ‘mean’ but in fact all three – mean, median, and mode can be referred to as averages.
Mean – the most comment measure, calculated by adding up the individual values and then dividing by the total number of values.
Excel Command: =AVERAGE
Median – the middle value in the data set. To calculate, the median arrange all the data in rank order (from smallest to largest) and identify the middle value, if it is an even number of data items the median is halfway between the two middle values (add the two middle values together and divide them by two).
Excel Command: =MEDIAN
Mode – the most commonly occurring value in a data set.
Excel Command: =MODE
As geographers, it is important to know when each should be used, and all are able to be used on very large data sets; particularly if you use excel to calculate the answer rather than working out the answer by hand. Depending on the distribution of data different measures of central tendency may be more appropriate.
- This uses every value in the data set and gives a simple overview.
- Works best when there is a narrow range of values as it can be skewed by one or two very high or very low values and the mean will not be representative of the data set as a whole.
- This is the middle value so it is easy to calculate.
- It is not skewed by extremely high or low values in the data set.
- Can be used with categorical data (for example data when you have non-numerical data – e.g. car colour) whereas mean and median can only be used with ordinal (numerical) data.
- Not affected by extreme values.
- However, data set can have no modal value (if each value only occurs once) or more than one modal value, which can be confusing.
Measures of Dispersion
Measures of dispersion are useful to use with measures of central tendency as dispersion refers to how spread out a data set is. For example, the data sets 4,4,5,6,6 and 1,3,5,7,9 both have the mean and median of 5 but have different dispersions.
Range – the difference between the highest and lowest value in a data set. It is calculated by taking the highest value and subtracting the lowest value. It is easy to calculate but emphasises the extreme values and does not give any information about the rest of the data.
Excel Command: =RANGE
Interquartile Range – this refers to the interval between the middle 50% of values in a data set – 25% of the data either side of the media. It measures the spread of data around the median and ignores the extreme values.
To calculate the interquartile range you need to first put the data in order and find the median, look at each half of the data and find the median of that half. This will give the lower quartile and upper quartile. Then subtract the lower quartile from the upper quartile.
Excel Command: To calculate interquartile range in excel you need to get excel to calculate the lower (1st) quartile and upper (3rd) quartile separately and then subtract them. The function is: =QUARTLE full instructions here.
Standard Deviation – The standard deviation measures the spread of values around the mean. Unlike the range and interquartile range, it incorporates all the values in the data set. The formula for standard deviation is:
To calculate the standard deviation you can use the above formula; this gets you to add up all the squared deviations (each value taken away from the mean squared), and then divided by the total number of data values and square rooted.
Excel Command: =STDEV
The smaller the interquartile range, or standard deviation the less dispersed the data is, and the stronger the clustering around the middle value. As a geographer, you then need to use your knowledge of geography to attempt to explain reasons for the spread of data.
Lines of Best Fit
Lines of best fit can be used in a scatter graph to show the data trend and correlation (is the general trend positive or negative). First, a scattergraph needs to be drawn – normally the independent variable is plotted on the x-axis and the dependent variable on the y-axis.
Software such as excel will use mathematical techniques to draw the line of best fit; however, when drawing the line of best fit by hand the following should be considered.
The line of best fit should usually be straight, however, depending on the data being graphed a curved line of best fit may be more appropriate (for example showing exponential growth).
The best fit line should follow the general trend of the data, have an equal number of points either side and may go through the mean x and mean y value.
The line of best fit indicates the relationship or ‘correlation’ between data sets.
Correlation on Graphs
When a scattergraph is drawn it shows the relationship between two sets of data. The type of relations ship can be described as a correlation and can be shown by the image below.
When carrying out a geographical investigation you are not normally able to collect data on the entire study group (population). So instead data on a subset is collected. For example, if you are carrying out interviews in a shopping centre you can not collect data on every person who visits that shopping centre. Sampling refers to the way the subset from which data is collected from is chosen from.
A good sample has the following qualities:
- It is unbiased.
- It is precise
- It is large enough to provide conclusive results and have statistical significance.
- It can be collected in the time available with resources available.
There are four main types of sampling:
Random – where items are selected using a random number table or through a random number generator.
Systematic – where items are selected at regular intervals, e.g. every x metres or every 5 minutes.
Stratified – where measurements are taken from different subsets; for example 5 people from each age group or 3 points in each postcode area.
When measuring any geographical data when carrying out fieldwork it is important to measure accurately and record the unit. For example, are you measuring in mm or cm? This is particularly important if multiple people are collecting the data. Furthermore, it is important to ensure that the most appropriate tool is being used – for example, if measuring pebbles callipers would be more appropriate than a metre stick.
Potential sources of error when carrying out fieldwork data collection:
- Using wrong equipment
- Confusion over units.
- Transcription error – making a mistake when typing up fieldnotes (perhaps because of poor handwriting).
- Estimating rather than measuring
- Not reading the instrument correctly (for example not being at the right level when using a clinometer)
- When collecting qualitative data different opinions -for example, what does ‘very little’ litter look like.