Data analysis of the PL
Posted: 02 Oct 2020, 13:18
Hi,
I am studying physics in uni, and much of the work is about gathering, planning, and analyzing data.
With the high expectations from the team in our first season in PL since 16 years, i decided to download some data from the internet and analyze it to try and have some quantitative info about how well we are progressing each week, how many points are usually enough to promise staying in the league, europa league spot, champions league spot etc...
The data i used was taken from https://www.football-data.co.uk/englandm.php .
Each file contains a match list of a specific season starting from 93/94 season up to last season (a csv file with the current season is updated as well!).
The stats presented in each file are different, with later seasons featuring more stats. Most if not all contain FT results and HT results, other stats documented are ref name, attendance, date, corners, free throws, shots, shots on target etc...
I analyze the data using python code, and i save every piece of data as a csv (excel), so if anyone is interested in seeing the code and/or have his hands on the files just reply in this thread or PM me, i'd love to share it.
I've started with a very basic analysis here, however, i hope to keep work on it and find some more interesting stats.
If anyone has an idea for something to check tell me about it and i'll add it to my to do list .
one last thing - statistics are not alway right, it is rather a mean to describe a standard. It does not claim to predict the future. If there is something that can be said on this particular season is that it is'nt standard at all! With no crowd at the stadiums, denser scheduale, prolonged transfer window, possible corona infection of individual players, staff members and their relatives it will be interesting to see how this season differs in numbers.
So without further ado i shall present my results:
* In order to finish in the 17th place, which promise another season in the first tier, in an ordinary season a club would have to secure in avarage 38 points, with a standard deviation of 2.47 points, where the maximum upper deviation is 6 points (i.e. 44 pts) and the maximum lower deviation is 4 point (i.e. 34 points). The median for this position is also 38 points.
- A mean of 8.96 +- 2.32 pts should be gained against teams that will be relegated (out of 18 possible)
- A mean of 6.84 +- 3.20 pts should be gained against teams that will finish in lower midtable (places 14-17) (out of 18 possible, since we finish 17th)
- A mean of 13.2 +- 3.24 pts should be gained against teams that will finish in upper midtable (places 8-13) (out of 36 possible)
- A mean of 5.28 +- 2.32 pts should be gained against teams that will be qualified to uefa league (places 5-7) (out of 18 possible)
- A mean of 3.04 +- 2.20 pts should be gained against teams that will be qualified to the champions league (places 2-4) (out of 18 possible)
-** A mean of 0.8 +- 1.26 pts should be gained against team that will win the league (1st place) (out of 6 possible)
* In order to finsih in a top half position (10th or higher) we will need to secure an avarage of 49 pts at least, with a standard deviation of 2.66 pts, where both max upper & lower deviation being 5 pts (i.e. 44 & 54 respectively). The median for 10th place is 50 pts.
- A mean of at least 11.24 +- 2.87 pts should be gained against teams that will be relegated (out of 18 possible)
- A mean of 13.4 +- 3.12 pts should be gained against teams that will finish in lower midtable (places 14-17) (out of 24 possible)
- A mean of 13.76 +- 3.33 pts should be gained against teams that will finish in upper midtable (places 8-13) (out of 30 possible, since we finish 10th)
- A mean of 6.08 +- 2.15 pts should be gained against teams that will be qualified to uefa league (places 5-7) (out of 18 possible)
- A mean of 3.96 +- 2.57 pts should be gained against teams that will be qualified to the champions league (places 2-4) (out of 18 possible)
-** A mean of 0.8 +- 0.94 pts should be gained against team that will win the league (1st place) (out of 6 possible)
* In order to sneak to a european qualification spot (assuming 1 cup will be won by a top 5 club, hence 6th place will lead to europe) we will need to secure an avarage of 61 pts at least, with a higher standard deviation of 3.75 pts, where both max upper & lower deviations being 8 pts (i.e. 53 & 69 respectively). The median for 6th is 61 pts.
- A mean of 12.24 +- 3.13 pts should be gained against teams that will be relegated (out of 18 possible)
- A mean of 14.4 +- 3.26 pts should be gained against teams that will finish in lower midtable (places 14-17) (out of 24 possible)
- A mean of 20.76 +- 2.87 pts should be gained against teams that will finish in upper midtable (places 8-13) (out of 36 possible)
- A mean of 5.68 +- 2.54 pts should be gained against teams that will be qualified to uefa league (places 5-7) (out of 12 possible, since we finish 6th)
- A mean of 6.68 +- 2.59 pts should be gained against teams that will be qualified to the champions league (places 2-4) (out of 18 possible)
-** A mean of 1.44 +- 1.36 pts should be gained against team that will win the league (1st place) (out of 6 possible)
* For the cheeky ones out there - to win the league we will need a high score avarage of 87 pts, with a high standard deviation of 6.5 pts, where the max upper deviation is 13 pts (man city with a 100 pts) and the max lower deviation is 12 pts (scum with 75 pts). With the median being the same as the avarage - 87pts.
- A mean of 16.28 +- 1.40 pts should be gained against teams that will be relegated (out of 18 possible)
- A mean of 19.32 +- 3.32 pts should be gained against teams that will finish in lower midtable (places 14-17) (out of 24 possible)
- A mean of 27.84 +- 3.55 pts should be gained against teams that will finish in upper midtable (places 8-13) (out of 36 possible)
- A mean of 12.76 +- 3.22 pts should be gained against teams that will be qualified to uefa league (places 5-7) (out of 18 possible)
- A mean of 11.2 +- 2.97 pts should be gained against teams that will be qualified to the champions league (places 2-4) (out of 18 possible)
-** No need to take any points as we are the champions!
** the last stat (pts against champions) seem to give nonsenseical values with standard deviation which is greater than the value, however this is because, as i mentioned before, statistics are not doing well with little data. when it comes down to 1/2 games per season there are too many variables to take into account (injuries, momentum, if the champions already promised their championship etc...), and so statistic usually are not giving too much. i shall consider adding the champions one group down to CL qualified to make it places 1-4. will be happy to hear your opinion.
enjoy and stay tuned.
edit:
i realized some might not be familiar with the exact meaning of the stsatistical terms, so i'll explain briefly:
mean value - equal to the avarage in all my analysis - the mean doesnt have to be a value within the population or even a possible result.
standard deviation - traditionaly defined as the set of values which contained within the 68% most probable values (34.1% from each direction of the mean value). In practice what it means is that if we try to guess, say, how many points the 17th place gonna have this (or any other) season, the most probable value would be the mean (38) +- the standard deviation (2.47) with a confidence level of 68% - meaning 68% of the seasons the values would be within 38+-2.47 pts.
median - the middle number in a set of values. say we have the set {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} the median would be (5+6)/2=11/2=5.5. the median also doesnt has to be a value in the population or even a possible value (say it can be decimal e.g 37.6).
maximum upper/lower deviation - the maximum deviation from the mean value to either side of the mean - upper -> higher value than mean, lower -> lower value than mean.
2nd edit:
for a lot of the stats i present i dropped the 93/94 & 94/95 seasons as they feature 22 clubs, so things like absolut number of points are due to deviate greatly from a 20 club season.
I am studying physics in uni, and much of the work is about gathering, planning, and analyzing data.
With the high expectations from the team in our first season in PL since 16 years, i decided to download some data from the internet and analyze it to try and have some quantitative info about how well we are progressing each week, how many points are usually enough to promise staying in the league, europa league spot, champions league spot etc...
The data i used was taken from https://www.football-data.co.uk/englandm.php .
Each file contains a match list of a specific season starting from 93/94 season up to last season (a csv file with the current season is updated as well!).
The stats presented in each file are different, with later seasons featuring more stats. Most if not all contain FT results and HT results, other stats documented are ref name, attendance, date, corners, free throws, shots, shots on target etc...
I analyze the data using python code, and i save every piece of data as a csv (excel), so if anyone is interested in seeing the code and/or have his hands on the files just reply in this thread or PM me, i'd love to share it.
I've started with a very basic analysis here, however, i hope to keep work on it and find some more interesting stats.
If anyone has an idea for something to check tell me about it and i'll add it to my to do list .
one last thing - statistics are not alway right, it is rather a mean to describe a standard. It does not claim to predict the future. If there is something that can be said on this particular season is that it is'nt standard at all! With no crowd at the stadiums, denser scheduale, prolonged transfer window, possible corona infection of individual players, staff members and their relatives it will be interesting to see how this season differs in numbers.
So without further ado i shall present my results:
* In order to finish in the 17th place, which promise another season in the first tier, in an ordinary season a club would have to secure in avarage 38 points, with a standard deviation of 2.47 points, where the maximum upper deviation is 6 points (i.e. 44 pts) and the maximum lower deviation is 4 point (i.e. 34 points). The median for this position is also 38 points.
- A mean of 8.96 +- 2.32 pts should be gained against teams that will be relegated (out of 18 possible)
- A mean of 6.84 +- 3.20 pts should be gained against teams that will finish in lower midtable (places 14-17) (out of 18 possible, since we finish 17th)
- A mean of 13.2 +- 3.24 pts should be gained against teams that will finish in upper midtable (places 8-13) (out of 36 possible)
- A mean of 5.28 +- 2.32 pts should be gained against teams that will be qualified to uefa league (places 5-7) (out of 18 possible)
- A mean of 3.04 +- 2.20 pts should be gained against teams that will be qualified to the champions league (places 2-4) (out of 18 possible)
-** A mean of 0.8 +- 1.26 pts should be gained against team that will win the league (1st place) (out of 6 possible)
* In order to finsih in a top half position (10th or higher) we will need to secure an avarage of 49 pts at least, with a standard deviation of 2.66 pts, where both max upper & lower deviation being 5 pts (i.e. 44 & 54 respectively). The median for 10th place is 50 pts.
- A mean of at least 11.24 +- 2.87 pts should be gained against teams that will be relegated (out of 18 possible)
- A mean of 13.4 +- 3.12 pts should be gained against teams that will finish in lower midtable (places 14-17) (out of 24 possible)
- A mean of 13.76 +- 3.33 pts should be gained against teams that will finish in upper midtable (places 8-13) (out of 30 possible, since we finish 10th)
- A mean of 6.08 +- 2.15 pts should be gained against teams that will be qualified to uefa league (places 5-7) (out of 18 possible)
- A mean of 3.96 +- 2.57 pts should be gained against teams that will be qualified to the champions league (places 2-4) (out of 18 possible)
-** A mean of 0.8 +- 0.94 pts should be gained against team that will win the league (1st place) (out of 6 possible)
* In order to sneak to a european qualification spot (assuming 1 cup will be won by a top 5 club, hence 6th place will lead to europe) we will need to secure an avarage of 61 pts at least, with a higher standard deviation of 3.75 pts, where both max upper & lower deviations being 8 pts (i.e. 53 & 69 respectively). The median for 6th is 61 pts.
- A mean of 12.24 +- 3.13 pts should be gained against teams that will be relegated (out of 18 possible)
- A mean of 14.4 +- 3.26 pts should be gained against teams that will finish in lower midtable (places 14-17) (out of 24 possible)
- A mean of 20.76 +- 2.87 pts should be gained against teams that will finish in upper midtable (places 8-13) (out of 36 possible)
- A mean of 5.68 +- 2.54 pts should be gained against teams that will be qualified to uefa league (places 5-7) (out of 12 possible, since we finish 6th)
- A mean of 6.68 +- 2.59 pts should be gained against teams that will be qualified to the champions league (places 2-4) (out of 18 possible)
-** A mean of 1.44 +- 1.36 pts should be gained against team that will win the league (1st place) (out of 6 possible)
* For the cheeky ones out there - to win the league we will need a high score avarage of 87 pts, with a high standard deviation of 6.5 pts, where the max upper deviation is 13 pts (man city with a 100 pts) and the max lower deviation is 12 pts (scum with 75 pts). With the median being the same as the avarage - 87pts.
- A mean of 16.28 +- 1.40 pts should be gained against teams that will be relegated (out of 18 possible)
- A mean of 19.32 +- 3.32 pts should be gained against teams that will finish in lower midtable (places 14-17) (out of 24 possible)
- A mean of 27.84 +- 3.55 pts should be gained against teams that will finish in upper midtable (places 8-13) (out of 36 possible)
- A mean of 12.76 +- 3.22 pts should be gained against teams that will be qualified to uefa league (places 5-7) (out of 18 possible)
- A mean of 11.2 +- 2.97 pts should be gained against teams that will be qualified to the champions league (places 2-4) (out of 18 possible)
-** No need to take any points as we are the champions!
** the last stat (pts against champions) seem to give nonsenseical values with standard deviation which is greater than the value, however this is because, as i mentioned before, statistics are not doing well with little data. when it comes down to 1/2 games per season there are too many variables to take into account (injuries, momentum, if the champions already promised their championship etc...), and so statistic usually are not giving too much. i shall consider adding the champions one group down to CL qualified to make it places 1-4. will be happy to hear your opinion.
enjoy and stay tuned.
edit:
i realized some might not be familiar with the exact meaning of the stsatistical terms, so i'll explain briefly:
mean value - equal to the avarage in all my analysis - the mean doesnt have to be a value within the population or even a possible result.
standard deviation - traditionaly defined as the set of values which contained within the 68% most probable values (34.1% from each direction of the mean value). In practice what it means is that if we try to guess, say, how many points the 17th place gonna have this (or any other) season, the most probable value would be the mean (38) +- the standard deviation (2.47) with a confidence level of 68% - meaning 68% of the seasons the values would be within 38+-2.47 pts.
median - the middle number in a set of values. say we have the set {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} the median would be (5+6)/2=11/2=5.5. the median also doesnt has to be a value in the population or even a possible value (say it can be decimal e.g 37.6).
maximum upper/lower deviation - the maximum deviation from the mean value to either side of the mean - upper -> higher value than mean, lower -> lower value than mean.
2nd edit:
for a lot of the stats i present i dropped the 93/94 & 94/95 seasons as they feature 22 clubs, so things like absolut number of points are due to deviate greatly from a 20 club season.