Only on Eduzaurus

Pokemon Study: Statistical Investigation for Team Development

Download essay Need help with essay?
Need help with writing assignment?
writers online
to help you with essay
Download PDF

I am a competitive player in Pokemon, and I form my Pokemon teams based on a good balance of stats. For instance, I need a “special sweeper” which has high Special Attack and Speed to take out enemies quickly. However, “special sweepers” do not have high defenses, which calls for the need of “tanks” or “walls” with high Defense/Special Defense to make up for the sweeper’s shortcomings (Merrick). This balance in stats is integral for my team to be adaptable to any other team sets that I may battle. Therefore, I thought it useful to form a mathematical method of comparing Pokemon against one another in terms of base stats so that I can find quantitatively which Pokemon can be sweepers, walls, tanks, etc. How these Pokemon compare to other Pokemon stats-wise determines their niche and optimal roles for battling, which is very useful for team-building. The aim of this investigation is to investigate how to statistically balance Pokemon stats to develop an adaptable team, as well as how the Pokemon company can tweak available competitive Pokemon to create the most balanced distribution (an idealized normal distribution) of base stats. In order to standardize and compare the relative powers of different Pokemon for the purpose of this mathematical investigation, only Pokemon Base Stats will be considered.

Pokemon is a video game that allows the player to contain fictitious creatures with storage systems called “Pokeballs.” People who own these Pokemon are able to train, trade, and battle them with other players. How these Pokemon fare in battle depend primarily upon their numerical “stats” which consists of Health (HP), Attack, Defense, Special Attack, Special Defense, and Speed. A Pokemon can have a maximum of four moves, which may consist of physical moves (contact moves, such as Tackle, whose damage potential depends on the user’s Attack stat), special moves (non-contact moves, such as Solar Beam, whose damage potential depends on the user’s Special attack stat), and/or other moves (which may inflict status conditions such as paralysis, increase/decrease the user’s or the target’s stats, or perform various other functions). Pokemon also have Defense stats (to guard themselves from physical moves) and Special Defense stats (to guard themselves from special moves). The Pokemon with the highest speed stat in a battle gets to make a move first in a given turn. When a Pokemon increases in Level (ranges from Level 1 to Level 100), its stats increase. However, it has a standard stat combination when it is at Level 50 known as Base Stats (Merrick).

Essay due? We'll write it for you!

Any subject

Min. 3-hour delivery

Pay if satisfied

Get your price

Derivation for the Equation of a Normal Distribution (Wilson)

For data that is normally distributed, the chance that a certain value occurs decreases as that value deviates farther away from the mean of the data. This can be expressed quantitatively as the rate at which frequencies (f) decrease with respect to values (x) is proportional to the difference between the value and the mean of the data set. However, this parabolic function would eventually result in a distribution with negative frequencies. A realistic distribution would have the frequencies asymptotically approach 0 as the values deviate farther away from the mean. Hence, the rate at which frequencies (f) decrease with respect to values (x) is also proportional to the data frequencies so that as frequencies approach 0, the distribution’s derivative also approaches 0. Hence, the normal distribution can be expressed by the following differential equation:

d(f)/d(x) = -k ( x – μ ) f(x) where μ is the mean of a data set and k is a constant.

To find the equation for the curve, the variables need to be separated and both sides need to be integrated.

d(f)/f = -k ( x – μ ) d(x), where k is a constant

∫d(f)/f = -k ∫( x – μ ) d(x)

ln(f) = [-k ( x – μ )²] / 2 + ln(C), where C is a constant. Taking e to the power of both sides of the equation, the following function is obtained:

f(x) = Ce[-k ( x – μ )²] / 2

Now, the C and the k must be evaluated so that the equation is only stated in terms of the x-value, mean, and standard deviation of the distribution. First, the fact that the total area under any probability density function is 1 can be applied to this equation:

1 = C ∫_(-∞)^∞ e[-k ( x – μ )²] / 2dx

To integrate this equation, a u-substitution must be used.

Let u = √(k/2)( x – μ ). So, u² = (k/2) ( x – μ )² and du = √(k/2)dx

dx = √(2/k)du

Using this for the integral above, the following is obtained:

1 = C√(2/k) ∫_(-∞)^∞ e-u² du

In order to integrate this equation, it must be squared and transformed into polar coordinates.

1 = 2C²/k ( ∫_(-∞)^∞e-x² dx )²

This creates a double integral. The definition of double integrals is the volume of a function, as a single integral is simply the area enclosed by a function (Dawkins). The “base” of this function is occupied by the same amount of length on both its width and length (when both integrals are written out, they have the same limits of integration, which deems the base under the function as square-shaped). Hence, x and y can be used interchangeably on the function above, as they each describe the equal dimensions of the base of the volume occupied by the function, written out as:

1 = 2C²/k ∫_(-∞)^∞e-x² dx ∫_(-∞)^∞e-y² dy

1 = 2C²/k ∫_(-∞)^∞ ∫_(-∞)^∞e-[x² + y²] dx dy

Now, the integral can be converted into polar coordinates because the region enclosed is a disk-like shape, which would be much easier to describe in terms of polar coordinates than with cartesian coordinates. A U-Substitution is then required to integrate the function. Based on research, it was found that:

, and

1 = 2C²/k ∫_0^2π ∫_0^∞ re-r² dr dθ

u = -r²

du = -2rdr

rdr = -(½)du


1 = -C²/k ∫_0^2π ∫_0^∞e-u du dθ

1 = -C²/k ∫_0^2π-1 dθ

1 = C²/k ∫_0^2πdθ

1 = C²/k * 2𝛑

C= √(k/2π)

Altogether, the function for a normal distribution becomes,

f(x) = √(k/2π)e[-k ( x – μ )²] / 2

To find the value of k, the variance of the distribution needs to be plugged into the equation. The variance of a distribution is defined as the expected value of the square of how far each x value deviates from the mean. Hence, the following is the equation for the variance:

σ² = ∫_(-∞)^∞ ( x – μ )² f(x) dx

σ² = ∫_(-∞)^∞ ( x – μ )² √(k/2π)e[-k ( x – μ )²] / 2 dx

A U-substitution will be used.

u = x – μ

du = dx

σ² =√(k/2π) ∫_(-∞)^∞ u² e-k u² / 2 du

Integration by parts is then needed to evaluate the integral. So as not to be confused with the “u” from the u-substitution, “w” will be used instead.

w = u (= x – μ)

dw = du (= 1)

v = -1/k (e-k u² / 2)

dv = ue-k u² / 2 du

σ² =√(k/2π) ∫_(-∞)^∞u² e-k u² / 2 du = wv – ∫_(-∞)^∞vdw

=√(k/2π) -u/k (e-k u² / 2) + 1/k ∫_(-∞)^∞ √(k/2π)e-k u² / 2 du

The “u” in the first term needs to be evaluated from negative infinity to infinity according to the limits of integration, which becomes 0. The integral of the second term is already the area under the probability density curve, which is 1 by definition. Hence,

σ² = 1/k, so k = 1/σ²

Plugging the value of k back into the equation of the normal distribution function, it becomes,

f(x) =√(1/σ²2π)e[-1 ( x – μ )² / 2σ²]

= (1/σ√2π) e(-½)[( x – μ ) / σ]²

This equation is the idealized probability density function used to plot a normal distribution based on its standard deviation and mean with respect to each individual “x-value.” Each x-value will represent the individual base stats of the Pokemon and f(x) will represent the individual probability for each base stat to occur.

Figure 1: An Idealized Normal Distribution Based on the Derived Equation

Analysis of Base Stats Distributions

In order for base stats among Pokemon to be truly “balanced” in competitive battling, the distributions of Pokemon possessing each base stat need to all be as close to normal distributions as possible. Then, these distributions are to be split into sections, each containing 20% of the data, in order to have equal amounts of Pokemon within each of the five tiers within the metagame.

The goal, in order to change each base stat distribution as close to the normal distribution as possible, is to use the residual errors from the actual distribution to the distribution when the statistical parameters are plugged into the normal distribution equation. First, each set of base stat data will be transformed into histograms. The histograms will have a certain bin size to represent the frequency of data falling into each range set of base stats (for the purpose of this analysis, set the bin size at 10 among all types of base stats). The middle base stat value in each bin will be the x-values, and the frequencies of each range of base stats will be the f(x) values. Then, a predetermined mean and standard deviation will be used for the equation.

These can be determined using regression analysis. The function for the normal distribution will be fitted into the data, and values for the parameters mean and standard deviation that minimize the squared residuals (squared vertical distance from each data point to the fit distribution). This can be done on Microsoft Excel.

Then, a chart of residuals will be made for each base stat range. A residual is defined as the difference between the actual y-value and the expected value when the x-value is plugged into the equation. Hence, the residual is an accurate measurement of how many Pokemon should be removed from or added to a certain base stat range in order to perfect the normal distribution. A positive residual indicates that the same number of Pokemon should be removed from that base stat range, while a negative residual indicates that the same number of Pokemon should be added to that base stat range. Analyses will be shown for the distribution of Special Attack.

*Discrete data points (where continuous base stat ranges are transformed into average, middle base stat values) will be taken for the purpose of a facilitated statistical analysis.

Figure 3: Distribution of Special Attack Stats (Middle Base Stat Values)

*All data for the number of Pokemon belonging to each base stat range was taken from, an online Pokemon information portal.

On Microsoft Excel, the equation for the normal distribution was imputed into the column, “Expected Number of Pokemon with each Base Stat” with two parameters: mean and standard deviation, as well as the x-values defined as each value in the “Middle Base Stat Value” column. Next, two other cells were named mean and standard deviation respectively. A separate column was named “Residuals” and defined as “=Actual Number of Pokemon with Each Base Stat ‘minus’ Expected Number of Pokemon with each Base Stat”, and a separate cell was named “Sum of Squared Residuals” and defined as “=SUM(Residuals^2)”. Afterwards, the solver function on Microsoft Excel was used. The objective was set to minimize the “Sum of Squared Residuals” cell by changing variable cells, “mean” and “standard deviation”. The mean calculated to be 65.6 and the standard deviation was calculated to be 26.4. These were used as the mean and standard deviation of the best-fit normal distribution equation for the data.

These parameters were then plugged back into the equation of a normal distribution.

f(x) = (1/26.4√2π) e(-½)[( x – 65.6 ) / 26.4]²

Now, the middle base stat values will be plugged back into this equation to obtain the expected number of Pokemon possessing certain Special Attack Base Stats based on a perfect, idealized normal distribution.

Figure 4: Expected Distribution of Middle Special Attack Base Stats (Based on Idealized Normal Distribution)

These are the expected values for all given base stats. Then, the expected values were subtracted from the actual number of pokemon having each base stat. By definition, these are the residuals, which determine how many Pokemon of certain base stats should be added (negative residuals) and removed (positive residuals) from the metagame for a balanced distribution of Special Attack. This data is shown below.

Figure 5: Residuals of Actual versus Expected Numbers of Pokemon in Each Base Stat

Figure 4: Best-Fit Distribution of Special Attack Stats Based on Regressed Formula

Now, the probability density distribution (where the total probability under the curve is 1) will be divided into 5 equal sections (where p(x)=0.2) to determine the base stat cutoffs between the competitive tiers. The goal is to find base stat values that, when the function is integrated from 0 to those values, the area under the curve will be 0.2, 0.4, 0.6, and 0.8 respectively. This can be expressed as,

0.2 = F(x) = ∫_(-∞)^A(1/26.4√2π) e(-½)[( x – 65.6 ) / 26.4]²dx

Where “A” is a base stat value. A U-Substitution can be used to solve this.

Let u = (x – 65.6) / 26.4. This means that (1/26.4√2π) ∫_(-∞)^Ae(-½) u²du. Now let w = (-½)u² and

dw = – du. So, – ∫_(-∞)^Aew dw = -ew evaluated at A and -∞. w = (-½)[(x – 65.6) / 26.4]², so the equation becomes 0.2 = -e(-½)[(x – 65.6) / 26.4]² . Solving for x, x = 43.381, which is the first cut-off from the lowest tier of stats to the second lowest tier of stats. This same equation was evaluated for x but with 0.4, 0.6, and 0.8, which were found to be 58.9, 72.3, and 87.8 respectively. These are the border cutoffs for a Pokemon’s Special Attack Stat that can be used to assess a Pokemon’s capabilities as a “Special Sweeper” on a scale of 1-5. A “5” rating would signify that the Pokemon’s Special Attack Stat is over 87.8, deeming it most capable as a “Special Sweeper.” If a Pokemon’s Special Attack Stat is between 58.9 and 72.3 for instance, it would have a Special Attack rating of “2,” signifying that it is a weaker special attacker. This process can be repeated on the other 5 Base Stats, and the player can determine a Pokemon’s niche in the competitive metagame based on which stats it has the highest ratings in. A Pokemon with a “5” rating in Defense and Special Defense but a rating of “2” or “1” in each of the other Base Stats, for example, would be known as a “Wall,” which can be used to block out opponent’s attacks and stall for time (perhaps to set up weather, entry hazards, or other field preparations) because it cannot easily be damaged. Overall, this analysis is an effective tool for comparing Pokemon strength-wise quantitatively to more accurately assign roles for them on my competitive team.

Conclusion and Evaluation

This investigation provides a useful team builder for players and a good guideline for the Pokemon company to follow in order to balance competitive play. First, a best-fit normal distribution that minimizes the squared residuals between it and the actual distribution of base stats must be fitted onto the distribution for each base stat. Then, the minimized residuals provide guidelines as to how the Pokemon company needs to modify the Pokemon available for each base stat in order to create perfectly balanced competitive play. That way, no one uses the same advantageous Pokemon all the time, which would make gameplay rather repetitive and uninteresting. Afterwards, the area under this best-fit normal distribution needs to be separated into 5 equal sections. This decides the strengths of a Pokemon. Suppose the Pokemon’s base stats are in the top percentile for Attack but are in a lower percentile on the Defense distribution. Therefore, this Pokemon would make a great physical sweeper to take advantage of its high Attack stat. Suppose another Pokemon was in the top percentile in the Special Defense distribution but at low percentiles in the Attack and Special Attack distribution. This Pokemon would make a good special defensive wall or cleric/healer which can withstand lots of attacks. A balanced 6-Pokemon team requires a coverage of the top percentiles of every base stat distribution.

However, this method is not the most extensive method of team-building and competition-modifying. There are several other factors that determine a Pokemon’s role on a team, such as type (electric, water, grass, ground, etc.), special abilities (for instance, Levitate, which grants immunity from ground-type moves), available movesets, and others. Furthermore, there are certain Pokemon that can be underused or overused within the Pokemon metagame, so predictability of Pokemon teams is another factor to consider in team-building. However, this method provides a very good guideline. Additionally, it may not be wise for the Pokemon company to remove any Pokemon that contribute to an unbalanced base stat distribution, as these Pokemon may be balanced in other stats as well. Ideally, when the Pokemon company makes a new generation of Pokemon, they need to assess how their new Pokemon change the base stat distribution between numerical base stats and the number of Pokemon that have each one.


This essay has been submitted by a student. This is not an example of the work written by our professional essay writers. You can order our professional work here.

We use cookies to offer you the best experience. By continuing to use this website, you consent to our Cookies policy.


Want to get a custom essay from scratch?

Do not miss your deadline waiting for inspiration!

Our writers will handle essay of any difficulty in no time.