Sum of Squares

What Is the Sum of Squares?

Sum of squares is a statistical technique utilized in regression analysis to decide the dispersion of data points. In a regression analysis, the goal is to decide how well a data series can be fitted to a function that could assist with making sense of how the data series was produced. Sum of squares is utilized as a mathematical method for finding the function that best fits (shifts least) from the data.

The Formula for Sum of Squares Is

$\begin &\text X \text n \text\ &\text=\sum_^\left(X_i-\overline\right)$
Sum of squares is otherwise called variation.

What Does the Sum of Squares Tell You?

The sum of squares is a measure of deviation from the mean. In statistics, the mean is the average of a set of numbers and is the most regularly utilized measure of central tendency. The arithmetic mean is essentially calculated by summing up the values in the data set and partitioning by the number of values.

Suppose the closing prices of Microsoft (MSFT) in the last five days were 74.01, 74.77, 73.94, 73.61, and 73.40 in US dollars. The sum of the total prices is $369.73 and the mean or average price of the course book would in this manner be $369.73 / 5 = $73.95.

In any case, knowing the mean of a measurement set isn't sufficient all the time. In some cases, it is useful to know how much variation there is in a set of measurements. How far separated the individual values are from the mean might give some knowledge into how fit the perceptions or values are to the regression model that is made.

For instance, if an analyst wanted to know whether the share price of MSFT moves in tandem with the price of Apple (AAPL), they can drill down the set of perceptions for the course of the two stocks for a certain period, say 1, 2, or 10 years and make a linear model with every one of the perceptions or measurements recorded. On the off chance that the relationship between the two factors (i.e., the price of AAPL and price of MSFT) is certainly not a straight line, then there are variations in the data set that should be examined.

In statistics vernacular, if the line in the linear model made doesn't go through every one of the measurements of value, then a portion of the variability that has been seen in the share prices is unexplained. The sum of squares is utilized to work out whether a linear relationship exists between two factors, and any unexplained variability is alluded to as the residual sum of squares.

The sum of squares is the sum of the square of variation, where variation is defined as the spread between every individual value and the mean. To decide the sum of squares, the distance between every data point and the line of best fit is squared and afterward summed up. The line of best fit will limit this value.

The most effective method to Calculate the Sum of Squares

Presently you can see the reason why the measurement is called the sum of squared deviations, or the sum of squares for short. Utilizing our MSFT model over, the sum of squares can be calculated as:

SS = (74.01 - 73.95)² + (74.77 - 73.95)² + (73.94 - 73.95)² + (73.61 - 73.95)² + (73.40 - 73.95)²
SS = (0.06) ² + (0.82)² + (- 0.01)² + (- 0.34)² + (- 0.55)²
SS = 1.0942

Adding the sum of the deviations alone without squaring will bring about a number equivalent to or close to zero since the negative deviations will impeccably offset the positive deviations. To get a more practical number, the sum of deviations must be squared. The sum of squares will constantly be a positive number on the grounds that the square of any number, whether positive or negative, is consistently positive.

Illustration of How to Use the Sum of Squares

In light of the consequences of the MSFT calculation, a high sum of squares demonstrates that the vast majority of the values are farther away from the mean, and subsequently, there is large variability in the data. A low sum of squares alludes to low variability in the set of perceptions.

In the model above, 1.0942 shows that the variability in the stock price of MSFT in the last five days is exceptionally low and investors hoping to invest in stocks portrayed by price stability and low volatility may opt for MSFT.

Limitations of Using the Sum of Squares

Pursuing an investment choice on what stock to purchase requires a lot a bigger number of perceptions than the ones listed here. An analyst might need to work with long stretches of data to be aware with a higher certainty how high or low the variability of an asset is. As additional data points are added to the set, the sum of squares increases as the values will be more spread out.

The most widely utilized measurements of variation are the standard deviation and variance. Be that as it may, to work out both of the two metrics, the sum of squares must initially be calculated. The variance is the average of the sum of squares (i.e., the sum of squares partitioned by the number of perceptions). The standard deviation is the square root of the variance.

There are two methods of regression analysis that utilization the sum of squares: the linear least squares method and the non-linear least squares method. The least squares method alludes to the way that the regression function limits the sum of the squares of the variance from the genuine data points. Along these lines, it is feasible to draw a function which statistically gives the best fit to the data. Note that a regression function can either be linear (a straight line) or non-linear (a bending line).

Highlights

The sum of squares measures the deviation of data points from the mean value.
A higher sum-of-squares result shows a large degree of variability inside the data set, while a lower result demonstrates that the data doesn't change extensively from the mean value.