Skip to main content

Analysis

Since we know how to display different statistical results using different UI elements, understating how those numbers are calculated will give a full picture of all the data transformations involved from the start for data gathering till the end of the data display.

Daily numbers

As mentioned before, dashboard contains summaries of data slices represented by the map (total cases, active cases, recovered and total test). It also shows the difference between these numbers and the same calculations for the previous day. This gives the ability to see not only latest result, but also the trend of observations.

Reminder: the final dataset represents a hash table of hash tables of daily observations per suburb - merge strategy

Here is how we calculate daily numbers for the active cases:

const { cases, dates } = useContext(DataContext);

const { totalActive, progressActive } = useMemo(() => {
const lastDateData = [
...cases.get(dates[dates.length - 1]).values()
]
.reduce((acc, value) => acc + (value.Active ? value.Active : 0), 0);

const secondLastDateData = [
...cases.get(dates[dates.length - 2]).values()
]
.reduce((acc, value) => acc + (value.Active ? value.Active : 0), 0);

const progress = lastDateData >= secondLastDateData ? 1 : -1;
const diff = Math.abs(lastDateData - secondLastDateData);
const percent = (progress * diff * 100) / secondLastDateData;

return {
totalActive: lastDateData, progressActive: percent
};
}, [cases, dates]);

We calculate the summary for the last day of observations first (reduce - lastDateData). Next is a second last day of observations: secondLastDateData. And the final value (progressActive) is a result for the Relative change formula of the last and the second last day of observations.

                            Actual change      x1 - x2
Relative change (x1, x2) = --------------- = -----------
x2 x2

The same idea applies to the rest of the dataset slices represented by the dashboard.

Cumulative totals

For the cumulative totals we need two summaries running: one for the total number of cases by a particular day, another one for the absolute change between any two dates expect the fist day of the observations.

Here is how we calculate total cases per day:

const calcTotal = (cases, date) => {
if (!cases.has(date)) {
return 0;
}

return [...cases.get(date)
.values()]
.reduce((acc, curr) => acc + curr.Cases, 0);
};

Difference between value of cases for a particular day and the previous day is calculated as a difference of case above zero

Absolute daily change (x1, x2) = max(0, (x1 - x2))

Here is how we calculate totals per suburb as well as the difference per day:

/*
cases - original dataset (Map of maps)
date - selected date
prevDate - previous date
suburb - list of suburbs to calculate statistics for
result - calculation result - a Map of suburb key and calculated result value
*/
const calcSuburbsTotal = (cases, date, prevDate, suburbs, result) => {
suburbs.forEach((suburb) => {
const entry = result.has(suburb.postCode) ? result.get(suburb.postCode) : [];
const value = cases.has(date) && cases.get(date).has(suburb.postCode)
? cases.get(date).get(suburb.postCode).Cases : 0;

const diff = !prevDate || !cases.has(date) || !cases.get(prevDate).has(suburb.postCode)
? value
: value - cases.get(prevDate).get(suburb.postCode).Cases;
entry.push({ value, diff: Math.max(0, diff) });
result.set(suburb.postCode, entry);
});
};

And here we calculate final dataset for the UI page:

const { cases, dates } = useContext(DataContext);

const data = useMemo(() => {
return dates.reduce((acc, curr) => {
const {
prev, daily, cumulative, prevDate
} = acc;
const total = calcTotal(cases, curr);

cumulative.push(total);

if (!prev) {
daily.push(total);
} else {
const diff = total - prev;
daily.push(Math.max(0, diff));
}

acc.prev = total;

if (!cases.has(curr)) {
return acc;
}

calcSuburbsTotal(cases, curr, prevDate, selectedSuburbs, acc.suburbs);
acc.prevDate = curr;
return acc;
}, {
prev: null, daily: [], cumulative: [], prevDate: null, suburbs: new Map()
});
}, [cases, dates, selectedSuburbs]);

Distribution

Our data analysis wouldn't be complete without checking how different slices of the data are distributed across their own values. (Probability distribution). As we discussed in a previous chapter, histogram is used for the approximated visualization. For the calculations we are using compute-histogram package. In its basic form, it just needs an array of values to return the result as an arrays of pairs, where first item in a pair is a bin index and second item is a number of observations in the bin:

/**
* Calculates the values required to draw a histogram based on the input array and the number of bins.
* Tails can be removed by limiting the calculation to a specific percentile.
* The number of bins can be automatically calculated using a heuristic.
*
* @param arr
* @param numBins If numBins === 0, then max of the Sturges and Freedman–Diaconis' choice methods is used
* See: https://en.wikipedia.org/wiki/Histogram
* @param trimTailPercentage removes the right and left tails from the distribution
* @returns Array Two dimensional array. First dimension is the index of the bin, and the second index
* is the count. This allows for direct import into ChartJS without having to change the data shape
*/
function calculateHistogram(arr) {

Usage is relatively strait forward. We just need to make sure of not passing non existing values by filtering them out form the dataset:

const values = computeHistogram(
population
.filter((item) => !!item.POA_NAME16 && !isNaN(Number(item.Tot_p_p)))
.map((item) => item.Tot_p_p)
);

Correlation

With correlation we are measuring the level of independence between to variables by calculating their corelation coefficient - Correlation. simple-statistics package will do the heavy lifting for the numerical part of the calculations (coefficients themselves). It just needs to array of data (same length) and returns a calculated coefficient.

const value = useMemo(() => {
return sampleCorrelation(
dataSlice.map((item) => item.x),
dataSlice.map((item) => item.y)
)
.toFixed(5);
}, [dataSlice]);

Regression

simple-statistics package provides regression calculations as well - Linear regression. As a convenience, it can also calculate a regression line for us.

const lineData = useMemo(() => {
const regressionData = data.map((item) => [item.x, item.y]);
const line = linearRegressionLine(
linearRegression(regressionData)
);

return data.map((item) => ({
x: item.x, y: Number(line(item.x).toFixed(3))
}));
}, [data]);

References

Relative change

Probability distribution

compute-histogram

Correlation

Sample correlation

Linear regression