One playoff game down, three to go. So far, so good for my Vegas Golden Knights! Not only did we win the first game, but it was also a shutout! No score for the Kings!

(And no doughnuts for me – I’ve been focused on cutting out carbs and sugar this year…. sad trombone.)

Which got me to thinking about comparing my team with the opposition at the player level. I’m still focused on distribution analysis as I was in my previous post, but now I want to evaluate the numerical variables for specific combinations of teams. (You can start at the beginning of the series here.)

In a more practical business scenario, this comparison of distributions could instead be product groups or regions  or whatever categorical variable exists in your data. I’m going to focus on comparing two categories (teams), but the examples that I describe in this post can take as many comparisons as you choose to include.

I chose a few different visualizations to highlight in this example in which I compare the player statistics for the players of the Golden Knights and the Kings.

Category Selection

I’m using a slicer to pick the two teams for my analysis. By default, the slicer allows only a single selection. I can change this after adding the slicer visual to the page by opening the format panel and adjusting the toggle switch to disable Single Select.

Take 1

The custom histogram does not allow me to visualize multiple categories, so I’m going to create an R script that uses the ggplot2 library to display a selected numeric variable by team. I have added a filter to focus on this past season. I add the R visual to my report and add Team Name (from teams), fullName (from players), and goals (from statistics)

I update the script like this:

if (!("ggplot2" %in% rownames(installed.packages()))) {
install.packages("ggplot2")
}

library(ggplot2)

df<-dataset
names(df) <- gsub(" ", "_", names(dataset))
numeric_col <- sapply(df, is.numeric)
xLabel <- names(df)[which(numeric_col)]

ggplot(df, aes(df[,numeric_col], fill = Team_Name)) +
  geom_histogram(alpha = 0.5, 
     position = 'identity', 
     binwidth=1) +
  labs(x=xLabel)

And I’ll start with goals by player.

Well, yuck. I’m not crazy about this histogram. It’s really hard to see the comparisons when there’s overlap between the two teams.

Take 2

I can change it up by using a density plot instead of a histogram, changing up the last few lines of my R script slightly :

ggplot(df, aes(x=df[,numeric_col],
color=Team_Name)) +
geom_density(alpha=0.5) +
labs(x=xLabel)

Now I get this result:

Conclusion

Both visuals give me similar information, but it’s easier to see the shape of the data in the density plot. Specifically, both teams have a lot of players with about 5 goals for the season, some less, and a smaller number of players with more goals. I can also see that the Golden Knights have more players with a high number of goals for the season than the Kings. (Go Knights!)

I’ve got more to say about other ways to visualize data distributions, but… there’s another playoff game tonight and I gotta go! Until next time…