<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Buckets & Bytes]]></title><description><![CDATA[In-depth visualization and scraping tutorials using sports data and R, with an emphasis on men's college basketball]]></description><link>https://www.bucketsandbytes.com</link><image><url>https://substackcdn.com/image/fetch/$s_!uND6!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b6be90d-3cf4-4da8-9d85-0043702df0dc_1280x1280.png</url><title>Buckets &amp; Bytes</title><link>https://www.bucketsandbytes.com</link></image><generator>Substack</generator><lastBuildDate>Sat, 04 Apr 2026 03:03:33 GMT</lastBuildDate><atom:link href="https://www.bucketsandbytes.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Andrew Weatherman]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[aweatherman@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[aweatherman@substack.com]]></itunes:email><itunes:name><![CDATA[Andrew Weatherman]]></itunes:name></itunes:owner><itunes:author><![CDATA[Andrew Weatherman]]></itunes:author><googleplay:owner><![CDATA[aweatherman@substack.com]]></googleplay:owner><googleplay:email><![CDATA[aweatherman@substack.com]]></googleplay:email><googleplay:author><![CDATA[Andrew Weatherman]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Premier League Table Bump Charts]]></title><description><![CDATA[Using bump charts to visualize the battle between Manchester City and Arsenal in the 2023-24 EPL campai]]></description><link>https://www.bucketsandbytes.com/p/premier-league-table-bump-charts</link><guid isPermaLink="false">https://www.bucketsandbytes.com/p/premier-league-table-bump-charts</guid><dc:creator><![CDATA[Andrew Weatherman]]></dc:creator><pubDate>Wed, 21 Aug 2024 22:04:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!liwP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb182129-277d-4340-bae1-fe3c5f3aaaf2_1848x1442.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This plot attempts to visualize the &#8220;standings flow&#8221; in the English Premier League during the 2023-24 campaign, with a focus on champions Manchester City and runners-up Arsenal. </p><p><strong>It is not yet accompanied by a tutorial.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!liwP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb182129-277d-4340-bae1-fe3c5f3aaaf2_1848x1442.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!liwP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb182129-277d-4340-bae1-fe3c5f3aaaf2_1848x1442.png 424w, https://substackcdn.com/image/fetch/$s_!liwP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb182129-277d-4340-bae1-fe3c5f3aaaf2_1848x1442.png 848w, https://substackcdn.com/image/fetch/$s_!liwP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb182129-277d-4340-bae1-fe3c5f3aaaf2_1848x1442.png 1272w, https://substackcdn.com/image/fetch/$s_!liwP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb182129-277d-4340-bae1-fe3c5f3aaaf2_1848x1442.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!liwP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb182129-277d-4340-bae1-fe3c5f3aaaf2_1848x1442.png" width="1456" height="1136" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cb182129-277d-4340-bae1-fe3c5f3aaaf2_1848x1442.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1136,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!liwP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb182129-277d-4340-bae1-fe3c5f3aaaf2_1848x1442.png 424w, https://substackcdn.com/image/fetch/$s_!liwP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb182129-277d-4340-bae1-fe3c5f3aaaf2_1848x1442.png 848w, https://substackcdn.com/image/fetch/$s_!liwP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb182129-277d-4340-bae1-fe3c5f3aaaf2_1848x1442.png 1272w, https://substackcdn.com/image/fetch/$s_!liwP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb182129-277d-4340-bae1-fe3c5f3aaaf2_1848x1442.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h1>Source Code</h1><p><a href="https://github.com/andreweatherman/buckets-and-bytes/blob/main/visualizations/graphs/epl_bump_chart/script.R">The full source code can be found here.</a></p>]]></content:encoded></item><item><title><![CDATA[NBA Points by Blue Bloods]]></title><description><![CDATA[Stacked bar charts with ggplot]]></description><link>https://www.bucketsandbytes.com/p/nba-points-by-blue-bloods</link><guid isPermaLink="false">https://www.bucketsandbytes.com/p/nba-points-by-blue-bloods</guid><dc:creator><![CDATA[Andrew Weatherman]]></dc:creator><pubDate>Wed, 21 Aug 2024 22:04:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!6Wo2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d98cc83-aeab-4139-82e1-7679530ba6b2_1200x1800.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In early February 2024, Todd Whitehead <a href="https://twitter.com/CrumpledJumper/status/1753900914824659035?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1753900914824659035%7Ctwgr%5E3326a08de22be0fd6952f9fb4e6d66a8f66295c7%7Ctwcon%5Es1_&amp;ref_url=https%3A%2F%2Fviz.aweatherman.com%2Fviz%2Fmost-nba-points%2Fmost-nba-points.html">tweeted a visualization</a> that illustrated points scored in the NBA by former Duke and North Carolina players.</p><p>This code works to recreate that visualization using ggplot2 and data from Sports Reference for Duke, North Carolina, Kentucky, Kansas, and UCLA players (the primary &#8220;Blue Bloods&#8221; of college basketball).</p><p><strong>This is not yet accompanied by a tutorial.</strong></p><h1>Source Code</h1><p><a href="https://github.com/andreweatherman/buckets-and-bytes/blob/main/visualizations/graphs/nba_points_stacked_bar/script.R">Full code can be found here.</a></p><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6Wo2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d98cc83-aeab-4139-82e1-7679530ba6b2_1200x1800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6Wo2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d98cc83-aeab-4139-82e1-7679530ba6b2_1200x1800.png 424w, https://substackcdn.com/image/fetch/$s_!6Wo2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d98cc83-aeab-4139-82e1-7679530ba6b2_1200x1800.png 848w, https://substackcdn.com/image/fetch/$s_!6Wo2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d98cc83-aeab-4139-82e1-7679530ba6b2_1200x1800.png 1272w, https://substackcdn.com/image/fetch/$s_!6Wo2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d98cc83-aeab-4139-82e1-7679530ba6b2_1200x1800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6Wo2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d98cc83-aeab-4139-82e1-7679530ba6b2_1200x1800.png" width="1200" height="1800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8d98cc83-aeab-4139-82e1-7679530ba6b2_1200x1800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1800,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6Wo2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d98cc83-aeab-4139-82e1-7679530ba6b2_1200x1800.png 424w, https://substackcdn.com/image/fetch/$s_!6Wo2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d98cc83-aeab-4139-82e1-7679530ba6b2_1200x1800.png 848w, https://substackcdn.com/image/fetch/$s_!6Wo2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d98cc83-aeab-4139-82e1-7679530ba6b2_1200x1800.png 1272w, https://substackcdn.com/image/fetch/$s_!6Wo2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d98cc83-aeab-4139-82e1-7679530ba6b2_1200x1800.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div>]]></content:encoded></item><item><title><![CDATA[Lots of columns? Use this gt trick!]]></title><description><![CDATA[Creating 538-style captions with intuitive multi-column tables]]></description><link>https://www.bucketsandbytes.com/p/lots-of-columns-use-this-gt-trick</link><guid isPermaLink="false">https://www.bucketsandbytes.com/p/lots-of-columns-use-this-gt-trick</guid><dc:creator><![CDATA[Andrew Weatherman]]></dc:creator><pubDate>Wed, 21 Aug 2024 22:03:54 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!FADn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0939400f-8b5c-4be1-a95e-c9432479d0c0_1312x368.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1><strong>The What</strong></h1><p>Dealing with multiple tables in <code>gt</code> is a pain, whether it be with gt_two_column_layout or some messy htmltools hacking. A clever way to address this is to transform your data into a <em>wide</em> format before you pass it to <code>gt</code>. This tutorial covers the latter <em>and</em> teaches you how to add what I&#8217;m calling a &#8220;538-style&#8221; caption (two captions separated by a black line at the bottom of the table).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FADn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0939400f-8b5c-4be1-a95e-c9432479d0c0_1312x368.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FADn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0939400f-8b5c-4be1-a95e-c9432479d0c0_1312x368.png 424w, https://substackcdn.com/image/fetch/$s_!FADn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0939400f-8b5c-4be1-a95e-c9432479d0c0_1312x368.png 848w, https://substackcdn.com/image/fetch/$s_!FADn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0939400f-8b5c-4be1-a95e-c9432479d0c0_1312x368.png 1272w, https://substackcdn.com/image/fetch/$s_!FADn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0939400f-8b5c-4be1-a95e-c9432479d0c0_1312x368.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FADn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0939400f-8b5c-4be1-a95e-c9432479d0c0_1312x368.png" width="1312" height="368" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0939400f-8b5c-4be1-a95e-c9432479d0c0_1312x368.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:368,&quot;width&quot;:1312,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FADn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0939400f-8b5c-4be1-a95e-c9432479d0c0_1312x368.png 424w, https://substackcdn.com/image/fetch/$s_!FADn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0939400f-8b5c-4be1-a95e-c9432479d0c0_1312x368.png 848w, https://substackcdn.com/image/fetch/$s_!FADn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0939400f-8b5c-4be1-a95e-c9432479d0c0_1312x368.png 1272w, https://substackcdn.com/image/fetch/$s_!FADn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0939400f-8b5c-4be1-a95e-c9432479d0c0_1312x368.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Using these ideas, we will be creating the table below &#8211; average team performance vs.&nbsp;top 100 opponents over the past five seasons in men&#8217;s college basketball.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sh7X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6052eb4-aa40-42ab-81a1-dcad66ceb2fd_4860x5015.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sh7X!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6052eb4-aa40-42ab-81a1-dcad66ceb2fd_4860x5015.png 424w, https://substackcdn.com/image/fetch/$s_!sh7X!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6052eb4-aa40-42ab-81a1-dcad66ceb2fd_4860x5015.png 848w, https://substackcdn.com/image/fetch/$s_!sh7X!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6052eb4-aa40-42ab-81a1-dcad66ceb2fd_4860x5015.png 1272w, https://substackcdn.com/image/fetch/$s_!sh7X!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6052eb4-aa40-42ab-81a1-dcad66ceb2fd_4860x5015.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sh7X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6052eb4-aa40-42ab-81a1-dcad66ceb2fd_4860x5015.png" width="1456" height="1502" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c6052eb4-aa40-42ab-81a1-dcad66ceb2fd_4860x5015.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1502,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sh7X!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6052eb4-aa40-42ab-81a1-dcad66ceb2fd_4860x5015.png 424w, https://substackcdn.com/image/fetch/$s_!sh7X!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6052eb4-aa40-42ab-81a1-dcad66ceb2fd_4860x5015.png 848w, https://substackcdn.com/image/fetch/$s_!sh7X!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6052eb4-aa40-42ab-81a1-dcad66ceb2fd_4860x5015.png 1272w, https://substackcdn.com/image/fetch/$s_!sh7X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6052eb4-aa40-42ab-81a1-dcad66ceb2fd_4860x5015.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h1><strong>The How</strong></h1><p>For this table, we will need:</p><pre><code>library(tidyverse)
library(cbbdata)
library(cbbplotR)
library(gt)
library(gtExtras)
library(glue)
library(magick)</code></pre><h2><strong>The Data</strong></h2><h3><strong>Grabbing the data</strong></h3><p>For this visualization, we will be pulling data from Barttorvik using the cbbdata package.</p><pre><code>data &lt;- cbd_torvik_game_stats() %&gt;% 
  filter(year &gt;= 2020) %&gt;% 
  left_join(cbd_torvik_ratings_archive() %&gt;% select(opp = team, date, barthag, rank)) %&gt;%
  filter(rank &lt;= 100) %&gt;% 
  summarize(mean_score = mean(game_score), 
            games = n(),
            .by = team) %&gt;% 
  filter(games &gt;= 25) %&gt;% 
  slice_max(mean_score, n = 100) %&gt;% 
  arrange(desc(mean_score)) %&gt;% 
  left_join(cbd_teams() %&gt;% select(team = common_team, logo)) </code></pre><p>Let&#8217;s break this down, we&#8217;re:</p><ol><li><p>Pulling game-level data for every contest over the past five seasons</p></li><li><p>Appending the <em>opponent&#8217;s</em> Barttorvik rank entering the game</p></li><li><p>Filtering to include games vs.&nbsp;opponents inside the top 100</p></li><li><p>Averaging each team&#8217;s game score (you can think of this as game-level performance) and summing the number of contests played</p></li><li><p>Only considering teams who have played at least 25 games</p></li><li><p>Arranging by highest average score</p></li><li><p>Joining on team logos</p></li></ol><h3><strong>Creating an HTML column</strong></h3><p>Next, we&#8217;re going to create two important columns: <code>group</code> and <code>formatted_score</code>.</p><p><em><strong>group</strong></em> is an indicator column on which we will split our data. For our table, we want to show the top 100 performing teams &#8211; and we&#8217;re going to do so with five 20-team columns. A simple way to indicate which team will be in which column is to create a new variable where teams 1-20 will be 1, teams 21-40 will be 2, and so on (any differentiation works). We can do this with rep(1:5, each = 20). The rep function will replicate the indicated set of values, in this case the sequence of [1, 2, 3, 4, 5], and the each parameter repeats each previous element 20 times. You can think of using the general form, rep(1:number_of_columns, each = total_rows_in_each_column).</p><p><em><strong>formatted_score</strong></em> is an HTML column which will be rendered in using gt. Per usual, I&#8217;m not going to dive too deep into inline CSS or HTML. The basic functionality of this column is to show the team&#8217;s ranking, logo, average game score, and number of games played in a single cell. Our inline CSS forces the average game score to be bold and larger than the games text. The mean_score change simply rounds the average performance and formats it to show trailing zeros.</p><pre><code>data &lt;- data %&gt;%
  mutate(rank = row_number(),
         group = rep(1:5, each = 20),
         mean_score = round(mean_score, 1),
         mean_score = ifelse(nchar(mean_score) == 2, glue("{mean_score}.0"), mean_score),
         formatted_score = glue(
           "&lt;div style='display: flex; align-items: center;'&gt;
      {rank}. &amp;nbsp; 
      &lt;img src='{logo}' alt='Logo' style='height:25px; margin-right: 8px;'&gt; 
      &lt;span style='font-weight: bold; font-size: 1.2em; color: black; vertical-align: middle;'&gt;{mean_score}&lt;/span&gt; 
      &lt;span style='font-size: 0.8em; color: gray; vertical-align: middle;'&gt;&amp;nbsp; ({games})&lt;/span&gt;
    &lt;/div&gt;"
         ))</code></pre><h3><strong>Splitting our data</strong></h3><p>Now that we have our <em>group</em> and <em>formatted_score</em> column, we can pivot our data wider by splitting on our group number and binding the resulting lists together. The group_split function returns a list of tibbles that are separated by the associated group (i.e., the group number). The map_dfc function iterates over each list, selects columns containing &#8220;formatted_score,&#8221; and column-binds them into a single tibble &#8211; leaving us with a 20-row, 5-column data frame as intended.</p><pre><code>data &lt;- data %&gt;%
  group_split(group) %&gt;%
  map_dfc(~ select(.x, contains("formatted_score")))</code></pre><p>Onto the table!</p><div><hr></div><h1>The Table</h1><h3><strong>The base</strong></h3><p>The base of our table uses the gt_theme_pl function from my cbbplotR package. The fmt_markdown function renders our HTML columns, cols_label removes all column labels, and cols_align center-aligns all columns. If you run this as-is and slap on a table header and caption, you&#8217;re honestly off to a pretty good start as the formatted_score columns and the gt_theme_pl function does much of the heavy lifting.</p><pre><code>data %&gt;%
  gt(id = "table") %&gt;% 
  gt_theme_pl() %&gt;% 
  fmt_markdown(contains("formatted_score")) %&gt;% 
  cols_label(everything() ~ "") %&gt;% 
  cols_align(columns = everything(), "center") </code></pre><h3><strong>Adding dividers and adjusting fonts</strong></h3><p>To spice things up a bit, we will use gt_add_dividers to mimic row borders and the gt_set_font function from cbbplotR to choose a new table font. We can make our table a bit more compact by dropping the data_row.padding option to 1. By default, the table font in gt_theme_pl is a dark purple, and we can adjust this by setting our font color inside rows to black with tab_style and the cells_body location helper.</p><pre><code>... %&gt;%
  gt_add_divider(columns = -last_col(), color = "black", weight = px(1.5), include_labels = FALSE) %&gt;%  
  gt_set_font("Barlow") %&gt;% 
  tab_options(data_row.padding = 1) %&gt;% 
  tab_style(locations = cells_body(), cell_text(color = "black")) </code></pre><h3><strong>Annotations</strong></h3><p>You&#8217;re going to have to bear with me here. The way that I create the 538-style captions is by adding the top-level caption line, the one above the bottom border in the final table, as a footnote and later include some necessary CSS to make it work. Perhaps there is a way to do this by just using tab_source_note, and I bet there is, but I&#8217;m assuming that is a bit messier than my solution.</p><p>Importantly, if you do want this caption design, you&#8217;re going to have to sacrifice the ability to use table footnotes. You can easily write your footnote information in this top-level caption, but you won&#8217;t see the footnote marks.</p><pre><code>... %&gt;%
  tab_footnote(locations = cells_column_labels(), footnote = md("Opponent rank is determined by Barttorvik T-Rank on game date. Game score is game-level Barthag performance (the&lt;br&gt;probability that you beat an average team on a neutral floor). Total games played is next to mean score.")) %&gt;% 
  tab_header(
    title = "Performance against top 100 teams over the past five seasons",
    subtitle = "Average T-Rank game score vs. then-top 100 opponents from 2020-2024; min. total 25 games"
  ) %&gt;% 
  tab_source_note("Data by Barttorvik + cbbdata || Viz. + Analysis by @andreweatherman")</code></pre><h3><strong>Necessary CSS</strong></h3><p>As promised, this table will rely on some CSS. I&#8217;m not really going to touch on anything but the footnote additions.</p><p><strong>#table .gt_footnote</strong> adds a solid black line with width 1px below the table footnote, which we are treating as our top-level caption, and adjusts the font size. </p><p><strong>#table .gt_footnote_marks</strong> hides the footnote marks in the column headers as they are irrelevant in this design. </p><p><strong>#table .gt_sourcenote</strong> aligns our bottom-level caption, defined with tab_source_note, to the right of our table, mimicking the popular 538 table design.</p><pre><code>... %&gt;%
  opt_css(
    "
    #table .gt_column_spanner {
      border-bottom-style: none !important;
      display: none !important;
    }
    #table .gt_subtitle {
      line-height: 1.2;
      padding-top: 0px;
      padding-bottom: 0px;
    }
    #table .gt_footnote {
      border-bottom-style: solid;
      border-bottom-width: 1px;
      border-bottom-color: #000;
      font-size: 12px;
    }
    #table .gt_footnote_marks {
      display: none !important;
    }
    #table .gt_sourcenote {
      text-align: right;
    }
    #table .gt_row {
      border-top-color: black;
    }
    "
  )</code></pre><div><hr></div><h1>Source Code</h1><p><a href="https://github.com/andreweatherman/buckets-and-bytes/blob/main/visualizations/tables/gt_two_columns/script.R">A link to the full source code can be found here.</a></p>]]></content:encoded></item><item><title><![CDATA[Best Road Performance]]></title><description><![CDATA[Adjusted road performances in college basketball]]></description><link>https://www.bucketsandbytes.com/p/best-road-performance</link><guid isPermaLink="false">https://www.bucketsandbytes.com/p/best-road-performance</guid><dc:creator><![CDATA[Andrew Weatherman]]></dc:creator><pubDate>Wed, 21 Aug 2024 22:03:10 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!dVNM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3830ab9-9615-49e1-ab9b-962f4c2ff9dd_2690x2551.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A gt table that calculates the 10 best T-Rank efficiency ratings in true D-1 vs.&nbsp;D-1 road performances &#8211; also includes a composite season-long predictive average across all games and quadrant records in true road games. </p><p><strong>This visualization is not yet accompanied by a tutorial.</strong></p><h1>Source Code</h1><p><a href="https://github.com/andreweatherman/buckets-and-bytes/blob/main/visualizations/tables/road_performance/script.R">Full source code can be found here.</a></p><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dVNM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3830ab9-9615-49e1-ab9b-962f4c2ff9dd_2690x2551.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dVNM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3830ab9-9615-49e1-ab9b-962f4c2ff9dd_2690x2551.png 424w, https://substackcdn.com/image/fetch/$s_!dVNM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3830ab9-9615-49e1-ab9b-962f4c2ff9dd_2690x2551.png 848w, https://substackcdn.com/image/fetch/$s_!dVNM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3830ab9-9615-49e1-ab9b-962f4c2ff9dd_2690x2551.png 1272w, https://substackcdn.com/image/fetch/$s_!dVNM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3830ab9-9615-49e1-ab9b-962f4c2ff9dd_2690x2551.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dVNM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3830ab9-9615-49e1-ab9b-962f4c2ff9dd_2690x2551.png" width="1456" height="1381" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e3830ab9-9615-49e1-ab9b-962f4c2ff9dd_2690x2551.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1381,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dVNM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3830ab9-9615-49e1-ab9b-962f4c2ff9dd_2690x2551.png 424w, https://substackcdn.com/image/fetch/$s_!dVNM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3830ab9-9615-49e1-ab9b-962f4c2ff9dd_2690x2551.png 848w, https://substackcdn.com/image/fetch/$s_!dVNM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3830ab9-9615-49e1-ab9b-962f4c2ff9dd_2690x2551.png 1272w, https://substackcdn.com/image/fetch/$s_!dVNM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3830ab9-9615-49e1-ab9b-962f4c2ff9dd_2690x2551.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div>]]></content:encoded></item><item><title><![CDATA[Team Schedule Matrices]]></title><description><![CDATA[Creating schedule matrices using gt]]></description><link>https://www.bucketsandbytes.com/p/team-schedule-matrices</link><guid isPermaLink="false">https://www.bucketsandbytes.com/p/team-schedule-matrices</guid><dc:creator><![CDATA[Andrew Weatherman]]></dc:creator><pubDate>Wed, 21 Aug 2024 22:02:51 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-jEm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e9afe9d-5675-4a58-ab54-57b3a3a5482c_2878x3714.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In late April, <a href="https://twitter.com/cobrastats/status/1784266700147318817">@cobrastats</a> posted a great graphic on the 2024 Big Ten football schedule. He <a href="https://github.com/cobrastats/2024-CFB-Football-Calendar">open-sourced his code on GitHub</a>, and with his permission, I created a pull request to show my attempt at creating the same graphic. You can find that code, with brief explanation, on the repository linked above, but I&#8217;m going to use this space to expand a bit more.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-jEm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e9afe9d-5675-4a58-ab54-57b3a3a5482c_2878x3714.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-jEm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e9afe9d-5675-4a58-ab54-57b3a3a5482c_2878x3714.png 424w, https://substackcdn.com/image/fetch/$s_!-jEm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e9afe9d-5675-4a58-ab54-57b3a3a5482c_2878x3714.png 848w, https://substackcdn.com/image/fetch/$s_!-jEm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e9afe9d-5675-4a58-ab54-57b3a3a5482c_2878x3714.png 1272w, https://substackcdn.com/image/fetch/$s_!-jEm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e9afe9d-5675-4a58-ab54-57b3a3a5482c_2878x3714.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-jEm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e9afe9d-5675-4a58-ab54-57b3a3a5482c_2878x3714.png" width="1456" height="1879" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9e9afe9d-5675-4a58-ab54-57b3a3a5482c_2878x3714.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1879,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-jEm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e9afe9d-5675-4a58-ab54-57b3a3a5482c_2878x3714.png 424w, https://substackcdn.com/image/fetch/$s_!-jEm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e9afe9d-5675-4a58-ab54-57b3a3a5482c_2878x3714.png 848w, https://substackcdn.com/image/fetch/$s_!-jEm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e9afe9d-5675-4a58-ab54-57b3a3a5482c_2878x3714.png 1272w, https://substackcdn.com/image/fetch/$s_!-jEm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e9afe9d-5675-4a58-ab54-57b3a3a5482c_2878x3714.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><p>For this table, we will need:</p><pre><code><code>library(cfbfastR)
library(gt)
library(gtExtras)
library(glue)
library(tidyverse) 
library(cbbdata)
library(rlang)
library(nflreadr)</code></code></pre><h2><strong>The Data</strong></h2><p>Most of this is pretty straightforward. With <em>schedule</em>, we pull the 2024 college football schedule, filter for games that have occurred (NY6), select the relevant variables, and then convert the date column.</p><p>With <em>weeks</em>, we pull the week numbers associated with each range of dates in the 2024 season &#8211; since not all games in a given week are played on the same day.</p><p>Finally, we use the overlap join functionality in the dplyr::X_join family to match our game dates to week numbers between the <em>start_date</em> and <em>end_date</em> range in weeks.</p><blockquote><p>Note: We don&#8217;t actually <em>need</em> to grab weeks (or join our data) because we <em>could</em> just infer the week numbers from game dates (ascending order) &#8211; but I did so for clarity and practice.</p></blockquote><pre><code>schedule &lt;- espn_cfb_schedule(year = 2024, limit = 1000) %&gt;% 
   filter(type != "postseason") %&gt;% 
  select(home_team = home_team_location, away_team = away_team_location, date = game_date) %&gt;% 
  mutate(date = as.Date(date, format = "%Y-%m-%dT%H:%MZ"))

weeks &lt;- espn_cfb_calendar(year = 2024) %&gt;% 
  select(week, start_date, end_date) %&gt;% 
  mutate(across(-week, ~as.Date(.x, format = "%Y-%m-%dT%H:%MZ")))

schedule &lt;- left_join(
  schedule,
  weeks,
  join_by(between(date, start_date, end_date))
)</code></pre><p>Now, let&#8217;s create a vector of Big Ten teams and filter down.</p><pre><code>b1g &lt;- cfbd_team_info() %&gt;% filter(conference == "Big Ten") %&gt;% pull("school")

schedule &lt;- schedule %&gt;% filter(home_team %in% b1g | away_team %in% b1g)</code></pre><h3><strong>Pivoting data with help from </strong><code>nflreadr</code></h3><p>Right now, our data looks like this. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DvmM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e84e0f5-fb5d-4972-af37-49294e693757_1626x474.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DvmM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e84e0f5-fb5d-4972-af37-49294e693757_1626x474.png 424w, https://substackcdn.com/image/fetch/$s_!DvmM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e84e0f5-fb5d-4972-af37-49294e693757_1626x474.png 848w, https://substackcdn.com/image/fetch/$s_!DvmM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e84e0f5-fb5d-4972-af37-49294e693757_1626x474.png 1272w, https://substackcdn.com/image/fetch/$s_!DvmM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e84e0f5-fb5d-4972-af37-49294e693757_1626x474.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DvmM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e84e0f5-fb5d-4972-af37-49294e693757_1626x474.png" width="1456" height="424" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0e84e0f5-fb5d-4972-af37-49294e693757_1626x474.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:424,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:110533,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DvmM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e84e0f5-fb5d-4972-af37-49294e693757_1626x474.png 424w, https://substackcdn.com/image/fetch/$s_!DvmM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e84e0f5-fb5d-4972-af37-49294e693757_1626x474.png 848w, https://substackcdn.com/image/fetch/$s_!DvmM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e84e0f5-fb5d-4972-af37-49294e693757_1626x474.png 1272w, https://substackcdn.com/image/fetch/$s_!DvmM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e84e0f5-fb5d-4972-af37-49294e693757_1626x474.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>For plotting, however, we want 18 rows (number of Big Ten teams) and 15 columns (one for Big Ten teams and 14 others for week). There are a few ways to approach this problem, and the easiest is to simply &#8220;pivot&#8221; our data wider.</p><p>With pivoting, though, we&#8217;re going to run in a small problem: Our Big Ten team might be home or away. The two columns with team information are organized by location and not conference. We can fix this by first pivoting our data to a long format, using pivot_longer, and then filter the resulting value column for Big Ten teams.</p><p>For this case, pivot_longer works, but other times, you might have more statistics and want a more streamlined solution. I want to show off a nice utility function from {nflreadr} that will pivot data and convert your pipeline to something more standardized: nflreadr::clean_homeaway. It converts home_ and away_ prefixed columns to team_ and opponent_ while doubling the rows (one row per team, not per game).</p><blockquote><p>If you relabel home_team and away_team to home and away, then <em>schedule %&gt;% pivot_longer(home:away)</em> would accomplish the exact same thing. I wanted to take this opportunity to introduce this function.</p></blockquote><pre><code>schedule %&gt;% 
  select(home_team, away_team, week) %&gt;% 
  clean_homeaway()</code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cXyO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93651e79-3a17-4786-be51-c54f6c047695_1622x846.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cXyO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93651e79-3a17-4786-be51-c54f6c047695_1622x846.png 424w, https://substackcdn.com/image/fetch/$s_!cXyO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93651e79-3a17-4786-be51-c54f6c047695_1622x846.png 848w, https://substackcdn.com/image/fetch/$s_!cXyO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93651e79-3a17-4786-be51-c54f6c047695_1622x846.png 1272w, https://substackcdn.com/image/fetch/$s_!cXyO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93651e79-3a17-4786-be51-c54f6c047695_1622x846.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cXyO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93651e79-3a17-4786-be51-c54f6c047695_1622x846.png" width="1456" height="759" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/93651e79-3a17-4786-be51-c54f6c047695_1622x846.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:759,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:206104,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cXyO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93651e79-3a17-4786-be51-c54f6c047695_1622x846.png 424w, https://substackcdn.com/image/fetch/$s_!cXyO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93651e79-3a17-4786-be51-c54f6c047695_1622x846.png 848w, https://substackcdn.com/image/fetch/$s_!cXyO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93651e79-3a17-4786-be51-c54f6c047695_1622x846.png 1272w, https://substackcdn.com/image/fetch/$s_!cXyO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93651e79-3a17-4786-be51-c54f6c047695_1622x846.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>After which, we can filter down to Big Ten teams.</p><pre><code><code>plot_data &lt;- schedule %&gt;% 
  select(home_team, away_team, week) %&gt;% 
  nflreadr::clean_homeaway() %&gt;% 
  filter(team %in% b1g)</code></code></pre><h3><strong>Adding logos</strong></h3><p>Before we can pivot our data, we need to add team logos and find a way to preserve our location data during the pivot without creating new columns. A nifty idea is to transition our opponent column to an HTML string with an &lt;img&gt; tag that includes the link to the team logo and uses the alt caption to encode location data. In a static table, the alt tag serves no purpose, so we can use glue to toss in the game location.</p><p>First, let&#8217;s grab team logos. Importantly, these team names are coming from ESPN, so let&#8217;s select those values to ensure an effortless join. We are going to use pull and set_names to create a named vector with team names and logo links.</p><pre><code><code>logos &lt;- cbd_teams() %&gt;% select(team = espn_location, logo)
logos &lt;- logos %&gt;% pull(logo) %&gt;% set_names(logos$team)</code></code></pre><blockquote><p>Remember that named vectors work like so, <em>object_name[value]</em>, so to grab logo links for our opponent column, we can do: <strong>logos[opponent]</strong>.</p></blockquote><p>Our location data is stored inside of the alt tag shown below.</p><pre><code><code>plot_data &lt;- plot_data %&gt;% 
  mutate(opponent = glue("&lt;img src='{logos[opponent]}' alt={location} style='height:25px; vertical-align:middle;'&gt;"))</code></code></pre><p>Now that we have converted our opponent column, we can safely pivot our data wider by selecting team as the identifier column, grabbing our column names from the week variable, and setting our values as the opponent column.</p><p>Finally, we will arrange our data in alphabetical order by team and then convert our team column to the proper logo link using the named vector.</p><pre><code><code>plot_data &lt;- plot_data %&gt;% 
  pivot_wider(id_cols = team, names_from = week, values_from = opponent) %&gt;% 
  arrange(team) %&gt;% 
  mutate(team = logos[team])</code></code></pre><div><hr></div><h1><strong>Plotting</strong></h1><h3><strong>Conditional Highlighting</strong></h3><p>In our table, we are going to highlight on two conditions: a) game location (home games are blue; away games are white) and b) bye weeks (gray).</p><p>But the problem is that conditional highlighting in gt is a bit weird because <em>tab_style + cell_fill</em> does not really work as one might expect. Namely, row and column vectors aren&#8217;t treated as separate pairs. If you pass through, e.g.&nbsp;rows = c(1, 2) and columns = c(5, 6) inside tab_style, you&#8217;ll fill four cells, not two, because tab_style doesn&#8217;t treat things as unique pairs.</p><p>Turns out, you can just build the CSS string for highlighting cells outside of the table and apply it directly with opt_css. The basic idea is this: We take a matrix of row-column indices, a table ID, and a color &#8211; and then inject those into a basic CSS string that targets cells and colors their background.</p><pre><code><code>generate_css &lt;- function(indices, css_id, color) {
    map2_chr(
    .x = indices[, 1],
    .y = indices[, 2],
    .f = ~glue("#{css_id} tbody tr:nth-child({.x}) td:nth-child({.y}) {{ background-color: {color}; }}")
  )
}</code></code></pre><p>We are using which and str_detect to find the row and column indices where &#8220;home&#8221; is present inside the alt tag &#8211; the same with is.na for bye games &#8211; and then apply our generate_css function.</p><pre><code><code>home_css &lt;- arrayInd(which(str_detect(as.matrix(plot_data), 'alt=home')), .dim = dim(plot_data)) %&gt;% 
  generate_css('table', '#cce7f5')

bye_css &lt;- arrayInd(which(is.na(as.matrix(plot_data))), .dim = dim(plot_data)) %&gt;% 
  generate_css('table', '#d9d9d9')</code></code></pre><p>I&#8217;m going to add one more pieces of css, and to make things cleaner in the final plot code, I&#8217;m going to define that rule here. (This just decreases the spacing in my caption.)</p><pre><code><code>additional_css &lt;- "
  
  #table .gt_sourcenote {
    line-height: 1.3;
  }

"</code></code></pre><h3><strong>Header + Legend</strong></h3><p>I like how the original table combines the legend with the title, and I&#8217;m going to do the same. I&#8217;m going to make two changes: a) I&#8217;m not using the Big Ten logo and b) I&#8217;m going to stack the title and legend (and center it).</p><p>This is all done with custom HTML, which will further inherit certain styles from the table theme (font family, size, weight, etc.). We can tweak a bit of that with in-line CSS.</p><pre><code><code>html_content &lt;- '
&lt;div style="text-align: center;"&gt;
  &lt;h1 style="margin: 0; font-size: 20px;"&gt;Big Ten Football Schedule | 2024&lt;/h1&gt;
  &lt;div style="display: flex; justify-content: center; align-items: center; margin-top: 5px;"&gt;
    &lt;div style="border: 1.5px solid black; padding: 2px 10px; text-align: center; background-color: #cce7f5; font-size: 10px; margin-right: 5px;"&gt;Home&lt;/div&gt;
    &lt;div style="border: 1.5px solid black; padding: 2px 10px; text-align: center; font-size: 10px; margin-right: 5px;"&gt;Away&lt;/div&gt;
    &lt;div style="border: 1.5px solid black; padding: 2px 10px; text-align: center; background-color: #d9d9d9; font-size: 10px;"&gt;Bye&lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;
'</code></code></pre><h3><strong>Building the table</strong></h3><p>The body of our table is pretty straightforward. fmt_image renders in our logo link for the team column, while fmt_markdown does the same thing for weeks 1:14. sub_missing is a great utility function to replace text in NA columns. We can use gt_add_divider to create more pronounced divisions between rows and weeks (I prefer this look over the original).</p><blockquote><p>Since we built an HTML string for our opponent columns, to preserve the game location data, we need to use fmt_markdown, not fmt_image.</p></blockquote><pre><code>plot_data %&gt;% 
  gt(id = 'table') %&gt;% 
  gt_theme_538() %&gt;% 
  fmt_image(team, height = 25) %&gt;%
  fmt_markdown(-team) %&gt;% 
  # use sub_missing to replace na with empty text string
  sub_missing(-team, missing_text = '') %&gt;% 
  cols_align(columns = everything(), 'center') %&gt;% 
  cols_label(team = '') %&gt;% 
  # bold col. headers
  tab_style(locations = cells_column_labels(), style = cell_text(weight = 'bold')) %&gt;% 
  # add dividers
  gt_add_divider(columns = -team, sides = 'all', include_labels = FALSE, color = 'black', weight = px(1.5)) %&gt;% 
  tab_header(html(html_content)) %&gt;% 
  tab_source_note(md("Data by cfbfastR&lt;br&gt;Viz. by @andreweatherman (h/t to @cobrastats)")) %&gt;% 
  tab_options(data_row.padding = 1) %&gt;% 
  # apply above css
  opt_css(c(home_css, bye_css, additional_css)) %&gt;% 
  gtsave_extra("schedule.png", zoom = 5)</code></pre><div><hr></div><h1>Source Code</h1><p><a href="https://github.com/andreweatherman/buckets-and-bytes/blob/main/visualizations/tables/schedule_matrix/script.R">The full source code can be found here.</a></p>]]></content:encoded></item><item><title><![CDATA[A different approach to survey responses]]></title><description><![CDATA[Using colored boxes in gt to visualize survey responses]]></description><link>https://www.bucketsandbytes.com/p/a-different-approach-to-survey-responses</link><guid isPermaLink="false">https://www.bucketsandbytes.com/p/a-different-approach-to-survey-responses</guid><dc:creator><![CDATA[Andrew Weatherman]]></dc:creator><pubDate>Wed, 21 Aug 2024 22:02:08 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!zO1n!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4341691-e888-4a89-a1c7-c7280d2988f7_9272x8278.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>On April 19th, 2024, the New York Times published an article with a visualization that detailed from which &#8220;outlets&#8221; jurors in the &#8220;Trump Hush-Money&#8221; trial turn to for their news consumption. I thought it would be a fun exercise to recreate it using the <a href="https://gt.rstudio.com/">{gt} package</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zO1n!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4341691-e888-4a89-a1c7-c7280d2988f7_9272x8278.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zO1n!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4341691-e888-4a89-a1c7-c7280d2988f7_9272x8278.png 424w, https://substackcdn.com/image/fetch/$s_!zO1n!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4341691-e888-4a89-a1c7-c7280d2988f7_9272x8278.png 848w, https://substackcdn.com/image/fetch/$s_!zO1n!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4341691-e888-4a89-a1c7-c7280d2988f7_9272x8278.png 1272w, https://substackcdn.com/image/fetch/$s_!zO1n!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4341691-e888-4a89-a1c7-c7280d2988f7_9272x8278.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zO1n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4341691-e888-4a89-a1c7-c7280d2988f7_9272x8278.png" width="1456" height="1300" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c4341691-e888-4a89-a1c7-c7280d2988f7_9272x8278.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1300,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zO1n!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4341691-e888-4a89-a1c7-c7280d2988f7_9272x8278.png 424w, https://substackcdn.com/image/fetch/$s_!zO1n!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4341691-e888-4a89-a1c7-c7280d2988f7_9272x8278.png 848w, https://substackcdn.com/image/fetch/$s_!zO1n!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4341691-e888-4a89-a1c7-c7280d2988f7_9272x8278.png 1272w, https://substackcdn.com/image/fetch/$s_!zO1n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4341691-e888-4a89-a1c7-c7280d2988f7_9272x8278.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><p>For this table, we will need:</p><pre><code><code>library(tidyverse)
library(janitor)
library(data.table)
library(gt)
library(gtExtras)
library(glue)</code></code></pre><h2><strong>The Data</strong></h2><h3><strong>Grab the data</strong></h3><p>Typically, I try my best to find and scrape the original data source. But unfortunately, perhaps for security reasons, I could not locate the <a href="https://www.nytimes.com/interactive/2024/04/16/nyregion/Trump-Jury-Questions.html">juror reponses to the screening questionnaire</a>. So instead, I hard coded the data as a .CSV.</p><p>Grab the data with this:</p><pre><code>data &lt;- PUT HERE AFTER REPO IS UP AND LIVE</code></pre><h3><strong>Manipulate the data</strong></h3><p>For exercise, I left us with a few required manipulations:</p><h4><strong>1) Transpose the data</strong></h4><p>Our data file is in a wide format relative to publication, but our table requires the jurors to be the column names. There are a number of ways to essentially &#8220;swap&#8221; rows and columns, and we are going to use the transpose function from data.table and specify keep.names to retain our column headers. Our headers, actually, are located in the first row of our table, so we will use row_to_names from janitor to &#8220;shift&#8221; everything up one row. Finally, let&#8217;s convert this back to a tibble and rename the first column.</p><pre><code>data &lt;- data %&gt;% 
  transpose(keep.names = "news") %&gt;% 
  row_to_names(1) %&gt;% 
  as_tibble() %&gt;% 
  rename("source" = "juror")</code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ju9X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3c53ea1-b72e-4daf-8502-411718053db5_1632x468.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ju9X!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3c53ea1-b72e-4daf-8502-411718053db5_1632x468.png 424w, https://substackcdn.com/image/fetch/$s_!Ju9X!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3c53ea1-b72e-4daf-8502-411718053db5_1632x468.png 848w, https://substackcdn.com/image/fetch/$s_!Ju9X!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3c53ea1-b72e-4daf-8502-411718053db5_1632x468.png 1272w, https://substackcdn.com/image/fetch/$s_!Ju9X!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3c53ea1-b72e-4daf-8502-411718053db5_1632x468.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ju9X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3c53ea1-b72e-4daf-8502-411718053db5_1632x468.png" width="1456" height="418" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d3c53ea1-b72e-4daf-8502-411718053db5_1632x468.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:418,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:71979,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ju9X!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3c53ea1-b72e-4daf-8502-411718053db5_1632x468.png 424w, https://substackcdn.com/image/fetch/$s_!Ju9X!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3c53ea1-b72e-4daf-8502-411718053db5_1632x468.png 848w, https://substackcdn.com/image/fetch/$s_!Ju9X!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3c53ea1-b72e-4daf-8502-411718053db5_1632x468.png 1272w, https://substackcdn.com/image/fetch/$s_!Ju9X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3c53ea1-b72e-4daf-8502-411718053db5_1632x468.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>2) Create the boxes</strong></h4><p>This is the &#8220;trickiest&#8221; part of the visualization. There are probably a multitude of ways to go about this, but to plot the boxes in each cell, I am using an inline block, with equal height and width.</p><p>First, you will notice that the table labels each juror number in the first row, <em>but</em> it resets the &#8220;counter&#8221; after the 12th juror (13-18 are &#8220;alternates&#8221;). To handle this, we need to create a &#8220;display number&#8221; that that follows the same logic. Since our column headers are the juror numbers, we can simply refer to it using the <code>cur_column</code> function.</p><p>Next, our values are <code>NA</code> if the juror <em>does not</em> utilize that news &#8220;source,&#8221; so we can set that color to a light grey and use a yellow when the cell is <em>not</em> <code>NA</code> (does use it).</p><p>Finally, the &#8220;trickiest&#8221; part is to create an HTML string that builds the box. If you don&#8217;t know HTML or CSS, that&#8217;s okay, you can still follow along; the code is pretty intuitive.</p><ul><li><p>The &#8220;inline block&#8221; creates a small square with the height and width specified in the string.</p></li><li><p>It is filled with the background-color referenced above.</p></li><li><p>The shorthand margin property first sets the top and bottom margins and then the left and right margins. Having smaller left and right margins will make the boxes appear closer together.</p></li><li><p>The text (and boxes) are then centered, set at a 12px font size, and bolded (which the original table does not do).</p></li></ul><pre><code>data &lt;- data %&gt;% 
   mutate(across(-source, ~{
     
    display_number &lt;- ifelse(row_number() == 1, 
                             ifelse(as.numeric(cur_column()) &gt; 12, as.numeric(cur_column()) - 12, cur_column()),
                             NA)

    color &lt;- ifelse(is.na(.x), "#EEEEEE", "#FCCF10")

    glue("&lt;span style='display:inline-block; width:20px; height:20px; line-height:20px; background-color: {color}; vertical-align:middle; margin:4px 1px; font-size: 12px; font-weight: bold; text-align:center;'&gt;{ifelse(!is.na(display_number), display_number, '')}&lt;/span&gt;")
  }))</code></pre><h4><strong>3) Separator</strong></h4><p>In the original table, there is a small gap to separate the jurors from the alternates. We can mimic this same effect by creating a dummy column <em>after</em> our above mutations and then place it <em>after</em> the 13th column (the 12th and final juror).</p><pre><code>data &lt;- data %&gt;% mutate(blank = '', .after = 13)</code></pre><div><hr></div><h2><strong>The Table</strong></h2><p>Most of our table can be created with stock gt functions, but we will need to add minimal CSS to top it off.</p><h4><strong>1) The Base Table</strong></h4><p>The &#8220;base&#8221; of our table will be created using fmt_markdown to render our HTML strings and gt_theme_nytimes to closely mirror the look of the original table. Importantly, we add an arbitrary &#8220;id&#8221; for later use with opt_css.</p><pre><code><code>data %&gt;% 
  gt(id = "table") %&gt;% 
  gt_theme_nytimes() %&gt;% 
  fmt_markdown(-c(source, blank))</code></code></pre><h4><strong>2) Handling the Separator Column</strong></h4><p>To create the separation effect, we will need to relabel our column and adjust its width.</p><pre><code><code>... %&gt;% 
  cols_label(blank = "") %&gt;% 
  cols_width(blank ~ px(15))</code></code></pre><h4><strong>3) Column Labels</strong></h4><p>The original table does not have &#8220;traditional&#8221; column headers; instead, they appear to column <em>spanners</em>, which we can create with <code>tab_spanner</code>. To <em>really</em> drive home this effect, we&#8217;re going to need some CSS at the end. But for now, let&#8217;s add the column spanners, align them to the left, and make them a light grey.</p><pre><code><code>... %&gt;% 
  tab_spanner(columns = 1, label = "Source") %&gt;% 
  tab_spanner(columns = 2:13, label = "Jurors") %&gt;% 
  tab_spanner(columns = 15:20, label = "Alternates") %&gt;% 
  tab_style(locations = cells_column_spanners(),
            style = cell_text(align = "left", size = px(16), color = "#7E7E7E"))</code></code></pre><h4><strong>4) Table Annotations + Options</strong></h4><p>Let&#8217;s add our title and caption. We will also tweak our caption font size, force the line below the caption to white (not sure why this theme doesn&#8217;t do it by default), and compress our rows.</p><pre><code><code>... %&gt;% 
  tab_header(title = "Where the jurors in the Trump hush-money trial say they get their news") %&gt;% 
  tab_source_note(md("Data and original table by New York Times&lt;br&gt;Recreation in R by @andreweatherman")) %&gt;% 
    tab_options(data_row.padding = 1,
                source_notes.border.bottom.style = "solid",
                source_notes.border.bottom.color = "white",
                source_notes.font.size = 12)</code></code></pre><h4><strong>5) Additional CSS</strong></h4><p>To make our column spanners look like headers &#8211; creating the spanner-header effect in the original table &#8211; we need some light css. When using opt_css, it is important to reference the same table id that you created in gt(id = ...).</p><p>The first line hides the column headers and sets their position to &#8220;relative,&#8221; allowing our spanners to &#8220;drop&#8221; in their place.</p><p>The second line &#8220;drops&#8221; our spanners with position: absolute, makes them visible, and adds some minor padding. Our spanners are already aligned left, done in step three, but remember that our boxes have slight left-right margins, so by default, our spanners look misaligned. We can &#8220;push&#8221; them to the right with padding-left: 3px.</p><pre><code><code>... %&gt;% 
   opt_css(
    css = "
      #table .gt_col_headings {
        visibility: hidden;
        position: relative;
      }
      #table .gt_column_spanner {
        position: absolute;
        visibility: visible;
        padding-left: 3px;
      }
    "
  )</code></code></pre><div><hr></div><h1>Source Code</h1><p><a href="https://github.com/andreweatherman/buckets-and-bytes/blob/main/visualizations/tables/nyt_survey/script.R">Full code can be found here.</a></p>]]></content:encoded></item><item><title><![CDATA[Jittered Logo Plot]]></title><description><![CDATA[Creating a beeswarm-like plot with team logos]]></description><link>https://www.bucketsandbytes.com/p/jittered-logo-plot</link><guid isPermaLink="false">https://www.bucketsandbytes.com/p/jittered-logo-plot</guid><dc:creator><![CDATA[Andrew Weatherman]]></dc:creator><pubDate>Wed, 21 Aug 2024 22:01:45 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UW5J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03a81fcf-b910-43ae-8a35-5ac6a56d858c_3600x3900.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This tutorial walks through how to create jittered logo plots using {ggplot2} and {cbbplotR} for college basketball. We will be plotting performance against seed expectation &#8211; the cumulative number of wins above or below seed expectation &#8211; from 2000-2024 for every team with at least five tournament appearances.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UW5J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03a81fcf-b910-43ae-8a35-5ac6a56d858c_3600x3900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UW5J!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03a81fcf-b910-43ae-8a35-5ac6a56d858c_3600x3900.png 424w, https://substackcdn.com/image/fetch/$s_!UW5J!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03a81fcf-b910-43ae-8a35-5ac6a56d858c_3600x3900.png 848w, https://substackcdn.com/image/fetch/$s_!UW5J!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03a81fcf-b910-43ae-8a35-5ac6a56d858c_3600x3900.png 1272w, https://substackcdn.com/image/fetch/$s_!UW5J!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03a81fcf-b910-43ae-8a35-5ac6a56d858c_3600x3900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UW5J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03a81fcf-b910-43ae-8a35-5ac6a56d858c_3600x3900.png" width="1456" height="1577" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/03a81fcf-b910-43ae-8a35-5ac6a56d858c_3600x3900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1577,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UW5J!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03a81fcf-b910-43ae-8a35-5ac6a56d858c_3600x3900.png 424w, https://substackcdn.com/image/fetch/$s_!UW5J!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03a81fcf-b910-43ae-8a35-5ac6a56d858c_3600x3900.png 848w, https://substackcdn.com/image/fetch/$s_!UW5J!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03a81fcf-b910-43ae-8a35-5ac6a56d858c_3600x3900.png 1272w, https://substackcdn.com/image/fetch/$s_!UW5J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03a81fcf-b910-43ae-8a35-5ac6a56d858c_3600x3900.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h1><strong>The How</strong></h1><p>For this table, we will need:</p><pre><code>library(tidyverse)
library(cbbdata)
library(cbbplotR)
library(vipor)</code></pre><h2><strong>The Data</strong></h2><p>For this visualization, we will be pulling data from Barttorvik using the {cbbdata} package.</p><pre><code><code>data &lt;- cbd_torvik_ncaa_results(2000, 2024) %&gt;% 
  filter(r64 &gt;= 5) %&gt;% 
  select(team, pase) %&gt;% 
  mutate(pase_rk = dense_rank(-pase))</code></code></pre><p>All we&#8217;re doing here is pulling tournament performance data, filtering for five or more appearances (r64), and calculating PASE rank &#8211; which will be used to highlight top teams.</p><h3><strong>Calculate the jitter</strong></h3><p>We want to &#8220;jitter&#8221; our plot, which is something made easy by using the <code>ggbeeswarm</code> package. &#8220;Jittering&#8221; broadly refers to offsetting points to minimize overlap. Unfourtantely, {cbbplotR} does not <em>yet</em> support jittering points, so we need to do it ourselves.</p><p>Behind the scenes, {ggbeeswarm} uses the {vipor} package and its <em>offsetSingleGroup</em> function to calculate new x-values for plotting. With this knowledge, we can create a small wrapper around <em>offsetSingleGroup</em> to achieve similar results.</p><pre><code><code>calculate_quasirandom_jitter &lt;- function(y, x, width = 0.2) {
  jittered_offset &lt;- offsetSingleGroup(y, method = "quasirandom")
  jittered_offset &lt;- jittered_offset * width
  x + jittered_offset
}</code></code></pre><p>Next, we&#8217;ll apply this function to our data.</p><pre><code>data &lt;- data %&gt;% 
  mutate(x = calculate_quasirandom_jitter(pase, 1))</code></pre><div><hr></div><h1><strong>Plotting</strong></h1><p>Now, time to plot! Let&#8217;s briefly go over some things:</p><h4>geom_mean_lines</h4><p>This is a utility function to add mean (or median) lines to any plot. Notice that you must refer to your values as either y0 or x0, not y or x.</p><h4>scale_X_identity</h4><p>Inside geom_cbb_teams, you might notice that we are conditionally defining widths (logo size) and alpha (logo transparency) values. The scale_x_identity family of functions are used when <a href="https://ggplot2-book.org/scales-other#sec-scale-identity">&#8220;your data is already scaled such that the data and aesthetic spaces are the same.&#8221;</a> That is, whenever you are passing direct values for a scale inside of any aes, you must use the appropriate _identity function for ggplot to recognize those values as literal representations.</p><h4>plot.margin</h4><p>This is how you add padding to your plot. Sometimes padding makes your graph look a bit cleaner.</p><h4><strong>Using </strong><code>ggpreview</code><strong> with logo plots</strong></h4><p>If you are plotting numerous team logos, you might notice that RStudio can be slow to return the plot itself &#8211; which can possibly lead to your R session aborting. To fix this, {cbbplotR} borrows a function from the {ggpath} package called <em>ggpreview</em> &#8211; which saves a temporary image of your plot and returns it in the Viewer pane. It is recommend to then expand that window in your browser.</p><p>To use ggpreview, you need to store your plot as a variable and then pass it to the ggpreview function. The function also takes arguments for plot dimensions.</p><p>For example, if we were to draw a plot showing every team&#8217;s adjusted efficiencies, that would require rendering 362 logos, which would definitely cause us some problems. But with ggpreview, we can store our plot as a variable and view a temporary image of it! This entire process takes fewer than 10 seconds.</p><h3><strong>The plot</strong></h3><pre><code><code>plot &lt;- data %&gt;% 
  ggplot(aes(x, pase)) +
  geom_mean_lines(aes(y0 = pase), color = "grey70") +
  geom_cbb_teams(aes(team = team,
                     width = ifelse(pase_rk &lt;= 20, 0.07, 0.055),
                     alpha = ifelse(pase_rk &lt;= 20, 1, 0.15))) +
  scale_alpha_identity() +
  scale_y_continuous(breaks = seq(-10, 20, 5), labels = c("- 10", as.character(seq(-5, 15, 5)), "+ 20"),
                     limits = c(-10, 20)) +
  theme_minimal() +
  theme(plot.title.position = "plot",
        plot.title = element_text(family = "RadioCanadaBig-Bold", hjust = 0.5, size = 14),
        plot.subtitle = element_text(family = "RadioCanadaBig-Regular", hjust = 0.5,
                                     vjust = 2.7, size = 10),
        plot.caption.position = "plot",
        plot.caption = ggtext::element_markdown(family = "RadioCanadaBig-Regular",
                                                lineheight = 1.2, size = 8),
        axis.text = element_text(family = "RadioCanadaBig-Regular"),
        axis.title = element_text(family = "RadioCanadaBig-SemiBold"),
        axis.title.y = element_text(vjust = 2),
        axis.text.x = element_blank(),
        plot.margin = margin(t = 20, r = 20, b = 20, l = 20, unit = "pt"),
        panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        panel.grid.minor.y = element_blank(),
        plot.background = element_rect(fill = "#F6F7F2")) +
  labs(title = "The programs who routinely outperform March expectations",
       subtitle = "Sorted by PASE (performance against seed expectation) from 2000-2024.\nMin. five tournament appearances.",
       caption = "Data by cbbdata&lt;br&gt;Viz by @andreweatherman + cbbplotR",
       y = "Aggregate wins +/- seed expectation",
       x = NULL)</code></code></pre><h3><strong>Saving the plot</strong></h3><p>When you&#8217;re using custom fonts, as we are, sometimes ggsave won&#8217;t properly render them. To sidestep this, you need to specify a device, shown below.</p><pre><code>ggsave(plot = plot, "pase_graph.png", h = 6.5, w = 6, dpi = 600, device = grDevices::png)</code></pre><div><hr></div><h1>Source Code</h1><p><a href="https://github.com/andreweatherman/buckets-and-bytes/blob/main/visualizations/graphs/jittered_logos/script.R">The full source can be found here.</a></p>]]></content:encoded></item><item><title><![CDATA[New Coaches vs. KenPom]]></title><description><![CDATA[Custom stacking and subheading functions in gt]]></description><link>https://www.bucketsandbytes.com/p/new-coaches-vs-kenpom</link><guid isPermaLink="false">https://www.bucketsandbytes.com/p/new-coaches-vs-kenpom</guid><dc:creator><![CDATA[Andrew Weatherman]]></dc:creator><pubDate>Wed, 21 Aug 2024 22:01:22 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!3gFo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F194f8cdd-b677-4967-89c8-36d00b715fff_2286x2477.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We will be making a table that plots preseason vs.&nbsp;final KenPom rating improvements for new head coaches. This code uses custom-written gt functions.</p><blockquote><p>To build this table, you will need an <em>active</em> KenPom subscription and a cbbdata<a href="https://cbbdata.aweatherman.com/#registering-for-an-api-key"> account</a>. Follow <a href="https://cbbdata.aweatherman.com/#kenpom">these steps</a> to link your KenPom account to cbbdata.</p></blockquote><p>If you do not have an active KenPom subscription, the cleaned data is provided in the source code linked at the bottom of the article.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3gFo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F194f8cdd-b677-4967-89c8-36d00b715fff_2286x2477.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3gFo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F194f8cdd-b677-4967-89c8-36d00b715fff_2286x2477.png 424w, https://substackcdn.com/image/fetch/$s_!3gFo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F194f8cdd-b677-4967-89c8-36d00b715fff_2286x2477.png 848w, https://substackcdn.com/image/fetch/$s_!3gFo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F194f8cdd-b677-4967-89c8-36d00b715fff_2286x2477.png 1272w, https://substackcdn.com/image/fetch/$s_!3gFo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F194f8cdd-b677-4967-89c8-36d00b715fff_2286x2477.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3gFo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F194f8cdd-b677-4967-89c8-36d00b715fff_2286x2477.png" width="1456" height="1578" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/194f8cdd-b677-4967-89c8-36d00b715fff_2286x2477.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1578,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3gFo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F194f8cdd-b677-4967-89c8-36d00b715fff_2286x2477.png 424w, https://substackcdn.com/image/fetch/$s_!3gFo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F194f8cdd-b677-4967-89c8-36d00b715fff_2286x2477.png 848w, https://substackcdn.com/image/fetch/$s_!3gFo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F194f8cdd-b677-4967-89c8-36d00b715fff_2286x2477.png 1272w, https://substackcdn.com/image/fetch/$s_!3gFo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F194f8cdd-b677-4967-89c8-36d00b715fff_2286x2477.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><p>For this table, we will need:</p><pre><code><code>library(tidyverse)
library(rvest)
library(cbbdata)
library(gt)
library(gtExtras)
library(glue)
library(janitor)</code></code></pre><h2><strong>The Data</strong></h2><h3><strong>Grab The Data</strong></h3><h4><strong>Coaching Changes</strong></h4><p>The first thing that we will need is a list of coaching changes by season. There are a few different places from which to grab this, but the most straightforward way is the <em>Coaching Changes</em> page at <a href="https://www.barttorvik.com/">barttorvik</a>.</p><p>The data is presented in a static HTML table by year, so we will write a function using rvest to scrape data between 2012 and 2024.</p><blockquote><p>For some reason, the Barttorvik site <em>blocks</em> requests originating from Windows devices. To get around this, we will use withr and set a custom user-agent.</p></blockquote><pre><code><code>get_coaching_changes &lt;- function(year) {
  
  suppressWarnings({
    withr::local_options(HTTPUserAgent='Not Windows')
    read_html(glue("https://barttorvik.com/coaching_moves.php?year={year}")) %&gt;% 
      html_table() %&gt;% 
      pluck(1) %&gt;% 
      clean_names() %&gt;% 
      mutate(year = year) %&gt;% 
      select(team, year, new_coach)
  })
  
}</code></code></pre><p>Now that we have our scraping function, let&#8217;s loop over it with map_dfr.</p><pre><code>all_changes &lt;- map_dfr(2012:2024, \(year) get_coaching_changes(year))</code></pre><h4><strong>KenPom Ratings</strong></h4><p>Next, we need preseason and year-end KenPom ratings, which is possible with the cbd_kenpom_ratings_archive function from cbbdata. </p><pre><code>archive &lt;- cbd_kenpom_ratings_archive() %&gt;% 
  filter(year &gt;= 2008) %&gt;% 
  summarize(
    start_em = adj_em[which.min(date)],
    end_em = adj_em[which.max(date)],
    final_rank = adj_em_rk[which.max(date)],
    .by = c(team, year)
  ) %&gt;% 
  mutate(diff = end_em - start_em)</code></pre><h4><strong>Season Record</strong></h4><p>For some added flair, let&#8217;s include team records too.</p><pre><code><code>team_records &lt;- cbd_torvik_game_box() %&gt;%
  summarize(
    record = glue("{sum(result == 'W')}-{sum(result == 'L')}"),
    .by = c(team, year)
  )</code></code></pre><h4><strong>Combine</strong></h4><p>Finally, let&#8217;s combine our data and calculate the rating difference. All join functions in dplyr only work with two data frames. However, we can place everything inside of a list and use <em>reduce </em>from purrr.</p><pre><code><code>data &lt;- list(all_changes, archive, team_records) %&gt;% 
  reduce(left_join, by = c("team", "year"))</code></code></pre><p>We&#8217;re only going to plot the 10 &#8220;best&#8221; rating jumps.</p><pre><code>data &lt;- data %&gt;% slice_max(diff, n = 10)</code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QmCM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e1d6a52-611c-4c09-8b4d-2fdd9fce2875_1624x726.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QmCM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e1d6a52-611c-4c09-8b4d-2fdd9fce2875_1624x726.png 424w, https://substackcdn.com/image/fetch/$s_!QmCM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e1d6a52-611c-4c09-8b4d-2fdd9fce2875_1624x726.png 848w, https://substackcdn.com/image/fetch/$s_!QmCM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e1d6a52-611c-4c09-8b4d-2fdd9fce2875_1624x726.png 1272w, https://substackcdn.com/image/fetch/$s_!QmCM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e1d6a52-611c-4c09-8b4d-2fdd9fce2875_1624x726.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QmCM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e1d6a52-611c-4c09-8b4d-2fdd9fce2875_1624x726.png" width="1456" height="651" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3e1d6a52-611c-4c09-8b4d-2fdd9fce2875_1624x726.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:651,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:183826,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QmCM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e1d6a52-611c-4c09-8b4d-2fdd9fce2875_1624x726.png 424w, https://substackcdn.com/image/fetch/$s_!QmCM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e1d6a52-611c-4c09-8b4d-2fdd9fce2875_1624x726.png 848w, https://substackcdn.com/image/fetch/$s_!QmCM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e1d6a52-611c-4c09-8b4d-2fdd9fce2875_1624x726.png 1272w, https://substackcdn.com/image/fetch/$s_!QmCM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e1d6a52-611c-4c09-8b4d-2fdd9fce2875_1624x726.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>Postseason Outcome</strong></h4><p>The final thing that we are going to include is a column on whether or not a team made the postseason (NCAA, NIT, CBI, etc.). The <em>easiest</em> way to do this is to scrape Sports Reference &#8211; which is why we&#8217;re adding this <em>after</em> we have combined our data and grabbed the 10 largest jumps.</p><p>Postseason information can be found on a team&#8217;s schedule page for a given season. We can use cbd_teams to grab the needed team slugs.</p><pre><code>sr_ids &lt;- cbd_teams() %&gt;% select(team = common_team, sr_link)
grab_schedules &lt;- function(team, year) {
  
  Sys.sleep(3) # sleep for 501
  
  slug &lt;- filter(sr_ids, team == !!team)$sr_link
  url &lt;- glue("https://www.sports-reference.com/cbb/schools/{slug}/men/{year}-schedule.html")
  
  read_html(url) %&gt;% 
    html_nodes("#schedule") %&gt;% 
    html_table() %&gt;% 
    pluck(1) %&gt;% 
    clean_names() %&gt;% 
    slice_tail(n = 1) %&gt;% 
    select("type") %&gt;% 
    mutate(team = team, year = year)
  
}</code></pre><p>Use purrr to iterate over all teams and then join the results.</p><pre><code><code>postseason &lt;- map2_dfr(data$team, data$year, \(team, year) grab_schedules(team, year))

data &lt;- left_join(data, postseason, by = c('team', 'year')) %&gt;% 
  mutate(type = ifelse(type == "CTOURN", "---", type))</code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5k72!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3270652-e868-4e57-a227-885ef0454001_1622x722.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5k72!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3270652-e868-4e57-a227-885ef0454001_1622x722.png 424w, https://substackcdn.com/image/fetch/$s_!5k72!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3270652-e868-4e57-a227-885ef0454001_1622x722.png 848w, https://substackcdn.com/image/fetch/$s_!5k72!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3270652-e868-4e57-a227-885ef0454001_1622x722.png 1272w, https://substackcdn.com/image/fetch/$s_!5k72!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3270652-e868-4e57-a227-885ef0454001_1622x722.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5k72!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3270652-e868-4e57-a227-885ef0454001_1622x722.png" width="1456" height="648" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a3270652-e868-4e57-a227-885ef0454001_1622x722.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:648,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:194613,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5k72!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3270652-e868-4e57-a227-885ef0454001_1622x722.png 424w, https://substackcdn.com/image/fetch/$s_!5k72!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3270652-e868-4e57-a227-885ef0454001_1622x722.png 848w, https://substackcdn.com/image/fetch/$s_!5k72!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3270652-e868-4e57-a227-885ef0454001_1622x722.png 1272w, https://substackcdn.com/image/fetch/$s_!5k72!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3270652-e868-4e57-a227-885ef0454001_1622x722.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2><strong>The Table</strong></h2><h3><strong>Stack Function</strong></h3><p>To make things cleaner, here is a function that will plot team logos and stack some additional text to the right using HTML.</p><pre><code><code>gt_cbb_stack &lt;- function(data, upper_text1, upper_text2, lower_text1, lower_text2, lower_text3, logo) {

  data %&gt;%
    mutate(stack = glue(
        "&lt;div style='display: flex; align-items: center;'&gt;
           &lt;img src='{eval(expr({{logo}}))}' style='height: auto; width: 20px; padding-right: 5px;'&gt;
           &lt;div&gt;
             &lt;div style='line-height:14px;'&gt;&lt;span style='font-weight:bold;color:black;font-size:14px'&gt;{eval(expr({{upper_text1}}))}, {eval(expr({{upper_text2}}))}&lt;/span&gt;&lt;/div&gt;
             &lt;div style='line-height:10px;'&gt;&lt;span style='font-weight:plain;color:grey;font-size:10px'&gt;{eval(expr({{lower_text1}}))} --  #{eval(expr({{lower_text2}}))}, {eval(expr({{lower_text3}}))}&lt;/span&gt;&lt;/div&gt;
           &lt;/div&gt;
         &lt;/div&gt;"
      )
    )
}</code></code></pre><p>To use this, we need to add a column with team logo links. Then, let&#8217;s apply it.</p><pre><code><code>data &lt;- data %&gt;% left_join(cbd_teams() %&gt;% select(team = common_team, espn_nickname, logo))

data &lt;- data %&gt;% gt_cbb_stack(new_coach, year, espn_nickname, final_rank, record, logo)</code></code></pre><h3><strong>Column Header + Subheader Function</strong></h3><p>In late January, Todd Whitehead (Synergy) <a href="https://x.com/CrumpledJumper/status/1751698732591251583">posted a table</a> with cool column headers + subheaders. I really liked this design, which pairs very well with stacked cells, so I created a function to mimic this effect in gt. We&#8217;ll use it in our table too.</p><p>This function does a few things, but most notably, it creates an HTML string for the &#8220;stacked&#8221; effect, parses it using htmltools, and then sets it as the header using cols_label.</p><pre><code><code>gt_column_subheaders &lt;- function(gt_table, ...) {

  subheaders &lt;- list(...)
  all_col_names &lt;- colnames(gt_table[['_data']])

  for (col_name in all_col_names) {

    subtitle_info &lt;- subheaders[[col_name]] %||% list(subtitle = "&amp;nbsp;", heading = col_name)
    subtitle &lt;- subtitle_info$subtitle
    new_header_title &lt;- subtitle_info$heading

label_html &lt;- htmltools::HTML(glue(
  "&lt;div style='line-height: 1.05; margin-bottom: -2px;'&gt;
    &lt;span style='font-size: 14px; font-weight: bold; color: black;'&gt;{new_header_title}&lt;/span&gt;
    &lt;br&gt;
    &lt;span style='font-size: 10px; font-weight: normal; color: #808080;'&gt;{subtitle}&lt;/span&gt;
  &lt;/div&gt;"
))

    gt_table &lt;- gt_table %&gt;% 
      cols_label(!!sym(col_name) := label_html)
  }
  
  gt_table
}</code></code></pre><h4><strong>1) The Base Table</strong></h4><p>Honestly, the code below outputs a pretty nice table, but there is definitely some room for improvement.</p><pre><code><code>data %&gt;% 
  select(stack, type, start_em, end_em, diff) %&gt;% 
  gt(id = 'table') %&gt;% 
  gt_theme_nytimes() %&gt;% 
  fmt_markdown(stack) %&gt;% 
  cols_move_to_start(stack) %&gt;% 
  cols_align(columns = stack, 'left') %&gt;% 
  cols_align(columns = -stack, 'center')</code></code></pre><h4><strong>2) Applying Custom Column Function</strong></h4><p>Let&#8217;s apply our custom gt_column_subheaders function. To relabel a column, you need to pass a list with heading and subheading.</p><pre><code><code>... %&gt;%
  gt_column_subheaders(stack = list(heading = "Coach and Year",
                                    subtitle = "Team, Final Rank, and Record"),
                       type = list(heading = 'Post SZN',
                                    subtitle = "Tournament"),
                       start_em = list(heading = 'Pre',
                                    subtitle = "Rating"),
                       end_em = list(heading = 'End',
                                    subtitle = "Rating"),
                       diff = list(heading = 'Jump',
                                    subtitle = "End - Start"))</code></code></pre><h4><strong>3) Table Borders</strong></h4><p>To give our table some more clarity and definition, we will add some borders around our cells.</p><pre><code><code>... %&gt;%
  tab_style(locations = cells_body(columns = c(type, ends_with("em"))), style = cell_borders()) %&gt;% 
  tab_style(locations = cells_body(columns = -ends_with("em")), style = cell_borders(sides = "bottom")) %&gt;% 
  tab_style(locations = cells_body(rows = 1), style = cell_borders(sides = "top", weight = px(2))) %&gt;% 
  tab_style(locations = cells_body(columns = diff), style = cell_text(weight = 'bold')) </code></code></pre><h4><strong>4) Table Annotations + Options</strong></h4><p>Let&#8217;s add our title and caption. We will also tweak our caption font size, force the line below the caption to white (not sure why this theme doesn&#8217;t do it by default), and compress our rows.</p><pre><code><code>... %&gt;%
  tab_options(data_row.padding = 3.5,
              source_notes.border.bottom.style = "solid",
              source_notes.border.bottom.color = "white",
              source_notes.font.size = 10) %&gt;% 
  tab_header(title = "New coaches beating KenPom expectations",
             subtitle = md("The largest pre-season vs. year-end KenPom rating improvements&lt;br&gt;by new head coaches since 2012")) %&gt;% 
  tab_source_note(md("Data by cbbdata + Sports Reference&lt;br&gt;Viz. + Analysis by @andreweatherman"))</code></code></pre><h4><strong>5) Additional CSS</strong></h4><p>Finally, let&#8217;s throw in some minor CSS changes. When using opt_css, it is important to reference the same table id that you created in gt(id = &#8230;).</p><p>The first two lines adjust the padding between the title and subtitles &#8211; &#8220;squishing&#8221; them together. The third line targets the bottom border of the table. It creates the same effect as the tab_style that targeted the first row (black border at 2px weight).</p><pre><code><code>... %&gt;%
    opt_css(
    "
    #table .gt_heading {
      padding-top: 6px;
      padding-bottom: 0px;
    }
    #table .gt_subtitle {
      padding-top: 2px;
      padding-bottom: 6px;
    }
    #table tbody tr:last-child {
    border-bottom: 2px solid #000000;
    }
    "
  )</code></code></pre><div><hr></div><h1>Source Code</h1><p><a href="https://github.com/andreweatherman/buckets-and-bytes/blob/main/visualizations/tables/coaching_improvements/script.R">Full code can be found here.</a></p>]]></content:encoded></item><item><title><![CDATA[Faceted Bar Charts in R!]]></title><description><![CDATA[Creating faceted bar charts to show remaining Q1 opportunities in R]]></description><link>https://www.bucketsandbytes.com/p/important-update-faceted-bar-charts</link><guid isPermaLink="false">https://www.bucketsandbytes.com/p/important-update-faceted-bar-charts</guid><dc:creator><![CDATA[Andrew Weatherman]]></dc:creator><pubDate>Sat, 20 Jan 2024 00:13:34 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Ymq8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55163340-f534-484b-b571-d5fac964bd59.heic" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We will be exploring the number of &#8220;Quad 1&#8221; games remaining for every high-major team. If you are new to the NET and/or unsure about what defines a &#8220;Quad 1&#8221; game, <a href="https://www.ncaa.com/news/basketball-men/article/2022-12-05/college-basketballs-net-rankings-explained">please refer to the article</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ymq8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55163340-f534-484b-b571-d5fac964bd59.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ymq8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55163340-f534-484b-b571-d5fac964bd59.heic 424w, https://substackcdn.com/image/fetch/$s_!Ymq8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55163340-f534-484b-b571-d5fac964bd59.heic 848w, https://substackcdn.com/image/fetch/$s_!Ymq8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55163340-f534-484b-b571-d5fac964bd59.heic 1272w, https://substackcdn.com/image/fetch/$s_!Ymq8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55163340-f534-484b-b571-d5fac964bd59.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ymq8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55163340-f534-484b-b571-d5fac964bd59.heic" width="1456" height="1729" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/55163340-f534-484b-b571-d5fac964bd59.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1729,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:359063,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ymq8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55163340-f534-484b-b571-d5fac964bd59.heic 424w, https://substackcdn.com/image/fetch/$s_!Ymq8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55163340-f534-484b-b571-d5fac964bd59.heic 848w, https://substackcdn.com/image/fetch/$s_!Ymq8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55163340-f534-484b-b571-d5fac964bd59.heic 1272w, https://substackcdn.com/image/fetch/$s_!Ymq8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55163340-f534-484b-b571-d5fac964bd59.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>Getting the data</h2><p>First up, let&#8217;s load the necessary libraries.</p><pre><code>library(tidyverse)
library(cbbdata)
library(cbbplotR)
library(glue)
library(hrbrthemes)</code></pre><p>Per usual, this tutorial will make use of my {<strong><a href="https://cbbdata.aweatherman.com">cbbdata</a></strong>} package. If you have not done so already, you will need to install it and register for an API key (entirely free). <a href="https://cbbdata.aweatherman.com/#installation">Please read on how to do so here</a>.</p><h4>Season schedule</h4><p>Our first step is pulling the 2024 season schedule and filtering to include games played on January 19th <em>or later</em> (today to the end of the regular season). We further throw out all games played against non-Division 1 competition (where <em>type</em> does not equal &#8220;nond1&#8221;).</p><pre><code>schedule &lt;- cbd_torvik_season_schedule(year = 2024) %&gt;% 
  filter(date &gt;= Sys.Date() &amp; type != 'nond1')</code></pre><p>After, we need to adjust our data so that we get a &#8220;schedule&#8221; for each team. Only two factors affect quadrant boundaries, opponent NET and game location, so our team schedule only needs three columns: team, opponent, and game location. There are a myriad of possible ways to do this, but one of the shortest is by using <strong>mutate</strong> and setting the <em>.keep</em> argument to &#8220;none.&#8221; </p><p>Functionally, this is fairly similar to the superseded <strong>transmutate</strong> function, and if interested, you can <a href="https://github.com/tidyverse/dplyr/issues/6861">read more about the slight discrepancies here</a>. By setting <em>.keep</em> to &#8220;none,&#8221; our returned data only retains the columns that are specified in our <strong>mutate</strong> call (i.e., <em>team</em>, <em>opp</em>, and <em>location</em>). Again, we don&#8217;t care about the other stuff, so this is perfect.</p><pre><code>plot_data &lt;- schedule %&gt;% 
  mutate(
    team = home, opp = away, location = if_else(neutral, 'N', 'H'),
    .keep = 'none'
  )</code></pre><p>If you are following along in your own session &#8212; which I do encourage if you aren&#8217;t fully comfortable in R &#8212; you will notice that the number of rows in <em>plot_data</em> matches the number found in the <em>schedule </em>frame&#8230;but this isn&#8217;t good! <em>cbd_torvik_season_schedule</em> returns one row per game, but if we want to calculate the number of Q1 games remaining for every high-major team, we naturally need <em>two</em> rows per game (one for each team).</p><p>Perhaps the easiest and most intuitive way to solve this is by simply doing the reverse and binding the resulting rows &#8212; Occam's razor and all. </p><p>You can think of the code below as a short nested function. The first operation performed will be the <em>inner-most</em> chain: We take our <em>schedule</em> data and do the same <strong>mutate </strong>as above while swapping the home and away logic (changes are bolded). Then, that new data is passed to <strong>bind_rows</strong> and is combined with the existing <em>plot_data</em> object.</p><pre><code>plot_data &lt;- plot_data %&gt;% 
  bind_rows(
    schedule %&gt;% 
      mutate(
        team = <strong>away</strong>, opp = <strong>home</strong>,
        location = if_else(neutral, 'N', '<strong>A</strong>'),
        .keep = 'none'
      )
  )</code></pre><h4>Adding conference, NET, and quadrant boundaries</h4><p>Our <em>plot_data</em> object now includes two rows per game, with the proper team/opp and game location assignment, so we can move onto adding quadrant boundaries.</p><p>Shipped with {<strong><a href="https://cbbdata.aweatherman.com">cbbdata</a></strong>} <a href="https://cbbdata.aweatherman.com/news/index.html#cbbdata-020">v0.2</a> is a function called <strong>cbd_add_net_quad</strong>, which takes data with a similar structure &#8212; must have columns representing opponent and game location &#8212; and adds columns for opponent NET and quadrant definition.</p><p>If you are unsure about which {<strong><a href="https://cbbdata.aweatherman.com">cbbdata</a></strong>} version you have, you can run this line of code to check and update if needed. This might require an R session restart (<strong>if so, don&#8217;t forget to reload your libraries</strong>).</p><pre><code>if(!packageVersion('cbbdata') %in% c('0.2.0', '0.3.0')) {
   pak::pak('andreweatherman/cbbdata')
}</code></pre><p>We also need to add conference information and team NET for sorting purposes, so let&#8217;s do that here as well. We can grab conference information from the ratings endpoint and NET rankings with the current resume function.</p><pre><code>plot_data &lt;- plot_data %&gt;% 
  cbd_add_net_quad() %&gt;% 
  left_join(cbd_torvik_ratings(year = 2024) %&gt;% select(team, conf),
            by = 'team') %&gt;% 
  left_join(cbd_torvik_current_resume() %&gt;% 
              select(team, team_net = net), by = 'team')</code></pre><h4>Getting our data ready for plotting</h4><p>Before we start plotting, we need to sort our data, calculate per-conference medians, and clean up some names. </p><p>I hope that nothing here is too confusing. Because we want to plot with more &#8220;readable&#8221; conference names, let&#8217;s create a named vector that we&#8217;re going to use for filtering and relabeling.</p><p>Then we count the number of remaining quad 1 games with <em>sum(quad == &#8220;Quadrant 1&#8221;)</em> and leave a column showing team NET (since the latter will not change across each team, we can simply take the first observation of team_net with <strong>first</strong>).</p><p>For plotting purposes, we want our data to be ordered in a specific way, which we handle in the third line. We want to order by number of Q1 games remaining, and if a tie exists, we then sort by NET ranking. </p><p>Our last line is a basic <strong>mutate</strong> that achieves a few purposes: calculates median conference NET, relabels each conference using the named vector <em>and</em> places that median value inside the label (so that it will show in our facet), and finally creates a team label that we will place inside each bar. If the team is the last row in their conference (most Q1 games), we include a slightly more verbose label for easier interpretation. (We further restrict the verbose label to only appear if that team has eight or more Q1 games, else it will run over the bar.)</p><h4>What is fct_inorder?</h4><p>When dealing with categorical variables, such as teams, you might notice that {<strong><a href="https://ggplot2.tidyverse.org">ggplot2</a></strong>} reorders your data to be shown alphabetically. Sometimes that is fine, but in others, you might have a particular plotting order.</p><p>To address this, you&#8217;ll often want to convert those categorical variables into factors. If you already have a defined data order, like we do below, the easiest way is to simply use the <strong>fct_inorder</strong> function, which creates factors in the order in which they first appear.</p><pre><code>conf_relabel &lt;- c('ACC' = 'ACC', 'BE' = 'Big East', 'B10' = 'Big Ten', 
                  'B12' = 'Big 12', 'SEC' = 'SEC', 'P12' = 'Pac-12')

plot_data &lt;- plot_data %&gt;% 
  filter(conf %in% names(conf_relabel)) %&gt;% 
  summarize(Q1 = sum(quad == 'Quadrant 1'),
            median_net_left = median(net),
            team_net = first(team_net),
            .by = c(team, conf)) %&gt;% 
  arrange(Q1, desc(team_net)) %&gt;% 
  mutate(avg_conf_net = mean(team_net), 
         median_conf_net = median(team_net),
         conf = conf_relabel[conf],
         conf = glue("{conf} (Med. NET {round(median(median_net_left), 0)})"),
         team = fct_inorder(team),
         label = if_else(row_number() == n() &amp; Q1 &gt;= 8, 
                         glue('{Q1} Q1s left (NET {team_net})'), 
                         glue('{Q1} ({team_net})')),
         .by = conf
       )</code></pre><div><hr></div><h2>Plotting</h2><p>Now that we have our data analyzed and reshaped, let&#8217;s throw it over to {<strong><a href="https://ggplot2.tidyverse.org">ggplot2</a></strong>} for plotting.</p><p>Our code itself is fairly straightforward. We are only using two <strong>geom_X</strong> functions (<em>col</em> and <em>text</em>) and some colors that I thought looked cool. We are borrowing a general theme from {<strong>hrbrthemes</strong>} and doing a few extra things to it.</p><h4>Why are we converting Q1 to a factor in `fill`?</h4><p>Our <em>Q1</em> variable is numeric; it&#8217;s simply the count of remaining Q1 games for each team. When you set <em>fill</em> or <em>color</em> to a numeric variable in {<strong><a href="https://ggplot2.tidyverse.org">ggplot2</a></strong>}, the scale becomes continuous by default. This is perfect if your variable is a floating number (a <em>double</em> in R, e.g. 5.3 or 19.7), but when working with integers, it usually makes more sense to have a discrete color scale (i.e. a color per distinct integer).</p><p>The easiest way to accomplish this without adjusting your data is to simply convert those values to a <em>factor</em> inside your <em>aes</em> call. A <em>factor</em> will treat values as categorical ones (variables that have a fixed and known set of possible values).</p><h4>Facets? What in the world is fct_reorder? free_y?</h4><p>The key part of this plot is building &#8220;small multiples,&#8221; or in R-speak, <em>facets</em>. Well, I guess the <em>technical</em> R-speak is, &#8220;a matrix of panels defined by row and column faceting variables.&#8221; Basically, that&#8217;s a complicated way of saying, &#8220;Hey, let&#8217;s just break this plot into smaller ones based on some group.&#8221;</p><p>If you run the code prior to <strong>facet_wrap</strong>, you&#8217;ll see that we just get one long bar plot with a bunch of team names on the side. This isn&#8217;t very helpful. What <em>facets</em> allow us to do is to break our plot into smaller ones based on conferences. Now try running the same code but include the <strong>facet_wrap</strong> line. You&#8217;ll see the exact same plot but in a 2x3 matrix &#8212; with each entry representing a different conference. Neat!</p><p><strong>fct_reorder</strong> allows us to <em>arrange</em> our grid based on some variable. In this case, we want to arrange our small multiples in order of lowest <em>median_conf_net</em> to highest.</p><p>Try removing <em>scales = &#8220;free_y&#8221; </em>and see what happens. You should notice that our bars, well, aren&#8217;t quite aligned correctly. That&#8217;s because when you build facets, scales are fixed by default &#8212; meaning that each facet has the same x- and y-axis scale. This should be evaluated on a case-by-case basis, and with this plot, we definitely do not want our y-axis to be fixed. Instead, we want to plot only the teams observed in that facet &#8212; and we can do this by setting <em>scales = &#8220;free_y&#8221;</em>.</p><pre><code>plot &lt;- plot_data %&gt;% 
  ggplot(aes(Q1, team)) +
  geom_col(color = 'white', aes(fill = factor(Q1))) +
  geom_text(aes(label = label, x = Q1 - 0.25,
                color = ifelse(Q1 &gt;= 4, 'grey20', 'white')),
            family = 'Roboto Condensed', fontface = 'bold',
            hjust = 1, size = 3.5) +
  scale_fill_manual(values = c('#2082E4', '#1C8FE7', '#169CE8',
                               '#0FB4EC', '#0AC1ED', '#06CDEF',
                               '#00DAF0', '#06DCD5', '#14DDAC')) +
  scale_color_identity() +
  facet_wrap(~ fct_reorder(conf, median_conf_net), scales = 'free_y') +
  theme_modern_rc() +
  theme(strip.text = element_text(color = 'white', face = 'bold'),
        axis.text.y = element_cbb_teams(logo_type = 'dark'),
        axis.title.x = element_text(vjust = -1.5),
        plot.title.position = 'plot',
        plot.caption.position = 'plot',
        plot.subtitle = element_text(vjust = 2.7),
        plot.caption = element_text(hjust = 0),
        legend.position = 'none',
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank()) +
   labs(title = 'Number of Q1 games remaining for high-majors',
       subtitle = 'NET rankings are current morning of Jan. 19, 2024. Conferences are sorted by median team NET.',
       y = NULL,
       x = 'Q1 games remaining',
       caption = 'Data by cbbdata + cbbplotR\nViz. + Analysis by @andreweatherman')</code></pre><h4>Viewing the plot</h4><p>One thing that seems especially pertinent to mention: If you are building plots with more than ~20 logos, use <strong>ggpreview</strong>. RStudio can be very slow to render plots with many logos, and it might even terminate your session. </p><p>To address this, simply store your plot as a variable and pass that to the <strong>ggpreview</strong> function in {<strong><a href="https://cbbplotr.aweatherman.com/articles/getting_started.html">cbbplotR</a></strong>}. The function will save a temporary image of your plot &#8212; where you can further set a width and height &#8212; and display that in the <em>Viewer</em> pane. I then recommend to open the plot in a new browser window (example shown below).</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!M7_r!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83ee7d9b-7dd7-4f13-aa39-233435595e2d.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!M7_r!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83ee7d9b-7dd7-4f13-aa39-233435595e2d.heic 424w, https://substackcdn.com/image/fetch/$s_!M7_r!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83ee7d9b-7dd7-4f13-aa39-233435595e2d.heic 848w, https://substackcdn.com/image/fetch/$s_!M7_r!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83ee7d9b-7dd7-4f13-aa39-233435595e2d.heic 1272w, https://substackcdn.com/image/fetch/$s_!M7_r!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83ee7d9b-7dd7-4f13-aa39-233435595e2d.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!M7_r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83ee7d9b-7dd7-4f13-aa39-233435595e2d.heic" width="892" height="208" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83ee7d9b-7dd7-4f13-aa39-233435595e2d.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:208,&quot;width&quot;:892,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12717,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!M7_r!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83ee7d9b-7dd7-4f13-aa39-233435595e2d.heic 424w, https://substackcdn.com/image/fetch/$s_!M7_r!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83ee7d9b-7dd7-4f13-aa39-233435595e2d.heic 848w, https://substackcdn.com/image/fetch/$s_!M7_r!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83ee7d9b-7dd7-4f13-aa39-233435595e2d.heic 1272w, https://substackcdn.com/image/fetch/$s_!M7_r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83ee7d9b-7dd7-4f13-aa39-233435595e2d.heic 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><pre><code>ggpreview(plot, w = 8, h = 9.5)</code></pre><h4>Saving the plot</h4><p>To save the plot, we&#8217;ll use <strong>ggsave</strong>.</p><pre><code>ggsave(plot = p1, 'q1_remaining_0119.png', w = 8, h = 9.5, dpi = 350)</code></pre><div><hr></div><h2>Full Code</h2><h4>Loading libraries</h4><pre><code>library(tidyverse)
library(cbbdata)
library(cbbplotR)
library(glue)
library(hrbrthemes)</code></pre><h4>Getting the data</h4><pre><code>conf_relabel &lt;- c('ACC' = 'ACC', 'BE' = 'Big East', 'B10' = 'Big Ten', 
                  'B12' = 'Big 12', 'SEC' = 'SEC', 'P12' = 'Pac-12')

schedule &lt;- cbd_torvik_season_schedule(year = 2024) %&gt;% 
  filter(date &gt;= Sys.Date() &amp; type != 'nond1')


plot_data &lt;- schedule %&gt;%
  mutate(
    team = home, opp = away, location = if_else(neutral, "N", "H"),
    .keep = "none"
  ) %&gt;%
  bind_rows(
    schedule %&gt;%
      mutate(
        team = away, opp = home,
        location = if_else(neutral, "N", "A"),
        .keep = "none"
      )
  ) %&gt;%
  cbd_add_net_quad() %&gt;%
  left_join(cbd_torvik_ratings(year = 2024) %&gt;% select(team, conf),
    by = "team"
  ) %&gt;%
  left_join(cbd_torvik_current_resume() %&gt;%
    select(team, team_net = net), by = "team") %&gt;%
  filter(conf %in% names(conf_relabel)) %&gt;%
  summarize(
    Q1 = sum(quad == "Quadrant 1"),
    median_net_left = median(net),
    team_net = first(team_net),
    .by = c(team, conf)
  ) %&gt;%
  arrange(Q1, desc(team_net)) %&gt;%
  mutate(
    avg_conf_net = mean(team_net),
    median_conf_net = median(team_net),
    conf = conf_relabel[conf],
    conf = glue("{conf} (Med. NET {round(median(median_net_left), 0)})"),
    team = fct_inorder(team),
    label = if_else(row_number() == n() &amp; Q1 &gt;= 8,
      glue("{Q1} Q1s left (NET {team_net})"),
      glue("{Q1} ({team_net})")
    ),
    .by = conf
  )</code></pre><h4>Plotting</h4><pre><code>plot &lt;- plot_data %&gt;% 
  ggplot(aes(Q1, team)) +
  geom_col(color = 'white', aes(fill = factor(Q1))) +
  geom_text(aes(label = label, x = Q1 - 0.25,
                color = ifelse(Q1 &gt;= 4, 'grey20', 'white')),
            family = 'Roboto Condensed', fontface = 'bold',
            hjust = 1, size = 3.5) +
  scale_fill_manual(values = c('#2082E4', '#1C8FE7', '#169CE8',
                               '#0FB4EC', '#0AC1ED', '#06CDEF',
                               '#00DAF0', '#06DCD5', '#14DDAC')) +
  scale_color_identity() +
  facet_wrap(~ fct_reorder(conf, median_conf_net), scales = 'free_y') +
  theme_modern_rc() +
  theme(strip.text = element_text(color = 'white', face = 'bold'),
        axis.text.y = element_cbb_teams(logo_type = 'dark'),
        axis.title.x = element_text(vjust = -1.5),
        plot.title.position = 'plot',
        plot.caption.position = 'plot',
        plot.subtitle = element_text(vjust = 2.7),
        plot.caption = element_text(hjust = 0),
        legend.position = 'none',
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank()) +
   labs(title = 'Number of Q1 games remaining for high-majors',
       subtitle = 'NET rankings are current morning of Jan. 19, 2024. Conferences are sorted by median team NET.',
       y = NULL,
       x = 'Q1 games remaining',
       caption = 'Data by cbbdata + cbbplotR\nViz. + Analysis by @andreweatherman')</code></pre><h4>Viewing and saving</h4><pre><code># view
ggpreview(plot, w = 8, h = 9.5)

# save
ggsave(plot = p1, 'q1_remaining_0119.png', w = 8, h = 9.5, dpi = 350)</code></pre><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.bucketsandbytes.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Buckets &amp; Bytes! Subscribe to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Double Special: Is there finally parity in college hoops?]]></title><description><![CDATA[Exploring final unbeatens and variation between top teams using R]]></description><link>https://www.bucketsandbytes.com/p/double-special-is-there-finally-parity</link><guid isPermaLink="false">https://www.bucketsandbytes.com/p/double-special-is-there-finally-parity</guid><dc:creator><![CDATA[Andrew Weatherman]]></dc:creator><pubDate>Fri, 12 Jan 2024 23:20:04 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/7b7cb54b-6e0b-45a1-839b-f7d60f0af485_5340x4200.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>What a <em>crazy</em> week in college hoops: Nos. 1, 2, 3, and 5 all lost to <strong>unranked</strong> teams &#8212; and #2 Houston, the country&#8217;s last unbeaten, fell to an unranked Iowa State at Hilton Coliseum in Ames.</p><p>Houston&#8217;s loss prompted a thought: We&#8217;re just two months into the season, and no perfect teams remain. How does that compare to past years? Are we seeing more parity at the top of the sport? It feels like we don&#8217;t have a truly &#8220;elite&#8221; team.</p><p>Today&#8217;s <em>Buckets &amp; Bytes</em> is a <strong>two-part</strong> special &#8212; double the graphs for double the fun!</p><p>First, we&#8217;ll be taking a look at the former: When has the last unbeaten fallen in each season back to 2008? We&#8217;ll be using {<strong><a href="https://dplyr.tidyverse.org">dplyr</a></strong>} to isolate the last unbeaten and calculate how many days they lasted. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9G65!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f6f81c8-8fba-49ca-93f0-95f1fc29b447.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9G65!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f6f81c8-8fba-49ca-93f0-95f1fc29b447.heic 424w, https://substackcdn.com/image/fetch/$s_!9G65!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f6f81c8-8fba-49ca-93f0-95f1fc29b447.heic 848w, https://substackcdn.com/image/fetch/$s_!9G65!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f6f81c8-8fba-49ca-93f0-95f1fc29b447.heic 1272w, https://substackcdn.com/image/fetch/$s_!9G65!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f6f81c8-8fba-49ca-93f0-95f1fc29b447.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9G65!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f6f81c8-8fba-49ca-93f0-95f1fc29b447.heic" width="1456" height="1798" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7f6f81c8-8fba-49ca-93f0-95f1fc29b447.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1798,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:262179,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9G65!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f6f81c8-8fba-49ca-93f0-95f1fc29b447.heic 424w, https://substackcdn.com/image/fetch/$s_!9G65!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f6f81c8-8fba-49ca-93f0-95f1fc29b447.heic 848w, https://substackcdn.com/image/fetch/$s_!9G65!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f6f81c8-8fba-49ca-93f0-95f1fc29b447.heic 1272w, https://substackcdn.com/image/fetch/$s_!9G65!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f6f81c8-8fba-49ca-93f0-95f1fc29b447.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Next, we&#8217;ll use historical T-Rank data (2015-Present) from {<strong><a href="https://cbbdata.aweatherman.com">cbbdata</a></strong>} to investigate per-day rating variation between top teams as a quick litmus test for hoops parity.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nKWF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F026cb03b-f231-4e89-8d88-e0038f9c5850.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nKWF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F026cb03b-f231-4e89-8d88-e0038f9c5850.heic 424w, https://substackcdn.com/image/fetch/$s_!nKWF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F026cb03b-f231-4e89-8d88-e0038f9c5850.heic 848w, https://substackcdn.com/image/fetch/$s_!nKWF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F026cb03b-f231-4e89-8d88-e0038f9c5850.heic 1272w, https://substackcdn.com/image/fetch/$s_!nKWF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F026cb03b-f231-4e89-8d88-e0038f9c5850.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nKWF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F026cb03b-f231-4e89-8d88-e0038f9c5850.heic" width="1456" height="1145" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/026cb03b-f231-4e89-8d88-e0038f9c5850.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1145,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:214769,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nKWF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F026cb03b-f231-4e89-8d88-e0038f9c5850.heic 424w, https://substackcdn.com/image/fetch/$s_!nKWF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F026cb03b-f231-4e89-8d88-e0038f9c5850.heic 848w, https://substackcdn.com/image/fetch/$s_!nKWF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F026cb03b-f231-4e89-8d88-e0038f9c5850.heic 1272w, https://substackcdn.com/image/fetch/$s_!nKWF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F026cb03b-f231-4e89-8d88-e0038f9c5850.heic 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><p>Before we start, let&#8217;s load all necessary libraries.</p><blockquote><p><strong>As a reminder: This tutorial will use the {<a href="https://cbbdata.aweatherman.com">cbbdata</a>} package, and<a href="https://cbbdata.aweatherman.com/#registering-for-an-api-key"> you must register for a free API key.</a></strong></p></blockquote><pre><code>pkgs &lt;- c('cbbdata', 'cbbplotR', 'tidyverse', 'ggtext', 'hrbrthemes', 'glue')
invisible(lapply(pkgs, library, character.only = TRUE))</code></pre><h1>Part 1: Unbeaten Bar Chart</h1><h2>Getting the data</h2><p>To create the unbeaten bar chart, we need to calculate when the last perfect record fell relative to the start of the season. To do this, we first use {<strong><a href="https://cbbdata.aweatherman.com">cbbdata</a></strong>} to pull game results. </p><h4>Game results</h4><p>Remember, we want to calculate how many days <em>after </em>the start of the season that the last unbeaten fell &#8212; so we are using <strong>first</strong> to select the earliest observed date in each year (i.e. the start of the season). </p><p>R allows for operations on date classes, and we can take advantage of this by subtracting game date from start date (first day of each season) to quickly calculate the number of days between the two.</p><pre><code>game_results &lt;- cbd_torvik_game_box() %&gt;% 
  filter(team %in% cbd_teams()$common_team) %&gt;% 
  arrange(date) %&gt;% 
  mutate(start_date = first(date), 
         days_diff = date - start_date, 
         .by = year)</code></pre><h4>When did the last unbeaten fall?</h4><p>Now that we have our game results, we need to calculate when each team lost their first game. We can do this by combining a few <strong>summarize</strong> logical statements. First, we calculate cumulative team losses, then we decide whether that team is still unbeaten (i.e. <em>cum_loss</em> is 0), and finally we pull the date associated with the <em>first</em> observation of <em>is_unbeaten</em> being &#8220;no.&#8221;</p><p>If you&#8217;re relatively new to R, pay careful attention to how we are counting observations inside a column where a query is true. A lot of beginners, and I did this too, will create a binary variable for game results, 0 or 1, and sum those to calculate wins. But you don&#8217;t need to do that! Instead, you can simply do <strong>sum(result == &#8220;W&#8221;) </strong>to count the number of rows where a team won a game. Simple things like this will help in writing clean, concise code.</p><pre><code>game_results &lt;- game_results %&gt;% 
  summarize(
    cum_loss = cumsum(result == 'L'),
    is_unbeaten = ifelse(cum_loss == 0, 'yes', 'no'),
    first_loss = first(date[which(is_unbeaten == 'no')]),
    days_diff = first(days_diff[which(is_unbeaten == 'no')]),
    .by = c(team, year)
  ) %&gt;% 
  slice_max(days_diff, n = 1, by = year) %&gt;% 
  distinct(year, team, days_diff)</code></pre><h4>Stacking logos with custom vjust + fill colors</h4><p>There are a few seasons where multiple teams ended as the last one standing. To address this, we are going to create a <em>vjust</em> column so that our logos can &#8220;stack&#8221; on top of one another (if multiple teams are present). </p><p>We can do this with a <strong>case_when</strong> statement and some simple logic.  The dimensions of the USC logo are a bit different, for whatever reason, so we&#8217;re handling that specific year by separating our logic statements. Feel free to play around with these numbers, if you&#8217;d like, but these are the baselines that I found to work best with this plot.</p><p>We are grouping by year, evident through <em>.by = year</em>, so <em>n()</em> and <em>row_number()</em> refer to the total count and position in the group, respectively. This is a convenient way to avoid needing to create more variables to accomplish the same task.</p><pre><code>game_results &lt;- game_results %&gt;% 
  mutate(vjust = case_when(n() == 1 ~ 0.465,
                           !team %in% c('Baylor', 'USC') &amp;
                               n() &gt; 1 ~ 0.51 - (row_number() * 0.045),
                           team == 'Baylor' ~ 0.465,
                           team == 'USC' ~ 0.408),
         .by = year)</code></pre><h4>Fill color</h4><p>Finally, we want to highlight years in which the last unbeaten fell within 65 days of the season tip. We are going to use another <strong>mutate</strong> call to assign one of two colors based on the <em>days_diff</em> value. I like this magenta color, but feel free to switch either one.</p><pre><code>game_results &lt;- game_results %&gt;% 
  mutate(fill = ifelse(days_diff &lt;= 65, '#B08CCF', 'grey70'))</code></pre><h2>Plotting the data</h2><p>That&#8217;s all of the data that we need! {<strong><a href="https://dplyr.tidyverse.org">dplyr</a></strong>} provides some powerful tools for making data analysis intuitive and quick. Let&#8217;s throw it over to {<strong><a href="https://ggplot2.tidyverse.org">ggplot2</a></strong>} and <strong>{<a href="https://cbbplotr.aweatherman.com/articles/getting_started.html">cbbplotR</a></strong>}.</p><h4>Coloring and filling with strings</h4><p>Sometimes you might have columns with either color names (e.g., &#8220;red&#8221;) or hex strings, as we do above, but when you try to color or fill based on those columns, you&#8217;ll notice that the colors default to red and blue! The <strong>scale_X_identity</strong> family of functions allow you to map pre-scaled values directly &#8212; in this case, color names or hex strings.</p><p>To handle this, we must use <strong>scale_fill_identity </strong>and <strong>scale_color_identity</strong> while referencing our <em>fill</em> column inside the <strong>aes</strong> argument. To not interfere with the <em>fill</em> and <em>color</em> of team logos, it&#8217;s vital that your <em>fill</em> and <em>color</em> <strong>aes</strong> only appear in <strong>geom_col</strong>. By default, every <em>geom</em> will inherit any <strong>aes</strong> placed inside the initial <strong>ggplot()</strong> function.<em> </em></p><p>If you are unsure about what this means, please reference the code below, or &#8212; better yet &#8212; experiment with placing the <em>fill</em> and <em>color</em> <strong>aes</strong> in different places to see what happens (you won&#8217;t break anything).</p><h4>Expanding axes</h4><p>When you make a standard plot, you might notice that the points do not &#8220;hug&#8221; the y-axis, meaning that there is some separation between the first plotted observation and the y-axis. Normally, this space is fine and lets the plot &#8220;breathe,&#8221; but sometimes it looks a bit funky. I think the latter applies in our case, and to remove that separation, we need to use the <strong>expand</strong> argument inside <strong>scale_x_continuous</strong>. You can see how this works below, and again, I encourage you to play around with the values to see how the graph responds.</p><h4>Panel grid redundancy</h4><p>A point of clarification: I don&#8217;t believe that a panel grid is appropriate for our visualization. To remove them, you would typically place <strong>panel.grid = element_blank() </strong>inside a <strong>theme</strong> call. However, there is a peculiarity with the {<strong><a href="https://cinc.rud.is/web/packages/hrbrthemes/">hrbrthemes</a></strong>} package, which we are using here, where doing so <em>does not</em> remove them. Instead, you need to individually set the major and minor grid lines to be blank. If you are following along and notice the former redundancy, just understand that this package is requiring us to separate those lines (<em>shrugs</em>).</p><pre><code>game_results %&gt;% 
  ggplot(aes(year, days_diff, team = team)) +
  geom_col(aes(fill = fill, color = fill), position = "identity") +
  geom_cbb_teams(aes(vjust = vjust), width = 0.055) +
  scale_fill_identity() +
  scale_color_identity() +
  scale_y_continuous(limits = c(0, 145)) +
  scale_x_continuous(expand = c(0,0)) +
  geom_text(
    aes(label = glue("{days_diff} ({year})")), 
    angle = 90, color = 'white', nudge_y = -10,
    family = 'Roboto Condensed', fontface = 'bold',
    size = 4.5
  ) +
  theme_ipsum_rc() +
  theme(plot.title.position = 'plot',
        plot.subtitle = element_text(vjust = 2.8, size = 16),
        plot.title = element_text(size = 24),
        plot.caption = element_text(size = 12),
        axis.text.x = element_blank(),
        axis.title.y = element_text(size = 12),
        panel.grid.minor = element_blank(),
        panel.grid.major = element_blank()) +
  labs(title = 'Unbeatens are losing earlier and earlier',
       subtitle = 'Number of days needed for the final unbeaten to fall (2008-2024)',
       caption = 'Data by cbbdata + cbbplotR\nViz. + Analysis by @andreweatherman',
       x = NULL, y = 'Days since season start')</code></pre><div><hr></div><h1>Part 2: Rating Variation Line Plot</h1><h2>Getting the data</h2><p>For the variation line chart, we only need one piece of data: T-Rank archive ratings. If you have <a href="https://cbbdata.aweatherman.com/index.html#kenpom">authorized your {</a><strong><a href="https://cbbdata.aweatherman.com/index.html#kenpom">cbbdata} </a></strong><a href="https://cbbdata.aweatherman.com/index.html#kenpom">account with KenPom</a>, feel free to swap this data with KenPom ratings (accessed with <strong>cbd_kenpom_ratings_archive</strong>) for a more complete look (three extra seasons available). For the purpose of this blog, however, we are going to use the freely available T-Rank archive ratings.</p><h4>Archive ratings</h4><p>This graph doesn&#8217;t require much data; the focus is more on plotting techniques. In fact, we can grab all necessary data with just a few lines.</p><p>To avoid making two separate data frames, we can create a <em>year_group </em>variable that we will group by to calculate standard deviations. We can also add a column for an alpha value, which will help us in distinguishing our groups when plotting.</p><p>As useful as performing operations on dates can be, as illustrated by the previous plot, you <em>do</em> need to coerce them to a numeric before you are able to filter.</p><p>You might notice that we add a day to each date: T-Rank archive ratings are <em>day-end</em>, which means that, e.g., the final ratings on November 6, 2023 are <em>also</em> the ratings on the morning of November 7, 2023. It follows that this step isn&#8217;t logically needed for our plot, of course, but it&#8217;s a good idea to get into the practice of doing this with T-Rank data, especially if you are later adding on games, e.g., so that everything is on the same time scale.</p><pre><code>plot_data &lt;- cbd_torvik_ratings_archive() %&gt;% 
  mutate(date = date + 1,
         start = first(date),
         days_diff = date - start,
         year_group = ifelse(year == 2024, 'current', 'past'),
         .by = year) %&gt;% 
  filter(rank &lt;= 50 &amp; between(as.numeric(days_diff), 0, 144)) %&gt;% 
  summarize(top_sd = sd(barthag),
            .by = c(days_diff, year_group)) %&gt;% 
  mutate(alpha_group = ifelse(year_group == 'current', 1, 0.35))</code></pre><h4>Annotation data</h4><p>Our plot is going to have some brief annotations, and instead of using three separate <em>geoms </em>for labeling, we can create a quick data frame that can be passed through to take advantage of the principles of <strong>aes</strong> in {<strong><a href="https://ggplot2.tidyverse.org">ggplot2</a></strong>}.</p><pre><code>label_data &lt;- data.frame(
  label = c('Avg. Selection Sunday', '2015-2023', '2024'),
  x = c(124, 0, 0),
  y = c(0.0415, mean(filter(plot_data, year_group == 'past')$top_sd),
        mean(filter(plot_data, year_group == 'current')$top_sd)),
  angle = c(270, 0, 0)
)</code></pre><h2>Plotting the data</h2><p>This plot is a <em>tad</em> more complex than our first one, but there aren&#8217;t too many more moving parts. Again, if you&#8217;re confused about what particular functions do in the code, I recommend iterating over each line to experiment with what changes when you add or remove certain things. The best way to learn is by doing.</p><h4>What&#8217;s that tilde doing (data = ~ &#8230;)?</h4><p>In our <strong>geom_mean_lines </strong>functions, you might notice that we are doing some weird looking thing &#8594;<em><strong> </strong>data = ~ filter(.x, &#8230;)</em>. As briefly mentioned earlier, each <em>geom</em> &#8220;inherits&#8221; all data and aesthetic calls, but we want to plot average lines for both year groups. We could create a new data frame that includes the average values, but that feels unnecessary: After all, our data <em>is there</em> but we just need to find a way to intuitively access it.</p><p>Enter: the tilde (~). The tilde operator is allowing us to essentially &#8220;pass through&#8221; the data object being inherited, i.e., <em>plot_data</em>, to a <strong>filter</strong> function that queries by <em>year_group</em>. The <em>.x</em> is a &#8220;placeholder&#8221; for that inherited data; if you&#8217;ve used <strong>across</strong> or {<strong><a href="https://purrr.tidyverse.org">purrr</a></strong>}, you&#8217;ll quickly recognize it.</p><pre><code>plot_data %&gt;% 
  ggplot(aes(days_diff, top_sd)) +
  geom_mean_lines(data = ~ filter(.x, year_group == 'past'), 
                  aes(y0 = top_sd), alpha = .3, color = 'white') +
  geom_mean_lines(data = ~ filter(.x, year_group == 'current'), 
                  aes(y0 = top_sd), color = 'white') +
  geom_vline(xintercept = 124, color = 'white', linetype = 'dashed') +
  geom_line(linewidth = 1, aes(group = year_group, 
                               alpha = alpha_group)) +
  geom_richtext(data = label_data, 
                aes(x, y, label = label, angle = angle), 
                family = 'Roboto Condensed', fontface = 'bold',
                text.color = 'white', hjust = 0, label.color = NA, 
                fill = '#1e1e1e', size = 3.6) +
  scale_alpha_identity() +
  scale_y_continuous(labels = scales::label_percent()) +
  scale_x_continuous(labels = c('Pre.', '50', '100', '150')) +
  hrbrthemes::theme_modern_rc() +
  theme(legend.position = 'none',
        plot.title.position = 'plot',
        plot.subtitle = element_text(vjust = 2.7),
        plot.caption.position = 'plot',
        plot.caption = element_text(hjust = 0),
        axis.title.x = element_text(vjust = -2),
        axis.title.y = element_text(vjust = 2)) +
  labs(title = "There's finally more parity in men's college basketball",
       subtitle = 'Standard deviation between top 50 T-Rank teams through each day of the season (2015-2024). Values are represented\nas projected win percentage vs. average team on neutral floor. Average year-long SD is shown.',
       x = 'Days into the season',
       y = 'Top 50 Barthag SD',
       caption = 'Data by cbbdata\nViz. + Analysis by @andreweatherman') </code></pre><div><hr></div><h1>Full Code</h1><h2>Plot 1: Bar Chart</h2><h4>Getting the data</h4><pre><code>game_results &lt;- cbd_torvik_game_box() %&gt;% 
  # ensure we only account for D-1 teams
  filter(team %in% cbd_teams()$common_team) %&gt;% 
  arrange(date) %&gt;% 
  mutate(start_date = first(date), 
         days_diff = date - start_date, 
         .by = year) %&gt;% 
  summarize(
    cum_loss = cumsum(result == 'L'),
    is_unbeaten = ifelse(cum_loss == 0, 'yes', 'no'),
    first_loss = first(date[which(is_unbeaten == 'no')]),
    days_diff = first(days_diff[which(is_unbeaten == 'no')]),
    .by = c(team, year)
  ) %&gt;% 
  slice_max(days_diff, n = 1, by = year) %&gt;% 
  distinct(year, team, days_diff) %&gt;% 
  mutate(vjust = case_when(n() == 1 ~ 0.465,
                           !team %in% c('Baylor', 'USC') &amp;
                             n() &gt; 1 ~ 0.51 - (row_number() * 0.045),
                           team == 'Baylor' ~ 0.465,
                           team == 'USC' ~ 0.408),
         fill = ifelse(days_diff &lt;= 65, '#B08CCF', 'grey70'),
         .by = year)</code></pre><h4>Plotting</h4><pre><code>game_results %&gt;% 
  ggplot(aes(year, days_diff, team = team)) +
  geom_col(aes(fill = fill, color = fill), position = "identity") +
  geom_cbb_teams(aes(vjust = vjust), width = 0.055) +
  scale_fill_identity() +
  scale_color_identity() +
  scale_y_continuous(limits = c(0, 145)) +
  scale_x_continuous(expand = c(0,0)) +
  geom_text(
    aes(label = glue("{days_diff} ({year})")), 
    angle = 90, color = 'white', nudge_y = -10,
    family = 'Roboto Condensed', fontface = 'bold',
    size = 4.5
  ) +
  theme_ipsum_rc() +
  theme(plot.title.position = 'plot',
        plot.subtitle = element_text(vjust = 2.8, size = 16),
        plot.title = element_text(size = 24),
        plot.caption = element_text(size = 12),
        axis.text.x = element_blank(),
        axis.title.y = element_text(size = 12),
        panel.grid.minor = element_blank(),
        panel.grid.major = element_blank()) +
  labs(title = 'Unbeatens are losing earlier and earlier',
       subtitle = 'Number of days needed for the final unbeaten to fall (2008-2024)',
       caption = 'Data by cbbdata + cbbplotR\nViz. + Analysis by @andreweatherman',
       x = NULL, y = 'Days since season start')</code></pre><h4>Saving</h4><pre><code>ggsave(plot = last_plot(), 'last_unbeaten.png', dpi = 600, bg = 'white', w = 7.54, h = 9.31)</code></pre><h2>Plot 2: Line Chart</h2><h4>Getting the data</h4><pre><code>plot_data &lt;- cbd_torvik_ratings_archive() %&gt;% 
  mutate(date = date + 1,
         start = first(date),
         days_diff = date - start,
         year_group = ifelse(year == 2024, 'current', 'past'),
         .by = year) %&gt;% 
  filter(rank &lt;= 50 &amp; between(as.numeric(days_diff), 0, 144)) %&gt;% 
  summarize(top_sd = sd(barthag),
            .by = c(days_diff, year_group)) %&gt;% 
  mutate(alpha_group = ifelse(year_group == 'current', 1, 0.3))</code></pre><pre><code>label_data &lt;- data.frame(
  label = c('Avg. Selection Sunday', '2015-2023', '2024'),
  x = c(124, 0, 0),
  y = c(0.0415, mean(filter(plot_data, year_group == 'past')$top_sd),
        mean(filter(plot_data, year_group == 'current')$top_sd)),
  angle = c(270, 0, 0)
)</code></pre><h4>Plotting</h4><pre><code>plot_data %&gt;% 
  ggplot(aes(days_diff, top_sd)) +
  geom_mean_lines(data = ~ filter(.x, year_group == 'past'), 
                  aes(y0 = top_sd), alpha = .3, color = 'white') +
  geom_mean_lines(data = ~ filter(.x, year_group == 'current'), 
                  aes(y0 = top_sd), color = 'white') +
  geom_vline(xintercept = 124, color = 'white', linetype = 'dashed') +
  geom_line(linewidth = 1, aes(group = year_group, 
                               alpha = alpha_group)) +
  geom_richtext(data = label_data, 
                aes(x, y, label = label, angle = angle), 
                family = 'Roboto Condensed', fontface = 'bold',
                text.color = 'white', hjust = 0, label.color = NA, 
                fill = '#1e1e1e', size = 3.6) +
  scale_alpha_identity() +
  scale_y_continuous(labels = scales::label_percent()) +
  scale_x_continuous(labels = c('Pre.', '50', '100', '150')) +
  hrbrthemes::theme_modern_rc() +
  theme(legend.position = 'none',
        plot.title.position = 'plot',
        plot.subtitle = element_text(vjust = 2.7),
        plot.caption.position = 'plot',
        plot.caption = element_text(hjust = 0),
        axis.title.x = element_text(vjust = -2),
        axis.title.y = element_text(vjust = 2)) +
  labs(title = "There's finally more parity in men's college basketball",
       subtitle = 'Standard deviation between top 50 T-Rank teams through each day of the season (2015-2024). Values are represented\nas projected win percentage vs. average team on neutral floor. Average year-long SD is shown.',
       x = 'Days into the season',
       y = 'Top 50 Barthag SD',
       caption = 'Data by cbbdata\nViz. + Analysis by @andreweatherman')</code></pre><h4>Saving</h4><pre><code>ggsave(plot = last_plot(), w = 8.9, h = 7, 'parity_trank.png', dpi = 600)</code></pre><p>If you found this walkthrough and code useful, please consider subscribing below and sharing the post! It helps me a ton.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.bucketsandbytes.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you found this content useful and would like to receive free R tutorials in your inbox, please consider subscribing below!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Game performance tables with gt and cbbplotR!]]></title><description><![CDATA[Create a 'Four Factor' game performance table using gt and new features in cbbplotR!]]></description><link>https://www.bucketsandbytes.com/p/game-performance-tables-with-gt-and</link><guid isPermaLink="false">https://www.bucketsandbytes.com/p/game-performance-tables-with-gt-and</guid><dc:creator><![CDATA[Andrew Weatherman]]></dc:creator><pubDate>Fri, 05 Jan 2024 23:13:08 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ncpz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f2d5629-d105-4745-b0c2-124aafff12ee.heic" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the third installment of <em>Buckets &amp; Bytes</em> &#8212; a (hopefully) weekly series where we use R to create engaging visualizations using college basketball data. I want to preface: I hope that this code can provide a general framework for creating similar visualizations, so if college basketball isn&#8217;t your thing, you can easily adapt it to different data!</p><p>Today, we will be using my <strong>new package</strong>, {<strong><a href="https://cbbplotr.aweatherman.com/articles/getting_started.html">cbbplotR</a></strong>}, which provides {<strong><a href="https://ggplot2.tidyverse.org">ggplot2</a></strong>} and {<strong><a href="https://gt.rstudio.com">gt</a></strong>} extensions for visualizing college basketball team and conference logos + player headshots.</p><p>Specifically, we will be creating this neat game performance table, highlighting team-wide four factors on a game-by-game basis &#8212; and underscoring the powerful combination of {<strong><a href="https://cbbdata.aweatherman.com">cbbdata</a></strong>} and {<strong><a href="https://cbbplotr.aweatherman.com/articles/getting_started.html">cbbplotR</a></strong>} in the process!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ncpz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f2d5629-d105-4745-b0c2-124aafff12ee.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ncpz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f2d5629-d105-4745-b0c2-124aafff12ee.heic 424w, https://substackcdn.com/image/fetch/$s_!ncpz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f2d5629-d105-4745-b0c2-124aafff12ee.heic 848w, https://substackcdn.com/image/fetch/$s_!ncpz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f2d5629-d105-4745-b0c2-124aafff12ee.heic 1272w, https://substackcdn.com/image/fetch/$s_!ncpz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f2d5629-d105-4745-b0c2-124aafff12ee.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ncpz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f2d5629-d105-4745-b0c2-124aafff12ee.heic" width="1456" height="1033" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f2d5629-d105-4745-b0c2-124aafff12ee.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1033,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:284113,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ncpz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f2d5629-d105-4745-b0c2-124aafff12ee.heic 424w, https://substackcdn.com/image/fetch/$s_!ncpz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f2d5629-d105-4745-b0c2-124aafff12ee.heic 848w, https://substackcdn.com/image/fetch/$s_!ncpz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f2d5629-d105-4745-b0c2-124aafff12ee.heic 1272w, https://substackcdn.com/image/fetch/$s_!ncpz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f2d5629-d105-4745-b0c2-124aafff12ee.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>Getting the data</h2><p>This visualization uses data from {<strong><a href="https://cbbdata.aweatherman.com">cbbdata</a></strong>}. If you have not yet installed the package and created an account, <a href="https://cbbdata.aweatherman.com/#installation">follow the steps outlined here</a>. As a caveat: This visualization will utilize KenPom data. To access this data, you need to authorize your KenPom account on CBBData, <a href="https://cbbdata.aweatherman.com/reference/cbd_kenpom_authorization.html">which can be done using this function</a>.</p><p><strong>If you do not have an active KenPom subscription, don&#8217;t worry!</strong> The data that we need <em>is</em> public and front-facing, and I will include the necessary script to pull the data so you can produce the same table.</p><p>The process needed to grab our data involves using a few <em>joins</em>, so to simplify the process, we will walk through each &#8220;segment&#8221; individually. The full source code, included at the bottom of this post, will be more concise.</p><p>This code relies on v0.2 of both {<strong><a href="https://cbbdata.aweatherman.com">cbbdata</a></strong>} and {<strong><a href="https://cbbplotr.aweatherman.com/articles/getting_started.html">cbbplotR</a></strong>}, which were released on January 5th. Please run this chunk to load the required libraries <em>and</em> update both packages.</p><pre><code>pak::pak(c('andreweatherman/cbbdata', 'andreweatherman/cbbplotR'))
pkgs &lt;- c('cbbdata', 'cbbplotR', 'gt', 'gtExtras', 'tidyverse', 'glue')
invisible(lapply(pkgs, library, character.only = TRUE))</code></pre><div><hr></div><h3>Data: Game Stats and NET</h3><p>The most important piece of our table, evidently, is game data. We are plotting &#8220;Four Factors,&#8221; which is a group of statistics <a href="http://www.rawbw.com/~deano/articles/20040601_roboscout.htm">defined by Dean Oliver</a> as integral to winning basketball games. It&#8217;s a decades-old concept that has held true against the test of time. Using {<strong><a href="https://cbbdata.aweatherman.com">cbbdata</a></strong>}, we can quickly get these data on a per-game level (caveat: D-1 vs. D-1 games <em>only</em>).</p><p>We also want to include the score of the game, and we can use the {<strong><a href="https://glue.tidyverse.org">glue</a></strong>} package to build the score by concatenate two columns. We are also going to adjust our game date to be more readable.</p><p>Finally, we want to include NET rankings and quadrant boundaries. As hinted earlier, the new update to {<strong><a href="https://cbbdata.aweatherman.com">cbbdata</a></strong>} includes a function to do this for us. If you are not familiar with the NET or quadrants,<a href="https://www.ncaa.com/news/basketball-men/article/2022-12-05/college-basketballs-net-rankings-explained"> you can learn more here</a>.</p><p>Since we are plotting Duke&#8217;s performance, we only need to request the Blue Devils&#8217; game results. If you wish to plot another team (or year), remember to switch out all instances of <em>Duke</em> for your own team (or year). The process is the same.</p><pre><code>factors_net &lt;- cbd_torvik_game_stats(<strong>year = 2024, team = 'Duke'</strong>) %&gt;% 
    arrange(date) %&gt;%
    cbd_add_net_quad() %&gt;%
    mutate(score = glue('{pts_scored}-{pts_allowed}'),
           date = format(date, '%b. %e'))  %&gt;% 
    select(date, result, opp, score, off_ppp, off_efg,
           off_to, off_or, off_ftr, def_ppp, def_efg, def_to,
           def_or, def_ftr, game_score, net, quad)</code></pre><h3>Data: Relative Performance</h3><p>Barttorvik includes a neat stat called <em>Game Score</em>, which is denoted as game_score in our data above. Game score can be thought of as a composite look at how well your team played in a given game. In a nutshell, game score is a per-game Barthag rating, which &#8212; while typically viewed across the aggregate where it down-weights mismatches &#8212; is an estimation of a team&#8217;s winning chances vs. the average team on a neutral floor. Game score is judged on a [0, 100] scale and is positive (scores closer to 100 are better).</p><p>With this information, we can create a data column that says, &#8220;In this game, my team played similar to how the T-Rank #X team would be expected to play against the same opponent.&#8221; Another appropriate interpretation would be a quick view of team consistency; this can be judged by looking at both the game_score column and our new one. Obviously, <em>game score</em> is pretty volatile at a per-game level, but it&#8217;s still nice to look at.</p><p>To do this, we need to create a function that pulls the closest current national ranking that corresponds to that game score. </p><pre><code>find_closest_rank &lt;- function(scores) {
  map_int(scores, function(score) {
    differences &lt;- abs(ratings$barthag - score / 100)
    closest_index &lt;- which.min(differences)
    ratings$barthag_rk[closest_index]
  })
}

ratings &lt;- cbd_torvik_ratings(year = 2024) # get current rankings</code></pre><p>Now, we can apply it to our data.</p><pre><code>factors_net &lt;- factors_net %&gt;% 
  mutate(closest_rank = find_closest_rank(game_score))</code></pre><h3>Data: KenPom Rankings</h3><p>Finally, we want to include KenPom rankings in our table. If you have an active KenPom subscription, <a href="https://cbbdata.aweatherman.com/reference/cbd_kenpom_authorization.html">I recommend authorizing it using {</a><strong><a href="https://cbbdata.aweatherman.com/reference/cbd_kenpom_authorization.html">cbbdata</a></strong><a href="https://cbbdata.aweatherman.com/reference/cbd_kenpom_authorization.html">} to make the process much more streamlined</a>. But if you do not have one, current rankings are public information, and I will include the necessary code below.</p><h4>KenPom Rankings through {<a href="https://cbbdata.aweatherman.com">cbbdata</a>}</h4><p>If you have authorized your account, here is how you would grab current rankings and join them over.</p><pre><code>current_kp &lt;- cbd_kenpom_ratings(year = 2024) %&gt;% 
  select(opp = team, rank = rk)

factors_net &lt;- left_join(factors_net, current_kp, by = 'opp')</code></pre><h4>KenPom Rankings with {<a href="https://rvest.tidyverse.org">rvest</a>}</h4><p>If you do not have a KenPom account, you can still grab current rankings. Since KenPom is a <em>static</em> site and current rankings are public-facing (no paywall), we can use a combination of {<strong><a href="https://rvest.tidyverse.org">rvest</a></strong>}, {<strong><a href="https://sfirke.github.io/janitor/">janitor</a></strong>}, and {<strong><a href="https://www.tidyverse.org">tidyverse</a></strong>} to retrieve and clean them.</p><p>Since not all KenPom team names natively match over to conventions found in {<strong><a href="https://cbbdata.aweatherman.com">cbbdata</a></strong>}, we need to create a matching dictionary. Then, we use {<strong><a href="https://rvest.tidyverse.org">rvest</a></strong>} to parse the KenPom index page (current data) and pull the static HTML table. <strong>html_table</strong> will natively return a <em>list</em>, even if only one table is found, so we use the <strong>pluck</strong> function from {<strong><a href="https://purrr.tidyverse.org">purrr</a></strong>} to &#8220;lift&#8221; the first element from the nested list. We then use a few functions from {<strong><a href="https://sfirke.github.io/janitor/">janitor</a></strong>} to clean the data and later employ our matching dictionary created at the start of the chunk.</p><pre><code>teams &lt;- cbd_teams() %&gt;% select(team = common_team, kp = kp_team)

team_matching &lt;- teams %&gt;% pull('team') %&gt;% 
  rlang::set_names(cbbdata::cbd_teams()$kp)

current_kp &lt;- rvest::read_html('https://kenpom.com') %&gt;% 
  rvest::html_table() %&gt;% 
  pluck(1) %&gt;% 
  janitor::row_to_names(1) %&gt;% 
  janitor::clean_names() %&gt;% 
  select(opp = team, rank = rk) %&gt;% 
  mutate(opp = team_matching[opp]) %&gt;% 
  filter(!is.na(opp))

factors_net &lt;- left_join(factors_net, current_kp, by = 'opp')</code></pre><p>This isn&#8217;t terribly complicated code, but you can see the evident advantage to using {<strong><a href="https://cbbdata.aweatherman.com">cbbdata</a></strong>} &#8212; which also includes daily KenPom rating archives back to 2011-12 for subscribers.</p><div><hr></div><h2>Visualizing in <a href="https://gt.rstudio.com">{gt}</a></h2><h3>{<strong><a href="https://cbbplotr.aweatherman.com/articles/getting_started.html">cbbplotR</a></strong>} utility functions</h3><p>Our table will use a few functions from {<strong><a href="https://cbbplotr.aweatherman.com/articles/getting_started.html">cbbplotR</a></strong>} to aid in creating our table. Here is a quick overview of each.</p><h4>Adding opponent logos</h4><p>Our table includes logos of each Duke opponent, which is a nice way of quickly identifying any game of interest. This process involves mashing together some HTML, but {<strong><a href="https://cbbplotr.aweatherman.com/articles/getting_started.html">cbbplotR</a></strong>} includes a function called <strong>gt_cbb_teams</strong> that will do this for us!</p><p>Our table will have some dark fill colors for wins and losses, so let&#8217;s use <em>dark mode</em> logos for our opponents. By default, <strong>gt_cbb_teams</strong> pulls normal logos, but you can set <em>logo_color = &#8220;dark&#8221; </em> to get dark ones.</p><p>We also want to include the KenPom rank of each opponent, but we need to add it <em>after</em> we have called <strong>gt_cbb_teams. </strong>Then, let&#8217;s create a new frame called <em>table_data</em>.</p><pre><code><code>table_data &lt;- factors_net %&gt;% 
  # we want to add HTML in the opp col. and rewrite it -&gt; so: opp, opp
  gt_cbb_teams(opp, opp, logo_color = 'dark') %&gt;% 
  mutate(opp = glue('{opp} (#{rank})'))</code></code></pre><h4>Creating the title</h4><p>If you notice, our table header is pretty cool. It includes Duke&#8217;s logo, which is a nice way of quickly identifying the subject of our data. Making a header like this includes tinkering with some HTML &#8212; but luckily, {<strong><a href="https://cbbplotr.aweatherman.com/articles/getting_started.html">cbbplotR</a></strong>} ships with a function that will build the header for us! More specifically, it will allow us to include a team logo, conference logo, player headshot, or a custom image by passing through an external link. As you&#8217;ll see later in the code, we will eventually need to wrap this object in <em>HTML</em>.</p><p>To include a logo for Duke, we set the <em>value</em> to Duke and the <em>type</em> to &#8220;team&#8221;. We can set a table title + subtitle and adjust the fonts, weights, and line-heights of both as well (which we won&#8217;t do). </p><pre><code>gt_title &lt;- gt_cbb_logo_title(
  title = 'Game-by-game efficiency performance for Duke in 2023',
  subtitle = 'D1 vs. D1 only. Data, rankings, and quadrants are current
  as of Jan. 5.',
  value = 'Duke',
  type = 'team'
)</code></pre><h4>Coloring win/loss rows</h4><p> {<strong><a href="https://cbbplotr.aweatherman.com/articles/getting_started.html">cbbplotR</a></strong>} ships with another utility function, <strong>gt_color_results</strong>, that will take a column of game results &#8212; either W/L characters or 1/0 binaries &#8212; and fill each row relative to the game result. It&#8217;s a tidy way of replicating two <strong>tab_style </strong>calls in a single line. You can also adjust the win/loss_color (fill) and the wins/loss_text_color. By default, the font color is white, which we will keep.</p><h4>Setting the table font</h4><p>The final {<strong><a href="https://cbbplotr.aweatherman.com/articles/getting_started.html">cbbplotR</a></strong>} utility function is <strong>gt_set_font</strong>, which is a quick and dirty way of changing the font in all customizable parts of your table. You&#8217;ll notice that we will still use <strong>tab_style</strong> to adjust the weights of our column labels, however, and it should be noted that <strong>gt_set_font</strong> does not yet offer customization aside from changing the font family. You can think of it as a nice way to test different fonts in your table.</p><div><hr></div><h3>Building the table</h3><p>Now that we have briefly explored each {<strong><a href="https://cbbplotr.aweatherman.com/articles/getting_started.html">cbbplotR</a></strong>} function, created our table header, and finalized our data, let&#8217;s throw it over to {<strong><a href="https://gt.rstudio.com">gt</a></strong>}!</p><p>Okay, there&#8217;s a lot going on here. If you&#8217;re relatively new to {<strong><a href="https://gt.rstudio.com">gt</a></strong>}, I <em>really</em> recommend that you step through each line. It might seem overwhelming, but many of the functions are intuitively named, and running the code line-by-line should help you understand what&#8217;s happening.</p><h4>Optional CSS</h4><p>I&#8217;m not going to walk through every function, but I did want to briefly mention the <strong>opt_css </strong>line at the end. <strong>opt_css</strong> is a way of adding CSS to your tables, which <em>really</em> extends table possibilities. In fact, {<strong><a href="https://gt.rstudio.com">gt</a></strong>} <em>is</em> just converting everything to HTML &#8212; which is why we can get some neat customization with our table header.</p><p>In our table, specifically, we use this CSS to decrease the spacing between each footnote and our caption lines. To make this work, we need to set a table ID, which we do in the second line with gt(<strong>id = 'duke'</strong>) and then reference that ID as a selector.</p><pre><code>table_data %&gt;% 
  gt(id = 'duke') %&gt;% 
  gt_theme_538() %&gt;% 
  fmt_markdown(opp) %&gt;% 
  cols_move(date, opp) %&gt;% 
  cols_move_to_end(quad) %&gt;% 
  cols_hide(c(result, rank, net)) %&gt;% 
  cols_align(columns = everything(), 'center') %&gt;% 
  cols_align(columns = opp, 'left') %&gt;% 
  cols_label(opp = 'opponent (KenPom Rk.)', off_ppp = 'PPP', 
             def_ppp = 'PPP',off_efg = 'eFG%', off_to = 'TOV%', 
             off_or = 'Reb%',off_ftr = 'FTA/FGA', def_efg = 'eFG%',
             def_to = 'TOV%', def_or = 'Reb%', def_ftr = 'FTA/FGA',
             game_score = 'Eff. Score', quad = 'NET Quad',
             closest_rank = 'Like #') %&gt;% 
  gt_color_results() %&gt;% 
  tab_style(locations = cells_column_labels(),
            style = cell_text(font = 'Oswald', weight = 'bold')) %&gt;% 
  tab_style(locations = cells_title(), style =
              cell_text(font = 'Oswald')) %&gt;% 
  tab_options(table.font.names = 'Oswald', data_row.padding = 2) %&gt;% 
  gt_add_divider(score, include_labels = FALSE, color = 'black') %&gt;% 
  gt_add_divider(off_ftr, include_labels = FALSE, color = 'black') %&gt;% 
  gt_add_divider(def_ftr, include_labels = FALSE, color = 'black') %&gt;% 
  tab_spanner(off_ppp:off_ftr, label = 'Offensive Performance',
              id = 'offense') %&gt;% 
  tab_spanner(def_ppp:def_ftr, label = 'Defensive Performance',
              id = 'defense') %&gt;% 
  tab_footnote(cells_column_spanners(spanner = c('offense', 'defense')),
               footnote = "Points per possession + 'Four Factors'
               (effective FG%, turnover rate, off/def rebound rate,
               and FTA per 100 FGA)") %&gt;% 
  tab_footnote(cells_column_labels(columns = game_score),
               footnote = 'This value is used to calculate the
               proceeding column and can be viewed as a [0-100]
               composite game performance score') %&gt;%
  tab_footnote(cells_column_labels(columns = closest_rank),
               footnote = 'This game performance is roughly
               equivalent to how #X would be expected to play in the
               same game (Barttorvik)') %&gt;% 
  tab_header(title = html(gt_title)) %&gt;% 
  tab_source_note(md('Data by cbbdata + cbbplotR
                     &lt;br&gt;Viz. + Analysis by @andreweatherman')) %&gt;% 
  opt_css(
    '
      #duke .gt_sourcenote{
        line-height: 1.2;
        padding-top: 9px !important;
      }
      #duke .gt_footnote {
        padding-top: 7px !important;
        padding-bottom: 7px !important;
        line-height: 0.2;
      }
      '
  )</code></pre><div><hr></div><h2>Full Code</h2><h4>Loading libraries</h4><pre><code>pak::pak(c('andreweatherman/cbbdata', 'andreweatherman/cbbplotR'))
pkgs &lt;- c('cbbdata', 'cbbplotR', 'gt', 'gtExtras', 'tidyverse', 'glue')
invisible(lapply(pkgs, library, character.only = TRUE))</code></pre><h4>Finding closest rank function</h4><pre><code>find_closest_rank &lt;- function(scores) {
  map_int(scores, function(score) {
    differences &lt;- abs(ratings$barthag - score / 100)
    closest_index &lt;- which.min(differences)
    ratings$barthag_rk[closest_index]
  })
}
ratings &lt;- cbd_torvik_ratings(year = 2024)</code></pre><h4>KenPom data<br></h4><p>If you <em><strong>have </strong></em>authorized your KenPom account on {<strong><a href="https://cbbdata.aweatherman.com">cbbdata</a></strong>}.</p><pre><code>current_kp &lt;- cbd_kenpom_ratings(year = 2024) %&gt;% 
  select(opp = team, rank = rk)</code></pre><p>If you <em><strong>have not</strong></em> authorized your KenPom account.</p><pre><code>teams &lt;- cbd_teams() %&gt;% select(team = common_team, kp = kp_team)

team_matching &lt;- teams %&gt;% pull('team') %&gt;%
  rlang::set_names(cbbdata::cbd_teams()$kp)

current_kp &lt;- read_html('https://kenpom.com') %&gt;%
  html_table() %&gt;%
  pluck(1) %&gt;%
  row_to_names(1) %&gt;%
  clean_names() %&gt;%
  select(opp = team, rank = rk) %&gt;% 
  mutate(opp = team_matching[opp]) %&gt;%
  filter(!is.na(opp))</code></pre><h4>Table data</h4><pre><code>table_data &lt;- cbd_torvik_game_stats(year = 2024, team = 'Duke') %&gt;% 
  arrange(date) %&gt;%
  cbd_add_net_quad() %&gt;% 
  mutate(score = glue('{pts_scored}-{pts_allowed}'),
         closest_rank = find_closest_rank(game_score),
         date = format(date, '%b. %e'),
         location = NULL) %&gt;% 
  select(date, result, opp, score, off_ppp, off_efg,
         off_to, off_or, off_ftr, def_ppp, def_efg, def_to,
         def_or, def_ftr, game_score, closest_rank, net, quad) %&gt;% 
  left_join(current_kp, by = 'opp') %&gt;% 
  gt_cbb_teams(opp, opp, logo_color = 'dark') %&gt;%
  mutate(opp = glue('{opp} (#{rank})'))</code></pre><h4>Table header</h4><pre><code>gt_title &lt;- gt_cbb_logo_title(
  title = 'Game-by-game efficiency performance for Duke in 2023',
  subtitle = 'D1 vs. D1 only. Data, rankings, and quadrants are current
  as of Jan. 5.',
  value = 'Duke',
  type = 'team'
)</code></pre><h4>Creating the table</h4><pre><code>table &lt;- table_data %&gt;% 
  gt(id = 'duke') %&gt;% 
  gt_theme_538() %&gt;% 
  fmt_markdown(opp) %&gt;% 
  cols_move(date, opp) %&gt;% 
  cols_move_to_end(quad) %&gt;% 
  cols_hide(c(result, rank, net)) %&gt;% 
  cols_align(columns = everything(), 'center') %&gt;% 
  cols_align(columns = opp, 'left') %&gt;% 
  cols_label(opp = 'opponent (KenPom Rk.)', off_ppp = 'PPP', 
             def_ppp = 'PPP',off_efg = 'eFG%', off_to = 'TOV%', 
             off_or = 'Reb%',off_ftr = 'FTA/FGA', def_efg = 'eFG%',
             def_to = 'TOV%', def_or = 'Reb%', def_ftr = 'FTA/FGA',
             game_score = 'Eff. Score', quad = 'NET Quad',
             closest_rank = 'Like #') %&gt;% 
  gt_color_results() %&gt;% 
  tab_style(locations = cells_column_labels(),
            style = cell_text(font = 'Oswald', weight = 'bold')) %&gt;% 
  tab_style(locations = cells_title(), style =
              cell_text(font = 'Oswald')) %&gt;% 
  tab_options(table.font.names = 'Oswald', data_row.padding = 2) %&gt;% 
  gt_add_divider(score, include_labels = FALSE, color = 'black') %&gt;% 
  gt_add_divider(off_ftr, include_labels = FALSE, color = 'black') %&gt;% 
  gt_add_divider(def_ftr, include_labels = FALSE, color = 'black') %&gt;% 
  tab_spanner(off_ppp:off_ftr, label = 'Offensive Performance',
              id = 'offense') %&gt;% 
  tab_spanner(def_ppp:def_ftr, label = 'Defensive Performance',
              id = 'defense') %&gt;% 
  tab_footnote(cells_column_spanners(spanner = c('offense', 'defense')),
               footnote = "Points per possession + 'Four Factors'
               (effective FG%, turnover rate, off/def rebound rate,
               and FTA per 100 FGA)") %&gt;% 
  tab_footnote(cells_column_labels(columns = game_score),
               footnote = 'This value is used to calculate the
               proceeding column and can be viewed as a [0-100]
               composite game performance score') %&gt;%
  tab_footnote(cells_column_labels(columns = closest_rank),
               footnote = 'This game performance is roughly
               equivalent to how #X would be expected to play in the
               same game (Barttorvik)') %&gt;% 
  tab_header(title = html(gt_title)) %&gt;% 
  tab_source_note(md('Data by cbbdata + cbbplotR
                     &lt;br&gt;Viz. + Analysis by @andreweatherman')) %&gt;% 
  opt_css(
    '
      #duke .gt_sourcenote{
        line-height: 1.2;
        padding-top: 9px !important;
      }
      #duke .gt_footnote {
        padding-top: 7px !important;
        padding-bottom: 7px !important;
        line-height: 0.2;
      }
      '
  )</code></pre><h4>Saving the table</h4><p>Sometimes, <strong>gtsave_extra </strong>can be a bit finicky. If your table is not saving, try restarting your R Session (you&#8217;ll then need to re-load the libraries).</p><pre><code>gtsave_extra(table, 'team_performance.png')</code></pre><p>If you found this walkthrough and code useful, please consider subscribing below and sharing the post!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.bucketsandbytes.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you found this content useful and would like to receive free R tutorials in your inbox, please consider subscribing below!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[2048-Style Basketball Grids in ggplot]]></title><description><![CDATA[Learn how to create a 2048-style grid in R and plot images in-line with plot titles]]></description><link>https://www.bucketsandbytes.com/p/2048-style-basketball-grids-in-ggplot</link><guid isPermaLink="false">https://www.bucketsandbytes.com/p/2048-style-basketball-grids-in-ggplot</guid><dc:creator><![CDATA[Andrew Weatherman]]></dc:creator><pubDate>Sat, 30 Dec 2023 23:18:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!FQ7P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc44a83d0-8904-438e-83f4-ccc66662bd36.heic" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Uh, is this thing on?</em> Anyways, welcome back to Buckets &amp; Bytes! Can I say that I&#8217;m &#8220;reviving&#8221; this blog if I&#8217;ve only posted once? I&#8217;ve been chronicling some of my recent visualizations on GitHub, and I thought that I might as well start this blog back up &#8212; but expect shorter posts. There&#8217;s just no way that I can sustain 3,000-word tutorials, but I hope that this will still provide some semblance of value!</p><p>Today, we will be recreating a <a href="https://x.com/CrumpledJumper/status/1740251518840996135?s=20">recent Todd Whitehead post</a>. This is a neat grid that gives a creative spin on visualizing career distributions. I am really digging the 2048-vibe and the &#8216;Scorigami&#8217; potential.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FQ7P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc44a83d0-8904-438e-83f4-ccc66662bd36.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FQ7P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc44a83d0-8904-438e-83f4-ccc66662bd36.heic 424w, https://substackcdn.com/image/fetch/$s_!FQ7P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc44a83d0-8904-438e-83f4-ccc66662bd36.heic 848w, https://substackcdn.com/image/fetch/$s_!FQ7P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc44a83d0-8904-438e-83f4-ccc66662bd36.heic 1272w, https://substackcdn.com/image/fetch/$s_!FQ7P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc44a83d0-8904-438e-83f4-ccc66662bd36.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FQ7P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc44a83d0-8904-438e-83f4-ccc66662bd36.heic" width="1456" height="1838" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c44a83d0-8904-438e-83f4-ccc66662bd36.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1838,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:145167,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FQ7P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc44a83d0-8904-438e-83f4-ccc66662bd36.heic 424w, https://substackcdn.com/image/fetch/$s_!FQ7P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc44a83d0-8904-438e-83f4-ccc66662bd36.heic 848w, https://substackcdn.com/image/fetch/$s_!FQ7P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc44a83d0-8904-438e-83f4-ccc66662bd36.heic 1272w, https://substackcdn.com/image/fetch/$s_!FQ7P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc44a83d0-8904-438e-83f4-ccc66662bd36.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let&#8217;s get started!</p><div><hr></div><h2>Getting the data</h2><p>This visualization is going to use the <strong>{<a href="https://cbbdata.aweatherman.com">cbbdata</a></strong>}<strong> </strong>R package. <a href="https://x.com/andreweatherman/status/1737261557657096220?s=20">If you missed it, I recently released cbbdata after eight months of development.</a> To use it, you need to install the package, register for an API key, and set-up your credentials in R. Everything is entirely free, and the process takes less than one minute. <a href="https://cbbdata.aweatherman.com/index.html#installation">You can learn more on the package website.</a></p><p>For this visualization, we are going to plot point and rebound combinations for Purdue&#8217;s Zach Edey &#8212; the reigning national player of the year and the odds-on favorite to repeat. You can choose a different player, and/or you can choose to plot different statistical combinations. </p><div><hr></div><p>Let&#8217;s start by loading the required libraries.</p><pre><code>library(tidyverse)
library(cbbdata)
library(ggtext)
library(nflplotR)</code></pre><p>We need to define &#8220;bins&#8221; for point and rebound totals. After which, we will create new columns that define where each total falls relative to our bins. I am using increments of five for points and increments of three for rebounds. Feel free to play around with the data and find which bins work best for you. </p><pre><code>data &lt;- cbd_torvik_player_game(player = 'Zach Edey') %&gt;%
  mutate(reb = oreb + dreb) %&gt;% # fixes one `reb` NA
  select(player, team, pts, reb) %&gt;% 
  mutate(pts_bucket = case_when(
    pts &lt; 10 ~ '&lt;10',
    pts &gt;= 10 &amp; pts &lt; 15 ~ '10-14',
    pts &gt;= 15 &amp; pts &lt; 20 ~ '15-19',
    pts &gt;= 20 &amp; pts &lt; 25 ~ '20-24',
    pts &gt;= 25 ~ '25+'
  ),
  reb_bucket = case_when(
    reb &lt; 5 ~ '&lt;5',
    reb &gt;= 5 &amp; reb &lt; 8 ~ '5-7',
    reb &gt;= 8 &amp; reb &lt; 11 ~ '8-10',
    reb &gt;= 11 &amp; reb &lt; 14 ~ '11-13',
    reb &gt;= 14 ~ '14+'
  ))</code></pre><p>We want to make sure that our plot includes point + rebound combinations that might be zero (not achieved yet). To do this, we are going to create a tibble with our bin definitions and use <strong>expand.grid</strong> to create a new object that includes all possible bin combinations.</p><pre><code>bins &lt;- tibble(
  pts_bucket = c('&lt;10', '10-14', '15-19', '20-24', '25+'),
  reb_bucket = c('&lt;5', '5-7', '8-10', '11-13', '14+')
)

bins_grid &lt;- bins %&gt;% expand.grid()</code></pre><p>Next, we need to summarize our data to count the number of games where Edey achieved each bin pair. In other words, if Edey put up 15 points and 10 rebounds in one game, that would count for an observation with a <em>15-19</em> point bin and an <em>8-10</em> rebound bin. We are also going to join on our bins data after we count pairs so that our graph will include possible combinations that have not yet been achieved.</p><p>We are going to add levels to our bucket columns so that our plot axes are in the correct order (small &#8594; large). If you don&#8217;t do this, your bins (X-Y axes) will likely be misordered when you plot them.</p><pre><code>plot_data &lt;- data %&gt;%
  count(pts_bucket, reb_bucket) %&gt;%
  full_join(bins_grid, by = c('pts_bucket', 'reb_bucket')) %&gt;%
  mutate(pts_bucket = fct(pts_bucket,
                          levels = c('&lt;10', '10-14', '15-19', 
                                     '20-24', '25+')),
         reb_bucket = fct(reb_bucket,
                          levels = c('&lt;5', '5-7', '8-10', 
                                     '11-13', '14+')),
         n = replace_na(n, 0))</code></pre><p>Finally, we want to plot a picture of Edey, so we will grab his headshot from ESPN. You can get a headshot of any player by going to their ESPN college career page, right clicking on their headshot, and selecting <em>Copy Image Address</em>.</p><pre><code>headshot = 'https://a.espncdn.com/combiner/i?img=/i/headshots/mens-college-basketball/players/full/4600663.png&amp;w=350&amp;h=254'</code></pre><h2>Plot the data</h2><p>That&#8217;s all of the data that we need! <strong>{<a href="https://cbbdata.aweatherman.com">cbbdata</a></strong>} makes accessing clean and tidy college basketball data in R a breeze. Now for the fun part: Let&#8217;s use <strong>{<a href="https://ggplot2.tidyverse.org">ggplot2</a></strong>} to create a 2048-style player grid for Zach Edey. </p><p>This plot is going to make use of the <strong>{<a href="https://nflplotr.nflverse.com">nflplotR</a></strong>} package to render in our player headshot. If you are doing anything with team logos, I highly recommend using it. While the <strong>geom_from_path</strong> function is pulled from the {<strong><a href="https://mrcaseb.github.io/ggpath/">ggpath</a>} </strong>package (same developer), <strong>{<a href="https://nflplotr.nflverse.com">nflplotR</a></strong>} does include a very handy <strong>ggpreview</strong> function that is a l<a href="https://nflplotr.nflverse.com/articles/nflplotR.html#how-about-speed-in-the-rstudio-preview-pane">ife-saver when plotting multiple team logos</a>. We will not be using that function in this code, but out of habit, I usually import <strong>{<a href="https://nflplotr.nflverse.com">nflplotR</a></strong>}.</p><h4>Custom fonts</h4><p>We will be using the Oswald font from Google Fonts. If you do not have that installed on your machine, you can <a href="https://fonts.google.com/specimen/Oswald?query=Oswald">download it for free</a>. If you are on a Windows machine, <a href="https://alexkgold.space/posts/2019-11-09-custom-fonts-in-ggplot2/">you might have to do a few more things</a> to get your font to work with <strong>{<a href="https://ggplot2.tidyverse.org">ggplot2</a></strong>}. In my experience, custom fonts on Mac OS might only require an R session restart if you just downloaded Oswald.</p><h4>Clipping and drawing outside of the panel</h4><p>We want to plot the headshot &#8220;outside&#8221; of the coordinate system, i.e. above our plot and in-line with the title, so we need to specify <strong>coord_fixed(</strong><em><strong>clip = &#8220;off&#8221;</strong></em><strong>)</strong> &#8212; which will allow plotting outside of the panel (X-Y).</p><p>And specifically, we are going to set our x and y values inside <strong>geom_from_path</strong> to <em>Inf </em>so that our headshot is forced to the top right of our plot. We are going to use hjust  and vjust to tweak the position of the headshot. If you are plotting a different player, be sure to adjust these values so that your headshot is properly positioned.</p><h4>Theme</h4><p>We adjusted a number of things in the <strong>theme</strong> function to get our grid to look just right. Most of what we did is mess around with margins to create distance between elements in our plot. If you aren&#8217;t exactly sure what each parameter does, I encourage you to change the values to see what happens!</p><div><hr></div><pre><code>plot_data %&gt;% 
  ggplot(aes(x = pts_bucket, y = reb_bucket)) +
  geom_tile(aes(fill = n), color = 'white', linewidth = 3) +
  geom_from_path(aes(path = headshot), x = Inf, y = Inf,  width = 0.25,
                 hjust = 1.05, vjust = 0.41) +
  geom_richtext(aes(label = n, color = '#2C2F2B'), size = 7,
                label.color = NA, fill = NA,
                family = 'Oswald-Medium') +
  scale_color_identity() +
  scale_fill_gradient(low = '#F2F2F2', high = '#F7565A') +
  coord_fixed(clip = 'off') +
  theme_minimal() +
  theme(legend.position = 'none',
        plot.title = element_text(family = 'Oswald-Medium',
                                  size = 20, vjust = 0),
        plot.subtitle = element_text(family = 'Oswald-Regular',
                                     color = 'grey40',
                                     size = 10, vjust = 0.5),
        plot.caption = element_markdown(family = 'Oswald-Regular',
                                        lineheight = 1.3,
                                        margin = margin(t = 20),
                                        color = 'grey40', hjust = 0,
                                        size = 8),
        plot.title.position = 'plot',
        plot.caption.position = 'plot',
        axis.text.x = element_text(family = 'Oswald-Regular',
                                   size = 10, color = 'grey40',
                                   margin = margin(t = -4, b = -4)),
        axis.text.y = element_text(family = 'Oswald-Regular',
                                   size = 10, color = 'grey40',
                                   margin = margin(l = -4, r = -4)),
        axis.title.x = element_text(family = 'Oswald-Medium',
                                    vjust = -3, size = 12),
        axis.title.y = element_text(family = 'Oswald-Medium',
                                    vjust = 3, size = 12),
        plot.margin = margin(30, 30, 30, 30),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank()) +
  labs(x = 'POINTS',
       y = 'REBOUNDS',
       title = 'Box Scorigami: Zach Edey',
       subtitle = 'No. of career games with different combinations of points + rebounds',
       caption = 'Data by cbbdata through Dec. 27, 2023&lt;br&gt;Viz.
       + Analysis by @andreweatherman&lt;br&gt;Inspiration from @CrumpledJumper')</code></pre><h4>Saving</h4><p>One more thing to note: When I was making this plot, <strong>ggsave</strong> was giving me <em>lots</em> of trouble, and I had to resort to using <strong>png</strong> to save it. If you experience similar issues, run the above code &#8212; ensuring that your graph is showing in your <em>Plots</em> pane &#8212; and use this to save it (replacing the file name as needed).</p><pre><code>png('plot.png', res = 600, width = 7, height = 7,
    bg = 'white', units = 'in')
print(p)
dev.off()</code></pre><div><hr></div><h2>Full code</h2><p>And that&#8217;s it! The 2048-tile graph doesn&#8217;t take much to make, but it&#8217;s an effective, intuitive, and clean way of displaying statistic distributions for players. This code can be used to reproduce graphs for college basketball or as a general framework with which to build similar plots for other sports. My example uses a width and height of 7 inches and a DPI of 600 when saving.</p><pre><code>library(tidyverse)
library(cbbdata)
library(ggtext)
library(nflplotR)


data &lt;- cbd_torvik_player_game(player = 'Zach Edey') %&gt;%
  mutate(reb = oreb + dreb) %&gt;% # fixes one `reb` NA
  select(player, team, pts, reb) %&gt;% 
  mutate(pts_bucket = case_when(
    pts &lt; 10 ~ '&lt;10',
    pts &gt;= 10 &amp; pts &lt; 15 ~ '10-14',
    pts &gt;= 15 &amp; pts &lt; 20 ~ '15-19',
    pts &gt;= 20 &amp; pts &lt; 25 ~ '20-24',
    pts &gt;= 25 ~ '25+'
  ),
  reb_bucket = case_when(
    reb &lt; 5 ~ '&lt;5',
    reb &gt;= 5 &amp; reb &lt; 8 ~ '5-7',
    reb &gt;= 8 &amp; reb &lt; 11 ~ '8-10',
    reb &gt;= 11 &amp; reb &lt; 14 ~ '11-13',
    reb &gt;= 14 ~ '14+'
  ))

###

bins &lt;- tibble(
  pts_bucket = c('&lt;10', '10-14', '15-19', '20-24', '25+'),
  reb_bucket = c('&lt;5', '5-7', '8-10', '11-13', '14+')
)

bins_grid &lt;- bins %&gt;% expand.grid()

###

headshot = 'https://a.espncdn.com/combiner/i?img=/i/headshots/mens-college-basketball/players/full/4600663.png&amp;w=350&amp;h=254'

plot_data &lt;- data %&gt;%
  count(pts_bucket, reb_bucket) %&gt;%
  full_join(bins_grid, by = c('pts_bucket', 'reb_bucket')) %&gt;%
  mutate(pts_bucket = fct(pts_bucket,
                          levels = c('&lt;10', '10-14', '15-19', 
                                     '20-24', '25+')),
         reb_bucket = fct(reb_bucket,
                          levels = c('&lt;5', '5-7', '8-10', 
                                     '11-13', '14+')),
         n = replace_na(n, 0))

###

plot_data %&gt;% 
  ggplot(aes(x = pts_bucket, y = reb_bucket)) +
  geom_tile(aes(fill = n), color = 'white', linewidth = 3) +
  geom_from_path(aes(path = headshot), x = Inf, y = Inf,  width = 0.25,
                 hjust = 1.05, vjust = 0.41) +
  geom_richtext(aes(label = n, color = '#2C2F2B'), size = 7,
                label.color = NA, fill = NA,
                family = 'Oswald-Medium') +
  scale_color_identity() +
  scale_fill_gradient(low = '#F2F2F2', high = '#F7565A') +
  coord_fixed(clip = 'off') +
  theme_minimal() +
  theme(legend.position = 'none',
        plot.title = element_text(family = 'Oswald-Medium',
                                  size = 20, vjust = 0),
        plot.subtitle = element_text(family = 'Oswald-Regular',
                                     color = 'grey40',
                                     size = 10, vjust = 0.5),
        plot.caption = element_markdown(family = 'Oswald-Regular',
                                        lineheight = 1.3,
                                        margin = margin(t = 20),
                                        color = 'grey40', hjust = 0,
                                        size = 8),
        plot.title.position = 'plot',
        plot.caption.position = 'plot',
        axis.text.x = element_text(family = 'Oswald-Regular',
                                   size = 10, color = 'grey40',
                                   margin = margin(t = -4, b = -4)),
        axis.text.y = element_text(family = 'Oswald-Regular',
                                   size = 10, color = 'grey40',
                                   margin = margin(l = -4, r = -4)),
        axis.title.x = element_text(family = 'Oswald-Medium',
                                    vjust = -3, size = 12),
        axis.title.y = element_text(family = 'Oswald-Medium',
                                    vjust = 3, size = 12),
        plot.margin = margin(30, 30, 30, 30),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank()) +
  labs(x = 'POINTS',
       y = 'REBOUNDS',
       title = 'Box Scorigami: Zach Edey',
       subtitle = 'No. of career games with different combinations of points + rebounds',
       caption = 'Data by cbbdata through Dec. 27, 2023&lt;br&gt;Viz.
       + Analysis by @andreweatherman&lt;br&gt;Inspiration from @CrumpledJumper')</code></pre><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.bucketsandbytes.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you found this content useful and would like to receive free R tutorials in your inbox, please consider subscribing below!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[R Tutorial: Performance Against AP T-25 Expectation]]></title><description><![CDATA[Learning how to estimate performance against AP T-25 expectation in college basketball using R and open-source data]]></description><link>https://www.bucketsandbytes.com/p/most-deserving</link><guid isPermaLink="false">https://www.bucketsandbytes.com/p/most-deserving</guid><dc:creator><![CDATA[Andrew Weatherman]]></dc:creator><pubDate>Sat, 21 Oct 2023 15:00:36 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!DO0J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6c0cf3c-12e3-4e83-889c-aeaf03500961_977x1682.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the first installment of <em>Buckets &amp; Bytes</em>! Today, we&#8217;ll be learning how to use open-source data to answer the question:</p><div class="pullquote"><p><strong>How many games would the average AP Top 25 team be expected to win against the schedule of each ranked team?</strong></p></div><h4>What does this mean?</h4><p>We will use Barttorvik data (T-Rank; a leading college basketball metric) to compute <em>venue-adjusted </em>win totals for each AP Preseason Top 25 team and then compare to how the <em>average</em> ranked team would be expected to perform against the exact same schedule. We will walk through the process of how to visualize our results in the table below, all using R and open-source data!</p><p><strong>This code will work throughout the season, so keep it handy if you want to create similar graphics as the season progresses!</strong> </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DO0J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6c0cf3c-12e3-4e83-889c-aeaf03500961_977x1682.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DO0J!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6c0cf3c-12e3-4e83-889c-aeaf03500961_977x1682.jpeg 424w, https://substackcdn.com/image/fetch/$s_!DO0J!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6c0cf3c-12e3-4e83-889c-aeaf03500961_977x1682.jpeg 848w, https://substackcdn.com/image/fetch/$s_!DO0J!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6c0cf3c-12e3-4e83-889c-aeaf03500961_977x1682.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!DO0J!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6c0cf3c-12e3-4e83-889c-aeaf03500961_977x1682.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DO0J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6c0cf3c-12e3-4e83-889c-aeaf03500961_977x1682.jpeg" width="977" height="1682" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f6c0cf3c-12e3-4e83-889c-aeaf03500961_977x1682.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1682,&quot;width&quot;:977,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!DO0J!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6c0cf3c-12e3-4e83-889c-aeaf03500961_977x1682.jpeg 424w, https://substackcdn.com/image/fetch/$s_!DO0J!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6c0cf3c-12e3-4e83-889c-aeaf03500961_977x1682.jpeg 848w, https://substackcdn.com/image/fetch/$s_!DO0J!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6c0cf3c-12e3-4e83-889c-aeaf03500961_977x1682.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!DO0J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6c0cf3c-12e3-4e83-889c-aeaf03500961_977x1682.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The final product of today&#8217;s blog</figcaption></figure></div><h3>How do I use <em>Buckets &amp; Bytes</em></h3><p><em>Buckets &amp; Bytes</em> aims to provide a comprehensive code walkthrough, from start to finish, using engaging data. For some, the code discussion might not be necessary. For others, I hope you find some value in this tutorial. Our walkthrough is spread across three sections: <strong>Extraction</strong>, <strong>analysis</strong>, and <strong>visualization</strong>. </p><blockquote><p><em>If you are just interested in the full source code, that can be found <a href="https://gist.github.com/andreweatherman/a79d12e3e29b4b60145f8773a6f9c5ea">here</a> and at the bottom of the article.</em></p></blockquote><h3>What do I need to know?</h3><p>Each post in this series will assume different levels of R and analytics knowledge. Today&#8217;s code will use rvest, dplyr, purrr, and gt &#8212; but you should have a foundational understanding of R and the tidyverse. I will provide detailed explanations at times but might gloss over true beginner concepts. </p><h2>Part 1: Data Extraction</h2><p>You can run this code to install and load all necessary packages.</p><pre><code>needed_packages &lt;- c('tidyverse', 'rvest', 'gt', 'gtExtras', 'withr')
installed_packages &lt;- needed_packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) {
  install.packages(packages[!installed_packages])
}

invisible(lapply(needed_packages, library, character.only = TRUE))</code></pre><h3>Grabbing the season schedule</h3><p>The first thing we must do is grab the season schedule. Luckily, Barttorvik has this easily accessible in .CSV form &#8212; so we can simply pass that url to the <strong>read_csv</strong> function. Importantly, the loaded CSV has <em>no</em> pre-defined column headers, so to ensure that R does not infer the first row to be the column headers, causing us to lose one game, we need to specify `col_names = FALSE` and manually set our own headers. Once we do that, let&#8217;s simplify some things and only choose the needed columns (neutral, home, and away).</p><blockquote><p><strong>Important: </strong>If you are on Windows, you will need to run this line <em>before</em> you try to scrape the website to avoid 403 errors. For some reason, his site blocks the User-Agent of Windows machines.</p><p><strong>withr::local_options(HTTPUserAgent='Buckets &amp; Bytes')</strong></p></blockquote><pre><code>withr::local_options(HTTPUserAgent='Buckets &amp; Bytes')
schedule &lt;- read_csv("https://barttorvik.com/2024_master_sked.csv", col_names = FALSE) %&gt;%
  setNames(c('game_id', 'game_date', 'game_type', 'neutral', 'away', 'home')) %&gt;%
  select(neutral, home, away)</code></pre><p>Our `schedule` data should look like this. `neutral` is essentially a boolean column, with 1 indicating a neutral-site game.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Mgel!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad70efc7-0214-48d1-9e9b-05a28823508c_790x310.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Mgel!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad70efc7-0214-48d1-9e9b-05a28823508c_790x310.png 424w, https://substackcdn.com/image/fetch/$s_!Mgel!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad70efc7-0214-48d1-9e9b-05a28823508c_790x310.png 848w, https://substackcdn.com/image/fetch/$s_!Mgel!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad70efc7-0214-48d1-9e9b-05a28823508c_790x310.png 1272w, https://substackcdn.com/image/fetch/$s_!Mgel!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad70efc7-0214-48d1-9e9b-05a28823508c_790x310.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Mgel!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad70efc7-0214-48d1-9e9b-05a28823508c_790x310.png" width="790" height="310" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ad70efc7-0214-48d1-9e9b-05a28823508c_790x310.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:310,&quot;width&quot;:790,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:63669,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Mgel!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad70efc7-0214-48d1-9e9b-05a28823508c_790x310.png 424w, https://substackcdn.com/image/fetch/$s_!Mgel!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad70efc7-0214-48d1-9e9b-05a28823508c_790x310.png 848w, https://substackcdn.com/image/fetch/$s_!Mgel!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad70efc7-0214-48d1-9e9b-05a28823508c_790x310.png 1272w, https://substackcdn.com/image/fetch/$s_!Mgel!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad70efc7-0214-48d1-9e9b-05a28823508c_790x310.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">head(schedule)</figcaption></figure></div><h3>Scraping preseason team ratings</h3><p>Next up: Web scraping! Now that we have our season schedule, we need to have some reference point for team strength: For this visualization, we are going to stick with Barttorvik and use his <em>T-Rank</em> metric rating. Lucky for us, this metric is easily accessible on the home page!</p><p><a href="https://barttorvik.com">Navigating to it</a>, we can see that the data is nicely visualized inside an HTML table. If you are new to web scraping in R, a <em>general</em> rule of thumb is that if data is contained inside a static HTML table, the <strong>html_table()</strong> function in `rvest` should be able to retrieve it. Let&#8217;s check:</p><blockquote><p>I have run into a few issues in the past scraping his website on Windows. If you run into connection errors, you can pull in the data using this code. You will then be able to run <strong>across()</strong> on it below. There&#8217;s nothing we can do about connection errors.</p><p><strong>read_csv('https://gist.github.com/andreweatherman/f7c5e850a88f22b0577599fbc9d26da9/raw/928588a1afbddf1c49266cf3b56892be6682793f/bytes_ratings_connection_error.csv')</strong></p></blockquote><pre><code># withr::local_options(HTTPUserAgent='Buckets &amp; Bytes')
read_html('https://barttorvik.com/trankpre.php') %&gt;%
   html_table()</code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QVC9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d72df9-b14c-4b30-91e7-c3f9f2f5fd6b_1454x386.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QVC9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d72df9-b14c-4b30-91e7-c3f9f2f5fd6b_1454x386.png 424w, https://substackcdn.com/image/fetch/$s_!QVC9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d72df9-b14c-4b30-91e7-c3f9f2f5fd6b_1454x386.png 848w, https://substackcdn.com/image/fetch/$s_!QVC9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d72df9-b14c-4b30-91e7-c3f9f2f5fd6b_1454x386.png 1272w, https://substackcdn.com/image/fetch/$s_!QVC9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d72df9-b14c-4b30-91e7-c3f9f2f5fd6b_1454x386.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QVC9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d72df9-b14c-4b30-91e7-c3f9f2f5fd6b_1454x386.png" width="1454" height="386" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/51d72df9-b14c-4b30-91e7-c3f9f2f5fd6b_1454x386.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:386,&quot;width&quot;:1454,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:293263,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QVC9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d72df9-b14c-4b30-91e7-c3f9f2f5fd6b_1454x386.png 424w, https://substackcdn.com/image/fetch/$s_!QVC9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d72df9-b14c-4b30-91e7-c3f9f2f5fd6b_1454x386.png 848w, https://substackcdn.com/image/fetch/$s_!QVC9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d72df9-b14c-4b30-91e7-c3f9f2f5fd6b_1454x386.png 1272w, https://substackcdn.com/image/fetch/$s_!QVC9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d72df9-b14c-4b30-91e7-c3f9f2f5fd6b_1454x386.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">result from code above</figcaption></figure></div><p>When we run that code, a tibble with all data is properly returned &#8212; awesome! What you might notice, however, is that our column headers are a bit inconvenient &#8212; e.g. `Proj. Rec`, `Ret Mins`, etc. We won&#8217;t be touching any of these columns, but out of habit, I always run the <strong>clean_names()</strong> function from the `janitor` package to quickly tidy headers. Again, that line isn&#8217;t <em>necessary </em>for our analysis &#8212; it just renames column headers with a more friendly syntax &#8212; but my code will assume you run it.</p><p>If you look closely at our data, you might notice that all columns are character types. Since we will be running quantitative analysis, we must covert a few columns to the <em>numeric</em> type. Specifically, we want `adj_oe` and `adj_de` to be numbers. <strong>(These represent the estimated points allowed/scored per 100 possessions, adjusted for team and opponent strength.)</strong></p><p>We can use the `mutate` function from the `dplyr` package (part of the `tidyverse`) to essentially &#8220;modify&#8221; both columns with <strong>as.numeric()</strong>. Using <strong>mutate()</strong>, there are two ways to approach this problem. The first is to iterate over every column, one variable at a time; the second, and a more rigorous option, is to use <strong>across()</strong> to apply identical transformations to a range of columns at once. Looking back on my early days of self-teaching R, I wish that I had grasped the concept of <strong>across()</strong> sooner: The syntax might look strange at first, but trust me, it will save you loads of typing!</p><p><strong>across()</strong> requires two arguments &#8212; the range of columns and the function(s) to apply. Diving deep into <strong>across()</strong> is out of scope for this blog entry, but let&#8217;s see how it works with our data. (`.x` is simply telling <strong>as.numeric()</strong> to run on the two referenced columns.)</p><pre><code>ratings &lt;- read_html('https://barttorvik.com/trankpre.php') %&gt;%
   html_table() %&gt;%
   pluck(1) %&gt;%
   janitor::clean_names() %&gt;%
   <strong>mutate(across(c(adj_oe, adj_de), ~ as.numeric(.x))) </strong>%&gt;%
   select(team, adj_oe, adj_de)

# if you run into connection errors, replace the code from read_html to clean_names with the ratings data in the link above</code></pre><p><strong>A brief aside about the use of `pluck` above: html_table()</strong> returns a <em>list</em>. To access our tibble and perform any sort of  analysis or visualization, we need to &#8220;retrieve&#8221; our data from the list (well, <em>not really</em> but only Sickos would reference an index position throughout this code). <strong>pluck()</strong> is a simple way to index within a list and &#8220;pluck&#8221; out the desired element. In this case, our data is just at index position 1.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jcOl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa35c390e-5fe6-4ed1-80e3-e4ae476d8a0a_446x304.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jcOl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa35c390e-5fe6-4ed1-80e3-e4ae476d8a0a_446x304.png 424w, https://substackcdn.com/image/fetch/$s_!jcOl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa35c390e-5fe6-4ed1-80e3-e4ae476d8a0a_446x304.png 848w, https://substackcdn.com/image/fetch/$s_!jcOl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa35c390e-5fe6-4ed1-80e3-e4ae476d8a0a_446x304.png 1272w, https://substackcdn.com/image/fetch/$s_!jcOl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa35c390e-5fe6-4ed1-80e3-e4ae476d8a0a_446x304.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jcOl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa35c390e-5fe6-4ed1-80e3-e4ae476d8a0a_446x304.png" width="446" height="304" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a35c390e-5fe6-4ed1-80e3-e4ae476d8a0a_446x304.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:304,&quot;width&quot;:446,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:49149,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jcOl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa35c390e-5fe6-4ed1-80e3-e4ae476d8a0a_446x304.png 424w, https://substackcdn.com/image/fetch/$s_!jcOl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa35c390e-5fe6-4ed1-80e3-e4ae476d8a0a_446x304.png 848w, https://substackcdn.com/image/fetch/$s_!jcOl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa35c390e-5fe6-4ed1-80e3-e4ae476d8a0a_446x304.png 1272w, https://substackcdn.com/image/fetch/$s_!jcOl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa35c390e-5fe6-4ed1-80e3-e4ae476d8a0a_446x304.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">head(ratings)</figcaption></figure></div><h3>A tibble for the preseason top 25 teams</h3><p>Necessary for this analysis, of course, is data for the Associated Press&#8217; &#8216;Preseason Top 25&#8217; poll. Now, you <em>could </em>scrape this information from the poll website itself, but to be honest, it might be quicker to hard-encode the data yourself. (Importantly, we will need team names to match later on, so I decided to manually write the data.)</p><pre><code>ap_top_25 &lt;- tibble(
  rank = 1:25,
  team = c('Kansas', 'Duke', 'Purdue', 'Michigan St.', 'Marquette', 'Connecticut', 'Houston', 'Creighton', 'Tennessee',
           'Florida Atlantic', 'Gonzaga', 'Arizona', 'Miami FL', 'Arkansas', 'Texas A&amp;M', 'Kentucky', 'San Diego St.', 'Texas',
           'North Carolina', 'Baylor', 'USC', 'Villanova', "Saint Mary's", 'Alabama', 'Illinois')
)</code></pre><h3>Join on T-Rank ratings</h3><p>Now that we have the preseason ranked teams, we can use our `ratings` table to &#8220;join&#8221; or &#8220;merge&#8221; with the `ap_top_25` data. Essentially, we want the `adj_oe` and `adj_de` of each ranked team. There are a few ways to approach this, and we are going to use the `left_join` function.</p><p><strong>left_join()</strong> adds the columns from &#8220;y,&#8221; the second data frame, to &#8220;x,&#8221; the first data frame listed, based on matching keys. Importantly, <strong>left_join()</strong> ensures that all observations in &#8220;x&#8221; are kept, regardless of whether a match was found in &#8220;y.&#8221; For our use, it helps identify whether we misspelt a team in the manual tibble above.</p><p>At first, joins can be difficult to understand. Below is how we are structuring our code. We're combining the `ap_top_25` dataset with the `ratings` data, matching by team names. This means each team in the `ap_top_25` frame will now be supplemented with <em>their</em> `adj_oe` and `adj_de` values from the `ratings` table. </p><pre><code>ap_top_25 &lt;- left_join(ap_top_25, ratings, by = 'team')</code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Hyb0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa515c4c1-78f9-4650-8d42-24b2ebaa4a43_564x310.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Hyb0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa515c4c1-78f9-4650-8d42-24b2ebaa4a43_564x310.png 424w, https://substackcdn.com/image/fetch/$s_!Hyb0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa515c4c1-78f9-4650-8d42-24b2ebaa4a43_564x310.png 848w, https://substackcdn.com/image/fetch/$s_!Hyb0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa515c4c1-78f9-4650-8d42-24b2ebaa4a43_564x310.png 1272w, https://substackcdn.com/image/fetch/$s_!Hyb0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa515c4c1-78f9-4650-8d42-24b2ebaa4a43_564x310.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Hyb0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa515c4c1-78f9-4650-8d42-24b2ebaa4a43_564x310.png" width="564" height="310" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a515c4c1-78f9-4650-8d42-24b2ebaa4a43_564x310.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:310,&quot;width&quot;:564,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:52307,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Hyb0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa515c4c1-78f9-4650-8d42-24b2ebaa4a43_564x310.png 424w, https://substackcdn.com/image/fetch/$s_!Hyb0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa515c4c1-78f9-4650-8d42-24b2ebaa4a43_564x310.png 848w, https://substackcdn.com/image/fetch/$s_!Hyb0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa515c4c1-78f9-4650-8d42-24b2ebaa4a43_564x310.png 1272w, https://substackcdn.com/image/fetch/$s_!Hyb0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa515c4c1-78f9-4650-8d42-24b2ebaa4a43_564x310.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">head(ap_top_25)</figcaption></figure></div><h2>What have we done in Part 1?</h2><p>Part 1 is all about <strong>data extraction</strong> with a splash of cleaning. Let&#8217;s break down what we&#8217;ve done.</p><ol><li><p>We used <strong>read_csv() </strong>to access a .CSV file of the 2023-24 season schedule on the barttorvik website. We renamed and selected the necessary columns.</p></li><li><p>We used <strong>read_html() </strong>from the `rvest` package to scrape the ratings table on the barttorvik website. We introduced the <strong>clean_names() </strong>function from `janitor` and touched on how to use <strong>across() </strong>to apply the same function to a range of columns.</p></li><li><p>We manually encoded data for the preseason AP Top 25.</p></li><li><p>We used <strong>left_join() </strong>to <em>join</em> the ratings data with the AP Top 25 data.</p></li></ol><h2>Part 2: Data Analysis</h2><p>That&#8217;s all the data we need! Now for the next part: Analysis! Remember that we are trying to calculate the difference in expected wins between <em>each</em> AP Top 25 team and the <em>average </em>AP Top 25 team. </p><h3>Function Building</h3><p>For our analysis, we are going to write a <em>general function</em> that adds all required information and then simply loop over it for each team. Writing functions is a good way to clean up your code and make your work reproducible. </p><p><strong>As a note:</strong> When &#8220;team&#8221; is referenced in the explanations below, keep in mind that we are building a <em>function</em>. In other words, `team` is an argument that will be filled by whichever school we insert into the function. It could be Duke or Kansas or Purdue, etc.</p><h4>Game location + opponent</h4><pre><code>parse_wp &lt;- function(team) {
  
  schedule %&gt;%
    filter(home == team | away == team) %&gt;%

    mutate(
      # add game location
      game_location = case_when(
      home == team &amp; neutral == 0 ~ 'home',
      away == team &amp; neutral == 0 ~ 'away',
      .default = 'neutral'
    ),

    team = team,

    opponent = if_else(team == home, away, home),

    avg_ap_adj_oe = mean(ap_top_25$adj_oe),
    avg_ap_adj_de = mean(ap_top_25$adj_de))

   <strong># FUNCTION LOGIC WILL CONTINUE BELOW...</strong>
  
}</code></pre><p>Importantly, we want to adjust for venue, but our schedule data frame doesn&#8217;t have any explicit &#8220;game location&#8221; column, so we must write that logic ourselves. For this, we can use a `case_when` statement inside `mutate` &#8212; which you can think of as chaining together multiple if/else statements. The logic is simple: If the game is <em>not </em>at a neutral site (`neutral == 0`) and the team is listed under the `home` column, it is a home game. This is reversed for road trips. If neither condition is matched (i.e. `neutral == 1`), then <strong>case_when()</strong> will return the `.default` statement, which we have coded as &#8216;neutral&#8217;. </p><p>We will also create a column to indicate the opponent for each game using an `if/else` statement. Lastly, let&#8217;s go ahead and on the average AP Top 25 team&#8217;s offensive and defensive efficiency. We will use this in a later section.</p><h4>Team and opponent ratings</h4><p>Now let&#8217;s add the ratings that we previously scraped for both the team and their opponents. There are multiple ways to &#8220;merge&#8221; data in R, but to keep things consistent, we are again going to fall back on <strong>left_join()</strong>. Adding on team ratings is fairly straightforward. </p><p>For opponent ratings, however, we need to rename columns to specify that they indicate <em>opponent</em> ratings. We can quickly do this using <strong>select()</strong> with the form `new_column_name = old_column_name` (e.g. `<em>opp</em>_adj_oe = adj_oe`). If you&#8217;ve used SQL, you might notice that this runs parallel to <em>SELECT old_name AS new_name</em>.</p><h5>Equality Joins</h5><p>Before we can run this join, however, we need to reference the &#8220;keys&#8221; (or columns) to match on. If you look at the column names of `ratings` with `names(ratings)`, you see that we have `team`, `adj_oe`, and `adj_de`. If we do not specify the key to merge on, `left_join` will pick the only matching one &#8212; `team`. But these are <em>opponent</em> ratings, not the `team` rating, so we need to merge with the `opponent` column that we created in the last step.</p><p>Merging on keys with different names is called an &#8220;equality&#8221; join, and `dplyr` makes this very simple with the `join_by` function: just supply the column names and separate with `==`. <a href="https://r4ds.hadley.nz/joins.html#sec-mutating-joins">If you want to learn more about joins, you can read more here!</a></p><p><strong>You might notice that we could have just renamed the `team` column to `opponent` in the `select` step; in practice, this is an easier implementation, but I wanted to demonstrate a use of `join_by`.</strong></p><pre><code>parse_wp &lt;- function(team) {
  
   <strong># ... PREVIOUS CODE ...
</strong>
    left_join(ratings %&gt;% select(team, opp_adj_oe = adj_oe, opp_adj_de = adj_de), <strong>join_by('opponent' == 'team')</strong>) %&gt;%
    
    left_join(ap_top_25 %&gt;% select(-rank), by = 'team')

   <strong># ... FUNCTION LOGIC WILL CONTINUE BELOW...</strong>
  
}</code></pre><h4>Venue-adjust the ratings</h4><p>Now that we have ratings and game location, let&#8217;s adjust efficiencies for each game. Barttorvik uses a <strong>1.3% adjustment constant</strong>. For home games, offensive efficiency is multiplied by 1.013 and defensive efficiency is multiplied by 0.987. For away games, offensive efficiency is multiplied by 0.987 and defensive efficiency is multiplied by 1.013. For neutral site games, no adjustment is made. (Remember that defensive efficiency is points <em>allowed</em> per 100 possessions, so a lower value is better.)</p><h5>Helper Function</h5><p>To make our code cleaner, let&#8217;s write a &#8220;helper function&#8221; that we can apply to our data. There are ways to make this function shorter, but in effort to keep use consistent, we will stick with <strong>case_when()</strong>.</p><pre><code>adjust_efficiency &lt;- function(df) {
  adjusted &lt;- df %&gt;%
    mutate(
      <strong># off. ratings for team and AP average (NOT opponent)</strong>
      across(ends_with("oe") &amp; !starts_with("opp"), 
             ~ case_when(
               game_location == "home" ~ . * 1.013,
               game_location == "away" ~ . * 0.987,
               .default = . # no change for neutral
             )),
      <strong># def. ratings for team and AP average (NOT opponent)</strong>
      across(ends_with("de") &amp; !starts_with("opp"), 
             ~ case_when(
               game_location == "home" ~ . * 0.987,
               game_location == "away" ~ . * 1.013,
               .default = .
             )),
      <strong># off. ratings for opponent (game location is switched!)</strong>
      across(opp_adj_oe, 
             ~ case_when(
               game_location == "home" ~ . * 0.987, 
               game_location == "away" ~ . * 1.013,
               .default = .
             )),
      <strong># def. ratings for opponent (game location is switched!)</strong>
      across(opp_adj_de, 
             ~ case_when(
               game_location == "home" ~ . * 1.013,
               game_location == "away" ~ . * 0.987,
               .default = .
             ))
    )
  
  return(adjusted)
}</code></pre><p>Let&#8217;s briefly explore the logic: Again, we are using <strong>across()</strong>, but we are introducing the <strong>ends_with()</strong> and <strong>starts_with()</strong> functions. They <em>select</em> columns based on some suffix or prefix. Because our columns follow a naming standard, we can use those functions here to avoid typing out all columns. For a quick refresh on our column names at this point in the (<em>larger</em>) function: </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7zLp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35b5d2ab-8be7-41b0-9899-452985571648_1092x126.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7zLp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35b5d2ab-8be7-41b0-9899-452985571648_1092x126.png 424w, https://substackcdn.com/image/fetch/$s_!7zLp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35b5d2ab-8be7-41b0-9899-452985571648_1092x126.png 848w, https://substackcdn.com/image/fetch/$s_!7zLp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35b5d2ab-8be7-41b0-9899-452985571648_1092x126.png 1272w, https://substackcdn.com/image/fetch/$s_!7zLp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35b5d2ab-8be7-41b0-9899-452985571648_1092x126.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7zLp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35b5d2ab-8be7-41b0-9899-452985571648_1092x126.png" width="1092" height="126" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/35b5d2ab-8be7-41b0-9899-452985571648_1092x126.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:126,&quot;width&quot;:1092,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:38485,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7zLp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35b5d2ab-8be7-41b0-9899-452985571648_1092x126.png 424w, https://substackcdn.com/image/fetch/$s_!7zLp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35b5d2ab-8be7-41b0-9899-452985571648_1092x126.png 848w, https://substackcdn.com/image/fetch/$s_!7zLp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35b5d2ab-8be7-41b0-9899-452985571648_1092x126.png 1272w, https://substackcdn.com/image/fetch/$s_!7zLp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35b5d2ab-8be7-41b0-9899-452985571648_1092x126.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>We are iterating over every column that <em>ends with</em> &#8220;oe,&#8221; which are our offensive efficiency columns, and those which <em>end with</em> &#8220;de,&#8221; our defensive efficiency columns. For example, this is the same thing typing `across(c(&#8216;avg_opp_adj_<strong>oe</strong>&#8217;, &#8216;adj<strong>_oe</strong>&#8217;, &#8216;opp_adj_<strong>oe</strong>&#8217;))` but less typing!</p><p>Importantly, remember that our `game_location` column is <strong>in reference to the team</strong>, so a home game is an <em>away </em>one for the opponent and should be treated as such. Because of this, we need to <em>not</em> select the opponent columns &#8212; so we use `!starts_with(&#8216;opp&#8217;)`, which removes all matching columns that <em>start with</em> &#8216;opp&#8217;.</p><p>To adjust opponent columns, we simply iterate over the opponent ratings columns using the same logic but switch our adjustment constant!</p><p>To apply the helper function, we just add it to our piping chain.</p><pre><code>parse_wp &lt;- function(team) {
  
   <strong># ... PREVIOUS CODE ...</strong>

    adjust_efficiency()

   <strong># ... FUNCTION LOGIC WILL CONTINUE BELOW...</strong>
  
}</code></pre><h4>Calculate per-game winning percentage</h4><h5>Pythag Metric</h5><p>Now that we have our location-adjusted efficiencies, we are one step closer towards calculating win percentages. Barttorvik uses the Pythagorean expectation to estimate the win probability for a given game &#8212; <a href="https://en.wikipedia.org/wiki/Pythagorean_expectation">a formula first created by Bill James for baseball.</a> For our use, we need to calculate each team&#8217;s `pythag` with their location-adjusted efficiencies. The formula is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\frac{\\text{Adj. OE}^{11.5}}{\\text{Adj. OE}^{11.5} + \\text{Adj. DE}^{11.5}}\n&quot;,&quot;id&quot;:&quot;QYZEMYYZIE&quot;}" data-component-name="LatexBlockToDOM"></div><p>We can just throw this into a <strong>mutate()</strong> step!</p><pre><code>parse_wp &lt;- function(team) {
  
   <strong># ... PREVIOUS CODE ...</strong>

    mutate(team_pythag = (adj_oe^11.5) / (adj_oe^11.5 + adj_de^11.5),
      avg_ap_pythag = (avg_ap_adj_oe^11.5) / (avg_ap_adj_oe^11.5 + avg_ap_adj_de^11.5),
      opp_pythag = (opp_adj_oe^11.5) / (opp_adj_oe^11.5 + opp_adj_de^11.5))

   <strong># ... FUNCTION LOGIC WILL CONTINUE BELOW...</strong>
  
}</code></pre><h5>Win probability</h5><p>Now that we have &#8216;pythag&#8217; values, we can calculate win probability for every game. The formula that Barttorvik uses is below and directly uses the values that we just appended.</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\frac{team_{\\text{py}} - team_{\\text{py}} \\cdot opp_{\\text{py}}}{team_{\\text{py}} + opp_{\\text{py}} - 2 \\cdot team_{\\text{py}} \\cdot opp_{\\text{py}}}\n&quot;,&quot;id&quot;:&quot;YRZPTYGJNO&quot;}" data-component-name="LatexBlockToDOM"></div><p>Again, we can just throw this in <strong>mutate()</strong>. Importantly, we will only calculate win probabilities for each team and the average AP team; <em>we do not care about opponent win probabilities</em>. At this point, we have 17 columns in our data frame, but we only need three, so we will select the pertinent ones.</p><pre><code>parse_wp &lt;- function(team) {
  
   <strong># ... PREVIOUS CODE ...</strong>

     team_wp = (team_pythag - team_pythag * opp_pythag) / (team_pythag + opp_pythag - 2 * team_pythag * opp_pythag),
    ap_wp = (avg_ap_pythag - avg_ap_pythag * opp_pythag) / (avg_ap_pythag + opp_pythag - 2 * avg_ap_pythag * opp_pythag)) %&gt;%
    select(team, team_wp, ap_wp)

   <strong># ... FUNCTION LOGIC WILL CONTINUE BELOW...</strong>
  
}</code></pre><h4>Our full function</h4><p>Great &#8212; now our function is finished. Let&#8217;s put it all together:</p><pre><code>parse_wp &lt;- function(team) {
  
  win_percentages &lt;- schedule %&gt;%
    filter(home == team | away == team) %&gt;%
    mutate(
      game_location = case_when(
      home == team &amp; neutral == 0 ~ 'home',
      away == team &amp; neutral == 0 ~ 'away',
      .default = 'neutral'
    ),
    team = team,
    opponent = if_else(team == home, away, home),
    avg_ap_adj_oe = mean(ap_top_25$adj_oe),
    avg_ap_adj_de = mean(ap_top_25$adj_de)) %&gt;%
    left_join(ratings %&gt;% select(team, opp_adj_oe = adj_oe, opp_adj_de = adj_de), join_by('opponent' == 'team')) %&gt;%
    left_join(ap_top_25 %&gt;% select(-rank), by = 'team') %&gt;%
    adjust_efficiency() %&gt;%
    mutate(
      team_pythag = (adj_oe^11.5) / (adj_oe^11.5 + adj_de^11.5),
      avg_ap_pythag = (avg_ap_adj_oe^11.5) / (avg_ap_adj_oe^11.5 + avg_ap_adj_de^11.5),
      opp_pythag = (opp_adj_oe^11.5) / (opp_adj_oe^11.5 + opp_adj_de^11.5),
      team_wp = (team_pythag - team_pythag * opp_pythag) / (team_pythag + opp_pythag - 2 * team_pythag * opp_pythag),
      ap_wp = (avg_ap_pythag - avg_ap_pythag * opp_pythag) / (avg_ap_pythag + opp_pythag - 2 * avg_ap_pythag * opp_pythag)
    ) %&gt;%
    select(team, team_wp, ap_wp)
  
  return(win_percentages)
  
}</code></pre><p>If you&#8217;ve followed along to this point and have questions about what specific things do in the function, <a href="https://twitter.com/andreweatherman">please reach out to me on Twitter</a>! This is the first post in the series, so I am still figuring out what might be &#8220;too much&#8221; &#8212; or not enough! &#8212; explanation.</p><h3>Loop over the function</h3><p>Now that we have our function, we can &#8220;loop over it&#8221; to get data for each ranked team. For full transparency, I hate for-loops. I think they are unintuitive, clunky, and need precise writing to ensure optimization. (They aren&#8217;t necessarily bad to use, though!) Lucky for us, we can lean on the `map` family of functions in `purrr`. These mapping functions essentially apply a function(s) to each element in a vector or list. If you want to learn more about `purrr` &#8212; and I <em>highly</em> recommend that you do &#8212; <a href="https://www.youtube.com/watch?v=EGAs7zuRutY">check out this great introduction by package maintainer, and R GOAT, Hadley Wickham</a>.</p><p>For our use, we are going to use <strong>map_dfr()</strong>. This function will iterate over the input vector (in this case, the teams in our `ap_top_25` data frame) and <em>bind the rows</em> of the result. In other words, it will output a single tibble with all data! </p><pre><code>game_preds &lt;- <strong>map_dfr</strong>(ap_top_25$team, \(team) parse_wp(team))</code></pre><p>Let&#8217;s walk through our code. <strong>map_dfr()</strong> takes a single input (map<strong>2</strong>_dfr will take two and <strong>p</strong>map_dfr will take unlimited as a list). Our first argument is our <em>input</em>, which we will feed into our second argument (our function). Remember that we want to iterate over all preseason AP top 25 teams, and those are listed in the `team` column in `ap_top_25` frame &#8212; so we will pull those out using `ap_top_25$team`.</p><p>Let&#8217;s explore that &#8220;weird&#8221; syntax in the second argument. `\(team)` is essentially saying, &#8220;<strong>We want to reference our input vector as `team`.</strong>&#8221; We could have put basically any other thing here as our reference name &#8212; &#8220;school,&#8221; &#8220;program,&#8221; &#8220;university,&#8221; etc. All this is doing is telling our function what to expect our input vector to be referred to as in the function. After that, we simply call our `parse_wp` function and pass our input vector &#8212; named `team` &#8212; to the function and run it! </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FRIe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07982320-80de-42a6-8e4b-3bfe0d898c98_384x308.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FRIe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07982320-80de-42a6-8e4b-3bfe0d898c98_384x308.png 424w, https://substackcdn.com/image/fetch/$s_!FRIe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07982320-80de-42a6-8e4b-3bfe0d898c98_384x308.png 848w, https://substackcdn.com/image/fetch/$s_!FRIe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07982320-80de-42a6-8e4b-3bfe0d898c98_384x308.png 1272w, https://substackcdn.com/image/fetch/$s_!FRIe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07982320-80de-42a6-8e4b-3bfe0d898c98_384x308.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FRIe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07982320-80de-42a6-8e4b-3bfe0d898c98_384x308.png" width="384" height="308" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/07982320-80de-42a6-8e4b-3bfe0d898c98_384x308.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:308,&quot;width&quot;:384,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:50513,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FRIe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07982320-80de-42a6-8e4b-3bfe0d898c98_384x308.png 424w, https://substackcdn.com/image/fetch/$s_!FRIe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07982320-80de-42a6-8e4b-3bfe0d898c98_384x308.png 848w, https://substackcdn.com/image/fetch/$s_!FRIe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07982320-80de-42a6-8e4b-3bfe0d898c98_384x308.png 1272w, https://substackcdn.com/image/fetch/$s_!FRIe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07982320-80de-42a6-8e4b-3bfe0d898c98_384x308.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">head(game_preds)</figcaption></figure></div><h3>Summarize the results</h3><p>Now that we have our game predictions, we can quickly add up the expected wins using <strong>summarize()</strong>. Expected wins can simply be calculated as the summation of win percentage. First, though, we need to remove all games with no win percentage (these are games versus non-D1 teams).</p><pre><code>game_preds_summarized &lt;- game_preds %&gt;%
  filter(!is.na(team_wp)) %&gt;%
  summarize(
    team_wins = sum(team_wp),
    ap_wins = sum(ap_wp),
    diff = team_wins - ap_wins,
    <strong>.by = team</strong>
  )</code></pre><p>The `.by` argument is another way to &#8220;group&#8221; our summarization without writing a <strong>group_by()</strong> step. We want to group by each team to return the expected team wins and AP wins <em>by team schedule</em>. This will apply the set of calculations to each team instead of every row at once.</p><h5>Add team logos and rank</h5><p>To make our table easier to read, let&#8217;s add team logos to plot. You can grab a .CSV of team logos from my GitHub. Again, we are using an <em>equality join</em> between `team` and `common_team`. We are also adding a column to indicate AP preseason rank. We know this will simply be the row number of each team as they are already in order. `.before` is telling `mutate` <em>where</em> we want the rank column to appear.</p><p>Lastly, we will create a column that includes the HTML we need to correctly format the team logos and names column. The idea here is to place the team names to the right of their respective logo in a <em>single</em> column, with the text aligned in the middle relative to the logo. If you aren&#8217;t familiar with HTML, that&#8217;s completely okay. This step isn&#8217;t necessary for understanding how to make the visualization.</p><pre><code>teams &lt;- read_csv('https://gist.github.com/andreweatherman/cd2a258b7a75dc75cd86940e29f28af9/raw/12a2cd5f919a681ab4711c4b5082385bacd78c2e/teams.csv') %&gt;%
    select(common_team, logo)

game_preds_summarized &lt;- game_preds_summarized %&gt;%
    left_join(teams, join_by('team' == 'common_team')) %&gt;%
    mutate(ap_rank = row_number(), .before = team) %&gt;%
    mutate(team_logo = glue::glue("&lt;img src='{logo}' style='height: 25px; width: auto; vertical-align: middle;'&gt; {team}"))</code></pre><h2>What have we done in Part 2?</h2><p>Part 2 is all about <strong>analyzing</strong> our data. Let&#8217;s break down what we&#8217;ve done.</p><ol><li><p>Used <strong>case_when() </strong>to identify game location relative to `team`.</p></li><li><p>Introduced <em>equality joins </em>to merge on rating data.</p><ol><li><p><strong>join_by(&#8220;first_column&#8221; == &#8220;second_column&#8221;)</strong></p></li></ol></li><li><p>Defined a helper function using <strong>case_when() </strong>and <em>selection helpers </em>&#8212; <strong>starts_with() </strong>and <strong>ends_with()</strong> &#8212; to compute venue-adjusted ratings.</p></li><li><p>Applied pre-defined Pythag and win probability formulas.</p></li><li><p>Used <strong>map_dfr()</strong> to <em>loop over</em> our <strong>parse_wp()</strong> function and return results for all teams in a single tibble.</p></li><li><p>Summarized the resulting table <em>per-team </em>with the `.by` argument in <strong>summarize()</strong>.</p></li><li><p>Brought in team logos for plotting using an <em>equality join</em>.</p></li></ol><h2>Part 3: Data Visualizing</h2><p>Now for the fun part: Visualizing our data! For this visualization, we will be using the `gt` and `gtExtras` packages to create a table. For this section, I will break up the code in chunks, explaining each step, and include the final table code at the bottom.</p><h5>Step 1: Basic table construction</h5><pre><code>game_preds_summarized %&gt;%
   gt(id='ap') %&gt;%
   gt_theme_excel() %&gt;%
   fmt_markdown(team_logo) %&gt;%
   cols_hide(c(team, logo)) %&gt;%
   cols_move(team_logo, after = ap_rank) %&gt;%
   cols_align(columns = c(team_wins, ap_wins, diff, ap_rank), align = 'center') %&gt;%
   cols_align(columns = team_logo, align = 'left') %&gt;%
   cols_label(
    ap_rank = 'Rank',
    team_logo = 'Team',
    team_wins = 'TM Wins',
    ap_wins = 'AP Wins',
    diff = 'Diff.')</code></pre><p>A `gt` table can be initialized by passing data to <strong>gt()</strong>. Because we will be adding custom CSS at the end, we also need to pass an <em>arbitrary</em> table id to <strong>gt()</strong>. We will add on the appearance of our table later, but we can start with a base theme now, which will be <strong>gt_theme_excel()</strong> from the `gtExtras` package.</p><p><strong>fmt_markdown()</strong> renders our HTML column from earlier. <strong>cols_hide()</strong>, <strong>cols_move()</strong>, and <strong>cols_align()</strong> simply hide, move, and align our columns. <strong>cols_label()</strong> renames our column headers in the form of `old_name` = &#8220;New Name&#8221;.</p><h5>Step 2: Format numbers</h5><pre><code>game_preds_summarized %&gt;%
  <strong># ... PREVIOUS CODE ...

  </strong>fmt_number(columns = c(team_wins, ap_wins, diff), decimals = 2) %&gt;%
  gt_hulk_col_numeric(diff)</code></pre><p><strong>fmt_number()</strong> will do a few things, but we are employing it to trim our columns to show only two decimals places. <strong>gt_hulk_col_numeric()</strong> uses a diverging purple-green color palette &#8212; one that is colorblind safe! &#8212; to fill the `diff` cell background based on its value relative to the range in the column.</p><p>Right now, our table looks like this: Pretty solid but we can still tweak a few things!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!esHr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2485f6dc-6e7a-4d3f-84ec-9efe9f35916d_854x386.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!esHr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2485f6dc-6e7a-4d3f-84ec-9efe9f35916d_854x386.png 424w, https://substackcdn.com/image/fetch/$s_!esHr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2485f6dc-6e7a-4d3f-84ec-9efe9f35916d_854x386.png 848w, https://substackcdn.com/image/fetch/$s_!esHr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2485f6dc-6e7a-4d3f-84ec-9efe9f35916d_854x386.png 1272w, https://substackcdn.com/image/fetch/$s_!esHr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2485f6dc-6e7a-4d3f-84ec-9efe9f35916d_854x386.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!esHr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2485f6dc-6e7a-4d3f-84ec-9efe9f35916d_854x386.png" width="854" height="386" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2485f6dc-6e7a-4d3f-84ec-9efe9f35916d_854x386.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:386,&quot;width&quot;:854,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:206647,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!esHr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2485f6dc-6e7a-4d3f-84ec-9efe9f35916d_854x386.png 424w, https://substackcdn.com/image/fetch/$s_!esHr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2485f6dc-6e7a-4d3f-84ec-9efe9f35916d_854x386.png 848w, https://substackcdn.com/image/fetch/$s_!esHr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2485f6dc-6e7a-4d3f-84ec-9efe9f35916d_854x386.png 1272w, https://substackcdn.com/image/fetch/$s_!esHr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2485f6dc-6e7a-4d3f-84ec-9efe9f35916d_854x386.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h5>Step 3: Title, subtitle, and caption</h5><pre><code>game_preds_summarized %&gt;%
   <strong># ... PREVIOUS CODE ...

</strong>   tab_header(title = md('How Teams Stack Up to Avg. AP Top 25 Expectation'), subtitle = md('Venue-adjused team wins vs. expected wins by the average top 25 team')) %&gt;%
  tab_source_note(source_note = md('Data by Barttorvik&lt;br&gt;Analysis + Viz. by @andreweatherman'))</code></pre><p><strong>tab_header() </strong>and <strong>tab_source_note() </strong>will allow us to add a title, subtitle, and caption (source note). Wrapping our text in <strong>md()</strong> lets us use Markdown-formatted text, which can be useful if we want to style text or insert purposeful line breaks (like we did in our source note).</p><h5>Step 4: Table styles and options</h5><pre><code>game_preds_summarized %&gt;%
   <strong># ... PREVIOUS CODE ...</strong>
  
  <strong># bold diff col. text</strong>
  tab_style(
    style = cell_text(weight = 'bold'),
    location = cells_body(columns = diff)
  ) %&gt;%
  <strong># control padding on header</strong>
  tab_options(
    heading.padding = 3,
    table.font.size = 15
  ) %&gt;%
  <strong># title font</strong>
  tab_style(
    locations = cells_title('title'),
    style = cell_text(
      font = google_font('Fira Sans'),
      weight = 600,
      size = px(19)
    )
  ) %&gt;%
  <strong># subtitle font</strong>
  tab_style(
    locations = cells_title('subtitle'),
    style = cell_text(
      font = google_font('Fira Sans'),
      weight = 400,
      size = px(14)
    )
  ) %&gt;%
  <strong># source note font</strong>
  tab_style(
    locations = cells_source_notes(),
    style = cell_text(
      font = google_font('Fira Sans'),
      weight = 400,
      size = px(12)
    )
  ) %&gt;%
  <strong># columns labels</strong>
  tab_style(
    locations = cells_column_labels(columns = everything()),
    style = cell_text(
      font = google_font('Fira Sans'),
      weight = 600,
      size = px(14),
      transform = 'uppercase',
      align = 'center'
    )
  ) %&gt;%
  <strong># table font</strong>
  opt_table_font(
    font = google_font('Fira Sans'),
    weight = 450
  )</code></pre><p>This might seem like a lot of code &#8212; but most of it is just repeating styles for different sections!</p><p>The first <strong>tab_style() </strong>is <em>bolding</em> the text in the `diff` column. We can refer to a particular column&#8217;s values by referencing it in `cells_body`. The second <strong>tab_style() </strong>adjusts the padding and table font size.</p><p>The next three <strong>tab_style() </strong>calls specifies the font family, weight, and size of text in our title, subtitle, and source note. The <strong>google_font()</strong> function allows us to easily use a font from Google Fonts without having to download it. Feel free to play around with different fonts; I am using <em>Fira Sans</em> for this visualization.</p><p>We next modify the column labels&#8217; font, weight, size, alignment and force them to be capitalized. We indicate that we want to change <em>all</em> column headers by using <strong>everything()</strong> inside <strong>cells_column_labels()</strong>. Finally, we set the table font to <em>Fira Sans</em>.</p><h5>Step 5: Custom CSS</h5><pre><code>game_preds_summarized %&gt;%
   <strong># ... PREVIOUS CODE ...

</strong>   opt_css(
     css = "
     #ap .gt_heading {
      padding-bottom: 0px;
      padding-top: 6px
     }
     #ap .gt_subtitle {
      padding-top: 2px;
      padding-bottom: 6px;
     }
    "
  ) </code></pre><p>To put finishing touches on our table, we add brief CSS. All this does is adjust the padding on the header and subtitle: The padding was bugging me. Importantly, this only works because we set a table id at the start of our code &#8212; which we now reference here (<em>#ap &#8230;</em>).</p><h3>Full table code and saving the table</h3><p>All together, here is our final table code! We can place <strong>gtsave_extra() </strong>at the end of our code to save our table!</p><pre><code>game_preds_summarized %&gt;%
   gt(id='ap') %&gt;%
   gt_theme_excel() %&gt;%
   fmt_markdown(team_logo) %&gt;%
   cols_hide(c(team, logo)) %&gt;%
   cols_move(team_logo, after = ap_rank) %&gt;%
   cols_align(columns = c(team_wins, ap_wins, diff, ap_rank), align = 'center') %&gt;%
   cols_align(columns = team_logo, align = 'left') %&gt;%
   cols_label(
     ap_rank = 'Rank',
     team_logo = 'Team',
     team_wins = 'TM Wins',
     ap_wins = 'AP Wins',
     diff = 'Diff.') %&gt;%
   fmt_number(columns = c(team_wins, ap_wins, diff), decimals = 2) %&gt;%
   gt_hulk_col_numeric(diff) %&gt;%
   tab_header(title = md('How Teams Stack Up to Avg. AP Top 25 Expectation'), subtitle = md('Venue-adjused team wins vs. expected wins by the average top 25 team')) %&gt;%
   tab_source_note(source_note = md('Data by Barttorvik&lt;br&gt;Analysis + Viz. by @andreweatherman')) %&gt;%
   tab_style(style = cell_text(weight = 'bold'), location = cells_body(columns = diff)) %&gt;%
   tab_options(heading.padding = 3, table.font.size = 15) %&gt;%
   tab_style(
     locations = cells_title('title'),
     style = cell_text(
       font = google_font('Fira Sans'),
       weight = 600,
       size = px(19)
     )
   ) %&gt;%
   tab_style(
     locations = cells_title('subtitle'),
     style = cell_text(
       font = google_font('Fira Sans'),
       weight = 400,
       size = px(14)
     )
   ) %&gt;%
   tab_style(
     locations = cells_source_notes(),
     style = cell_text(
       font = google_font('Fira Sans'),
       weight = 400,
       size = px(12)
     )
   ) %&gt;%
   tab_style(
     locations = cells_column_labels(columns = everything()),
     style = cell_text(
       font = google_font('Fira Sans'),
       weight = 600,
       size = px(14),
       transform = 'uppercase',
       align = 'center'
     )
   ) %&gt;%
   opt_table_font(
     font = google_font('Fira Sans'),
     weight = 450
   ) %&gt;%
   opt_css(
     css = "
      #ap .gt_heading {
       padding-bottom: 0px;
       padding-top: 6px
      }
      #ap .gt_subtitle {
       padding-top: 2px;
       padding-bottom: 6px;
      }
     "
   ) %&gt;%
  gtsave_extra('ap_25_expected.png')</code></pre><h2>What have we done in Part 3?</h2><p>Part 3 is all about <strong>visualizing</strong> our data. Let&#8217;s break down what we&#8217;ve done.</p><ol><li><p>Initialized our table using <strong>gt()</strong> and piping in the summarized data.</p></li><li><p>Applied a basic table theme using <strong>gt_theme_excel()</strong>.</p></li><li><p>Rendered in our HTML content using <strong>fmt_markdown()</strong>.</p></li><li><p>Adjusted our columns using <strong>cols_hide()</strong>, <strong>cols_move()</strong>, and <strong>cols_align()</strong> and renamed column headers with <strong>cols_label()</strong>.</p></li><li><p>Truncated numeric columns using <strong>fmt_number()</strong> and applied a colorblind-safe conditional fill with <strong>gt_hulk_col_numeric()</strong>.</p></li><li><p>Added a table title, subtitle, and caption with <strong>tab_header()</strong> and <strong>tab_source_note()</strong>.</p></li><li><p>Modified the font, weight, and size of our column headers, table text, and titles with <strong>tab_style()</strong>.</p></li><li><p>Applied custom CSS using <strong>opt_css()</strong> to adjust padding.</p></li><li><p>Saved our table using <strong>gtsave_extra()</strong>.</p></li></ol><h2>Full Source Code</h2><p><strong><a href="https://gist.github.com/andreweatherman/a79d12e3e29b4b60145f8773a6f9c5ea">The full source code is hosted here on GitHub.</a></strong></p><p>If you found value in today&#8217;s post, I kindly ask that you please subscribe to <em>Buckets &amp; Bytes</em> and consider sharing the blog with others. Subscribing is entirely free! It just ensures that you never miss a post. Sharing is never expected but always appreciated. It helps me understand what content is most valuable.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.bucketsandbytes.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption"><strong>Subscribing is entirely free! It just ensures that you never miss a post.</strong></p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Welcome to Buckets & Bytes]]></title><description><![CDATA[Hands-on R tutorials with college basketball data]]></description><link>https://www.bucketsandbytes.com/p/welcome-to-spreadsheets-and-basketball</link><guid isPermaLink="false">https://www.bucketsandbytes.com/p/welcome-to-spreadsheets-and-basketball</guid><dc:creator><![CDATA[Andrew Weatherman]]></dc:creator><pubDate>Thu, 19 Oct 2023 21:27:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbabbd553-69f4-4934-80d7-e3865f4f07da_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to &#8220;<strong>Buckets &amp; Bytes</strong>&#8221;! If you&#8217;re into college basketball and curious about how data visualization can elevate your understanding of the game, you&#8217;re right where you need to be. My name is Andrew Weatherman, and I am an R enthusiast and college basketball fanatic. </p><p>My journey with R started in the thick of COVID, with self-learning at the core, aided immensely by hands-on tutorials that made sense of complex data. It was a blend of passion, persistence, and a touch of trial and error.</p><p>Here, we&#8217;re taking a leaf out of Owen Phillips&#8217; book, inspired by his incredible blog, <em>The F5</em>. We'll be exploring the world of college basketball through the lens of R programming, backed by a wealth of open-source data.</p><p>Each post here is two-fold: offering you insightful visualizations and giving you access to the source code. It&#8217;s about demystifying the process, unraveling the &#8216;why&#8217; and &#8216;how&#8217; behind each piece of code, aiming to arm you with the skills to forge your own path in analytics.</p><h3>Who is this for?</h3><p>The explanations in <em>Buckets &amp; Bytes</em> will assume some foundational understanding of R. You need not be an expert by any means, but it would be helpful to have a grasp of core principles. <em>Buckets &amp; Bytes </em>will walk through every step &#8212; start to finish &#8212; but for brevity, some basic knowledge will be assumed. If you are at least a few weeks into your R journey, <em>Buckets &amp; Bytes</em> will be the perfect learning companion. If you are more advanced, you might still find the visualization code and accompanying walk-through helpful! </p><h3>But I don&#8217;t like basketball</h3><p>That&#8217;s okay! The data we will be using is <em>mostly</em> basketball, yes, but the concepts and skills are applicable across all fields. Feel free to bring your own data and make spin-offs of the visualizations!</p><h3>I have questions about the code</h3><p>Please don&#8217;t be scared to reach out! <a href="https://twitter.com/andreweatherman">You can find me on Twitter</a> (I will never call it X), and my DMs are always open. I would be more than happy to answer any questions and provide deeper explanations where needed.</p><h3>When do you post?</h3><p>My hope is to post once per week, but I&#8217;m also a full-time student, with outside obligations, and applying to graduate schools &#8212; so in reality, it will be whenever I have the time.</p><h3>How can I support you?</h3><p>Subscribe and share the blog! If you have feedback, you can leave a comment on the article itself, <a href="https://twitter.com/andreweatherman">tweet me</a>, or shoot me a direct message on Twitter.</p><p><em>Buckets &amp; Bytes</em> is completely free! I&#8217;m doing this because I genuinely learned so much from Owen Phillips&#8217; <em>The F5</em> and want to offer an alternative now that <em>The F5</em> is over. </p><p><a href="https://ko-fi.com/andrewweatherman">If you feel so inclined, however, you can leave a small &#8220;tip&#8221; on my KoFi page.</a> Anything is very much appreciated.</p><p><strong>So, let&#8217;s get to it. Your deep dive into the intricate dance between college basketball and data analytics starts now.</strong></p>]]></content:encoded></item></channel></rss>