<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="https://media.rss.com/style.xsl"?>
<rss xmlns:podcast="https://podcastindex.org/namespace/1.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:psc="http://podlove.org/simple-chapters" xmlns:atom="http://www.w3.org/2005/Atom" xml:lang="en" version="2.0">
  <channel>
    <title><![CDATA[When Clean Data Is Actually Dirty]]></title>
    <link>https://rss.com/podcasts/when-clean-data-is-actually-dirty</link>
    <atom:link href="https://media.rss.com/when-clean-data-is-actually-dirty/feed.xml" rel="self" type="application/rss+xml"/>
    <atom:link rel="hub" href="https://pubsubhubbub.appspot.com/"/>
    <description><![CDATA[<p>We often treat data cleaning as a neutral step.</p><p>Delete missing rows. Fill gaps with the mean. Move on.</p><p></p><p>But cleaning is not neutral. It is a modeling decision.</p><p></p><p>In this episode, we unpack the statistical consequences of deletion and simple imputation, and why what looks “clean” can fundamentally alter your estimand, distort variance, and bias inference.</p><p></p><p>We walk through:</p><p></p><ul><li>The formal role of the missingness indicator</li><li>The difference between MCAR, MAR, and MNAR</li><li>Why complete-case analysis is rarely as safe as it seems</li><li>How mean imputation collapses variance and attenuates regression slopes</li><li>When multiple imputation and inverse probability weighting are appropriate</li><li>Why sensitivity analysis becomes essential under MNAR</li></ul><p></p><p></p><p>If you cannot defend MCAR, deletion and mean imputation are high-risk defaults.</p><p></p><p>Cleaning is not preprocessing.</p><p>Cleaning is inference.</p><p></p><p>This episode is for data scientists, statisticians, epidemiologists, and analysts who want to bring rigor back to real-world data.</p>]]></description>
    <generator>RSS.com 2026.129.132539</generator>
    <lastBuildDate>Mon, 16 Feb 2026 09:25:09 GMT</lastBuildDate>
    <language>en</language>
    <copyright><![CDATA[StatHarbor Analytics]]></copyright>
    <itunes:image href="https://media.rss.com/when-clean-data-is-actually-dirty/20260216_090226_c2fb26f92db01e47587468631375193d.png"/>
    <podcast:guid>5671ba26-1064-59f2-89f1-b9b1c8f85e65</podcast:guid>
    <image>
      <url>https://media.rss.com/when-clean-data-is-actually-dirty/20260216_090226_c2fb26f92db01e47587468631375193d.png</url>
      <title>When Clean Data Is Actually Dirty</title>
      <link>https://rss.com/podcasts/when-clean-data-is-actually-dirty</link>
    </image>
    <podcast:locked>yes</podcast:locked>
    <podcast:license>StatHarbor Analytics</podcast:license>
    <itunes:author>StatHarbor Analytics </itunes:author>
    <itunes:owner>
      <itunes:name>StatHarbor Analytics </itunes:name>
    </itunes:owner>
    <itunes:explicit>false</itunes:explicit>
    <itunes:type>episodic</itunes:type>
    <itunes:category text="Education">
      <itunes:category text="Courses"/>
    </itunes:category>
    <itunes:category text="Science">
      <itunes:category text="Mathematics"/>
    </itunes:category>
    <podcast:medium>podcast</podcast:medium>
    <item>
      <title><![CDATA[When Clean Data Is Actually Dirty]]></title>
      <itunes:title><![CDATA[When Clean Data Is Actually Dirty]]></itunes:title>
      <description><![CDATA[<p>“Cleaning” data is often treated as a harmless preprocessing step.</p><p></p><p>Delete missing rows.</p><p>Fill gaps with the mean.</p><p>Move forward.</p><p></p><p>But cleaning is not neutral.</p><p></p><p>It is a modeling decision that can change:</p><p></p><ul><li>The estimand</li><li>The sampling mechanism</li><li>The bias–variance trade-off</li></ul><p></p><p></p><p>In this episode, we examine the statistical dangers of deletion and simple imputation — and why naïve cleaning can quietly corrupt inference.</p>]]></description>
      <link>https://rss.com/podcasts/when-clean-data-is-actually-dirty/2552302</link>
      <enclosure url="https://content.rss.com/episodes/373276/2552302/when-clean-data-is-actually-dirty/2026_02_16_09_25_00_06e80de7-b4c1-4739-89df-c362f6973268.mp3" length="5799752" type="audio/mpeg"/>
      <guid isPermaLink="false">e7cb88c0-4d91-4c2b-b9c2-793602c42651</guid>
      <itunes:duration>362</itunes:duration>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:season>1</itunes:season>
      <podcast:season>1</podcast:season>
      <itunes:episode>1</itunes:episode>
      <podcast:episode>1</podcast:episode>
      <itunes:explicit>false</itunes:explicit>
      <pubDate>Mon, 16 Feb 2026 09:25:09 GMT</pubDate>
      <itunes:image href="https://media.rss.com/when-clean-data-is-actually-dirty/ep_cover_20260216_090259_0bf4796ec4c10b70a51af3b72e818538.png"/>
      <podcast:location rel="creator" geo="geo:61.0666922,-107.991707" osm="R1428125" country="ca">Canada, Canada</podcast:location>
    </item>
  </channel>
</rss>