07 – Distribution Diagnostics
📘 RIDE User Manual – Panel 7: Statistical Data Exploration
📊 Purpose of the Panel
The Distribution Diagnostics Panel allows users to: - Understand data shape via skewness and kurtosis - Test for normality with Q-Q plots and statistical tests - Detect outliers using the IQR method
This is crucial for identifying data anomalies, choosing suitable transformations, and selecting proper machine learning models.
Recommended Reading
- Blog: The Q-Q Plot: What It Means and How to Interpret It
- Blog: Understanding QQ Plots
- Blog: Kurtosis
- Blog: Measures of Kurtosis and Skewness
- Blog: Right Skewed vs. Left Skewed Distribution
- Blog: The Complete Guide to Skewness and Kurtosis
- Blog: What Is an Outlier?
- Blog: What are outliers in the data?
- Blog: What Are Outliers in Data Sciences?
🧭 User Workflow
-
Upload Dataset
Choose from:- Initial DataFrame
- After Missing Value Imputation
- After Feature Encoding
- After Feature Scaling
-
Navigate the Tabs:
- 📊 Distribution & Q-Q Plots: Visualize histograms + Q-Q plots, run normality tests.
- 🏔️ Kurtosis: Check how “peaked” or “flat” distributions are.
- ↗️ Skewness: Explore asymmetry in distributions.
- 🔍 Outliers: Detect extreme values using IQR logic.
-
Interactive Visuals:
- Select numeric features.
- Compare histograms and Q-Q plots.
- Get metric summaries and AI insights.
💻 Features Breakdown
Feature | Description |
---|---|
Data Source Selection | Choose which version of dataset to analyze. |
Distribution Analysis | Histogram + KDE + Q-Q Plot. |
Statistical Tests | Shapiro-Wilk, Kolmogorov-Smirnov, Anderson-Darling. |
Skewness Panel | Visual and numeric skewness for all columns. |
Kurtosis Panel | Measures tailedness; interprets platykurtic vs leptokurtic. |
Outlier Detection | Uses IQR method; gives upper/lower bounds, values, and percentage. |
AI Explanation | GPT-generated summaries for kurtosis and skewness. |
🔍 What Each Technique Tells You
🟩 Skewness
Type | Interpretation | When It Matters |
---|---|---|
= 0 | Symmetric distribution (Normal) | Good baseline for modeling |
> 0 | Right-skewed (long right tail) | Log transformation may help |
< 0 | Left-skewed (long left tail) | Square-root or Box-Cox might help |
🟪 Kurtosis
Type | Interpretation | When It Matters |
---|---|---|
≈ 0 | Normal-tailed (mesokurtic) | Safe for most parametric tests |
> 0 | Heavy-tailed (leptokurtic) | Higher chance of outliers |
< 0 | Light-tailed (platykurtic) | More uniform, less prone to extremes |
🟥 Q-Q Plot
- Compares quantiles of sample distribution to a normal distribution.
- S-shaped curve: Skewed data.
- Straight line: Normal distribution.
🟨 Statistical Tests for Normality
Test | Description | When It’s Used |
---|---|---|
Shapiro-Wilk | Most powerful for n < 5000 | Small to medium datasets |
Kolmogorov-Smirnov | Compares empirical vs normal distribution (uses μ and σ from data) | For larger datasets |
Anderson-Darling | Strong test across all sizes | Offers critical value comparison |
- Shapiro-Wilk Test: Wikipedia
- Shapiro-Wilk Test: An Introduction to the Shapiro-Wilk Test for Normality
- Kolmogorov-Smirnov: Wikipedia
- Kolmogorov-Smirnov: Interpreting results: Kolmogorov-Smirnov test
- Anderson-Darling Test: Wikipedia
- Anderson-Darling Test: A Complete Guide to the Anderson-Darling Normality Test
🧮 Outlier Detection (IQR Method)
- IQR = Q3 - Q1
- Outliers:
- Below: Q1 − 1.5 × IQR
- Above: Q3 + 1.5 × IQR
- Users can tune k (IQR multiplier) to make detection more or less sensitive.
Recommended Reading
- Blog: The Q-Q Plot: What It Means and How to Interpret It
- Blog: Understanding QQ Plots
- Blog: Kurtosis
- Blog: Measures of Kurtosis and Skewness
- Blog: Right Skewed vs. Left Skewed Distribution
- Blog: The Complete Guide to Skewness and Kurtosis
- Blog: What Is an Outlier?
- Blog: What are outliers in the data?
- Blog: What Are Outliers in Data Sciences?