02 – Data Cleanup & Conversion
📘 RIDE User Manual – Panel 2: Data Cleanup and Conversion
🔄 Purpose of the Panel
This panel enables users to clean their dataset by:
-
Removing duplicate rows
-
Converting feature columns to appropriate data types
These preprocessing steps are crucial for maintaining data quality and preparing datasets for deeper analysis or modeling.
Recommended Reading
🧭 User Workflow
-
Upload Dataset
Users first upload a dataset (handled globally in the app). -
View Panel Intro
A GIF and explanation help the user understand what this panel does. -
Use Two Tabs:
-
Handle Duplicates: Automatically detects and removes duplicates.
-
Convert Data Types:
-
User selects a column and target data type (INT, FLOAT, BOOLEAN, STRING, DATETIME).
-
The backend intelligently parses and casts the column.
-
Before/after comparison of data types is shown for validation.
-
-
💻 Features Breakdown
Feature | Description |
---|---|
Duplicate Row Detection | Identifies and displays the count of duplicated rows in the dataset. |
Remove Duplicates Button | Allows users to remove duplicate rows in one click. |
Data Type Conversion | Provides dropdowns for selecting a column and a target data type. |
Smart Parsing Logic | Automatically handles tricky conversions like datetime or boolean strings. |
Type Comparison View | Displays side-by-side comparison of original and updated data types. |
🧠 How Conversions Are Handled Internally
-
INT/FLOAT:
-
Strips non-numeric characters using regex.
-
Converts strings only if not already numeric.
-
-
BOOLEAN:
- Converts “true/yes/1/y” strings to
True
, others toFalse
.
- Converts “true/yes/1/y” strings to
-
DATETIME:
-
Tries multiple common date formats.
-
Uses
polars.str.to_datetime()
with fallback parsing.
-