Quick Start Guide

Get started with RIDE CLI in minutes!

Basic Usage

1. Launch RIDE

ride

Or load a dataset directly:

ride your-dataset.csv

This opens the interactive interface:

===============================================================================

    8888888b.  8888888 8888888b.  8888888888      .d8888b.  888      8888888  
    888   Y88b   888   888  "Y88b 888            d88P  Y88b 888        888    
    888    888   888   888    888 888            888    888 888        888    
    888   d88P   888   888    888 8888888        888        888        888    
    8888888P"    888   888    888 888            888        888        888    
    888 T88b     888   888    888 888            888    888 888        888    
    888  T88b    888   888  .d88P 888            Y88b  d88P 888        888    
    888   T88b 8888888 8888888P"  8888888888      "Y8888P"  88888888 8888888  

===============================================================================

RIDE: Rapid Insights Data Engine

RIDE is a free open-source toolkit that lets you perform data analysis
without writing a single line of code and minimal intervention.

===============================================================================

Main Menu:

1. Load Dataset
2. Inspect Data
3. Change Data Type
4. Explore Data
5. Visualize Data
6. Impute Missing Values
7. Feature Encoding
8. Feature Scaling and Transformation
9. Export Data
10. AutoML (Train & Evaluate Models)

'$' Export Data (saves current state)
'exit': Exit RIDE CLI

Enter your choice (1-10, $, exit):

2. Load Your Data

Select option 1 to load your dataset:

Enter your choice (1-10, $, exit): 1

Options:
1. Load your own data
2. Load default data (Pre-loaded)
0. Back to main menu

Enter your choice (0-2): 1

Enter the path to your dataset file (CSV, Excel, Parquet): data/iris.csv

Success! Dataset loaded with 150 rows and 5 columns.

Preview of the first 5 rows:
   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa

Default Datasets Available

If you choose option 2, you can select from these pre-loaded datasets: - Ames Housing: House prices with 82 features - Camera Dataset: Digital camera specifications - Fish: Fish species measurements - Penguins: Palmer Penguins dataset - Titanic: Passenger survival data

3. Inspect Your Data

Use option 2 to explore your dataset structure:

Enter your choice (1-10, $, exit): 2

Inspection Options:
1. View features and data types
2. View dataset shape
3. Check missing values
4. View data sample
5. View summary statistics
6. Back to main menu

Enter your choice (1-6): 1

Features available in the dataset:
Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'], dtype='object')

Data types of features:
sepal_length     float64
sepal_width      float64
petal_length     float64
petal_width      float64
species           object
dtype: object

4. Change Data Types

Convert column data types with option 3:

Enter your choice (1-10, $, exit): 3

Columns available for data type conversion:
--------------------------------------------------
#   Column Name               Current Type     Sample Values
--------------------------------------------------
1   sepal_length              float64         5.1, 4.9, 4.7
2   sepal_width               float64         3.5, 3.0, 3.2
3   species                   object          setosa, setosa, setosa
--------------------------------------------------

Select data type:
1. String (object)
2. Integer (int8)
3. Integer (int64)
4. Float (float64)
5. DateTime
6. Boolean

5. Explore Your Data

Use option 4 for statistical analysis:

Enter your choice (1-10, $, exit): 4

Exploration Options:
1. Feature correlation analysis
2. Check for normal distribution
3. Detect outliers
4. View skewness
5. View kurtosis
6. Check for imbalanced target variable
7. Back to main menu

Enter your choice (1-7): 1

Top Correlations (excluding self-correlations):
------------------------------------------------------------
Feature Pair                    Correlation     Strength
------------------------------------------------------------
petal_length-petal_width         0.963         Strong
sepal_length-petal_length        0.872         Strong
sepal_length-petal_width         0.818         Strong
------------------------------------------------------------

6. Visualize Data

Create terminal-based visualizations with option 5:

Enter your choice (1-10, $, exit): 5

Visualization Options:
1. Plot histogram
2. Plot scatter plot
3. Back to main menu

Enter your choice (1-3): 1

Available numerical columns for histogram:
--------------------------------------------------
1. sepal_length
2. sepal_width
3. petal_length
4. petal_width
--------------------------------------------------

Enter the column number to plot histogram: 1

Plotting histogram for: sepal_length
[Terminal-based histogram appears]

Statistics for sepal_length:
Mean: 5.84
Median: 5.80
Std Dev: 0.83
Min: 4.30
Max: 7.90

7. Handle Missing Values

Impute missing data with option 6:

Enter your choice (1-10, $, exit): 6

Choice Available to Impute Missing Data:
1. [Press 1] Drop Missing Data
2. [Press 2] Impute Missing Data with Specific Value
3. [Press 3] Impute Missing Data with Mean
4. [Press 4] Impute Missing Data with Median
5. [Press 5] Impute Missing Data based on Distribution
6. [Press 6] Impute Missing Data with Fill Forward Strategy
7. [Press 7] Impute Missing Data with Backward Fill Strategy
8. [Press 8] Impute Missing Data with Nearest Neighbours

Enter your choice: 3

Enter path to save Imputed data: data/imputed_data.csv
Missing Data Imputed and saved successfully

8. Feature Encoding

Transform categorical variables with option 7:

Enter your choice (1-10, $, exit): 7

Categorical columns available:
--------------------------------------------------------
#   Column Name        Data Type    Unique Values    Sample Values
--------------------------------------------------------
1   species           object       3                setosa, versicolor...
--------------------------------------------------------

Encoding Methods:
1. Label Encoding - Maps each unique value to a number
2. One Hot Encoding - Creates binary columns for each category
3. Ordinal Encoding - Maps values to ordered integers
4. Exit and return to main menu

Select encoding method (1-4): 1

Mapping for column 'species':
  setosa → 0
  versicolor → 1
  virginica → 2

9. Feature Scaling

Scale numerical features with option 8:

Enter your choice (1-10, $, exit): 8

Available Options:

=== SCALING OPTIONS ===
1. Min-Max Scaler [Scales features to a range of [0,1]]
2. Standard Scaler (Z-score) [Scales to mean=0, std=1]
3. Robust Scaler [Recommended if outliers are present]
4. Max Abs Scaler [Scales by dividing by the maximum absolute value]

=== TRANSFORMATION OPTIONS ===
5. Quantile Transformer [Maps to uniform or normal distribution]
6. Log Transformer [Natural logarithm transformation]
7. Reciprocal Transformation [1/x transformation]
8. Square Root Transformation [√x transformation]
9. Exit and return to main menu

Enter your choice (1-9): 2

Enter path to save normalized/transformed data: data/scaled_data.csv
Features scaled and saved successfully

10. Run AutoML

Automatically find the best model with option 10:

Enter your choice (1-10, $, exit): 10

Available columns:
1. sepal_length
2. sepal_width
3. petal_length
4. petal_width
5. species

Enter the number of the target column: 5

Select task type:
1. Classification (target variable has discrete classes)
2. Regression (target variable has continuous values)

Enter your choice (1-2): 1

🤖 AutoML Model Selection 🤖
Task Type: Classification
Target Column: species

🔍 Preprocessing: Missing Value Analysis
Initial Missing Values:
(None found)

🔀 Data Split:
Training set: 120 samples
Testing set: 30 samples

🔍 Evaluating Random Forest Classifier...
✓ Completed - Balanced Accuracy: 1.0000

[More models evaluated...]

📊 Classification Model Comparison:
                    Balanced Accuracy  F1 Score  Accuracy
Model               
Random Forest                   1.000     1.000     1.000
Extra Trees                     1.000     1.000     1.000
LightGBM                        1.000     1.000     1.000
XGBoost                        0.967     0.967     0.967
Logistic Regression            0.967     0.967     0.967

Enter the full path to save the AutoML results: results/automl_results.csv
💾 Results saved to results/automl_results.csv

11. Export Data

Save your processed data with option 9 or '$':

Enter your choice (1-10, $, exit): 9

Enter the path to save the file: processed_data

Export Format Options:
1. CSV (.csv)
2. Excel (.xlsx)
3. Parquet (.parquet)

Choose export format (1-3): 1

Data exported successfully to processed_data.csv

Tips and Tricks

Use the '$' shortcut to quickly export your current dataset
Press 'exit' at any time to quit RIDE CLI
All visualizations are terminal-based using plotext
You can load files directly when starting RIDE: ride your-data.csv
Default datasets are available if you want to explore without your own data
All operations preserve your original data; modifications are saved to new files

Common Workflows

Data Analysis Workflow

Load dataset
Inspect data types and missing values
Explore statistical properties
Visualize distributions
Handle missing values
Encode categorical features
Scale numerical features
Export processed data

Machine Learning Workflow

Load dataset
Preprocess data (missing values, encoding, scaling)
Run AutoML to find best model
Export results for further analysis

Need Help?

Check the User Guide for detailed instructions
Report issues on GitHub
Explore the [API Reference]for advanced usage: