Data Requirements

Understand what data is required for each model type and how to prepare your datasets for optimal results.

Data Requirements

Beginner 15 min read Getting Started

Understanding the Data Needed for Each Prediction Model

This guide explains what data is required for each model type in Predict Oracle and how to prepare your datasets for optimal results.

Universal Data Requirements

Regardless of the model type, all datasets should: - Be in CSV format with proper headers - Have consistent data types within each column - Contain at least 100 rows of data (1,000+ recommended for best results) - Have no more than 30% missing values in any critical column

Model-Specific Requirements

Outcome Prediction

Required Data: - Target column: The outcome you want to predict (e.g., converted: yes/no) - Feature columns: Attributes that might influence the outcome - Unique identifier: A column with unique values for each row

Example Dataset Structure: customer_id, age, income, previous_purchase, email_opens, converted 1001, 34, 75000, yes, 12, yes 1002, 47, 63000, no, 2, no 1003, 29, 82000, yes, 8, yes

Minimum Requirements: - At least 3 feature columns - At least 500 rows (1,000+ recommended) - Clear target variable with at least two possible values

Smart Segmentation

Required Data: - Feature columns: Attributes used to identify patterns and create segments - Unique identifier: A column with unique values for each row - No target column needed (unsupervised learning)

Example Dataset Structure: customer_id, purchase_frequency, average_order_value, product_categories, browse_time 1001, 12, 87.50, electronics/home, 45 1002, 3, 220.75, luxury/accessories, 15 1003, 8, 42.30, groceries/household, 30

Minimum Requirements: - At least 4 feature columns - At least 300 rows (1,000+ recommended) - Mix of numerical and categorical data preferred

Forecasting

Required Data: - Date/time column: Regular time intervals (hourly, daily, weekly, monthly) - Target column: The value you want to forecast - Optional contextual columns: Additional variables that may influence trends

Example Dataset Structure: date, sales, promotion_active, holiday 2023-01-01, 12500, no, yes 2023-01-02, 8700, no, no 2023-01-03, 9200, yes, no

Minimum Requirements: - At least 30 time points (more for seasonal patterns) - Consistent time intervals - No more than 10% missing time points

Data Preparation Checklist

Before uploading your data: 1. Verify all required columns are present 2. Check for and handle missing values 3. Ensure date formats are consistent (YYYY-MM-DD recommended) 4. Remove duplicates and irrelevant columns 5. Verify data types match their content (numbers, text, dates)

Next steps: Check out our "Data Cleaning Best Practices" guide to learn how to properly prepare your data for analysis.

Ready to start building?

Create a free account to apply what you've learned.

Sign Up Free