Data Requirements
Understanding the Data Needed for Each Prediction Model
This guide explains what data is required for each model type in Predict Oracle and how to prepare your datasets for optimal results.
Universal Data Requirements
Regardless of the model type, all datasets should: - Be in CSV format with proper headers - Have consistent data types within each column - Contain at least 100 rows of data (1,000+ recommended for best results) - Have no more than 30% missing values in any critical column
Model-Specific Requirements
Outcome Prediction
Required Data: - Target column: The outcome you want to predict (e.g., converted: yes/no) - Feature columns: Attributes that might influence the outcome - Unique identifier: A column with unique values for each row
Example Dataset Structure:
customer_id, age, income, previous_purchase, email_opens, converted
1001, 34, 75000, yes, 12, yes
1002, 47, 63000, no, 2, no
1003, 29, 82000, yes, 8, yes
Minimum Requirements: - At least 3 feature columns - At least 500 rows (1,000+ recommended) - Clear target variable with at least two possible values
Smart Segmentation
Required Data: - Feature columns: Attributes used to identify patterns and create segments - Unique identifier: A column with unique values for each row - No target column needed (unsupervised learning)
Example Dataset Structure:
customer_id, purchase_frequency, average_order_value, product_categories, browse_time
1001, 12, 87.50, electronics/home, 45
1002, 3, 220.75, luxury/accessories, 15
1003, 8, 42.30, groceries/household, 30
Minimum Requirements: - At least 4 feature columns - At least 300 rows (1,000+ recommended) - Mix of numerical and categorical data preferred
Forecasting
Required Data: - Date/time column: Regular time intervals (hourly, daily, weekly, monthly) - Target column: The value you want to forecast - Optional contextual columns: Additional variables that may influence trends
Example Dataset Structure:
date, sales, promotion_active, holiday
2023-01-01, 12500, no, yes
2023-01-02, 8700, no, no
2023-01-03, 9200, yes, no
Minimum Requirements: - At least 30 time points (more for seasonal patterns) - Consistent time intervals - No more than 10% missing time points
Data Preparation Checklist
Before uploading your data: 1. Verify all required columns are present 2. Check for and handle missing values 3. Ensure date formats are consistent (YYYY-MM-DD recommended) 4. Remove duplicates and irrelevant columns 5. Verify data types match their content (numbers, text, dates)
Next steps: Check out our "Data Cleaning Best Practices" guide to learn how to properly prepare your data for analysis.