CSV Format Requirements
Detailed Guide on Formatting Your CSV Files for Predict Oracle
This guide provides comprehensive instructions for preparing your CSV files correctly for each model type, ensuring smooth data import and optimal model performance.
General CSV Formatting Guidelines
File Format Basics
- Save files as
.csv
(comma-separated values) - Use UTF-8 encoding to support special characters
- Keep file size under 100MB (for larger datasets, contact support)
- Include a header row with column names
- Use comma (,) as the delimiter
Column Naming Conventions
- Use clear, descriptive names
- Avoid spaces (use underscores instead:
customer_id
notcustomer id
) - Keep names concise (under 30 characters)
- Avoid special characters (%, $, #, @, etc.)
- Don't start column names with numbers
Data Type Consistency
Each column should contain only one data type: - Numeric: Integers or decimals (e.g., 42, 3.14) - Categorical: Text values (e.g., "yes", "no", "high", "medium", "low") - Dates: Consistent format (YYYY-MM-DD recommended) - Boolean: Use consistent values (true/false, yes/no, 1/0)
Model-Specific CSV Requirements
Outcome Prediction CSV Format
Required Structure:
id,feature1,feature2,feature3,...,target
1001,value1,value2,value3,...,outcome1
1002,value1,value2,value3,...,outcome2
Example:
customer_id,age,income,email_opens,website_visits,converted
1001,34,75000,12,8,yes
1002,47,63000,2,3,no
1003,29,82000,8,15,yes
Special Considerations: - Target column should contain discrete values (yes/no, high/medium/low) - Ensure balanced representation of outcomes (not 99% one outcome) - Missing values are acceptable but should be limited
Smart Segmentation CSV Format
Required Structure:
id,attribute1,attribute2,attribute3,...
1001,value1,value2,value3,...
1002,value1,value2,value3,...
Example:
customer_id,recency,frequency,monetary,categories_purchased,region
1001,14,8,943.28,"electronics,home",northeast
1002,125,2,1245.65,"automotive",midwest
1003,7,12,532.94,"clothing,accessories,beauty",west
Special Considerations: - No target column required (unsupervised learning) - Multi-value categories can be separated by commas within quotes - Standardize categorical values (e.g., "M"/"F" not "Male"/"Female"/"M"/"F")
Forecasting CSV Format
Required Structure:
date,target,contextual1,contextual2,...
2023-01-01,value,context1,context2,...
2023-01-02,value,context1,context2,...
Example:
date,sales,promotion,holiday,weekend
2023-01-01,12500,no,yes,yes
2023-01-02,8700,no,no,no
2023-01-03,9200,yes,no,no
Special Considerations: - Date column must be in chronological order - Time intervals must be consistent (daily, weekly, monthly) - Target column must be numeric - Contextual columns are optional but can improve accuracy
Common CSV Formatting Errors to Avoid
- Inconsistent date formats - Stick to one format throughout (YYYY-MM-DD)
- Thousand separators - Don't use commas in numbers (use 10000, not 10,000)
- Currency symbols - Omit $ or other currency symbols
- Text in numeric columns - Don't mix "N/A" or text in numeric columns
- Inconsistent missing value notation - Use empty cells for missing values, not "NULL", "N/A", etc.
- Hidden columns or formatting - Avoid exporting from Excel with hidden columns
- Quoted fields inconsistency - Ensure all text with commas is properly quoted
Testing Your CSV Format
Before uploading your full dataset, we recommend: 1. Testing with a small sample (10-20 rows) 2. Using our CSV Validator tool in the Data Preparation section 3. Reviewing the data preview after upload to verify correct import
For complex datasets or custom importing needs, contact our support team for assistance.
Related Resources
- No related guides available