CSV Format Requirements

Detailed guide on formatting your CSV files correctly for each model type in Predict Oracle.

CSV Format Requirements

Beginner 15 min read Data Preparation

Detailed Guide on Formatting Your CSV Files for Predict Oracle

This guide provides comprehensive instructions for preparing your CSV files correctly for each model type, ensuring smooth data import and optimal model performance.

General CSV Formatting Guidelines

File Format Basics

  • Save files as .csv (comma-separated values)
  • Use UTF-8 encoding to support special characters
  • Keep file size under 100MB (for larger datasets, contact support)
  • Include a header row with column names
  • Use comma (,) as the delimiter

Column Naming Conventions

  • Use clear, descriptive names
  • Avoid spaces (use underscores instead: customer_id not customer id)
  • Keep names concise (under 30 characters)
  • Avoid special characters (%, $, #, @, etc.)
  • Don't start column names with numbers

Data Type Consistency

Each column should contain only one data type: - Numeric: Integers or decimals (e.g., 42, 3.14) - Categorical: Text values (e.g., "yes", "no", "high", "medium", "low") - Dates: Consistent format (YYYY-MM-DD recommended) - Boolean: Use consistent values (true/false, yes/no, 1/0)

Model-Specific CSV Requirements

Outcome Prediction CSV Format

Required Structure: id,feature1,feature2,feature3,...,target 1001,value1,value2,value3,...,outcome1 1002,value1,value2,value3,...,outcome2

Example: customer_id,age,income,email_opens,website_visits,converted 1001,34,75000,12,8,yes 1002,47,63000,2,3,no 1003,29,82000,8,15,yes

Special Considerations: - Target column should contain discrete values (yes/no, high/medium/low) - Ensure balanced representation of outcomes (not 99% one outcome) - Missing values are acceptable but should be limited

Smart Segmentation CSV Format

Required Structure: id,attribute1,attribute2,attribute3,... 1001,value1,value2,value3,... 1002,value1,value2,value3,...

Example: customer_id,recency,frequency,monetary,categories_purchased,region 1001,14,8,943.28,"electronics,home",northeast 1002,125,2,1245.65,"automotive",midwest 1003,7,12,532.94,"clothing,accessories,beauty",west

Special Considerations: - No target column required (unsupervised learning) - Multi-value categories can be separated by commas within quotes - Standardize categorical values (e.g., "M"/"F" not "Male"/"Female"/"M"/"F")

Forecasting CSV Format

Required Structure: date,target,contextual1,contextual2,... 2023-01-01,value,context1,context2,... 2023-01-02,value,context1,context2,...

Example: date,sales,promotion,holiday,weekend 2023-01-01,12500,no,yes,yes 2023-01-02,8700,no,no,no 2023-01-03,9200,yes,no,no

Special Considerations: - Date column must be in chronological order - Time intervals must be consistent (daily, weekly, monthly) - Target column must be numeric - Contextual columns are optional but can improve accuracy

Common CSV Formatting Errors to Avoid

  1. Inconsistent date formats - Stick to one format throughout (YYYY-MM-DD)
  2. Thousand separators - Don't use commas in numbers (use 10000, not 10,000)
  3. Currency symbols - Omit $ or other currency symbols
  4. Text in numeric columns - Don't mix "N/A" or text in numeric columns
  5. Inconsistent missing value notation - Use empty cells for missing values, not "NULL", "N/A", etc.
  6. Hidden columns or formatting - Avoid exporting from Excel with hidden columns
  7. Quoted fields inconsistency - Ensure all text with commas is properly quoted

Testing Your CSV Format

Before uploading your full dataset, we recommend: 1. Testing with a small sample (10-20 rows) 2. Using our CSV Validator tool in the Data Preparation section 3. Reviewing the data preview after upload to verify correct import

For complex datasets or custom importing needs, contact our support team for assistance.

Related Resources
  • No related guides available

Ready to start building?

Create a free account to apply what you've learned.

Sign Up Free