Configuration Files¶
ExpOps projects use YAML configuration files to define project settings, pipeline structure, and execution parameters.
Configuration Files Overview¶
ExpOps projects use two main configuration files:
configs/project_config.yaml(required) - Main project configurationconfigs/compute_config.yaml(optional) - Cluster execution settings
Project Configuration (project_config.yaml)¶
The main configuration file contains these top-level sections:
metadata: # Project name, description, version
scripts: # Script keys to file paths; first key is default for processes
environment: # Named environments (requirements/env files + type)
reproducibility: # Random seed configuration
data: # Data sources and data path for hashing
experiment: # Model framework, paths, parameters, pipeline, cache
execution: # Optional: e.g. execution.workspace.base_dir (temp dir; default tmp)
reporting: # Chart entrypoints, probe paths, reporting environment
Key Sections¶
metadata: Project identification (name, description, version)scripts: Maps script keys to file paths (e.g.main: "project/models/model.py"). The first key is used as the default when a process does not specifyscript. See Defaults that reduce config below.environment: Named environment map (requirements file + type) used by processes and reportingreproducibility: Random seed configurationdata: Data source paths and optional data path for hash-based cache invalidationexperiment.parameters.pipeline: Pipeline DAG structure and process definitions (including optional fields and defaults)- See Pipeline Execution for details
experiment.parameters.cache: Cache backend and KV backend configuration- See Caching & Reproducibility and Backends for details
reporting: Chart entrypoints and chart definitions- See Reporting Features for details
execution.workspace: Optional. Per-process temporary workspace base path. Setbase_dirto control where process run directories are created (e.g.base_dir: "/tmp"). Default istmp(i.e./tmpon Unix).
Defaults that reduce config¶
You can omit some process fields and rely on defaults to keep config minimal:
- Scripts section: The top-level
scriptsmap still defines script keys to file paths. The first key is treated as the default script for processes that do not explicitly specify a script key incode. - Code field: Each process may define a single
codefield instead of separatescriptandcode_functionfields: code: "script_key.function_name"→ use the script registered underscript_keyand callfunction_namefrom that module.code: "function_name"→ use the first script key inscriptsand callfunction_name.- Omitted code:
- For ordinary processes, if
codeis omitted the system assumes a function with the same name as the process, loaded from the default script. - For data/seed split helper nodes (processes that only define
data_parallelismorseed_parallelismand nocode), the system treats them as function-less split nodes.
Example minimal process that uses both defaults:
See Pipeline Execution for full process attributes and process definitions.
Data Hashing¶
Provide a CSV path so cache invalidation responds to data changes:
When the CSV contents change, process and step caches are invalidated using the data hash.
Cluster Configuration (compute_config.yaml)¶
Optional configuration for distributed execution:
provider: slurm
num_workers: 4
options:
worker_cores: 2
worker_memory: 4GB
queue: normal
walltime: "02:00:00"
See Cluster Configuration for detailed setup instructions.
Quick Reference¶
For detailed information on each configuration section:
- Pipeline Definition: Pipeline Execution
- Process & Step Code: Model Code
- Caching: Caching & Reproducibility
- Backends: Backends
- Reporting/Charts: Reporting Features
- Cluster Execution: Cluster Configuration and Distributed Computing
Example Configurations¶
See template projects for complete examples:
- sklearn-basic: Basic local execution setup
- premier-league: Comprehensive setup with cluster configuration and dynamic charts