Concepts

Pipeline orchestration

Chain modules into a reusable DAG. Each block consumes the previous block's output, no glue code required.

A pipeline is a directed acyclic graph of modules. Each node is one module call; each edge declares which output feeds which input. Pipelines save a multi-step recipe once and re-run it.

Pipelines also back veille: a recurring scrap is internally a scheduled pipeline.

Anatomy

   ┌──────────┐
   │  scrap   │   queries=["bakery"], zones=["Paris"]
   └────┬─────┘
        │ produces: poi_list
        ▼
   ┌──────────┐      ┌──────────┐
   │  emails  │      │ ads_intel│
   └────┬─────┘      └────┬─────┘
        │                  │
        ▼                  ▼
   ┌────────────────────────────┐
   │          filter            │   rules: emails_present=true, ads_score≥30
   └────────────┬───────────────┘
                ▼
            ┌────────┐
            │  sort  │   sort_by=ads_score, desc, top_n=200
            └────────┘

Each node has:

type — module slug (see module registry)
params — module config, identical to a standalone job
inputs — references to upstream node(s)
id — local identifier within the pipeline

Chaining rules

An edge is valid only if the producer's produces matches the consumer's needs (shapes like poi_list, enriched_list, csv_rows). The editor enforces this at design time, and the server re-validates on submit.

The full set of chainable blocks — their category, input/output buckets, needs/produces columns, and per-block config_schema — is published as a single machine-readable contract at GET /api/pipelines/schema. That endpoint is the single source of truth: the editor palette, import, AI generation, and the planned MCP create_pipeline tool all read it. Every active enrichment module is chainable (scrap, import, reviews, emails, verify, socials, dead_check, techstack, ads_intelligence, brand_assets, legal_ids, legal_data, legal_mentions, phones_extra, pricing, pagespeed, phone_info) plus the filter/sort transforms.

Build, export, import, or generate with AI

A pipeline graph is portable JSON. Four ways to obtain one:

Build it visually in the editor (/pipelines/new).
Export the current graph to a JSON envelope ({schema_version, name, definition, meta}) — the Export button downloads it.
Import an envelope (paste or .json file) — it is validated via POST /api/pipelines/validate and loaded back into the editor for review before launch (nothing runs on import).
Generate with AI — describe the pipeline in plain language; the editor sends the server schema plus your description to Claude (using your own key via BYOK), parses the returned JSON, validates it, and lays it out on the canvas.

Limits

Limit	Value
Max nodes	20
Max inputs/node	1 (multi-input merges not yet open)
Max depth	20
Re-runs allowed	Unlimited

Execution

Pipelines auto-start at creation — POST /api/pipelines queues the root node, the rest follows as predecessors reach done.

Each node runs as a normal job (same lifecycle, observability, retries). The coordinator advances on done, stops on the first failed. A failed pipeline can be resumed from the failing node. To re-run, create a new pipeline (the graph is JSON — copy and re-post).

Endpoints

GET    /api/pipelines/schema           # canonical node schema (public, source of truth)
POST   /api/pipelines/validate         # normalize + validate, no side effects
POST   /api/pipelines                  # create (also auto-starts)
GET    /api/pipelines                  # list user pipelines
GET    /api/pipelines/{id}             # detail + graph

A pipeline is owned by one user.

Filter preview

POST /api/pipelines/{id}/nodes/{node_id}/filter-preview

Runs a filter node against a sample of the predecessor's output without executing the full pipeline.