FR
Copied
Concepts

Pipeline orchestration

Chain modules into a reusable DAG. Each block consumes the previous block's output, no glue code required.

A pipeline is a directed acyclic graph of modules. Each node is one module call; each edge declares which output feeds which input. Pipelines save a multi-step recipe once and re-run it.

Pipelines also back veille: a recurring scrap is internally a scheduled pipeline.

Anatomy

   ┌──────────┐
   │  scrap   │   queries=["bakery"], zones=["Paris"]
   └────┬─────┘
        │ produces: poi_list
        ▼
   ┌──────────┐      ┌──────────┐
   │  emails  │      │ ads_intel│
   └────┬─────┘      └────┬─────┘
        │                  │
        ▼                  ▼
   ┌────────────────────────────┐
   │          filter            │   rules: emails_present=true, ads_score≥30
   └────────────┬───────────────┘
                ▼
            ┌────────┐
            │  sort  │   sort_by=ads_score, desc, top_n=200
            └────────┘

Each node has:

Chaining rules

An edge is valid only if the producer's produces matches the consumer's needs (shapes like poi_list, enriched_list, csv_rows). The editor enforces this at design time, and the server re-validates on submit.

The full set of chainable blocks — their category, input/output buckets, needs/produces columns, and per-block config_schema — is published as a single machine-readable contract at GET /api/pipelines/schema. That endpoint is the single source of truth: the editor palette, import, AI generation, and the planned MCP create_pipeline tool all read it. Every active enrichment module is chainable (scrap, import, reviews, emails, verify, socials, dead_check, techstack, ads_intelligence, brand_assets, legal_ids, legal_data, legal_mentions, phones_extra, pricing, pagespeed, phone_info) plus the filter/sort transforms.

Build, export, import, or generate with AI

A pipeline graph is portable JSON. Four ways to obtain one:

Limits

Limit Value
Max nodes 20
Max inputs/node 1 (multi-input merges not yet open)
Max depth 20
Re-runs allowed Unlimited

Execution

Pipelines auto-start at creationPOST /api/pipelines queues the root node, the rest follows as predecessors reach done.

Each node runs as a normal job (same lifecycle, observability, retries). The coordinator advances on done, stops on the first failed. A failed pipeline can be resumed from the failing node. To re-run, create a new pipeline (the graph is JSON — copy and re-post).

Endpoints

GET    /api/pipelines/schema           # canonical node schema (public, source of truth)
POST   /api/pipelines/validate         # normalize + validate, no side effects
POST   /api/pipelines                  # create (also auto-starts)
GET    /api/pipelines                  # list user pipelines
GET    /api/pipelines/{id}             # detail + graph

A pipeline is owned by one user.

Filter preview

POST /api/pipelines/{id}/nodes/{node_id}/filter-preview

Runs a filter node against a sample of the predecessor's output without executing the full pipeline.

What's next