FR
Copied
Modules

Sort

Purpose

Reorder rows produced by a previous pipeline step by a chosen column, ascending or descending, and optionally truncate the result to the top N rows. Sort is pipeline-internal (see /docs/concepts/pipeline-orchestration): it consumes the output of the predecessor node and emits the same columns in a new order. A common use is ordering a freshly scraped list by completeness descending and keeping the top 200 before handing the result to an email-finder or cold-outreach step. Sort never adds, removes, or rewrites columns — truncation is the only content change it can apply.

Inputs

Configuration is attached to the pipeline node. There is no standalone request body.

Field Type Required Default Description
sort_by string (enum) yes completeness Column to order by.
direction "asc" | "desc" yes desc Sort direction.
top_n integer | null no null Keep only the first top_n rows after sorting. Min 1.

Accepted values for sort_by:

Value Meaning
completeness Aggregate fill rate of a row's enrichment fields.
note Star rating of the establishment (Google Maps).
nb_avis Number of reviews on the establishment.
email_quality Quality score of the extracted email (personal > role).

top_n must be a positive integer or null. A value of null keeps every row.

Outputs

Same schema as the predecessor node — sort is declared passthrough in the pipeline graph. The downstream node sees the same columns it would have seen without the sort step, only in a different order and possibly with fewer rows.

Property Behaviour
Columns Identical to input, byte-for-byte.
Row order Determined by sort_by and direction.
Row count min(input_count, top_n) if top_n is set, otherwise equal to the input.
Pipeline type Same as the upstream source (resolved transitively across sort/filter).

Lifecycle

Standard job lifecycle — see /docs/concepts/jobs-lifecycle. A sort step is created automatically by the pipeline runner when the predecessor reaches done, and is treated as a structural pipeline step rather than a billable action.

Pipeline

Property Value
Category process
Pipelinable yes
Needs none (accepts any input)
Produces none (passthrough output)
Pipeline input type any_pois
Pipeline output type passthrough

Because the output type is passthrough, the effective downstream type is inherited from the closest non-pass-through predecessor. A sort placed after scrap exposes the same downstream contract as scrap would on its own.

Endpoints

Sort has no public REST endpoint — it is pipeline-internal (see /docs/concepts/pipeline-orchestration) and created exclusively by the pipeline runner as the predecessor node finishes, through the internal helper create_pipeline_internal_job(job_type="sort", …).

To use sort, define it as a node inside a pipeline created via the pipelines API:

Method Path Purpose
POST /api/pipelines Create a pipeline containing a sort node.
GET /api/pipelines/{id} Inspect node configuration and node statuses.

A node entry for sort looks like:

{
  "type": "sort",
  "config": {
    "sort_by": "completeness",
    "direction": "desc",
    "top_n": 200
  }
}

The runner reads config at execution time and emits the sorted CSV into the node's job directory.

Limits

Global limits — see /docs/concepts/limits. Sort is not billed (quota cost 0), runs in the standard parallel job pool, and its maximum input size is bounded by the predecessor's output, not by sort itself.

Errors

Condition Result
sort_by references an unknown column Node transitions to failed; downstream nodes stay pending.
direction is not asc or desc Node transitions to failed with a validation error.
top_n is 0 or negative Rejected at pipeline creation; the API returns 400.
Empty input Node completes as done with an empty output CSV.
Predecessor did not reach done Sort stays pending; it is never scheduled until the parent finishes.

What's next