Modules

Sort

Purpose

Reorder rows produced by a previous pipeline step by a chosen column, ascending or descending, and optionally truncate the result to the top N rows. Sort is pipeline-internal (see /docs/concepts/pipeline-orchestration): it consumes the output of the predecessor node and emits the same columns in a new order. A common use is ordering a freshly scraped list by completeness descending and keeping the top 200 before handing the result to an email-finder or cold-outreach step. Sort never adds, removes, or rewrites columns — truncation is the only content change it can apply.

Inputs

Configuration is attached to the pipeline node. There is no standalone request body.

Field	Type	Required	Default	Description
`sort_by`	string (enum)	yes	`completeness`	Column to order by.
`direction`	`"asc"` \| `"desc"`	yes	`desc`	Sort direction.
`top_n`	integer \| `null`	no	`null`	Keep only the first `top_n` rows after sorting. Min `1`.

Accepted values for sort_by:

Value	Meaning
`completeness`	Aggregate fill rate of a row's enrichment fields.
`note`	Star rating of the establishment (Google Maps).
`nb_avis`	Number of reviews on the establishment.
`email_quality`	Quality score of the extracted email (personal > role).

top_n must be a positive integer or null. A value of null keeps every row.

Outputs

Same schema as the predecessor node — sort is declared passthrough in the pipeline graph. The downstream node sees the same columns it would have seen without the sort step, only in a different order and possibly with fewer rows.

Property	Behaviour
Columns	Identical to input, byte-for-byte.
Row order	Determined by `sort_by` and `direction`.
Row count	`min(input_count, top_n)` if `top_n` is set, otherwise equal to the input.
Pipeline type	Same as the upstream source (resolved transitively across sort/filter).

Lifecycle

Standard job lifecycle — see /docs/concepts/jobs-lifecycle. A sort step is created automatically by the pipeline runner when the predecessor reaches done, and is treated as a structural pipeline step rather than a billable action.

Pipeline

Property	Value
Category	`process`
Pipelinable	yes
Needs	none (accepts any input)
Produces	none (passthrough output)
Pipeline input type	`any_pois`
Pipeline output type	`passthrough`

Because the output type is passthrough, the effective downstream type is inherited from the closest non-pass-through predecessor. A sort placed after scrap exposes the same downstream contract as scrap would on its own.

Endpoints

Sort has no public REST endpoint — it is pipeline-internal (see /docs/concepts/pipeline-orchestration) and created exclusively by the pipeline runner as the predecessor node finishes, through the internal helper create_pipeline_internal_job(job_type="sort", …).

To use sort, define it as a node inside a pipeline created via the pipelines API:

Method	Path	Purpose
POST	`/api/pipelines`	Create a pipeline containing a `sort` node.
GET	`/api/pipelines/{id}`	Inspect node configuration and node statuses.

A node entry for sort looks like:

{
  "type": "sort",
  "config": {
    "sort_by": "completeness",
    "direction": "desc",
    "top_n": 200
  }
}

The runner reads config at execution time and emits the sorted CSV into the node's job directory.

Limits

Global limits — see /docs/concepts/limits. Sort is not billed (quota cost 0), runs in the standard parallel job pool, and its maximum input size is bounded by the predecessor's output, not by sort itself.

Errors

Condition	Result
`sort_by` references an unknown column	Node transitions to `failed`; downstream nodes stay `pending`.
`direction` is not `asc` or `desc`	Node transitions to `failed` with a validation error.
`top_n` is `0` or negative	Rejected at pipeline creation; the API returns `400`.
Empty input	Node completes as `done` with an empty output CSV.
Predecessor did not reach `done`	Sort stays `pending`; it is never scheduled until the parent finishes.

What's next

Filter — drop rows that don't match a rule before, or after, sorting.
Import — bring an external CSV into a pipeline so it can be sorted.