Sort
Purpose
Reorder rows produced by a previous pipeline step by a chosen column, ascending or descending, and optionally truncate the result to the top N rows. Sort is pipeline-internal (see /docs/concepts/pipeline-orchestration): it consumes the output of the predecessor node and emits the same columns in a new order. A common use is ordering a freshly scraped list by completeness descending and keeping the top 200 before handing the result to an email-finder or cold-outreach step. Sort never adds, removes, or rewrites columns — truncation is the only content change it can apply.
Inputs
Configuration is attached to the pipeline node. There is no standalone request body.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
sort_by |
string (enum) | yes | completeness |
Column to order by. |
direction |
"asc" | "desc" |
yes | desc |
Sort direction. |
top_n |
integer | null |
no | null |
Keep only the first top_n rows after sorting. Min 1. |
Accepted values for sort_by:
| Value | Meaning |
|---|---|
completeness |
Aggregate fill rate of a row's enrichment fields. |
note |
Star rating of the establishment (Google Maps). |
nb_avis |
Number of reviews on the establishment. |
email_quality |
Quality score of the extracted email (personal > role). |
top_n must be a positive integer or null. A value of null keeps every row.
Outputs
Same schema as the predecessor node — sort is declared passthrough in the pipeline graph. The downstream node sees the same columns it would have seen without the sort step, only in a different order and possibly with fewer rows.
| Property | Behaviour |
|---|---|
| Columns | Identical to input, byte-for-byte. |
| Row order | Determined by sort_by and direction. |
| Row count | min(input_count, top_n) if top_n is set, otherwise equal to the input. |
| Pipeline type | Same as the upstream source (resolved transitively across sort/filter). |
Lifecycle
Standard job lifecycle — see /docs/concepts/jobs-lifecycle. A sort step is created automatically by the pipeline runner when the predecessor reaches done, and is treated as a structural pipeline step rather than a billable action.
Pipeline
| Property | Value |
|---|---|
| Category | process |
| Pipelinable | yes |
| Needs | none (accepts any input) |
| Produces | none (passthrough output) |
| Pipeline input type | any_pois |
| Pipeline output type | passthrough |
Because the output type is passthrough, the effective downstream type is inherited from the closest non-pass-through predecessor. A sort placed after scrap exposes the same downstream contract as scrap would on its own.
Endpoints
Sort has no public REST endpoint — it is pipeline-internal (see /docs/concepts/pipeline-orchestration) and created exclusively by the pipeline runner as the predecessor node finishes, through the internal helper create_pipeline_internal_job(job_type="sort", …).
To use sort, define it as a node inside a pipeline created via the pipelines API:
| Method | Path | Purpose |
|---|---|---|
| POST | /api/pipelines |
Create a pipeline containing a sort node. |
| GET | /api/pipelines/{id} |
Inspect node configuration and node statuses. |
A node entry for sort looks like:
{
"type": "sort",
"config": {
"sort_by": "completeness",
"direction": "desc",
"top_n": 200
}
}
The runner reads config at execution time and emits the sorted CSV into the node's job directory.
Limits
Global limits — see /docs/concepts/limits. Sort is not billed (quota cost 0), runs in the standard parallel job pool, and its maximum input size is bounded by the predecessor's output, not by sort itself.
Errors
| Condition | Result |
|---|---|
sort_by references an unknown column |
Node transitions to failed; downstream nodes stay pending. |
direction is not asc or desc |
Node transitions to failed with a validation error. |
top_n is 0 or negative |
Rejected at pipeline creation; the API returns 400. |
| Empty input | Node completes as done with an empty output CSV. |
Predecessor did not reach done |
Sort stays pending; it is never scheduled until the parent finishes. |