Modules

Import

Purpose

The import module brings external data into outsend as a pipeline source. It is pipeline-internal — see /docs/concepts/pipeline-orchestration — and produces a normalized POI list that downstream enrichment, verification, or processing nodes can consume. Unlike scrap, import consumes no extraction quota (EF cost is zero).

Inputs

The node config exposes a single discriminator, source, with three mutually exclusive modes.

Field	Type	Required	Description
`source`	`"paste"` \| `"url"` \| `"from_job"`	yes	Selects which of the three input channels below applies. Defaults to `paste` when omitted.
`text`	string	when `source = paste`	Raw CSV content. Read only in `paste` mode.
`url`	string	when `source = url`	Public spreadsheet URL. Read only in `url` mode.
`from_job_id`	string	when `source = from_job`	UUID of an existing scrap job owned by the caller. Read only in `from_job` mode.

`paste` — inline CSV

The text payload is parsed as CSV by the shared resolution layer (app/column_map.py). The delimiter is auto-detected (comma, semicolon, or tab); UTF-8 is expected, with UTF-8 BOM and Latin-1 / cp1252 accepted as fallbacks. Headers are not mandatory: column names are matched flexibly against accepted aliases (a Website, url, e-mail or raison sociale header maps to the right canonical column), and a header-less sheet is auto-detected — its columns are then inferred from their content. Either way the import emits a notice (info banner on the job page, ⓘ on the dashboard) reporting what was auto-mapped, inferred, or ignored, so the mapping is never silent.

`url` — public spreadsheet

The url payload points to a publicly readable spreadsheet (typical shape: https://docs.google.com/spreadsheets/d/.../edit#gid=0). The sheet must be shared as "anyone with the link can view" — outsend does not authenticate to third-party providers. The fetched content is parsed with the same CSV rules as paste.

`from_job` — recent scrap reuse

The from_job_id payload references a previous scrap job. The reference is validated server-side at job creation:

Constraint	Rule
Existence	The job ID must resolve to an existing job.
Ownership	The caller must own the source job.
Job type	Must be `scrap`. Other job types cannot be re-imported through this channel.
Availability	The source CSV must still be downloadable (`is_download_available`).
Recency	The source job must be less than 7 days old.

When valid, the resulting import inherits all columns produced by the source scrap.

Outputs

import produces a normalized POI list, declared in the pipeline registry as output: "pois" — the same shape scrap emits. Downstream nodes that accept pois_any (reviews, emails, socials, dead-check, techstack, ads-intelligence, brand-assets) chain directly. Nodes that require pois_email (verify) chain only if the imported CSV already carries an email column.

The column set is dynamic: it mirrors whatever the source provides. The registry declares needs: [] and produces: [] for this reason — the module is permissive on input and propagates the input schema as output.

Lifecycle

Standard job lifecycle — see /docs/concepts/jobs-lifecycle. The job is linked to its pipeline via pipeline_id and pipeline_node_id and runs as soon as the pipeline transitions to running.

Pipeline

import is a root node. It accepts no upstream edges. Any node whose input is pois_any, any_pois, or pois_email (when the CSV carries emails) can be wired downstream.

Direction	Compatible types
Upstream	none — `import` is a `ROOT_TYPE` alongside `scrap`
Downstream	`reviews`, `emails`, `verify` (with email column), `socials`, `dead_check`, `techstack`, `ads_intelligence`, `brand_assets`, `filter`, `sort`

Registry: needs: [], produces: [].

Endpoints

The import module is not exposed as a standalone job endpoint — it is pipeline-internal (see /docs/concepts/pipeline-orchestration) and created only as a pipeline root. Two adjacent endpoints are useful when assembling an import:

Method	Path	Purpose
`POST`	`/api/jobs/parse-list`	Validates CSV input before submission. Accepts either `{"text": "..."}` JSON or a `multipart/form-data` upload with a `file` field. Returns `{count, with_lien_google_maps, with_site_web, sample, items, delimiter}`.
`GET`	`/api/jobs/{job_id}/items`	Returns the CSV rows of a finished `scrap` job in a structure suitable for `from_job` reuse.

The pipeline node payload itself follows this shape:

{
  "type": "import",
  "config": {
    "source": "paste",
    "text": "nom,site_web\n...",
    "url": "",
    "from_job_id": ""
  }
}

Exactly one of text, url, from_job_id is read, determined by source. The unused fields are persisted as empty strings.

Limits

Global limits — see /docs/concepts/limits. Module-specific:

Limit	Value
`from_job` recency	7 days. The source job is rejected past that window.
`from_job` source type	`scrap` only.
Supported formats	CSV with comma, semicolon, or tab delimiter. Encodings: UTF-8 (preferred), UTF-8 with BOM, Latin-1 / cp1252 (fallback). Headers optional — a header-less sheet is auto-detected and its columns inferred from content.

Errors

Condition	Surface	Message shape
`source` not in `{paste, url, from_job}`	Pipeline creation	`Source d'import invalide : <value> (attendu: paste \\| url \\| from_job)`
`from_job` without `from_job_id`	Pipeline creation	`Source 'from_job' : aucun job sélectionné`
`from_job_id` unknown	Pipeline creation	`Job source introuvable : <id>`
`from_job` source not owned by caller	Pipeline creation	`Job source non autorisé pour cet utilisateur`
`from_job` source not a scrap	Pipeline creation	`Seuls les scraps Gmaps peuvent être importés via 'from_job'`
`from_job` source CSV unavailable	Pipeline creation	`Le CSV du job source n'est pas (ou plus) disponible`
`from_job` source older than 7 days	Pipeline creation	`Le job source a plus de 7 jours — relancez un scrap ou collez le CSV.`
Empty paste payload	`parse-list`	HTTP 400, `Aucun texte fourni`
CSV parse failure	`parse-list`	HTTP 400, `CSV invalide: <detail>`
Zero parsed rows	`parse-list`	HTTP 400, `Aucune ligne lue dans le CSV`
Multipart upload missing file	`parse-list`	HTTP 400, `Aucun fichier fourni`
URL unreachable or non-CSV response	Pipeline execution	The import job transitions to `failed`; the message names the unreachable source.
Private spreadsheet (login page returned instead of CSV)	Pipeline execution	The import fails loudly with an explanation instead of silently succeeding — the content was HTML (a sign-in page), not CSV. Share the sheet as "anyone with the link can view".
Empty, header-only, or nothing exploitable	Pipeline execution / `parse-list`	The import fails with an explanation (no usable rows) rather than reporting a misleading success.

What's next

Module	Use it to
filter	Narrow the imported list by column predicates before paying for downstream enrichment.
sort	Order the imported list — useful when combined with row limits in later steps.