Import
Purpose
The import module brings external data into outsend as a pipeline source. It is pipeline-internal — see /docs/concepts/pipeline-orchestration — and produces a normalized POI list that downstream enrichment, verification, or processing nodes can consume. Unlike scrap, import consumes no extraction quota (EF cost is zero).
Inputs
The node config exposes a single discriminator, source, with three mutually exclusive modes.
| Field | Type | Required | Description |
|---|---|---|---|
source |
"paste" | "url" | "from_job" |
yes | Selects which of the three input channels below applies. Defaults to paste when omitted. |
text |
string | when source = paste |
Raw CSV content. Read only in paste mode. |
url |
string | when source = url |
Public spreadsheet URL. Read only in url mode. |
from_job_id |
string | when source = from_job |
UUID of an existing scrap job owned by the caller. Read only in from_job mode. |
paste — inline CSV
The text payload is parsed as CSV by the shared resolution layer (app/column_map.py). The delimiter is auto-detected (comma, semicolon, or tab); UTF-8 is expected, with UTF-8 BOM and Latin-1 / cp1252 accepted as fallbacks. Headers are not mandatory: column names are matched flexibly against accepted aliases (a Website, url, e-mail or raison sociale header maps to the right canonical column), and a header-less sheet is auto-detected — its columns are then inferred from their content. Either way the import emits a notice (info banner on the job page, ⓘ on the dashboard) reporting what was auto-mapped, inferred, or ignored, so the mapping is never silent.
url — public spreadsheet
The url payload points to a publicly readable spreadsheet (typical shape: https://docs.google.com/spreadsheets/d/.../edit#gid=0). The sheet must be shared as "anyone with the link can view" — outsend does not authenticate to third-party providers. The fetched content is parsed with the same CSV rules as paste.
from_job — recent scrap reuse
The from_job_id payload references a previous scrap job. The reference is validated server-side at job creation:
| Constraint | Rule |
|---|---|
| Existence | The job ID must resolve to an existing job. |
| Ownership | The caller must own the source job. |
| Job type | Must be scrap. Other job types cannot be re-imported through this channel. |
| Availability | The source CSV must still be downloadable (is_download_available). |
| Recency | The source job must be less than 7 days old. |
When valid, the resulting import inherits all columns produced by the source scrap.
Outputs
import produces a normalized POI list, declared in the pipeline registry as output: "pois" — the same shape scrap emits. Downstream nodes that accept pois_any (reviews, emails, socials, dead-check, techstack, ads-intelligence, brand-assets) chain directly. Nodes that require pois_email (verify) chain only if the imported CSV already carries an email column.
The column set is dynamic: it mirrors whatever the source provides. The registry declares needs: [] and produces: [] for this reason — the module is permissive on input and propagates the input schema as output.
Lifecycle
Standard job lifecycle — see /docs/concepts/jobs-lifecycle. The job is linked to its pipeline via pipeline_id and pipeline_node_id and runs as soon as the pipeline transitions to running.
Pipeline
import is a root node. It accepts no upstream edges. Any node whose input is pois_any, any_pois, or pois_email (when the CSV carries emails) can be wired downstream.
| Direction | Compatible types |
|---|---|
| Upstream | none — import is a ROOT_TYPE alongside scrap |
| Downstream | reviews, emails, verify (with email column), socials, dead_check, techstack, ads_intelligence, brand_assets, filter, sort |
Registry: needs: [], produces: [].
Endpoints
The import module is not exposed as a standalone job endpoint — it is pipeline-internal (see /docs/concepts/pipeline-orchestration) and created only as a pipeline root. Two adjacent endpoints are useful when assembling an import:
| Method | Path | Purpose |
|---|---|---|
POST |
/api/jobs/parse-list |
Validates CSV input before submission. Accepts either {"text": "..."} JSON or a multipart/form-data upload with a file field. Returns {count, with_lien_google_maps, with_site_web, sample, items, delimiter}. |
GET |
/api/jobs/{job_id}/items |
Returns the CSV rows of a finished scrap job in a structure suitable for from_job reuse. |
The pipeline node payload itself follows this shape:
{
"type": "import",
"config": {
"source": "paste",
"text": "nom,site_web\n...",
"url": "",
"from_job_id": ""
}
}
Exactly one of text, url, from_job_id is read, determined by source. The unused fields are persisted as empty strings.
Limits
Global limits — see /docs/concepts/limits. Module-specific:
| Limit | Value |
|---|---|
from_job recency |
7 days. The source job is rejected past that window. |
from_job source type |
scrap only. |
| Supported formats | CSV with comma, semicolon, or tab delimiter. Encodings: UTF-8 (preferred), UTF-8 with BOM, Latin-1 / cp1252 (fallback). Headers optional — a header-less sheet is auto-detected and its columns inferred from content. |
Errors
| Condition | Surface | Message shape |
|---|---|---|
source not in {paste, url, from_job} |
Pipeline creation | Source d'import invalide : <value> (attendu: paste \| url \| from_job) |
from_job without from_job_id |
Pipeline creation | Source 'from_job' : aucun job sélectionné |
from_job_id unknown |
Pipeline creation | Job source introuvable : <id> |
from_job source not owned by caller |
Pipeline creation | Job source non autorisé pour cet utilisateur |
from_job source not a scrap |
Pipeline creation | Seuls les scraps Gmaps peuvent être importés via 'from_job' |
from_job source CSV unavailable |
Pipeline creation | Le CSV du job source n'est pas (ou plus) disponible |
from_job source older than 7 days |
Pipeline creation | Le job source a plus de 7 jours — relancez un scrap ou collez le CSV. |
| Empty paste payload | parse-list |
HTTP 400, Aucun texte fourni |
| CSV parse failure | parse-list |
HTTP 400, CSV invalide: <detail> |
| Zero parsed rows | parse-list |
HTTP 400, Aucune ligne lue dans le CSV |
| Multipart upload missing file | parse-list |
HTTP 400, Aucun fichier fourni |
| URL unreachable or non-CSV response | Pipeline execution | The import job transitions to failed; the message names the unreachable source. |
| Private spreadsheet (login page returned instead of CSV) | Pipeline execution | The import fails loudly with an explanation instead of silently succeeding — the content was HTML (a sign-in page), not CSV. Share the sheet as "anyone with the link can view". |
| Empty, header-only, or nothing exploitable | Pipeline execution / parse-list |
The import fails with an explanation (no usable rows) rather than reporting a misleading success. |
What's next
| Module | Use it to |
|---|---|
| filter | Narrow the imported list by column predicates before paying for downstream enrichment. |
| sort | Order the imported list — useful when combined with row limits in later steps. |