FR
Copied
Modules

Import

Purpose

The import module brings external data into outsend as a pipeline source. It is pipeline-internal — see /docs/concepts/pipeline-orchestration — and produces a normalized POI list that downstream enrichment, verification, or processing nodes can consume. Unlike scrap, import consumes no extraction quota (EF cost is zero).

Inputs

The node config exposes a single discriminator, source, with three mutually exclusive modes.

Field Type Required Description
source "paste" | "url" | "from_job" yes Selects which of the three input channels below applies. Defaults to paste when omitted.
text string when source = paste Raw CSV content. Read only in paste mode.
url string when source = url Public spreadsheet URL. Read only in url mode.
from_job_id string when source = from_job UUID of an existing scrap job owned by the caller. Read only in from_job mode.

paste — inline CSV

The text payload is parsed as CSV by the shared resolution layer (app/column_map.py). The delimiter is auto-detected (comma, semicolon, or tab); UTF-8 is expected, with UTF-8 BOM and Latin-1 / cp1252 accepted as fallbacks. Headers are not mandatory: column names are matched flexibly against accepted aliases (a Website, url, e-mail or raison sociale header maps to the right canonical column), and a header-less sheet is auto-detected — its columns are then inferred from their content. Either way the import emits a notice (info banner on the job page, ⓘ on the dashboard) reporting what was auto-mapped, inferred, or ignored, so the mapping is never silent.

url — public spreadsheet

The url payload points to a publicly readable spreadsheet (typical shape: https://docs.google.com/spreadsheets/d/.../edit#gid=0). The sheet must be shared as "anyone with the link can view" — outsend does not authenticate to third-party providers. The fetched content is parsed with the same CSV rules as paste.

from_job — recent scrap reuse

The from_job_id payload references a previous scrap job. The reference is validated server-side at job creation:

Constraint Rule
Existence The job ID must resolve to an existing job.
Ownership The caller must own the source job.
Job type Must be scrap. Other job types cannot be re-imported through this channel.
Availability The source CSV must still be downloadable (is_download_available).
Recency The source job must be less than 7 days old.

When valid, the resulting import inherits all columns produced by the source scrap.

Outputs

import produces a normalized POI list, declared in the pipeline registry as output: "pois" — the same shape scrap emits. Downstream nodes that accept pois_any (reviews, emails, socials, dead-check, techstack, ads-intelligence, brand-assets) chain directly. Nodes that require pois_email (verify) chain only if the imported CSV already carries an email column.

The column set is dynamic: it mirrors whatever the source provides. The registry declares needs: [] and produces: [] for this reason — the module is permissive on input and propagates the input schema as output.

Lifecycle

Standard job lifecycle — see /docs/concepts/jobs-lifecycle. The job is linked to its pipeline via pipeline_id and pipeline_node_id and runs as soon as the pipeline transitions to running.

Pipeline

import is a root node. It accepts no upstream edges. Any node whose input is pois_any, any_pois, or pois_email (when the CSV carries emails) can be wired downstream.

Direction Compatible types
Upstream none — import is a ROOT_TYPE alongside scrap
Downstream reviews, emails, verify (with email column), socials, dead_check, techstack, ads_intelligence, brand_assets, filter, sort

Registry: needs: [], produces: [].

Endpoints

The import module is not exposed as a standalone job endpoint — it is pipeline-internal (see /docs/concepts/pipeline-orchestration) and created only as a pipeline root. Two adjacent endpoints are useful when assembling an import:

Method Path Purpose
POST /api/jobs/parse-list Validates CSV input before submission. Accepts either {"text": "..."} JSON or a multipart/form-data upload with a file field. Returns {count, with_lien_google_maps, with_site_web, sample, items, delimiter}.
GET /api/jobs/{job_id}/items Returns the CSV rows of a finished scrap job in a structure suitable for from_job reuse.

The pipeline node payload itself follows this shape:

{
  "type": "import",
  "config": {
    "source": "paste",
    "text": "nom,site_web\n...",
    "url": "",
    "from_job_id": ""
  }
}

Exactly one of text, url, from_job_id is read, determined by source. The unused fields are persisted as empty strings.

Limits

Global limits — see /docs/concepts/limits. Module-specific:

Limit Value
from_job recency 7 days. The source job is rejected past that window.
from_job source type scrap only.
Supported formats CSV with comma, semicolon, or tab delimiter. Encodings: UTF-8 (preferred), UTF-8 with BOM, Latin-1 / cp1252 (fallback). Headers optional — a header-less sheet is auto-detected and its columns inferred from content.

Errors

Condition Surface Message shape
source not in {paste, url, from_job} Pipeline creation Source d'import invalide : <value> (attendu: paste \| url \| from_job)
from_job without from_job_id Pipeline creation Source 'from_job' : aucun job sélectionné
from_job_id unknown Pipeline creation Job source introuvable : <id>
from_job source not owned by caller Pipeline creation Job source non autorisé pour cet utilisateur
from_job source not a scrap Pipeline creation Seuls les scraps Gmaps peuvent être importés via 'from_job'
from_job source CSV unavailable Pipeline creation Le CSV du job source n'est pas (ou plus) disponible
from_job source older than 7 days Pipeline creation Le job source a plus de 7 jours — relancez un scrap ou collez le CSV.
Empty paste payload parse-list HTTP 400, Aucun texte fourni
CSV parse failure parse-list HTTP 400, CSV invalide: <detail>
Zero parsed rows parse-list HTTP 400, Aucune ligne lue dans le CSV
Multipart upload missing file parse-list HTTP 400, Aucun fichier fourni
URL unreachable or non-CSV response Pipeline execution The import job transitions to failed; the message names the unreachable source.
Private spreadsheet (login page returned instead of CSV) Pipeline execution The import fails loudly with an explanation instead of silently succeeding — the content was HTML (a sign-in page), not CSV. Share the sheet as "anyone with the link can view".
Empty, header-only, or nothing exploitable Pipeline execution / parse-list The import fails with an explanation (no usable rows) rather than reporting a misleading success.

What's next

Module Use it to
filter Narrow the imported list by column predicates before paying for downstream enrichment.
sort Order the imported list — useful when combined with row limits in later steps.