# outsend — Full documentation bundle Every page of the outsend public documentation, concatenated for ingestion by LLMs. Each page is delimited by ``. --- title: outsend documentation slug: section: summary: Technical reference for outsend — modules, pipelines, monitoring, API. Built for developers and AI assistants. --- This documentation describes the **public contracts** of every outsend module — what each one accepts as input, what it returns, how it behaves over time, and how modules chain into pipelines. The goal is twofold: 1. **Help integrators and power users** understand what each module does and how to drive it from the UI or the API. 2. **Be readable by AI assistants** — every page is plain markdown, downloadable in bulk, exposed through the `llms.txt` standard. ## How to read it - **Concepts** — start here if you're new. Covers what a *job*, a *pipeline*, and a *veille* are, plus the lifecycle and the events they emit. - **Modules** — one page per module (19 active + 4 on-demand). Each page is structured: Purpose → Inputs → Outputs → Lifecycle → Limits → Errors. - **API reference** — every REST endpoint, grouped by domain. - **Integration** — bring-your-own-key (BYOK), MCP server (planned), `llms.txt`. ## Copy everything in one click The **Copy** button in the top-right corner of every page lets you grab: - The current page (raw markdown) - The current section (e.g. all modules pages) - **The entire documentation** — a single concatenated markdown bundle, ready to paste into Claude, ChatGPT, Cursor, or any AI assistant. There is also a stable LLM index at [`/docs/llms.txt`](/docs/llms.txt) and the full bundle at [`/docs/llms-full.txt`](/docs/llms-full.txt) — both follow the [llms.txt](https://llmstxt.org) standard, so most AI tools detect them automatically. ## Scope This documentation describes **what outsend exposes**, not how it is built internally. Implementation details — scraping stack, proxy infrastructure, DOM selectors, timing heuristics, exact success rates — are intentionally omitted. They are not stable contracts and would not help you integrate. If something you need is missing, write to [support@outsend.xyz](mailto:support@outsend.xyz). ## Quick links - [What is outsend](/docs/what-is-outsend) - [Quickstart](/docs/quickstart) - [Jobs & lifecycle](/docs/concepts/jobs-lifecycle) - [Module registry](/docs/concepts/module-registry) - [API overview](/docs/api/overview) --- title: Authentication slug: api/auth section: API summary: Session cookie issuance, credential management, email verification, and GDPR self-service endpoints under /api/auth. --- # Authentication The Authentication API issues and revokes session cookies, manages credentials, verifies email ownership, and exposes the GDPR self-service endpoints. All routes are mounted under `/api/auth` and respond with JSON unless noted. ## Session cookie Successful `signup`, `login`, and `password/change` calls set an `outsend_session` cookie: | Attribute | Value | |-----------|-------| | Name | `outsend_session` | | TTL | 7 days (`SESSION_DURATION_DAYS = 7`) | | `HttpOnly` | true | | `Secure` | true (production) | | `SameSite` | `Lax` | | `Path` | `/` | The cookie is a signed token bound to a row in `sessions`. Revoking a session (logout, password change, account delete) deletes the row server-side even if the cookie is replayed. ## Rate limits and errors Each endpoint applies per-IP and per-identity windows (see [Limits](/docs/concepts/limits)). Exhaustion returns `429 Too Many Requests` with a French message containing the retry-after delay in seconds. All errors follow FastAPI's `{ "detail": "" }` shape. Generic codes: `400` (invalid payload, expired token, wrong current password, captcha failure), `401` (bad credentials or missing session on protected routes), `429` (rate limit). Endpoint-specific `detail` messages are listed inline below. --- ## POST /api/auth/signup Creates a user, sends the welcome + verification email, and opens a session. No auth. Rate limit: 3 / hour / IP. ### Request body | Field | Type | Notes | |-------|------|-------| | `email` | string (email) | Required. | | `password` | string | 8 to 128 chars, must contain a letter AND a digit/symbol. | | `invitation_code` | string | 1 to 64 chars. Alpha is invite-only. | | `accept_responsibility` | boolean | Must be `true`. | | `hcaptcha_token` | string or null | Required when `HCAPTCHA_SECRET` is configured. | ```json { "email": "ada@example.com", "password": "lovelace-1843", "invitation_code": "ALPHA-7K2", "accept_responsibility": true, "hcaptcha_token": "10000000-aaaa-bbbb-cccc-000000000001" } ``` ### Response — `200 OK` ```json { "ok": true, "user": { "id": 42, "email": "ada@example.com", "is_admin": false, "is_active": true, "email_verified": false, "created_at": "2026-05-27T09:14:00Z" } } ``` Sets the `outsend_session` cookie. ### Specific errors | Status | Detail | |--------|--------| | `400` | `Captcha invalide. Réessaie.` | | `400` | `Code invitation invalide` | | `400` | `Email existe déjà` | --- ## POST /api/auth/login Validates credentials and opens a session. No auth. Rate limit: 5 / 15 min / IP and 5 / 15 min / email. ### Request body | Field | Type | Notes | |-------|------|-------| | `email` | string (email) | Required. | | `password` | string | 1 to 128 chars. | ```json { "email": "ada@example.com", "password": "lovelace-1843" } ``` ### Response — `200 OK` ```json { "ok": true, "user": { "id": 42, "email": "ada@example.com", "is_admin": false, "is_active": true, "email_verified": true, "created_at": "2026-05-27T09:14:00Z" } } ``` Sets the `outsend_session` cookie. ### Specific errors | Status | Detail | |--------|--------| | `401` | `Email ou mot de passe incorrect` | | `401` | `Compte désactivé` | --- ## POST /api/auth/logout Revokes the current session and clears the cookie. Auth optional. Empty body. Response: `200 OK` `{ "ok": true }`. --- ## GET /api/auth/me Returns the currently authenticated user. ```json { "id": 42, "email": "ada@example.com", "is_admin": false, "is_active": true, "email_verified": true, "created_at": "2026-05-27T09:14:00Z" } ``` --- ## POST /api/auth/password/reset-request Sends a reset link to the email if (and only if) it matches an active user. The response is identical in every case to prevent account enumeration. No auth. Rate limit: 3 / hour / IP and 3 / hour / email (silent when exhausted). ### Request body ```json { "email": "ada@example.com" } ``` ### Response — `200 OK` ```json { "ok": true } ``` --- ## POST /api/auth/password/reset-confirm Consumes a single-use reset token and sets the new password. Revokes every existing session for the user. No auth (the token is the credential). ### Request body | Field | Type | Notes | |-------|------|-------| | `token` | string | 10 to 256 chars, delivered by email. | | `new_password` | string | 8 to 128 chars, letter + digit/symbol. | ```json { "token": "eyJ...", "new_password": "babbage-1822" } ``` ### Response — `200 OK` ```json { "ok": true } ``` ### Specific errors | Status | Detail | |--------|--------| | `400` | `Lien invalide ou expiré` | | `422` | Password complexity rejected by validator. | --- ## POST /api/auth/password/change Rotates the password for a logged-in user. Requires the current password, revokes other sessions, issues a fresh cookie. Rate limit: 5 / hour / user. ### Request body | Field | Type | Notes | |-------|------|-------| | `current_password` | string | 1 to 200 chars. | | `new_password` | string | 8 to 200 chars, must differ from current. | ```json { "current_password": "lovelace-1843", "new_password": "babbage-1822" } ``` ### Response — `200 OK` ```json { "ok": true } ``` Sets a refreshed `outsend_session` cookie. ### Specific errors | Status | Detail | |--------|--------| | `400` | `Mot de passe actuel incorrect` | | `400` | `Le nouveau mot de passe doit être différent de l'actuel` | --- ## POST /api/auth/email/verify Consumes a single-use verification token and flips `email_verified` to `true`. No auth. ### Request body ```json { "token": "eyJ..." } ``` ### Response — `200 OK` ```json { "ok": true } ``` Specific error: `400 Lien de vérification invalide ou expiré`. --- ## POST /api/auth/email/resend-verify Re-sends the verification email to the authenticated user. Idempotent when the address is already verified. Empty body. Rate limit: 3 / hour / user. ### Response — `200 OK` ```json { "ok": true } ``` or, if already verified: ```json { "ok": true, "already_verified": true } ``` --- ## DELETE /api/auth/me Permanently deletes the account and every owned record (jobs, pipelines, surveillances, sessions, tokens). Job files on disk are purged after the cascading DB delete. Feedback threads are anonymised rather than removed. ### Request body | Field | Type | Notes | |-------|------|-------| | `confirm_email` | string | Must equal the user's email (case-insensitive). | ```json { "confirm_email": "ada@example.com" } ``` ### Response — `204 No Content` Empty body. Clears the `outsend_session` cookie. Specific error: `400 Confirmation email incorrecte`. --- ## GET /api/auth/me/export GDPR portability endpoint. Streams a ZIP archive containing every record owned by the user. ### Response — `200 OK` `Content-Type: application/zip` `Content-Disposition: attachment; filename="outsend-export--.zip"` Archive layout: | Entry | Contents | |-------|----------| | `account.json` | Account metadata, no secrets. | | `jobs.json` | All jobs with metadata. | | `jobs//*` | CSV/JSON outputs for every `done` job. | | `pipelines.json` | Pipeline definitions. | | `veille.json` | `recurring_scraps` + run history. | | `manifest.txt` | Human-readable summary. | --- title: Feedback API slug: api/feedback section: API summary: In-app chat with the platform admin and entry point for on-demand module activation requests. --- # Feedback API The Feedback API powers the in-app chat between an authenticated user and the platform admin. It also doubles as the entry point for on-demand module activation requests: clicking "Request" on a stub module (email, SMS, WhatsApp, phone carrier) opens a feedback thread with a dedicated `topic`, which surfaces in the admin dashboard's "On demand" inbox. A thread is a stable conversation pinned to a `topic`. Every reply is a `feedback_message` row scoped to that thread. Read state is tracked per role (user, admin) so each side sees only its own unread badge. All endpoints require an authenticated caller. Generic errors: `401` (not authenticated), `404` (thread does not exist). Endpoint-specific causes are listed inline. ## Topic conventions The `topic` field on a thread is a free-form string capped at 64 chars, but the product follows a small set of conventions: | Topic value | Meaning | | ------------------------ | ------------------------------------------------ | | `general` | Default. Catch-all chat. | | `feedback` | Generic product feedback. | | `bug` | Bug report. | | `feature` | Feature request. | | `on_demand_email` | Activation request for the email-campaign stub. | | `on_demand_sms` | Activation request for the SMS-campaign stub. | | `on_demand_whatsapp` | Activation request for the WhatsApp stub. | | `on_demand_phone_carrier`| Activation request for the phone-carrier stub. | Any `topic` matching `on_demand_*` is picked up by the admin endpoint `GET /api/admin/feedback/on-demand`, which groups threads by topic and exposes open counts. The on-demand stubs are listed in the module registry under `on_demand`; a client can read the registry and build `topic = "on_demand_" + slug`. The shorter `type` field (`bug`, `feature`, `other`) is independent of topic and only carries the coarse intent for sorting. --- ## POST /api/feedback/threads Create a new thread together with its first message. Rate limit: 20 threads per user per hour. | Field | Type | Notes | | -------------- | -------- | ------------------------------------------------- | | `type` | string | `bug`, `feature`, or `other`. Defaults to `other`.| | `message` | string | 3 to 5000 chars. The first message body. | | `topic` | string | Optional. Defaults to `general`. Max 64 chars. | ### Request ```json POST /api/feedback/threads { "type": "feature", "topic": "on_demand_whatsapp", "message": "Sending WhatsApp follow-ups to scraped leads would be useful." } ``` ### Response — 201 Created ```json { "id": 142, "user_id": 7, "user_email": "user@example.com", "type": "feature", "status": "open", "created_at": "2026-05-27 10:11:12", "last_read_user": "2026-05-27 10:11:12", "last_read_admin": null, "messages": [ { "id": 991, "author_role": "user", "author_user_id": 7, "message": "Sending WhatsApp follow-ups to scraped leads would be useful.", "created_at": "2026-05-27 10:11:12" } ], "preview": "Sending WhatsApp follow-ups to scraped leads would be useful.", "last_message_at": "2026-05-27 10:11:12", "unread_for_me": 0 } ``` Specific causes: `400` `type` not in `{bug, feature, other}`; `422` `message` shorter than 3 or longer than 5000; `429` more than 20 threads in the last hour. --- ## POST /api/feedback/threads/{thread_id}/messages Append a reply to an existing thread. The caller must own the thread, and the thread must not be `closed`. Posting a message also marks the thread as read for the user side. ### Request ```json POST /api/feedback/threads/142/messages { "message": "Adding more context: opt-out tracking would also be required." } ``` ### Response — 201 Created Returns the full serialized thread, identical in shape to the `POST /threads` response, with the appended message included. Specific causes: `400` thread is `closed`; `403` caller does not own the thread; `422` message empty or longer than 5000. --- ## GET /api/feedback/threads List the caller's threads, newest first. Capped at 100 rows. Each entry embeds the full message list so the client can render previews and unread counts without a second round trip. ### Response — 200 OK ```json [ { "id": 142, "user_id": 7, "user_email": "user@example.com", "type": "feature", "status": "open", "created_at": "2026-05-27 10:11:12", "last_read_user": "2026-05-27 10:11:12", "last_read_admin": null, "messages": [ /* ... */ ], "preview": "Sending WhatsApp follow-ups...", "last_message_at": "2026-05-27 10:11:12", "unread_for_me": 0 } ] ``` The `unread_for_me` counter reflects admin replies not yet seen, computed from `last_read_user`. The companion endpoint `GET /api/feedback/unread` returns the same number aggregated across every thread, ready to bind to a header badge. --- title: Jobs API slug: api/jobs section: API summary: Unified surface for every workload Outsend runs — source acquisition, enrichment, verification, reporting. --- # Jobs API The Jobs API is the unified surface for every workload Outsend runs on a tenant's behalf: source acquisition (`scrap`) and the enrichment, verification and reporting modules that operate on the resulting items. A job is the only billable unit. See also: - [Jobs lifecycle](/docs/concepts/jobs-lifecycle) — pending → running → done | failed | cancelled | expired - [States and events](/docs/concepts/states-and-events) — SSE event payload reference - [Limits](/docs/concepts/limits) — EF quota, per-job caps, retention All endpoints require an authenticated session cookie. Endpoints that create or mutate jobs additionally require an active user; `POST /api/jobs` and `POST /api/jobs/resume` also require a verified email address. Admin-only routes (`/api/admin/*`, `/api/jobs/queue`) are not documented here. ## Conventions | Item | Value | |---|---| | Base URL | `https://outsend.xyz` | | Auth | Session cookie (`outsend_session`) | | Content-Type | `application/json` for POST bodies | | Job identifier | Opaque string (`job.id`), stable for the lifetime of the job | | Timestamps | ISO 8601 UTC | ### The `JobPublic` object Every endpoint that returns a job returns the same shape: ```json { "id": "j_01HXYZ...", "job_type": "scrap", "queries": ["dentiste"], "zones": ["Paris", "75015"], "include_reviews": false, "status": "running", "grid_points_count": 412, "processed_points": 87, "results_count": 64, "error_count": 0, "ef_cost": 0.041, "created_at": "2026-05-27T09:12:03Z", "started_at": "2026-05-27T09:12:05Z", "completed_at": null, "expires_at": "2026-06-26T09:12:03Z", "error_message": null, "output_filename": null, "download_available": false, "source_job_id": null, "pipeline_id": null, "email_mode": null, "breakdown": { "by_query": {"dentiste": 64}, "by_zone": {"Paris": 64} }, "dead_queries": [], "flagged_tiles_count": 0, "total_attempts_count": 87, "query_stats": { "dentiste": { "tiles": 87, "with_results": 71 } } } ``` `status` is one of `pending | running | done | failed | cancelled | expired`. ### Errors All endpoints return `{"detail": "..."}` (or `{"detail": {"message": ..., "errors": [...]}}` for validation errors). Generic codes: `401` not authenticated, `403` not authorised (other tenant or unverified email), `404` not found, `422` Pydantic validation. Endpoint-specific causes are listed inline. --- ## Create a job (generic) ``` POST /api/jobs ``` Creates a `scrap` job — the canonical source acquisition workload that runs queries across a geographic grid. For every other workload, use the typed shortcut described below; passing a `type` field to `POST /api/jobs` is **not** supported. **Request body** ```json { "queries": ["dentiste", "orthodontiste"], "zones": ["Paris", "75015", "Lyon 2e"], "include_reviews": false, "extra_columns": ["gps", "departement", "region"] } ``` | Field | Type | Notes | |---|---|---| | `queries` | `string[]` (1..20) | Each item ≤ 200 chars, trimmed, deduplicated | | `zones` | `string[]` (1..50) | City names, postal codes, or arrondissements; resolved server-side | | `include_reviews` | `boolean` | If `true`, fetches the latest reviews per POI (raises EF cost) | | `extra_columns` | `string[]` | Optional output columns, off by default. Allowed: `gps` (adds exact `lat`/`lon`), `departement`, `region`. Unknown values are ignored. See [the scrap module](/docs/modules/scrap). | **Response** — `200 OK`, a `JobPublic` in `pending` status. Specific causes: `400` zone parsing failed / EF quota exceeded / empty grid; `403` email not verified. --- ## Create a job (typed shortcut) Every enrichment, verification and report module has a dedicated endpoint that accepts the items it operates on. Each shortcut returns a `JobPublic` whose `job_type` is fixed to the module slug. ``` POST /api/jobs/{type} ``` | `type` | Purpose | Module doc | |---|---|---| | `reviews` | Pull the latest reviews for each POI | [reviews](/docs/modules/reviews) | | `emails` | Discover contact emails from each POI's website | [emails](/docs/modules/emails) | | `verify-emails` | Anti-bounce verification (no VPN) | [verify-emails](/docs/modules/verify-emails) | | `socials` | Detect linked social network profiles | [socials](/docs/modules/socials) | | `phones-extra` | Find additional phone numbers beyond the Maps listing | [phones-extra](/docs/modules/phones-extra) | | `legal-ids` | Extract SIRET / SIREN from the website | [legal-ids](/docs/modules/legal-ids) | | `legal-mentions` | Parse the legal-notice page (capital, RCS, …) | [legal-mentions](/docs/modules/legal-mentions) | | `legal-data` | Enrich via SIRENE / INPI (`api.gouv.fr`) | [legal-data](/docs/modules/legal-data) | | `pricing` | Extract SaaS / B2B pricing | [pricing](/docs/modules/pricing) | | `techstack` | Detect CMS, frameworks, analytics, payment, CRM | [techstack](/docs/modules/techstack) | | `pagespeed` | Score via Google PSI API v5 | [pagespeed](/docs/modules/pagespeed) | | `ads-intelligence` | Marketing/ads profiling (pixels, CMP, retargeting) | [ads-intelligence](/docs/modules/ads-intelligence) | | `brand-assets` | Logo, favicon, palette, optional screenshot | [brand-assets](/docs/modules/brand-assets) | | `dead-check` | Flag dead sites (DNS, parking, default-server, SSL) | [dead-check](/docs/modules/dead-check) | | `delivery-check` | Gmail Inbox / Promotions / Spam placement test | [delivery-check](/docs/modules/delivery-check) | **Request body (shape shared by every item-driven module)** ```json { "items": [ { "nom": "Cabinet Dupont", "site_web": "https://dupont-dentiste.fr", "ville": "Paris" } ], "source_job_id": "j_01HXYZ..." } ``` | Field | Type | Notes | |---|---|---| | `items` | `dict[]` (1..10 000) | Module-specific keys; usually a subset of a previous job's CSV | | `source_job_id` | `string?` | Chains the new job to a previous job, used for traceability and billing display | **Module-specific overrides** - `POST /api/jobs/emails` — accepts `mode: "normal" | "deep"` (default `normal`). - `POST /api/jobs/brand-assets` — accepts `capture_screenshot: boolean` (default `false`, ~5× slower per item when on). - `POST /api/jobs/delivery-check` — does **not** take `items`. Body: ```json { "domain": "example.com", "subject_filter": "outsend" } ``` **Response** — `200 OK`, a `JobPublic` in `pending` status. Additional cause: `422` if `items` is empty, too large, or missing keys required by the module. --- ## List jobs ``` GET /api/jobs?limit={n}&offset={n} ``` Returns the authenticated user's jobs, most recent first. | Param | Type | Default | Range | |---|---|---|---| | `limit` | `int` | `100` | clamped to `[1, 500]` | | `offset` | `int` | `0` | `≥ 0` | **Response** — `200 OK`, `JobPublic[]`. --- ## Get a job ``` GET /api/jobs/{id} ``` **Response** — `200 OK`, a single `JobPublic`. Includes live counters (`processed_points`, `results_count`, `query_stats`, `breakdown`) that the dashboard polls between SSE events. --- ## Stream live progress (SSE) ``` GET /api/jobs/{id}/stream?since={log_id} ``` Server-Sent Events stream that emits status transitions, log lines and counter updates as the worker progresses. Reconnects honour the `Last-Event-ID` header automatically; the `since` query param is a fallback for clients that don't speak SSE natively. Event taxonomy (`status`, `log`, `progress`, `done`, `error`) and payload shapes are documented in [States and events](/docs/concepts/states-and-events). **Headers returned** ``` Content-Type: text/event-stream Cache-Control: no-cache X-Accel-Buffering: no ``` --- ## List a job's items ``` GET /api/jobs/{id}/items?offset={n}&limit={n} ``` Returns the rows of the job's output CSV as JSON, for chaining into an enrichment job. Only available for jobs whose `status == "done"` and whose `job_type` produces a reusable CSV (i.e. not `delivery_check` and not `viewport_test`). **Response** — `200 OK` ```json { "count": 412, "items": [ { "nom": "Cabinet Dupont", "site_web": "https://...", "telephone": "+33 1 ...", "...": "..." } ] } ``` Specific causes: `400` job not done or job_type has no reusable output; `410` CSV expired or deleted. --- ## Download a job's result ``` GET /api/jobs/{id}/download?format=csv|json|xlsx ``` Downloads the job's output. CSV is the canonical artefact written by the worker (UTF-8 BOM, `;` separator); JSON and XLSX are derived on the fly. All exports are run through a spreadsheet-formula-injection sanitiser. | `format` | Media type | Filename | |---|---|---| | `csv` (default) | `text/csv; charset=utf-8` | `{job.output_filename}` | | `json` | `application/json; charset=utf-8` | `{base}.json` | | `xlsx` | `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet` | `{base}.xlsx` | Specific causes: `400` job still pending/running or unsupported `format`; `410` output expired, missing, or job failed before writing a row. --- ## Cancel a job ``` POST /api/jobs/{id}/cancel ``` Requests cancellation of a `pending` or `running` job. Returns `400` if the job is already terminal. If the job belongs to a pipeline, downstream stages are short-circuited. **Response** — `200 OK`, `{"ok": true}`. --- ## Resume a job ``` POST /api/jobs/{id}/resume ``` Creates a **new** job that picks up a `cancelled` or `failed` `scrap` where it left off. The new job inherits the source's queries, zones and partial CSV; the worker skips coordinates already processed. EF is debited only for the remaining grid points. **Response** — `200 OK`, a new `JobPublic` (the resume job) in `pending` status. Its `source_job_id` references the original. Specific causes: `400` source job not resumable (wrong type, not interrupted, or already fully processed); `403` email not verified. --- ## Delete a job ``` DELETE /api/jobs/{id} ``` Permanently removes the job and its output CSV. Refuses to delete a job that is still running — cancel it first. **Response** — `204 No Content`. Specific cause: `400` job still running. --- ## Estimate EF cost ``` POST /api/estimate ``` Computes the EF cost of a hypothetical `scrap` job without creating one. Drives the live cost meter in the launch form. Estimation is free and unmetered. **Request body** — same shape as `POST /api/jobs`, but `queries` and `zones` may be empty (returns `valid: false`). **Response** — `200 OK`, a `JobEstimateResponse`: ```json { "valid": true, "grid_points": 412, "total_requests": 824, "queries_count": 2, "ef_cost": 0.041, "estimated_duration_seconds": 1380, "errors": [], "warnings": [] } ``` | Field | Meaning | |---|---| | `valid` | `true` iff `errors` is empty | | `grid_points` | Distinct GPS tiles across the union of zones | | `total_requests` | `grid_points × len(queries)` — what the worker will actually call | | `queries_count` | Echoes `len(queries)` for UI display | | `ef_cost` | France-equivalent units; see [Limits](/docs/concepts/limits) | | `estimated_duration_seconds` | Best-effort wall-clock estimate | | `errors` | Hard blockers (over-quota, unparseable zones, empty grid) | | `warnings` | Soft signals (not currently used) | --- ## Notes on omitted endpoints The following routes exist but are intentionally not part of the public surface: - `GET /api/jobs/queue` — anonymised global queue for the public dashboard widget. Tenant-agnostic, scoped separately. - `/api/admin/*` — operator-only. - `GET /api/jobs/{id}/breakdown`, `GET /api/jobs/{id}/map`, `GET /api/jobs/{id}/output-columns`, `GET /api/jobs/{id}/delivery-result`, `POST /api/jobs/parse-list`, `GET /api/brand-lookup`, `GET /api/brand-assets/{owner}/{filename}`, `GET /api/delivery-check/seeds` — UI-internal helpers that may change without notice. --- title: API overview slug: api/overview section: API summary: Conventions shared by every Outsend API endpoint — base URL, auth, content types, versioning, errors. --- # API overview The Outsend API exposes the same surface that powers the web application. The dashboard and the API share one backend, one authentication scheme, and one set of objects. ## Base URL ``` https://outsend.xyz ``` Endpoints under `/api/` return JSON or stream events. The base URL is stable for the alpha. ## Authentication Sessions use a cookie named `outsend_session`. Obtain one by posting credentials: ``` POST /api/auth/login Content-Type: application/json { "email": "name@example.com", "password": "..." } ``` The response sets `outsend_session` as `HttpOnly`, `Secure`, `SameSite=Lax`. Subsequent requests must include it. Sessions remain valid until logout (`POST /api/auth/logout`) or expiry. Requests without a valid cookie receive `401` on protected routes. API tokens scoped per workspace are on the roadmap; cookie sessions are currently the only supported mechanism. ## Content types | Surface | Content type | Notes | |---|---|---| | Read and write endpoints | `application/json` | UTF-8, snake_case fields | | Event streams | `text/event-stream` | Server-Sent Events | | Downloads | `application/octet-stream` and friends | Endpoints whose path ends in `/download` | | Tabular exports | `text/csv`, `application/json`, `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet` | Selected via `?format=csv|json|xlsx` | Endpoints that accept a `format` query parameter default to JSON when it is omitted. ## Versioning The API is in alpha. There is no `/v1/` prefix and no version header — the surface evolves in place. Breaking changes are announced in advance through the changelog and, when relevant, through in-app banners. Additive changes ship without notice. A versioned prefix will be introduced before general availability. ## Rate limits Sensitive endpoints (authentication, contact, job creation) are protected by per-route quotas. Exceeded limits return `429` with a `Retry-After` header. See [/docs/concepts/limits](/docs/concepts/limits). ## Errors Failures return a JSON body and a conventional HTTP status: ```json { "detail": "Human-readable message", "errors": [ { "field": "email", "message": "Invalid format" } ] } ``` The `errors` array is present only when the failure is tied to specific input fields. | Status | Meaning | |---|---| | 400 | Malformed request, business rule violation | | 401 | No session, or session expired | | 403 | Authenticated but not allowed; also returned for deactivated accounts | | 404 | Resource does not exist, or is not visible to the caller | | 422 | Request was well-formed but failed validation | | 429 | Rate limit reached; retry after the header value | | 5xx | Server-side fault; retries with backoff are safe | Treat any 5xx as transient and apply exponential backoff. ## Endpoint groups | Group | Path | Purpose | |---|---|---| | Authentication | [/docs/api/auth](/docs/api/auth) | Login, logout, signup, password reset, email verification | | Jobs | [/docs/api/jobs](/docs/api/jobs) | Create, list, inspect, control, and export jobs | | Pipelines | [/docs/api/pipelines](/docs/api/pipelines) | Compose multi-step workflows and run them | | Veille | [/docs/api/veille](/docs/api/veille) | Continuous monitoring of queries and sources | | Feedback | [/docs/api/feedback](/docs/api/feedback) | Submit in-product feedback and bug reports | | Registry | [/docs/api/registry](/docs/api/registry) | Discover available job types and their parameters | ## SSE protocol Long-running operations expose progress through Server-Sent Events. Event names, payload shape, and the state machine are documented at [/docs/concepts/states-and-events](/docs/concepts/states-and-events). --- title: Pipelines API slug: api/pipelines section: API summary: Compose and run DAGs of scraping, enrichment, and transformation steps under /api/pipelines. --- # Pipelines API A pipeline is a directed acyclic graph (DAG) of nodes that chains scraping, enrichment, and transformation steps. Submitting a pipeline starts the root jobs synchronously; downstream jobs are spawned as each predecessor reaches `done`. All endpoints live under `/api/pipelines` and require an authenticated session. Mutating routes additionally require an active (non-suspended) account. Generic errors: `401` no session, `403` not owner / account suspended, `404` pipeline or node id unknown — endpoint-specific causes are listed inline. See also: [Pipeline orchestration](/docs/concepts/pipeline-orchestration) and [Filter module](/docs/modules/filter). ## Graph shape The `definition` document describes the DAG. Edges are explicit and reference node ids; they are not inferred from any per-node `inputs` field. ```json { "nodes": [ {"id": "n1", "type": "scrap", "config": {"queries": ["dentist"], "zones": ["Paris"]}, "x": 100, "y": 100}, {"id": "n2", "type": "emails", "config": {"mode": "normal"}, "x": 320, "y": 100}, {"id": "n3", "type": "verify", "config": {}, "x": 540, "y": 100} ], "edges": [ {"id": "e1", "from": "n1", "to": "n2"}, {"id": "e2", "from": "n2", "to": "n3"} ] } ``` ### Node types | Type | Role | Accepts (input) | Produces (output) | |-------------------|------------------------------------------|-------------------|-------------------| | `scrap` | Google Maps scrape (root) | none | `pois` | | `import` | CSV/Sheets import (root) | none | `pois` | | `reviews` | Fetch reviews for POIs | `pois_any` | `reviews` | | `emails` | Discover emails from websites | `pois_any` | `pois_email` | | `verify` | SMTP verify emails | `pois_email` | `verified` | | `socials` | Discover social profiles | `pois_any` | `pois` | | `dead_check` | Detect inactive POIs | `pois_any` | `pois` | | `techstack` | Detect website tech stack | `pois_any` | `pois` | | `ads_intelligence`| Detect ad campaigns | `pois_any` | `pois` | | `brand_assets` | Extract logo and brand assets | `pois_any` | `pois` | | `legal_ids` | Find SIRET/SIREN from website | `pois_any` | `pois` | | `legal_data` | Company data via api.gouv.fr | `pois_any` | `pois` | | `legal_mentions` | Parse legal mentions page | `pois_any` | `pois` | | `phones_extra` | Find extra phone numbers | `pois_any` | `pois` | | `pricing` | Extract pricing/tariffs | `pois_any` | `pois` | | `pagespeed` | Google PageSpeed scoring | `pois_any` | `pois` | | `phone_info` | Phone line type / carrier (cache) | `pois_any` | `pois` | | `filter` | Apply rule-based row filter | `any_pois` | passthrough | | `sort` | Reorder rows by a column | `any_pois` | passthrough | `filter` and `sort` preserve the upstream type; type compatibility is resolved by walking back to the nearest non-passthrough ancestor. ### Column guarantee No module ever drops a column. Every enrichment and check node outputs **all the columns it received** — in their original order — plus its own columns appended at the end. This holds across an entire chain: a `scrap → legal_ids → legal_data → emails` pipeline ends with the Google Maps columns (`lien_google_maps`, `note`, `nb_avis`, `lat`, `lon`, …), the identifiers, the legal profile, and the emails side by side. Custom columns from an `import` are passed through untouched as well. The only way a column disappears is an explicit transformation the pipeline asks for (a `filter` rule, a `sort` `top_n` cut applies to rows, never columns). > The full machine-readable contract for every node — `category`, `input`/`output`, the `needs`/`produces` columns, and each block's `config_schema` — is served live at [`GET /api/pipelines/schema`](#get-apipipelinesschema). That endpoint is the single source of truth; this table is a summary. ### Validation rules The server rejects a definition with HTTP 400 if any of the following hold: | Rule | Error message | |-------------------------------------------------|-------------------------------------------------| | Empty `nodes` list | `Pipeline vide` | | More than 20 nodes | `Trop de nodes (max 20)` | | Duplicate node id | `IDs de nodes en doublon` | | Unknown `type` | `Type de node inconnu : ...` | | Edge endpoint references missing node | `Edge référence un node inexistant` | | Self-loop (`from == to`) | `Edge vers soi-même interdit` | | Root type connected as a successor | `Le node '...' ne peut pas avoir de prédécesseur` | | Incompatible output to input | `Connexion X → Y incompatible` | | Node with more than one predecessor (MVP limit) | `Le node ... a plusieurs prédécesseurs` | | Missing required `config` field | `Node '...' : champ requis manquant « ... »` | | Wrong config field type / bad enum value | `Node '...', champ « ... » : ...` | Roots must be one of `scrap` or `import`. Fan-out (one node feeding several successors) is allowed; fan-in is not. ### Portable envelope Pipelines export and import as a single self-describing JSON envelope. The same shape is produced by the editor's **Export** button and accepted by **Import** and AI generation: ```json { "schema_version": 1, "name": "Dentists Paris", "definition": { "nodes": [...], "edges": [...] }, "meta": { "exported_from": "outsend.xyz", "kind": "pipeline" } } ``` `POST /api/pipelines` and `POST /api/pipelines/validate` accept either the full envelope or a bare `definition`. Before validation the server **normalizes** the definition: it generates missing edge ids, auto-lays out nodes that have no `x`/`y`, applies `config_schema` defaults, coerces newline-separated strings into `string[]` fields, and strips config fields not in the schema. This means a minimal hand-written or AI-generated `{nodes, edges}` (no coordinates, no edge ids) is accepted as-is. --- ## GET /api/pipelines/schema Return the canonical, machine-readable pipeline schema — the single source of truth shared by the editor, import, AI generation, and the planned [MCP](/docs/integration/mcp) `create_pipeline` tool. **Public** (no auth): it is format documentation, identical for every caller. **Response 200** ```json { "schema_version": 1, "compat": { "pois_any": ["pois", "pois_email"], "pois_email": ["pois_email"], "any_pois": ["pois", "pois_email", "verified"] }, "root_types": ["import", "scrap"], "nodes": { "scrap": { "category": "source", "is_root": true, "input": null, "output": "pois", "needs": [], "produces": ["nom", "site_web", "telephone", "..."], "config_schema": { "queries": {"type": "string[]", "required": true, "label": "..."}, "zones": {"type": "string[]", "required": true, "label": "..."} } }, "emails": { "category": "enrich", "input": "pois_any", "output": "pois_email", "needs": ["site_web"], "produces": ["email", "email_personal"], "config_schema": {"mode": {"type": "enum", "enum": ["normal", "deep"], "default": "normal"}} } } } ``` `config_schema` field types: `string`, `string[]`, `int`, `float`, `bool`, `enum` (with `enum` list), `object`. `required` and `default` are optional per field. --- ## POST /api/pipelines/validate Normalize and validate a definition **without creating or running anything**. Used by Import (review before launch) and AI generation (check the JSON Claude produced). Requires a session. **Request body** ```json { "definition": { "nodes": [...], "edges": [...] }, "schema_version": 1 } ``` **Response 200 — valid** ```json { "ok": true, "definition": { "nodes": [...with ids, x/y, defaults...], "edges": [...] }, "summary": { "n_nodes": 3, "n_edges": 2, "types": ["scrap", "emails", "verify"] } } ``` **Response 200 — invalid** (note: still HTTP 200, with `ok: false`) ```json { "ok": false, "error": "Connexion scrap → verify incompatible" } ``` The returned `definition` is the normalized form, ready to load into the editor or submit verbatim to `POST /api/pipelines`. --- ## Generate a pipeline with any AI You don't need to write the JSON by hand. Two ways: **1. Built-in (inside outsend).** The editor's **🤖 Build with AI** button sends the schema above plus your plain-language description to Claude using your own key ([BYOK](/docs/integration/byok)), then validates and lays out the result. **2. Bring-your-own assistant (copy/paste anywhere).** Open any AI assistant — claude.ai, Claude Desktop, Cursor, ChatGPT — and: 1. Paste the contract: either this page, or the whole docs bundle at [`/docs/llms-full.txt`](/docs/llms-full.txt), or just the JSON of [`GET /api/pipelines/schema`](#get-apipipelinesschema). 2. Add your request, e.g. *"Compose an outsend pipeline that finds dentists in Berlin, gets their emails, verifies deliverability, and keeps the top-rated. Return only the JSON envelope."* 3. The assistant returns a `{schema_version, name, definition}` envelope. Paste it into the editor's **Import** dialog (it is validated before anything runs), or `POST` it to `/api/pipelines`. Because the editor, import, and this API all accept the same envelope and the server normalizes it (missing edge ids, coordinates, and config defaults are filled in), a hand-assembled `{nodes, edges}` works without any layout fields. --- ## POST /api/pipelines Create a pipeline and launch its root jobs. **Request body** ```json { "name": "Dentists Paris", "definition": { "nodes": [...], "edges": [...] } } ``` `name` is optional (≤ 120 chars, defaults to `"Pipeline"`). The `definition` is normalized (see [Portable envelope](#portable-envelope)) before validation, so a minimal `{nodes, edges}` works. **Response 201** ```json { "id": "f1a2…-uuid", "status": "running", "initial_jobs": ["job_abc", "job_def"] } ``` Specific cause: `400` definition fails any validation rule, or root job creation fails. On root failure the pipeline is persisted with `status = failed`. --- ## GET /api/pipelines List the caller's pipelines (most recent first, capped at 50). **Response 200** ```json [ { "id": "f1a2…", "name": "Dentists Paris", "status": "running", "created_at": "2026-05-27 10:14:02", "completed_at": null, "nodes_count": 3, "done_count": 1, "results_count": 187, "progress_pct": 42 } ] ``` `status` is one of `pending | running | done | failed | cancelled`. `nodes_count` is derived from the stored definition. `done_count` is the number of stages already finished, `results_count` the rows aggregated across all stages so far, and `progress_pct` (0–100) a duration-weighted completion estimate — transform stages (`filter`, `sort`) count far less than scraping/enrichment stages, and the in-flight stage contributes its real sub-progress. The same `progress_pct` is also returned by `GET /api/pipelines/{id}`. --- ## GET /api/pipelines/{id} Return a single pipeline with its definition and the jobs spawned so far. **Response 200** ```json { "id": "f1a2…", "user_id": 42, "name": "Dentists Paris", "definition": { "nodes": [...], "edges": [...] }, "status": "running", "created_at": "2026-05-27 10:14:02", "completed_at": null, "progress_pct": 42, "jobs": [ { "id": "job_abc", "job_type": "scrap", "status": "done", "pipeline_node_id": "n1", "results_count": 187, "error_message": null, "created_at": "2026-05-27 10:14:02", "completed_at": "2026-05-27 10:18:55" } ], "output_job": { "id": "job_xyz", "job_type": "verify_emails", "results_count": 142, "status": "done", "download_available": true } } ``` `output_job` is the pipeline's **final dataset** — the output of the most-downstream stage that has actually produced rows (a pipeline filters/reduces, it does not sum; `output_job.results_count` therefore matches the headline count, not the sum of all stages). Download it via [`GET /api/jobs/{id}/download`](jobs.md#get-apijobsiddownload) using `output_job.id`, in `csv` / `xlsx` / `json`. It is `null` while the pipeline has produced nothing downloadable yet, and `download_available` reflects whether a CSV (final **or** partial — so it works for running/stopped pipelines too) is still on disk and unexpired. --- ## PATCH /api/pipelines/{id} Not implemented. The current API does not expose graph mutation after creation; clone the pipeline by re-issuing `POST /api/pipelines` with an updated definition. Returns `405 Method Not Allowed`. --- ## DELETE /api/pipelines/{id} Not implemented. Pipelines are immutable once created; deletion will be added once retention policy is defined. Returns `405 Method Not Allowed`. --- ## POST /api/pipelines/{id}/run Not implemented. Pipelines start automatically when created via `POST /api/pipelines`; there is no separate run endpoint. To re-execute an existing graph, post it again as a new pipeline. --- ## GET /api/pipelines/{id}/nodes/{node_id}/input-columns Inspect the schema of the CSV that will feed a given node. Useful for building filter UIs. **Behaviour.** The endpoint locates the node's most recent predecessor job. If the predecessor is not yet `done`, the response carries an empty `columns` list and a `reason` code. Otherwise the predecessor's output CSV is read (up to 5000 rows) and each column is profiled for type, fill rate, and sample values. **Response 200 — predecessor done** ```json { "columns": [ { "name": "telephone", "type": "phone", "fill_rate": 0.92, "sample_values": ["+33 1 23 45 67 89", "0612345678"], "distinct_count": null }, { "name": "categorie", "type": "category", "fill_rate": 1.0, "sample_values": ["dentiste", "orthodontiste"], "distinct_count": 4, "distinct_values": ["dentiste", "endodontiste", "orthodontiste", "stomatologue"] } ], "row_count": 187, "predecessor_job_id": "job_abc" } ``` `type` is one of `phone | email | url | number | category | text`. A column is tagged `category` only if it has between 1 and 200 distinct non-empty values; otherwise it falls back to `text`. A typed verdict requires ≥ 80% of non-empty values to match the corresponding pattern. **Response 200 — no usable input** ```json { "columns": [], "reason": "no_predecessor" } ``` | `reason` | Meaning | |-------------------|----------------------------------------------------------| | `no_predecessor` | The node is a root, or has no incoming edge yet. | | `no_data_yet` | Predecessor job exists but is not in status `done`. | | `no_csv_found` | Predecessor finished but no output CSV is on disk. | | `csv_read_error` | The CSV file could not be parsed. | --- ## POST /api/pipelines/{id}/nodes/{node_id}/filter-preview Apply a set of filter rules in memory against the upstream CSV and return the match count plus a small sample. No job is created; no state is mutated. The target node must be of type `filter`. The body uses the same `rules` shape that `filter` nodes persist in their `config.rules`; previews are computed by the same function the worker uses at execution time, so the count is authoritative for the data inspected. **Request body** ```json { "rules": { "logic": "AND", "conditions": [ {"column": "fill_rate", "op": ">=", "value": 0.5}, {"column": "categorie", "op": "in", "value": ["dentiste", "orthodontiste"]} ] } } ``` The exact rule grammar is defined by the filter module (see [Filter module](/docs/modules/filter)). **Response 200** ```json { "total": 187, "matched": 73, "samples": [ {"nom": "Cabinet Dupont", "telephone": "0123456789", "categorie": "dentiste"} ], "predecessor_job_id": "job_abc", "fieldnames": ["nom", "telephone", "categorie", "site_web"], "capped": false } ``` `samples` contains up to 5 matched rows with empty fields stripped. `capped` is `true` when the upstream CSV exceeded the 5000-row preview limit — in that case `total` reflects only the inspected window, but the `matched/total` ratio remains representative. When the predecessor is not ready, the response is the same `{total, matched, samples, reason}` skeleton with all counts at `0`. Possible `reason` codes mirror the input-columns endpoint: `no_predecessor`, `no_data_yet`, `no_csv_found`. Specific causes: `400` target node is not of type `filter`, or rule application raised; `500` CSV could not be read. --- ## Lifecycle summary 1. `POST /api/pipelines` validates the graph, persists the pipeline as `running`, and spawns one job per root node. 2. As each job reaches `done`, the worker reads its CSV, transforms rows for the successor's input type, and creates the next job. Empty outputs short-circuit the branch. 3. When every spawned job has reached a terminal status (`done`, `failed`, `cancelled`, `expired`), the pipeline is finalized as `done` if all succeeded, otherwise `failed`. --- title: Module registry API slug: api/registry section: API summary: Single source of truth listing every module the platform exposes — active scrapers, on-demand stubs, meta features, coming-soon items. --- # Module registry API The Module registry is the single source of truth listing every module the platform exposes: active scrapers, on-demand stubs, meta features, and coming-soon items still gathering interest votes. The frontend reads the registry instead of hardcoding module slugs, so adding a module only takes two files (`frontend/static/job_types.js` and `app/job_registry.py`). See also: [/docs/concepts/module-registry](/docs/concepts/module-registry). ## Registry entry shape Each module is described by a small object that the frontend renders in the dashboard tiles, the search palette, and the pricing pages. | Field | Type | Purpose | | -------------- | -------------- | ----------------------------------------------------------------------- | | `slug` | string | Stable identifier. Used as job_type, route param, and registry key. | | `category` | string | Group bucket (`sources`, `enrich`, `signals`, `outreach`, `tools`). | | `label` | object | `{ "fr": "...", "en": "..." }`. Bilingual display name. | | `needs` | string[] | Upstream artifacts the module consumes (e.g. `["leads"]`). | | `produces` | string[] | Downstream artifacts it emits (e.g. `["emails"]`). | | `pipelinable` | boolean | Whether the module can be chained inside a Pipeline. | | `is_on_demand` | boolean | Stub module — clicking activate opens a feedback thread, not a job. | | `coming_soon` | boolean | Listed for interest voting only. No backend execution. | | `alpha_unavailable` | boolean | Built and listed as active, but frozen during alpha. Its create endpoint returns `503`. | | `api_endpoint` | string \| null | Path the dashboard calls to start a run, or `null` for stubs. | A module is at most one of `is_on_demand`, `coming_soon`, `alpha_unavailable`, or plain active. Active modules have a non-null `api_endpoint`; stubs and coming-soon modules have `api_endpoint = null`. An `alpha_unavailable` module is presented as active and keeps a non-null `api_endpoint`, but that endpoint returns `503` while the alpha freeze is in effect. --- ## GET /api/modules-registry Public endpoint. Returns the server-side mirror of the JS registry. The response is a flat object with one array per bucket plus a `feature_pages` mapping that points each active module to its published `/features/` sales page (or `null` if not written yet). ### Response — 200 OK ```json { "active": [ "ads_intelligence", "brand_assets", "dead_check", "delivery_check", "emails", "filter", "import", "legal_data", "legal_ids", "legal_mentions", "pagespeed", "phones_extra", "pricing", "reviews", "scrap", "socials", "sort", "techstack", "verify_emails", "viewport_test" ], "multi_proxy": [ "dead_check", "emails", "legal_ids", "legal_mentions", "phones_extra", "pricing", "reviews", "scrap", "socials", "techstack" ], "parallel": [ "ads_intelligence", "brand_assets", "delivery_check", "filter", "import", "legal_data", "pagespeed", "sort", "verify_emails", "viewport_test" ], "on_demand": [ "email_campaign", "phone_carrier", "sms_campaign", "whatsapp_campaign" ], "meta": ["pipeline", "veille"], "coming_soon": [ "ai_personalization", "ai_team_members", "bing_places", "campaign", "chrome_extension", "crm", "directories", "email_warmup", "funding", "hiring", "integrations", "job_changes", "linkedin", "mobile_phones", "multichannel", "natural_filter", "pagesjaunes", "press_monitoring", "public_api", "review_patterns", "seo_data", "tech_adoption", "tracking", "whatsapp", "yelp_tripadvisor" ], "alpha_unavailable": ["finance"], "feature_pages": { "scrap": "scraper-google-maps-gratuit-export-csv", "emails": "email-finder-pro-rgpd-france", "ads_intelligence": null } } ``` The `multi_proxy` set lists scrapers that share the global VPN pool — only one can run at a time platform-wide. `parallel` modules use direct HTTP and may run concurrently. Clients that schedule jobs should check both sets to surface "Will queue" warnings. --- ## GET /api/features Returns the caller's interest state plus a global counter per coming-soon feature. The counts include every allowed feature id, even those with zero votes, so the frontend can render `Needed (N)` labels without a fallback branch. The list of acceptable feature ids equals `coming_soon` from the registry, plus a tiny legacy set (`company`, `monitoring`, `pagespeed`) kept around to preserve historic votes. ### Response — 200 OK ```json { "voted": ["linkedin", "funding"], "counts": { "linkedin": 27, "funding": 14, "hiring": 6, "ai_personalization": 3, "directories": 0, "press_monitoring": 0 } } ``` Specific cause: `401` caller is not authenticated. --- ## POST /api/features/{feature_id}/interest Records an interest vote for `feature_id`. The operation is idempotent — a second call by the same user is a no-op. Use `DELETE` on the same path to retract the vote. `feature_id` is validated against the allow-list from the registry (coming-soon ids plus legacy ids). Unknown ids return 404 so the endpoint cannot be used as a write-anywhere KV store. ### Request ```json POST /api/features/linkedin/interest ``` No body. The user is identified by session. ### Response — 204 No Content Empty body. Re-fetch `GET /api/features` for the updated counter. Specific causes: `401` not authenticated; `403` authenticated but not active (pending invite); `404` `feature_id` does not match the registry allow-list. --- ## Related - [Module registry concept](/docs/concepts/module-registry) - [Feedback API](/docs/api/feedback) — used by on-demand stubs to surface activation requests in the admin dashboard. --- title: Veille API slug: api/veille section: API summary: Recurring monitoring of scrapes and pipelines, with diff buckets and reputation signals. --- # Veille API The Veille API manages recurring monitoring jobs. A *veille* (watch) replays a source scrape — or an entire pipeline — on a fixed cadence, then computes a diff against the previous run to surface what changed. See [Veille monitoring concepts](/docs/concepts/veille-monitoring) for lifecycle, scheduling, and diff model. All endpoints are mounted under `/api/veille` and require an authenticated, active session. Responses are JSON. Resource ownership is enforced on every request: cross-user access returns `404`. Generic errors are `401` (no session) and `404` (not found / not owned); endpoint-specific causes are listed inline. ## Resource model A `Veille` object exposes the following fields: | Field | Type | Description | | -------------------- | --------------- | ------------------------------------------------------------ | | `id` | integer | Stable identifier. | | `name` | string | Human-readable label (2–200 chars). | | `source_job_id` | string \| null | Source scrape replayed on each tick (mutually exclusive with `source_pipeline_id`). | | `source_pipeline_id` | string \| null | Source pipeline replayed on each tick. | | `frequency_days` | integer | Cadence in days, between `1` and `365`. | | `status` | string | One of `active`, `paused`, `deleted`. | | `next_run_at` | string (ISO8601)| Next scheduled execution. | | `last_run_at` | string \| null | Timestamp of the most recent completed run. | | `last_run_job_id` | string \| null | Job id of the most recent run. | | `run_count` | integer | Total successful runs. | | `created_at` | string (ISO8601)| Creation timestamp. | ## Endpoints ### List veilles `GET /api/veille` Returns the caller's active and paused veilles. Soft-deleted entries are excluded. **Response** `200 OK` — `{ "items": [Veille, ...] }`. ### Create a veille `POST /api/veille` Creates a recurring monitor from a completed scrape or pipeline owned by the caller. Exactly one of `source_job_id` or `source_pipeline_id` is required. **Request body** | Field | Type | Required | Notes | | -------------------- | ------- | -------- | -------------------------------------- | | `name` | string | yes | 2–200 characters. | | `source_job_id` | string | one of | 8–64 characters. | | `source_pipeline_id` | string | one of | 8–64 characters. | | `frequency_days` | integer | yes | `1` ≤ value ≤ `365`. | ```json { "name": "Plombiers Lyon 3", "source_job_id": "job_8f2c91a4", "frequency_days": 7 } ``` **Response** `200 OK` — the newly created veille. Specific cause: `400` validation failure (missing/both source fields, source not owned, source not completed, invalid frequency). ### Retrieve a veille `GET /api/veille/{id}` Returns a single veille owned by the caller. Soft-deleted entries return `404`. ### Update a veille `PATCH /api/veille/{id}` Patches mutable fields. Omitted fields are left untouched. **Request body** | Field | Type | Notes | | ---------------- | ------- | ------------------------------------------------------ | | `name` | string | 2–200 characters. | | `frequency_days` | integer | `1` ≤ value ≤ `365`. Reschedules `next_run_at`. | | `status` | string | `active`, `paused`, or `deleted`. | ```json { "status": "paused", "frequency_days": 14 } ``` **Response** `200 OK` — the updated veille. Specific cause: `400` invalid field value. ### Delete a veille `DELETE /api/veille/{id}` Soft-deletes the veille. The record is preserved for audit but excluded from all list endpoints and no longer scheduled. **Response** `200 OK` — `{ "ok": true }`. ## Runs A *run* is a single execution of the veille plus the diff statistics computed against the previous run. The first run is a *baseline* (`is_baseline: true`) and has no diff counters. ### List runs `GET /api/veille/{id}/runs` Returns the run history ordered by `computed_at` descending. **Response** `200 OK` ```json { "items": [{ "id": 17, "job_id": "job_b71e0d22", "prev_job_id": "job_aa44e0f1", "is_baseline": false, "total_count": 312, "prev_total_count": 305, "new_count": 9, "removed_count": 2, "modified_count": 24, "unchanged_count": 279, "computed_at": "2026-05-27T08:11:04Z", "job_status": "done", "job_completed_at": "2026-05-27T08:10:48Z" }] } ``` ### Retrieve a run `GET /api/veille/{id}/runs/{run_id}` Returns the run, including `samples` — capped previews of the rows in each diff bucket. **Response** `200 OK` ```json { "id": 17, "job_id": "job_b71e0d22", "is_baseline": false, "new_count": 9, "removed_count": 2, "modified_count": 24, "unchanged_count": 279, "total_count": 312, "computed_at": "2026-05-27T08:11:04Z", "samples": { "new": [{ "key": "...", "nom": "..." }], "removed": [{ "key": "...", "nom": "..." }], "modified": [{ "key": "...", "nom": "...", "before": { "note": "4.3", "nb_avis": 42 }, "after": { "note": "3.8", "nb_avis": 51 }, "changed_fields": ["note", "nb_avis"] }] } } ``` ## Signal categories Every non-baseline run classifies each row in the dataset into exactly one bucket: | Category | Meaning | | ---------- | ----------------------------------------------------------------------------- | | `new` | Row present in the current run, absent from the previous run. | | `removed` | Row present in the previous run, absent from the current run (closed/dropped).| | `modified` | Row present in both runs with at least one tracked field changed. | | `unchanged`| Row present in both runs, identical on tracked fields. | Bucket counts are surfaced as `new_count`, `removed_count`, `modified_count`, and `unchanged_count`. The matching `samples.{new,removed,modified}` arrays hold capped previews suitable for UI display. > The `removed` field is the closed/dropped bucket: a record no longer listed at the source. ## Reputation signals Reputation signals are a derived view of a run's `modified` bucket. They isolate rows whose public reputation moved in a way that is timing-sensitive for outreach — typically Google Maps listings whose rating dropped or whose review volume surged between two runs. ### Ranking logic (high level) A modified row becomes a signal when at least one of the following holds: - **Rating drop** — the average rating decreased by at least `0.2` points. - **Review surge** — the review count grew by at least `3` since the previous run. Each signal carries a `score` that ranks urgency. Larger rating drops dominate; review surges contribute a smaller, additive boost above a low-volume noise floor. Signals are returned sorted by `score` descending. The exact weighting is an implementation detail and may evolve; do not depend on absolute score values, only on relative order. ### List signals `GET /api/veille/{id}/runs/{run_id}/signals` **Response** `200 OK` ```json { "items": [{ "nom": "Garage du Centre", "adresse": "12 rue Voltaire, 69003 Lyon", "telephone": "+33 4 78 00 00 00", "site_web": "https://...", "email": "contact@...", "lien_google_maps": "https://maps.google.com/...", "note_avant": 4.3, "note_apres": 3.8, "delta_note": -0.5, "avis_avant": 42, "avis_apres": 51, "delta_avis": 9, "score": 12.0 }], "total": 1 } ``` ### Export signals `GET /api/veille/{id}/runs/{run_id}/signals.{fmt}` Streams the same ranked signal list as a downloadable file. | Format | Media type | Extension | | ------ | ------------------------------------------------------------------------- | --------- | | `csv` | `text/csv; charset=utf-8` | `.csv` | | `json` | `application/json` | `.json` | | `xlsx` | `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet` | `.xlsx` | The response sets `Content-Disposition: attachment` with a filename of the form `signaux-reputation-veille-{id}-run-{run_id}.{fmt}`. Specific cause: `400` unsupported `fmt` (must be `csv`, `json`, `xlsx`). --- title: AI spending caps slug: concepts/ai-spending-caps section: Concepts summary: Hard per-user spend limits ($/request, $/day, $/month) on AI features, with up-front cost estimates and email alerts. --- AI features in outsend run on **your own provider key** (BYOK): the provider bills you directly, at cost. So a miscalibrated prompt never burns your bill, outsend enforces **hard spending caps** on every AI request — server-side, so they can't be bypassed. ## The three caps | Cap | Default | Configurable up to | |--------------|---------|--------------------| | Per request | $10 | $100 | | Per day | $10 | $100 | | Per month | $100 | $1,000 | Set them in **Settings → AI spending caps**. Days and months are counted in UTC. ## How it works 1. **Estimate before** — before an AI action runs, outsend shows the worst-case cost (your input tokens + the maximum output tokens) and how much budget you have left today and this month. 2. **Block before overspending** — if a request could push you over a cap, it is refused *before* the provider is ever called. Nothing is spent. 3. **Track the real cost** — after each call, the actual cost (the provider's reported token usage × the model's price) is added to your daily and monthly totals. 4. **Email alerts** — you get an email at 80% of a daily/monthly cap, and again when a cap is reached (AI is paused until it resets). ## Models without a known price Prices come from a public catalog of model prices (~2,700 models). If a model isn't in it (some custom or exotic endpoints), outsend can't compute its cost: the request is **allowed and tracked, but not capped**, and the UI flags the price as unknown. Mainstream models from every supported provider are priced. ## Good to know - Caps are a **safety net on outsend's side** — the real bill is always your provider's, and the estimate is indicative. - Resets are calendar-based: the daily total resets at 00:00 UTC, the monthly total on the 1st. - Raising a cap takes effect immediately; AI resumes as soon as you're back under it. --- title: Jobs & lifecycle slug: concepts/jobs-lifecycle section: Concepts summary: A job is one unit of work. This page describes its states, transitions, events, and retry semantics. --- A **job** is one unit of work. Every module runs as a job. Jobs are isolated, observable, resumable. ## State machine ``` ┌─────────┐ queue picks ┌─────────┐ success ┌──────┐ │ pending │ ────────────────► │ running │ ────────────► │ done │ └─────────┘ └─────────┘ └──────┘ │ │ │ user cancels │ fatal error ▼ ▼ ┌───────────┐ ┌────────┐ │ cancelled │ │ failed │ └───────────┘ └────────┘ done / failed / cancelled ──── (after 7 days) ────► expired ``` | State | Meaning | |-------------|---------------------------------------------------------------------------| | `pending` | Created, sitting in the FIFO queue | | `running` | Picked by a worker, executing | | `done` | Completed successfully, results downloadable | | `failed` | Errored out (see `error_message`) | | `cancelled` | Cancelled via the UI or API | | `expired` | More than 7 days since terminal state — result files purged | Transitions and queue assignment are atomic; a job is never picked twice. ## Creation ``` POST /api/jobs { "queries": [...], "zones": [...] } # creates a scrap job POST /api/jobs/{type} { ...module-specific params } # typed shortcut ``` See [Jobs API](/docs/api/jobs). ## Observability ``` GET /api/jobs/{id} # status, counters, metadata GET /api/jobs/{id}/stream # SSE: status / log / done ``` The stream closes when the job terminates. Safety timeout: 6 hours. Event payloads: see [States & SSE events](/docs/concepts/states-and-events). ## Results ``` GET /api/jobs/{id}/download?format=csv|json|xlsx GET /api/jobs/{id}/items?offset=0&limit=200 ``` Results live **7 days** after terminal state, then are purged. The job record remains. ## Errors & retries A `failed` job exposes `error_message` and `error_count` (items that errored inside the job — a job can be `done` with `error_count > 0`). ``` POST /api/jobs/{id}/resume ``` Creates a new attempt resuming from the last successful item. ## Cancellation ``` POST /api/jobs/{id}/cancel # keeps partial results DELETE /api/jobs/{id} # cancels and removes record ``` ## Concurrency - Up to **5 simultaneous jobs per user** (queued beyond) - Two lanes: **serial** (extraction) and **parallel** (6 slots: verification, pipeline utilities, `delivery_check`) - Jobs are independent — re-runs do not wait on the original ## What's next - [States & SSE events](/docs/concepts/states-and-events) - [Pipeline orchestration](/docs/concepts/pipeline-orchestration) - [Limits & quotas](/docs/concepts/limits) --- title: Limits & quotas slug: concepts/limits section: Concepts summary: Every numeric limit enforced by the platform, in one table. --- Reference for capacity planning. Platform-wide unless noted per-user. ## Jobs | Limit | Value | Scope | |------------------------------------|-----------------|----------------| | Concurrent jobs per user | 5 | per user | | Parallel-lane worker slots | 6 | platform-wide (verify_emails, delivery_check, import, filter, sort) | | Result file retention | 7 days | per job | | SSE stream max duration | 6 hours | per stream | | Max EF per job | 1.0 | per job | The parallel lane is a pool separate from the serial lane used by extraction modules. ## Veille | Limit | Value | |--------------------|------------| | Frequency min | 1 day | | Frequency max | 365 days | ## Pipelines | Limit | Value | |--------------------|------------| | Max nodes | 20 | | Max inputs/node | 1 (MVP) | ## AI spending (BYOK) Hard per-user caps on AI features — billed to your own provider key — with email alerts. Configurable in Settings up to 10× the default. | Cap | Default | Max | Scope | |--------------|---------|---------|---------------------| | Per request | $10 | $100 | per user | | Per day | $10 | $100 | per user, UTC day | | Per month | $100 | $1,000 | per user, UTC month | Requests that would exceed a cap are blocked before the provider is ever called; you get an email at 80% and when a cap is reached. See [AI spending caps](/docs/concepts/ai-spending-caps). ## Auth rate limits Per-endpoint windows. Exceeding returns `429 Too Many Requests`. | Endpoint | Limit | Window | |--------------------------------|-------------|--------------------------| | Signup | 3 attempts | per hour, per IP | | Login | 5 attempts | per 15 min, per IP+email | | Password reset request | 3 attempts | per hour, per IP+email | | Password change (logged-in) | 5 attempts | per hour, per user | | Resend email verification | 3 attempts | per hour, per user | | Feedback thread creation | 20 attempts | per hour, per user | | Session lifetime | 7 days | sliding window | No global API throttle beyond these. ## Module-specific - **[`scrap`](/docs/modules/scrap)** — max 1.0 EF per job - **[`emails`](/docs/modules/emails)** — `normal` and `deep` modes with different EF profiles - All multi-proxy modules — `items` array bounded at 1–10000 per request ## What's next - [Jobs & lifecycle](/docs/concepts/jobs-lifecycle) - [API overview](/docs/api/overview) --- title: Module registry slug: concepts/module-registry section: Concepts summary: A single source of truth describes every module — its inputs, outputs, category, and where it appears in the UI. --- Every module outsend exposes is declared in a **single registry**. It powers the dashboard module grid, the new-job picker, the pipeline editor, and the landing page listing. Guarantees: a module visible in the dashboard has an endpoint (and vice versa); a machine-readable snapshot is published; categories are hints, while `slug`, `needs` and `produces` are stable. ## The endpoint ``` GET /api/modules-registry ``` Returns the full registry as JSON. Each entry: ```json { "slug": "scrap", "category": "extraction", "label": { "fr": "Scrap Google Maps", "en": "Scrape Google Maps" }, "needs": null, "produces": "poi_list", "pipelinable": true, "is_on_demand": false, "coming_soon": false, "api_endpoint": "/api/jobs/scrap" } ``` | Field | Meaning | |-----------------|-------------------------------------------------------------------------| | `slug` | Stable identifier, used in URLs and API paths | | `category` | `extraction` \| `enrichment` \| `intelligence` \| `verification` \| `pipeline` \| `meta` | | `label` | User-facing display names per language | | `needs` | Input shape (`poi_list`, `csv_rows`, …) — `null` if produced from scratch | | `produces` | Output shape | | `pipelinable` | Usable as a node in a pipeline | | `is_on_demand` | If true, no backend yet — triggers a conversation with the team | | `coming_soon` | If true, listed for visibility only; interest can be voted | | `alpha_unavailable` | If true, the module is built and listed as active everywhere, but frozen during alpha — its create endpoint returns `503` | | `api_endpoint` | Shortcut to start a job of this type | ## Flexible input matching `needs` and `produces` describe *canonical* column names (`nom`, `telephone`, `site_web`, `email`, `lien_google_maps`, …). You never have to format your data to match them exactly: inputs are resolved against a shared table of accepted aliases, so columns named `Website`, `url`, `e-mail`, `name` or `raison sociale` map to the right canonical field. Header-less files are auto-detected and columns are inferred from their content. Every job is transparent about it. Each run reports a non-blocking **`notice`** (shown as an info banner on the job page and as a discreet ⓘ on the dashboard) describing what was auto-mapped, guessed, or ignored — for example rows skipped because they had no website. A job only fails when a required column is genuinely absent (e.g. an enrichment that needs `site_web` finds it on zero rows), and that error explicitly **names the accepted aliases** so you know what header to provide. ## Categories | Category | What it does | Examples | |----------------|-----------------------------------------------------------|-------------------------------------------------------------| | `extraction` | Produces data from public sources | [`scrap`](/docs/modules/scrap) | | `enrichment` | Augments existing rows with new fields | [`emails`](/docs/modules/emails), [`socials`](/docs/modules/socials), [`legal_ids`](/docs/modules/legal_ids) | | `intelligence` | Computes signals on existing rows | [`pricing`](/docs/modules/pricing), [`techstack`](/docs/modules/techstack), [`ads_intelligence`](/docs/modules/ads_intelligence) | | `verification` | Validates or scores existing rows | [`verify_emails`](/docs/modules/verify_emails), [`delivery_check`](/docs/modules/delivery_check) | | `pipeline` | Orchestration utilities | [`import`](/docs/modules/import), [`filter`](/docs/modules/filter), [`sort`](/docs/modules/sort) | | `meta` | Not a job — describes pipelines or veilles | (no API endpoint) | ## Lifecycle of a module 1. **Coming soon** — landing page only, no backend, interest votable 2. **On-demand** — listed in the dashboard, CTA opens a conversation, executed manually 3. **Active** — fully backed by an endpoint 4. **Available (alpha-frozen)** — built and presented as an active module across every surface, but not launchable during alpha: the UI shows a maintenance banner with a disabled launch button, and the create endpoint returns `503`. Unlike *coming soon*, it is not a placeholder and carries no interest vote — it is a finished module held back only by alpha capacity. 5. **Deprecated** — still callable but flagged Phase changes appear in the registry via `coming_soon`, `is_on_demand`, `alpha_unavailable`, and `deprecated_at`. ## Adding a module (contributors) Adding a module = 2 files in the codebase: a JS registry entry (UI surfaces) and a Python registry entry (API + worker dispatcher). The runtime then plugs the module everywhere automatically. ## What's next - [Jobs & lifecycle](/docs/concepts/jobs-lifecycle) - [Pipeline orchestration](/docs/concepts/pipeline-orchestration) --- title: Pipeline orchestration slug: concepts/pipeline-orchestration section: Concepts summary: Chain modules into a reusable DAG. Each block consumes the previous block's output, no glue code required. --- A **pipeline** is a directed acyclic graph of modules. Each node is one module call; each edge declares which output feeds which input. Pipelines save a multi-step recipe once and re-run it. Pipelines also back [veille](/docs/concepts/veille-monitoring): a recurring scrap is internally a scheduled pipeline. ## Anatomy ``` ┌──────────┐ │ scrap │ queries=["bakery"], zones=["Paris"] └────┬─────┘ │ produces: poi_list ▼ ┌──────────┐ ┌──────────┐ │ emails │ │ ads_intel│ └────┬─────┘ └────┬─────┘ │ │ ▼ ▼ ┌────────────────────────────┐ │ filter │ rules: emails_present=true, ads_score≥30 └────────────┬───────────────┘ ▼ ┌────────┐ │ sort │ sort_by=ads_score, desc, top_n=200 └────────┘ ``` Each node has: - **type** — module slug (see [module registry](/docs/concepts/module-registry)) - **params** — module config, identical to a standalone job - **inputs** — references to upstream node(s) - **id** — local identifier within the pipeline ## Chaining rules An edge is valid only if the producer's `produces` matches the consumer's `needs` (shapes like `poi_list`, `enriched_list`, `csv_rows`). The editor enforces this at design time, and the server re-validates on submit. The full set of chainable blocks — their `category`, `input`/`output` buckets, `needs`/`produces` columns, and per-block `config_schema` — is published as a single machine-readable contract at [`GET /api/pipelines/schema`](/docs/api/pipelines). That endpoint is the **single source of truth**: the editor palette, import, AI generation, and the planned MCP `create_pipeline` tool all read it. Every active enrichment module is chainable (scrap, import, reviews, emails, verify, socials, dead_check, techstack, ads_intelligence, brand_assets, legal_ids, legal_data, legal_mentions, phones_extra, pricing, pagespeed, phone_info) plus the `filter`/`sort` transforms. ## Build, export, import, or generate with AI A pipeline graph is portable JSON. Four ways to obtain one: - **Build** it visually in the editor (`/pipelines/new`). - **Export** the current graph to a JSON envelope (`{schema_version, name, definition, meta}`) — the Export button downloads it. - **Import** an envelope (paste or `.json` file) — it is validated via `POST /api/pipelines/validate` and loaded back into the editor for review before launch (nothing runs on import). - **Generate with AI** — describe the pipeline in plain language; the editor sends the server schema plus your description to Claude (using your own key via [BYOK](/docs/integration/byok)), parses the returned JSON, validates it, and lays it out on the canvas. ## Limits | Limit | Value | |--------------------|--------------------------------------| | Max nodes | 20 | | Max inputs/node | 1 (multi-input merges not yet open) | | Max depth | 20 | | Re-runs allowed | Unlimited | ## Execution Pipelines **auto-start at creation** — `POST /api/pipelines` queues the root node, the rest follows as predecessors reach `done`. Each node runs as a normal job (same lifecycle, observability, retries). The coordinator advances on `done`, stops on the first `failed`. A failed pipeline can be resumed from the failing node. To re-run, create a new pipeline (the graph is JSON — copy and re-post). ## Endpoints ``` GET /api/pipelines/schema # canonical node schema (public, source of truth) POST /api/pipelines/validate # normalize + validate, no side effects POST /api/pipelines # create (also auto-starts) GET /api/pipelines # list user pipelines GET /api/pipelines/{id} # detail + graph ``` A pipeline is owned by one user. ### Filter preview ``` POST /api/pipelines/{id}/nodes/{node_id}/filter-preview ``` Runs a `filter` node against a sample of the predecessor's output without executing the full pipeline. ## What's next - [Veille (monitoring)](/docs/concepts/veille-monitoring) - [`filter`](/docs/modules/filter), [`sort`](/docs/modules/sort), [`import`](/docs/modules/import) --- title: Scrape modes (Fast / Advanced / Ultra) slug: concepts/scrape-modes section: Concepts summary: The three Google Maps scrape modes control adaptive subdivision depth — the trade-off between speed, cost (EF) and contact completeness. --- The Google Maps scrape offers **three modes** that tune a single knob: **adaptive subdivision depth**. They trade off speed, cost and completeness. | Mode | For | In one line | |------|-----|-------------| | **Fast** *(default)* | Most cases | Fast, cheaper, already captures the bulk of contacts. | | **Advanced** | When you want to enrich | Balanced: more contacts in dense areas, moderate cost. | | **Ultra** | Maximum coverage | Subdivides as deep as possible: near-exhaustive recall, slower and costlier. | ## Why three modes: the 120-result cap Google Maps **caps any search at ~120 results** ("you've reached the end of the list"). To go further, outsend splits a saturated tile into 4 more-zoomed sub-tiles and re-scans each (dedup by Google Maps link). This is **adaptive subdivision**. But subdividing only pays off if the sub-tile brings **new** contacts: in a low-density area Google widens its radius beyond the tile and often returns the same 120 places → subdividing means 4× the work for 0 new leads. So each mode sets a **threshold**: a saturated tile is only subdivided if it brought at least *N* new unique contacts. | Mode | Threshold (new uniques required to subdivide) | Effect | |------|----------------------------------------------|--------| | **Fast** | 15 | Only subdivides genuinely rich areas → few tiles. | | **Advanced** | 7 | Subdivides more readily → more coverage. | | **Ultra** | 1 | Subdivides whenever anything new remains → maximum coverage. | Subdivision depth is bounded (zoom 13 → 17, i.e. 4 levels: a tile is then ~300 m across, ≈ one city block), so even Ultra stays finite. ## Modes only diverge in dense areas Key point: **a mode only changes anything where tiles saturate** (≥ 120 results). - **Dense area** (city center, a common query like "plumber" or "restaurant"): tiles saturate, subdivision kicks in → Fast / Advanced / Ultra yield **markedly different** contact volumes. - **Sparse area** (rural, niche query): nothing saturates, no subdivision → **all three modes return the exact same result**. Picking Ultra there buys nothing (same result, same cost). That's why the mode is a **per-scrape** choice, not a global setting: it depends on how dense what you're searching for is. ## Cost (EF) and duration **EF** (France-equivalent) is the cost unit of a scrape. The baseline is simple: > **1 EF = scraping the whole of France, once, in Fast mode.** So a city or a département costs a small fraction of an EF. Because deeper modes fire **many more** Google Maps requests (they re-subdivide saturated tiles), they cost proportionally more: | Mode | Relative cost | Relative duration | |------|:---:|:---:| | Fast | **×1** (base) | ×1 | | Advanced | **≈ ×2** | ≈ ×2 | | Ultra | **≈ ×6** | ≈ ×6 | These factors are **measured averages** (ratio of tiles processed vs Fast, 2026-06-05 campaign). Real cost depends on the **actual density** of the zone: - **Sparse area**: nothing saturates → no subdivision → all three modes cost **the same** (the factor barely applies). - **Dense area**: the gap widens (Ultra can reach ×14 in a very dense city center). The pre-scrape estimate applies these factors (the EF shown rises when you switch to Advanced/Ultra). During the scrape, the **ETA accounts** for upcoming subdivisions, and **elapsed time** is shown live. ## Measurements > **Methodology.** 3 queries of differing density — "plumber" (clusters), "pharmacy" (numerous and spread out), "cobbler" (niche) — all categories that **display a phone** (consumer categories like restaurant/hairdresser show ~0 phones → wrongly filtered by the anti-bot, untestable). 3 zones (dense / medium / rural), all 3 modes each, **every scrape run to full completion** (no timeout). We measure: unique contacts, tiles processed (≈ cost/requests), real duration. Percentages are vs Fast. **Full matrix (campaign 2026-06-05, "plumber", all to completion)** | Zone | Density | Mode | Contacts | Tiles | Time | vs Fast | Contacts/tile | |------|---------|------|---------:|------:|-----:|--------:|--------------:| | Lyon 6 km | dense | Fast | 606 | 53 | 50 min | — | 11.4 | | Lyon 6 km | dense | Advanced | 627 | 89 | 84 min | +3.5 % | 7.0 | | Lyon 6 km | dense | Ultra | 647 | 193 | 180 min | +6.8 % | 3.4 | | Tours 10 km | medium | Fast | 311 | 14 | 8 min | — | 22.2 | | Tours 10 km | medium | Advanced | 351 | 42 | 20 min | +13 % | 8.4 | | Tours 10 km | medium | Ultra | 377 | 150 | 72 min | +21 % | 2.5 | | Aurillac 12 km | rural | Fast | 213 | 19 | 7 min | — | 11.2 | | Aurillac 12 km | rural | Advanced | 211 | 23 | 9 min | −1 % | 9.2 | | Aurillac 12 km | rural | Ultra | 215 | 83 | 40 min | +1 % | 2.6 | - **Rural → all three modes are identical** (213 / 211 / 215). Ultra takes 40 min (vs 7 min for Fast) for **+2 contacts**. Going deeper is pointless when nothing saturates. - **Medium → Ultra +21 %** vs Fast, but at **9× the time** (72 min vs 8 min); Advanced +13 % at 2.5×. - **Dense → Ultra +6.8 %** vs Fast, at **3.6× the time** (3 h vs 50 min). - **Efficiency**: Fast is **3–9× more cost-effective per tile** (i.e. per EF/time) than Ultra across all zones. **Two more queries ("pharmacy" = dense and numerous; "cobbler" = niche), Ultra gain vs Fast** | Query | Lyon (dense) | Tours (medium) | Aurillac (rural) | |-------|:---:|:---:|:---:| | plumber (clusters) | +6.8 % | +21 % | +1 % | | **pharmacy (numerous, spread out)** | **+50 %** | **+44 %** | noise* | | cobbler (niche) | +16 % | +3 % | +12 % | Pharmacy detail: Lyon Fast 411 / Ultra 617 (36→157 min); Tours Fast 253 / Ultra 364 (8→110 min). Cobbler Lyon Fast 173 / Ultra 200. *Rural pharmacy = noise: tiles don't saturate consistently (the 120 boundary), so mode order there is random. > **Takeaway.** Ultra's gain has **no single value: from +1 % to +50 % depending on the category**. **Numerous, spread-out** categories (pharmacies, regular shops) benefit hugely from Ultra (+44 to +50 % — Fast misses half because of the 120 cap). Categories that **cluster** (plumber) or are **rare** (cobbler) gain only +1 to +16 %. In all cases Ultra costs **3–14× the time** of Fast, and in rural/low density all three modes converge. ## Recommendation - **Default: Fast.** Best speed/cost ratio for a first pass and for categories that cluster (trades, specialized services). - **Ultra when the target is dense AND numerous** (pharmacies, shops, agencies…) and you want exhaustiveness: the gain is real, up to **+50 %** more contacts. Accept 3–14× the time. - **Advanced** = middle ground. - **Niche or sparse area → Fast**, period: modes converge, Ultra just wastes time. See also: [Jobs & lifecycle](concepts/jobs-lifecycle), [Limits & quotas](concepts/limits). --- title: States & SSE events slug: concepts/states-and-events section: Concepts summary: Exact payloads for every job state and every event emitted on the SSE stream. --- The contract for integrating against the job stream — bots, dashboards, alerting, AI assistants. ## States — full enum | Value | Terminal | Result files available | Re-runnable | |-------------|----------|------------------------|-------------| | `pending` | no | no | n/a | | `running` | no | no | n/a | | `done` | yes | yes (7 days) | yes | | `failed` | yes | partial | yes | | `cancelled` | yes | partial | yes | | `expired` | yes | no (purged) | no | A `pending` or `running` job cannot be deleted, only **cancelled**. ## SSE stream ``` GET /api/jobs/{id}/stream Accept: text/event-stream ``` Standard SSE; each event: ``` event: data: ``` ### `status` event Every **2 seconds** while non-terminal, plus once at terminal state. ```json { "id": "j_abc123", "status": "running", "processed_points": 412, "grid_points_count": 1280, "results_count": 387, "error_count": 2, "download_available": false, "query_stats": { "bakery": { "found_pct": 92 }, "dentist": { "found_pct": 78 } } } ``` | Field | Type | Description | |----------------------|---------|--------------------------------------------------------------| | `id` | string | Job id | | `status` | enum | See table above | | `processed_points` | int | Items finished | | `grid_points_count` | int | Items planned | | `results_count` | int | Result rows so far | | `error_count` | int | Items that failed (job can still reach `done`) | | `download_available` | bool | `true` once the result file is ready | | `query_stats` | object | Per-query stats; depends on module | ### `log` event Emitted as new log lines accumulate (bundled, polled internally every 0.5 s). ```json { "message": "Picked up 12 POIs in Lyon centre", "level": "info", "timestamp": "2026-05-27T14:21:08Z" } ``` `level` ∈ `debug` · `info` · `warn` · `error`. ### `done` event Emitted once, then the stream closes. Same event for `failed` and `cancelled` — check `status`. ```json { "id": "j_abc123", "status": "done", "results_count": 1820, "duration_seconds": 1342 } ``` ### `error` event Stream-level errors (auth, not-found). Different from a job ending in `failed` (that one comes via `done` with `status: "failed"`). ```json { "code": "forbidden", "message": "Not your job" } ``` ## Polling intervals (no SSE) | Endpoint | Min poll interval | |------------------|-------------------| | `/api/jobs/{id}` | 2 seconds | | `/api/jobs` | 5 seconds | Internal state refreshes every 2 s; faster polling brings no benefit. ## Timeouts | Thing | Value | |-----------------------------------------|------------| | SSE stream max duration | 6 hours | | Job overall timeout | 6 hours | | Idle worker reconnect window | 30 seconds | | Result file retention after `done` | 7 days | ## What's next - [Jobs & lifecycle](/docs/concepts/jobs-lifecycle) - [Limits & quotas](/docs/concepts/limits) --- title: Veille (monitoring) slug: concepts/veille-monitoring section: Concepts summary: A recurring scrap that diffs each run against the previous one and surfaces reputation signals. --- A **veille** (French for "watch") is a scheduled re-run of an existing job or pipeline. Each run is diffed against the previous one, and differences are exposed as **signals**. A veille is created from an existing **scrap job** (the source). Its query + zones + parameters become the template, cloned at each scheduled run. ``` source job (one-off scrap) │ registered as veille, frequency = 7 days ▼ run 1 ──► poi_list_v1 │ 7 days later ▼ run 2 ──► poi_list_v2 │ diff(v1, v2) ▼ change report: - new POIs (opened) - closed POIs (no longer found) - modified POIs (ratings dropped, contact changed, ...) ``` ## Frequency Days, **1**–**365**. Hourly is intentionally disallowed: prospect data doesn't move that fast, and source rate limits would not survive it. Typical: 7 (weekly), 30 (monthly), 90 (quarterly). ## Signals Three categories extracted from each diff: - **`new`** — in the new run, absent before (newly opened competitors, partners, acquisition targets) - **`closed`** — absent from the new run, present before (outreach cleanup; early shutdown signal) - **`modified`** — present in both, changed: - **Rating delta** — Google rating drop = strong "client in trouble" signal - **Review count delta** — surging or stalling activity - **Contact delta** — phone or website changed (often a relaunch) Modified rows are scored; the signals endpoint returns them ranked. ## Endpoints ``` GET /api/veille # list user's veilles POST /api/veille # create PATCH /api/veille/{id} # update name, frequency, status DELETE /api/veille/{id} # soft-delete GET /api/veille/{id}/runs # historical runs GET /api/veille/{id}/runs/{run_id} # one run + diff GET /api/veille/{id}/runs/{run_id}/signals # filtered, scored signals ``` Signals endpoint supports CSV / JSON / XLSX via `?format=…`. ## States | State | Meaning | |----------|---------------------------------------------------------| | `active` | Will run on schedule | | `paused` | Schedule suspended; existing runs remain available | | `deleted`| Soft-deleted; data retained | A veille run is a normal job — same workers, same quotas. Counts against the running-jobs ceiling only at run time. ## What's next - [Jobs & lifecycle](/docs/concepts/jobs-lifecycle) - [Pipeline orchestration](/docs/concepts/pipeline-orchestration) - [`scrap`](/docs/modules/scrap) --- title: BYOK — Bring your own AI key slug: integration/byok section: Integration summary: Connect a personal API key from any major AI provider (Anthropic, OpenAI, Gemini, Mistral, Groq, DeepSeek, xAI, or any OpenAI-compatible endpoint) and pick a model. The user's key, the user's quota. --- > **Status: partially live.** Connecting a key, picking a provider, and selecting a model are available now in **Settings → Connect an AI**, and power the AI features shipping today (e.g. Google review summaries, pipeline generation from a description). The broader in-app assistant described below is still on the roadmap. The BYOK ("bring your own key") integration lets the user paste an AI provider API key into the outsend settings and use an AI assistant directly inside the app — to configure searches, draft filter rules, summarise results, or build pipelines through natural language. ## Why BYOK and not a hosted model - The user's spend stays on the user's account, billed by the AI provider directly. - No outsend-side mediation: the assistant sees only what the user grants it. - Provider choice stays with the user: Anthropic, OpenAI, or any compatible endpoint. ## Supported providers The provider and model are chosen in **Settings → Connect an AI**. Models are **detected live** from the provider's own API — there is no fixed model list to maintain, and new models appear automatically as the provider releases them. | Provider | Key format | Notes | |----------|------------|-------| | Anthropic (Claude) | `sk-ant-…` | Native Messages API | | OpenAI | `sk-…` | Incl. reasoning models (o-series, GPT-5) | | Google (Gemini) | `AIza…` | OpenAI-compatible endpoint | | Mistral | — | | | Groq | `gsk_…` | | | DeepSeek | `sk-…` | | | xAI (Grok) | `xai-…` | | | Any OpenAI-compatible endpoint | — | Paste a custom base URL (Together, Perplexity, OpenRouter, local Ollama / vLLM, …) | The key is stored encrypted at rest (Fernet, server secret), scoped to the user's account, and never sent outside the outsend backend except to the chosen provider. A rough cost estimate is shown before AI actions — it is indicative only (best-effort token counting against known public prices) and may differ from the provider's actual billing. AI spending is also protected by **hard caps** — per request, per day and per month — that you set in **Settings**: a request that would exceed a cap is blocked *before* the provider is called, with email alerts at 80% and when a cap is reached. See [AI spending caps](/docs/concepts/ai-spending-caps). ## What the assistant can do The assistant uses the same outsend API surface documented in [API overview](/docs/api/overview). It can: - Read the user's jobs, pipelines, and veilles - Start new jobs (with explicit user confirmation for spend) - Compose pipelines by chaining modules from the [registry](/docs/concepts/module-registry) - Compute filter rules from natural-language descriptions and preview the result It cannot: - Access other users' data - Modify billing, account settings, or invitation codes - Run anything outside the user's normal permission scope ## Why not just use Claude.ai with outsend as MCP? Both options will exist: - **BYOK** — for users who want the assistant **inside outsend.xyz**, with the UI rendering search forms and tables natively while the model orchestrates. - **[MCP](/docs/integration/mcp)** — for users who want to drive outsend from their own Claude.ai or Claude Desktop, with their existing subscription. The two patterns are complementary, not competing. ## What's next - [MCP integration](/docs/integration/mcp) — drive outsend from your own AI client - [llms.txt](/docs/integration/llms-txt) — point any AI assistant at the docs --- title: llms.txt — AI-friendly documentation slug: integration/llms-txt section: Integration summary: A single URL exposes the entire outsend documentation to any AI assistant — no auth, no scraping, no parsing. --- The outsend documentation is published in the [llms.txt](https://llmstxt.org) format. Any AI assistant — Claude, ChatGPT, Cursor, Perplexity, or a local model — can ingest the full reference in one fetch. ## The two endpoints | URL | Purpose | |---------------------------------------------------------------------|---------------------------------------------------------------------------| | [`/docs/llms.txt`](/docs/llms.txt) | Flat index — one line per page, with title + URL + one-line summary | | [`/docs/llms-full.txt`](/docs/llms-full.txt) | Full bundle — every page concatenated, delimited by `` | Both endpoints return `text/plain` with no auth, no rate limit, no JS rendering required. ## Use it from an AI assistant Most AI clients now detect `llms.txt` automatically when a domain is mentioned. For the ones that don't, paste the URL directly: ``` https://outsend.xyz/docs/llms-full.txt ``` The bundle is ~150 KB and fits comfortably in any modern context window. ## Per-section bundles For narrower scopes, the per-section endpoints are also available: | URL | Contains | |----------------------------------------------|-----------------------------------| | `/docs/_bundle/concepts.txt` | Only the Concepts pages | | `/docs/_bundle/modules.txt` | Only the Modules pages | | `/docs/_bundle/api.txt` | Only the API reference | | `/docs/_bundle/integration.txt` | Only the Integration pages | ## The Copy button Every page in this documentation has a **Copy** button in the top-right corner. It exposes the same bundles, but as a one-click clipboard action: - Copy this page (raw markdown) - Copy this section - Copy entire docs The "Copy entire docs" action is the recommended path when handing the docs to an AI assistant interactively. ## Why this matters AI assistants are increasingly used as the integration layer between SaaS products. A documentation that an assistant can ingest cleanly — without scraping, login flows, or HTML parsing — is integratable; one that cannot, is not. outsend's docs are designed to be readable by humans, but their **first audience** is the LLM that will draft the integration code, write the prompt template, or diagnose the misconfigured pipeline. ## What's next - [API overview](/docs/api/overview) — the surface the assistant will call - [MCP](/docs/integration/mcp) — the protocol the assistant should prefer --- title: MCP — Model Context Protocol slug: integration/mcp section: Integration summary: Drive outsend from your own Claude.ai, Claude Desktop, or any MCP-compatible client. Your subscription, your tokens. --- > **Status: planned.** The MCP server is on the roadmap; this page describes the intended endpoint shape so AI clients can plan against it. The release will be announced in the changelog. The MCP integration exposes outsend as a **remote MCP server** that any MCP-compatible client can connect to: Claude.ai (custom connectors), Claude Desktop, Claude Code, Cursor, or any future client that speaks the protocol. The user signs in once with their outsend account, and from then on the AI client can run searches, build pipelines, and read results — using the user's own LLM subscription (no outsend-side LLM cost). ## How it will work 1. The user opens settings in their MCP client (e.g. Claude.ai → Settings → Connectors → Add custom connector). 2. They paste `https://outsend.xyz/mcp` and authenticate. 3. The MCP server returns the list of available tools (see below). 4. The model can call those tools on the user's behalf; each call hits the outsend API as that user. ## Planned tools | Tool | What it does | |---------------------------|------------------------------------------------------------------| | `list_jobs` | List the user's recent jobs | | `get_job` | Fetch a job's status, counters, and a sample of its results | | `create_scrap_job` | Start a Google Maps extraction | | `create_enrich_job` | Start an enrichment on an existing job (emails, socials, …) | | `list_pipelines` | List the user's pipelines | | `create_pipeline` | Compose a pipeline from a description | | `run_pipeline` | Execute a saved pipeline | | `list_veilles` | List recurring veilles | | `create_veille` | Register an existing job as a recurring veille | | `get_signals` | Fetch the latest reputation signals from a veille run | Each tool's argument schema mirrors the corresponding [API endpoint](/docs/api/overview). In particular, `create_pipeline` takes the same portable envelope as the REST API (`{schema_version, name, definition}`), and the set of valid blocks plus their per-block `config_schema` is the contract already published at [`GET /api/pipelines/schema`](/docs/api/pipelines) — the MCP server reuses it rather than defining its own. ## Scope and limits The MCP server inherits the user's normal permissions: - It cannot access other users' data. - It respects the same rate limits as the REST API. - It cannot modify billing, account settings, or invitation codes. ## BYOK vs MCP | Pattern | Where the chat lives | Who pays the LLM tokens | |---------|-------------------------------------------------|----------------------------------| | [BYOK](/docs/integration/byok) | Inside outsend.xyz | The user, via a pasted API key | | MCP | Inside the user's existing AI client | The user, via their subscription | The two patterns coexist. Pick BYOK if the assistant should live in the outsend UI; pick MCP if it should live wherever the user already works. ## What's next - [BYOK](/docs/integration/byok) — assistant inside outsend.xyz - [llms.txt](/docs/integration/llms-txt) — let any AI assistant ingest the docs --- title: Ads profile slug: modules/ads_intelligence section: Modules --- # Ads profile The `ads_intelligence` module profiles the marketing stack of each POI's website and condenses the findings into a single 0–100 marketing maturity score. It splits a list of prospects into two actionable segments: businesses that already invest in paid acquisition, and businesses still on a cold first-touch. Detections match the homepage against community-maintained filter lists (uBlock Origin, EasyList, EasyPrivacy) plus a curated outsend signature table, covering advertising pixels, retargeting networks, CMPs, marketing CRMs and chat widgets. ## Inputs Only items with a non-empty `site_web` are processed. | Field | Type | Required | Notes | |-----------------|--------|----------|----------------------------------------| | `site_web` | string | yes | Absolute URL of the POI's website | | `nom` | string | no | Carried through for reporting | | `place_id` | string | no | Used to join back to the source list | | `source_job_id` | string | no | ID of an upstream `scrap` job to chain | Batch size: 1 to 10 000 items per job. ## Outputs One row per processed POI. Paid-media pixels and retargeting weigh the most in the score; chat widgets the least. | Column | Type | Description | |-------------------|----------|-----------------------------------------------------------------------------| | `ads_score` | integer | Marketing maturity score, 0–100 | | `pixels_detected` | string[] | Advertising pixels found on the page (e.g. `meta`, `google_ads`, `tiktok`) | | `crm_detected` | string | Marketing CRM identified, if any (e.g. `hubspot`, `klaviyo`, `brevo`) | | `chat_widget` | string | Chat solution identified, if any (e.g. `intercom`, `crisp`, `drift`) | | `marketing_tools` | string[] | Other marketing technologies (CMP, CDP, affiliation, retargeting networks) | Granular fields also stored: `ads_active`, `ads_networks`, `pixel_meta`, `pixel_google_ads`, `cmp_vendor`, `retargeting`, `crm_marketing`, `chat_widgets`. ## Lifecycle Standard outsend job lifecycle; see [/docs/concepts/jobs-lifecycle](/docs/concepts/jobs-lifecycle). Progress is reported per item in the `sites` unit. ## Pipeline | Direction | Keys | |------------|-------------------------------------------------------------------------------------------------------------------| | `needs` | `site_web` | | `produces` | `ads_active`, `ads_score`, `ads_networks`, `pixel_meta`, `pixel_google_ads`, `cmp_vendor`, `retargeting`, `crm_marketing`, `chat_widgets` | Any upstream job that emits `site_web` (typically `scrap`) can feed `ads_intelligence`. The job picker defaults to the most recent `scrap` job of the current account. ## Endpoints ### Create job `POST /api/jobs/ads-intelligence` ```json { "items": [ { "site_web": "https://example.com", "nom": "Example", "place_id": "..." } ], "source_job_id": "optional-upstream-job-uuid" } ``` Response: a `JobPublic` document describing the newly created job (`id`, `status`, `job_type`, `output_filename`, `ef_cost`, timestamps). | Status | When | |--------|---------------------------------------------------------------------| | `400` | No item has a `site_web`, or per-job EF quota exceeded | | `401` | Missing or invalid session | | `403` | Account inactive | | `422` | Payload does not match the schema (e.g. `items` empty or > 10 000) | Job state, progress and results are read through the shared job endpoints (`GET /api/jobs/{id}`, `GET /api/jobs/{id}/results`, SSE stream). ## Limits See [/docs/concepts/limits](/docs/concepts/limits). Per-item EF cost: ~1 / 3 / 3700 EF. Wall time per item: 0.6 – 6 s. ## Errors | Error | Cause | |----------------------------------------|-------------------------------------------------------------| | `Aucun établissement avec site web` | All items were missing `site_web` after normalisation | | `Quota dépassé` | Estimated EF cost exceeds the per-job ceiling | | Item-level fetch failure | Recorded on the row; the job continues with the next item | | Empty homepage / non-HTML response | Row emitted with `ads_score = 0` and empty detections | ## What's next Pair `ads_intelligence` with the following modules to extend the prospect profile: - [`techstack`](/docs/modules/techstack) — full CMS, framework and hosting fingerprint. - [`pricing`](/docs/modules/pricing) — surface visible pricing and commercial terms. - [`pagespeed`](/docs/modules/pagespeed) — Core Web Vitals and performance budget. --- title: Brand assets slug: modules/brand_assets section: Modules --- # Brand assets Extracts the visual identity of each prospect from its own website: main logo, logo variants, favicon, dominant brand color, harmonic palette derived from the logo. Optional homepage screenshot. All images are re-hosted in the caller's private storage, so a link never breaks when the prospect rotates its CDN. The module is read-only against the prospect's site — no form submission, no login, no authentication crossing. ## Inputs One row per POI. Only `site_web` is required; the rest is passed through. | Field | Type | Required | Notes | |------------|--------|----------|-------------------------------------------| | `site_web` | string | yes | HTTP(S) URL of the prospect's website. | | `nom` | string | no | Display name, surfaced in the UI. | A batch accepts 1 to 10 000 rows. Rows without `site_web` are dropped before the job is enqueued. Job-level options: | Option | Type | Default | Notes | |----------------------|--------|---------|------------------------------------------------| | `source_job_id` | string | null | Parent job in the pipeline chain. | | `capture_screenshot` | bool | false | Adds a homepage screenshot. ~5x slower per row.| ## Outputs Per-row output. Local URLs point to assets re-hosted under `/api/brand-assets//.` and served only to the owner (or an admin). | Column | Type | Notes | |------------------------------|---------|-----------------------------------------------------------------------| | `logo_url` | string | Source URL of the main logo as found on the prospect's site. | | `logo_local_url` | string | Re-hosted copy of the main logo, stable URL. | | `logo_source` | string | Where the logo was picked up (e.g. `og:image`, JSON-LD, apple-touch). | | `logo_variants_local_urls` | list | Re-hosted alternate marks: apple-touch, mask-icon, monochrome, etc. | | `favicon_url` | string | Source URL of the highest-quality favicon detected. | | `favicon_local_url` | string | Re-hosted copy of the favicon. | | `brand_color` | string | Dominant brand color as a hex string. | | `brand_color_source` | string | Origin of the color (theme-color meta, logo sampling, etc.). | | `brand_palette` | list | Five harmonic hex colors derived from the logo. | | `screenshot_local_url` | string | Homepage screenshot. Populated only when `capture_screenshot=true`. | Binaries are stored as `data/brand_assets//.`. Allowed extensions: `svg`, `png`, `jpg`, `jpeg`, `webp`, `gif`, `ico`, `avif`. Each asset is hashed for de-duplication across rows of the same owner. ## Lifecycle Standard job states — see [Jobs lifecycle](/docs/concepts/jobs-lifecycle). Per-row HTTP errors never fail the job: a failed row carries `fetch_error` and a null `logo_local_url`. ## Pipeline | Needs | Produces | |-------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------| | `site_web` | `logo_url`, `logo_local_url`, `logo_variants_local_urls`, `favicon_url`, `favicon_local_url`, `brand_color`, `brand_palette`, `screenshot_local_url` | `pipelinable: true`, slots after any step that yields `site_web` — most commonly a `scrap` parent. `supports_veille: false`: brand identity is a one-shot extraction, not a recurring signal. ## Endpoints ### Create a batch job ``` POST /api/jobs/brand-assets ``` Body: ```json { "items": [ {"nom": "Stripe", "site_web": "https://stripe.com"} ], "source_job_id": null, "capture_screenshot": false } ``` Returns a `JobPublic` envelope. Authentication: any active user. ### Live single-domain lookup ``` GET /api/brand-lookup?domain=&refresh= ``` Single-shot, no batch job created. The first call for a given domain fetches live (~2–3s) and stores the result in a per-user cache. Subsequent calls within seven days return the cached profile instantly. `refresh=true` forces a re-fetch. Response shape: ```json { "domain": "stripe.com", "cached": false, "cached_at": null, "profile": { "status": "ok", "logo_url": "...", "logo_local_url": "...", "logo_source": "og:image", "logo_variants_local_urls": ["..."], "favicon_url": "...", "favicon_local_url": "...", "brand_color": "#635BFF", "brand_color_source": "theme-color", "brand_palette": ["#635BFF", "..."], "http_status": 200, "final_url": "https://stripe.com/", "fetch_error": null } } ``` ### Serve a re-hosted asset ``` GET /api/brand-assets//. ``` Per-owner isolation: only the owner (or an admin) can read assets from the namespace. Filenames are validated against a strict regex; path traversal attempts are rejected with `400`. For global quotas and caps, see [Limits](/docs/concepts/limits). Brand-lookup cache TTL: 7 days, per user, per domain. Screenshot mode multiplies per-row cost by ~5; opt-in only. ## Errors | Code | Meaning | |------|--------------------------------------------------------------------------| | 400 | `Aucun établissement avec site web` — no row carried a usable `site_web`. | | 400 | Invalid domain on `/api/brand-lookup`. | | 400 | Invalid asset filename on the serve endpoint. | | 403 | Access to an asset owned by another user. | | 502 | Live lookup failed upstream (`Lookup failed: : `). | ## What's next - [techstack](/docs/modules/techstack) — detect the CMS, analytics, and frameworks behind the same `site_web`. - [ads_intelligence](/docs/modules/ads_intelligence) — surface the prospect's paid acquisition footprint to complement the visual identity. --- title: Closed check slug: modules/dead_check section: Modules summary: Check whether each point of interest is still operating, has shut down, or is uncertain — based on real abandonment signals on its website. --- ## Purpose The `dead_check` module inspects the website attached to each point of interest (POI) and decides whether the underlying business looks alive, dead, or uncertain. It correlates several abandonment signals on the same domain: expiring registration, redirection to an unrelated property (resale or rebranding), parking pages disguised as real sites, and stale or invalid TLS certificates. Directory listings (Doctolib, Pages Jaunes, Yelp, etc.) are recognised as such rather than blindly flagged as a personal site. ## Inputs A list of POIs, each carrying at least a website. Items without a `site_web` are filtered out at submit time. | Field | Type | Required | Description | |---|---|---|---| | `items` | array of POI objects | yes | 1 to 10 000 entries. Entries without a `site_web` are dropped before execution. | | `source_job_id` | string | no | Identifier of the upstream list-producing job (typically a `scrap`). Used for lineage in the UI and the pipeline. | No other tuning parameters: the module runs in a single mode. ## Outputs Each input POI is augmented with the closed-check verdict for its website. The original POI columns are kept and the following is appended: | Column | Type | Description | |---|---|---| | `site_alive` | `"open"` \| `"closed"` \| `"uncertain"` | Final verdict. `open` means the site behaves like an active business presence, `closed` means converging signals of abandonment, `uncertain` means signals were too thin to decide. | The progress unit during execution is `sites`; the result unit is also `sites`. ## Lifecycle Standard job states — see [Jobs lifecycle](/docs/concepts/jobs-lifecycle). Partial counters stream over SSE so early verdicts can be consumed without waiting for the final export. ## Pipeline Pipelinable; typically inserted right after the list-producing step, before any expensive enrichment. | Slot | Value | |---|---| | `needs` | `site_web` | | `produces` | `site_alive` | | Category | `verify` | | Typical upstream | `scrap` | | Typical downstream | `emails`, `techstack`, `ads_intelligence`, `filter` | A common pattern is `scrap` then `dead_check` then `filter` (keep `site_alive = "open"`) then any enrichment module — avoiding outreach spend on shuttered businesses. ## Endpoints Create a job: ``` POST /api/jobs/dead-check Content-Type: application/json { "items": [ { "site_web": "https://example.com", "nom": "Example Co" } ], "source_job_id": "…" } ``` Response: a `JobPublic` object with `id`, `status`, and the standard job metadata. Poll `GET /api/jobs/{id}` or subscribe to the SSE stream for progress; download the final CSV from the job detail page once `status = "done"`. For the full job API surface (list, detail, cancel, export, events), see [Jobs API](/docs/api/jobs). For per-account quotas, see [Limits](/docs/concepts/limits). ## Errors | HTTP | `detail` | Cause | |---|---|---| | 400 | `Aucun établissement avec site web` | No input item carries a `site_web` field. | | 400 | `Quota dépassé : …` | Estimated cost exceeds the per-job equivalent-France quota. | | 401 / 403 | — | Missing or inactive session. | Errors raised after the job has been created surface on the job detail page and via the SSE `error` event; the job ends in `status = "error"` and partial results, if any, remain downloadable. ## What's next - [filter](/docs/modules/filter) — keep only POIs whose `site_alive` is `open` (or exclude `closed`) before spending budget on enrichment. - [reviews](/docs/modules/reviews) — for POIs marked `uncertain`, recent reviews are a strong tiebreaker between an active business and a dormant one. --- title: Inbox placement slug: modules/delivery_check section: Modules --- # Inbox placement Tests where a message sent from a given domain actually lands. The module sends nothing on the caller's behalf — the real message is sent to fifteen seed inboxes, and the module reports back where each one ended up: primary inbox, a secondary tab (Promotions, Social), or spam. The result is a snapshot of how a recipient mailbox treated this exact message, from this exact domain, at this exact moment. It is not a simulation, a reputation lookup, or a header inspection. The module answers one question: *if this message is sent from this domain right now, where does it go?* ## Inputs A test job takes two values. | Field | Required | Description | | --- | --- | --- | | `domain` | yes | The sending domain to test, lowercased, without the `@` (for example `acme.fr`). Must contain a dot and be 3 to 120 characters. | | `subject_filter` | no | Optional substring matched against the seed inbox subject lines. Useful to disambiguate when multiple tests run in parallel from the same domain. Up to 120 characters. | The module does not take a recipient list. Inbox placement is a standalone job — it is not part of a pipeline and cannot consume the output of another job. ## Outputs The job writes one row per seed mailbox to `results_delivery.csv`. Fifteen seed inboxes are queried; each row describes what that mailbox observed. | Column | Description | | --- | --- | | `seed_email` | Address of the test inbox the row refers to. | | `seed_kind` | Provider family for the seed (used to group results by mailbox type). | | `status` | `received` if the message was found, otherwise an empty or pending state. | | `placement` | Where the message landed: `Inbox principal`, `Inbox · ` (for example Promotions, Social), `Spam`, or empty if not received. | | `subject` | Subject line as observed in the seed mailbox. | | `received_relative` | Human-readable delay between send and observation (for example `2 min`). | The structured report endpoint aggregates these rows into a summary. | Field | Description | | --- | --- | | `received` | Number of seeds that observed the message. | | `total` | Total seed inboxes queried (15). | | `missing` | `total - received`. | | `primary` | Seeds where placement is `Inbox principal`. | | `primary_pct` | Primary inbox rate as a percentage of `received`. | | `inbox_secondary` | Seeds where placement is a non-primary inbox tab. | | `promotions` | Seeds where placement matched Promotions or Social. | | `spam` | Seeds where placement is `Spam`. | | `spam_pct` | Spam rate as a percentage of `received`. | | `verdict` | Contextual verdict — see below. | | `seeds` | The per-seed array described in the previous table. | The `verdict` object carries a one-line judgment and an actionable note. | `verdict.label` | When | | --- | --- | | `EXCELLENT` | `primary_pct` ≥ 90. | | `TRÈS BON` | `primary_pct` ≥ 70. | | `MOYEN` | `primary_pct` ≥ 50. | | `MAUVAIS` | `spam_pct` ≥ 50. | | `INSUFFISANT` | Most messages landed in secondary tabs. | | `EN ATTENTE` | Nothing received yet. | ## Lifecycle Standard job states — see [Jobs lifecycle](/docs/concepts/jobs-lifecycle). The runtime workflow is: create the job, fetch seeds via `GET /api/delivery-check/seeds`, send the real message to all fifteen from the domain under test, wait for the worker to poll until all seeds report `received` or a timeout fires, then read the aggregated report. The module does not chain. Its output is not reusable as input for another job — Inbox placement is listed in the non-chainable job set alongside `viewport_test`. ## Pipeline Inbox placement is `standalone_only`. - **Needs:** nothing. The job takes a domain string, not a list of records. - **Produces:** no reusable columns. The CSV exists for export but is not exposed to the pipeline graph. - **Pipelinable:** no. - **Veille:** not supported. If a campaign needs to react to a placement result, the report is consumed from the API and branched on in external orchestration — the module will not feed another node directly. ## Endpoints | Method | Path | Purpose | | --- | --- | --- | | `POST` | `/api/jobs/delivery-check` | Create a delivery-check job. Body: `{ "domain": "...", "subject_filter": "..." }`. Returns the public job object. | | `GET` | `/api/delivery-check/seeds` | List the fifteen seed addresses to send the test message to. | | `GET` | `/api/jobs/{job_id}/delivery-result` | Aggregated report with summary, verdict, and per-seed rows. | | `GET` | `/api/jobs/{job_id}` | Standard job status (queued, running, done, failed). | All endpoints require an authenticated, active user. Reading another user's job returns `403`. Budget per job is fixed at fifteen seed observations; no slider, no override. Inbox placement does not consume scraping credits (`ef_per_item: 0`), though the per-user job quota still applies. A complete run typically lands between two and eight minutes after the seed message is sent. The domain must contain a dot and is lowercased internally. For global caps, see [Limits](/docs/concepts/limits). ## Errors | Status | Reason | | --- | --- | | `400` | `Domaine d'envoi invalide` — the domain is empty or does not contain a dot. | | `400` | `Quota dépassé` — `MAX_EF_PER_JOB` reached. Inbox placement itself is free, but the quota check still applies. | | `400` | `Pas un job de test de délivrabilité` — `/delivery-result` was called on a job whose type is not `delivery_check`. | | `403` | The job belongs to another user. | | `404` | The job ID does not exist. | | `410` | The job's CSV has expired and been deleted. | If the report returns `received: 0` after the worker has run, the seed message never arrived — either it was not sent, was blocked entirely, or the domain is on a complete blocklist. Re-send to the seeds and re-poll before drawing a conclusion. ## What's next - [Verify emails](/docs/modules/verify_emails) — clean a list of addresses before sending, so that the seed test reflects what the deliverable subset will see. - [Ads intelligence](/docs/modules/ads_intelligence) — once placement is solid, see which competitors are paying for visibility on the same audience. --- title: Emails slug: modules/emails section: Modules summary: Find a working email address for each point of interest from a previous list. --- ## Purpose Enrichment module: infers email addresses from each POI's website. Personal mailboxes rank above generic ones (`info@`, `contact@`, `hello@`). No address is invented — empty when no candidate qualifies. ## Inputs A list of POIs each carrying at least a website. The two execution modes differ in coverage vs cost. | Field | Type | Required | Description | |---|---|---|---| | `items` | array of POI objects | yes | 1 to 10 000 entries. Entries without a `site_web` are filtered out before execution. | | `mode` | `"normal"` \| `"deep"` | no, defaults to `"normal"` | `normal` runs the standard extraction. `deep` runs an exhaustive second pass and requires a previously completed `normal` run on the same source. | | `source_job_id` | string | conditionally | Required when `mode = "deep"`. Must reference a `done` `emails` job in `normal` mode on the same upstream source. | ## Outputs Each input POI is augmented with up to two email fields. Original POI columns are preserved; the job appends: | Column | Type | Description | |---|---|---| | `email` | string \| null | Best-ranked address for this POI. Empty when no candidate qualifies. | | `email_personal` | string \| null | Set when the best candidate looks like a person's mailbox rather than a generic role address. | Ranking is deterministic. Progress unit: `sites`. Result unit: `emails`. ## Lifecycle Standard job lifecycle: see [Jobs & lifecycle](/docs/concepts/jobs-lifecycle). ## Pipeline | Slot | Value | |---|---| | `needs` | `poi_list` (POIs with a `site_web` field) | | `produces` | `enriched_list` (POIs augmented with `email`, `email_personal`) | | Typical upstream | `scrap` | | Typical downstream | `verify_emails`, `delivery_check`, `filter` | Default pipeline config: `{ "mode": "normal" }`. `deep` is intended as a manual follow-up on POIs that came back empty from the normal run. ## Endpoints Create a job: ``` POST /api/jobs/emails Content-Type: application/json { "items": [ { "site_web": "https://example.com", "nom": "Example Co" } ], "mode": "normal" } ``` Response: a `JobPublic` object with `id`, `status`, and standard metadata. For the full job API surface, see [Jobs API](/docs/api/jobs). ## Limits Global quotas: see [/docs/concepts/limits](/docs/concepts/limits). Module-specific caps: | Limit | Value | |---|---| | Minimum items per job | 1 | | Maximum items per job | 10 000 | | Items kept | only those with a non-empty `site_web` | | `deep` mode prerequisite | A `done` `normal` `emails` job on the same `source_job_id` | Items without a website are dropped during normalization. If the filtered list is empty, the job is rejected with `"Aucun établissement avec site web"`. ## Errors | HTTP | `detail` | Cause | |---|---|---| | 400 | `Mode email invalide : ... (attendu: normal | deep)` | `mode` is neither `normal` nor `deep`. | | 400 | `Aucun établissement avec site web` | No input item carries a `site_web` field. | | 400 | `Le mode Deep Extract n'est dispo qu'après une extraction normale ...` | `mode = "deep"` submitted without a valid prior normal run on the same source. | | 400 | `Quota dépassé : ...` | Estimated cost exceeds the per-job equivalent-France quota. | | 401 / 403 | — | Missing or inactive session. | Errors raised after creation surface via the SSE `error` event; the job ends in `status = "error"` and partial results remain downloadable. ## What's next - [verify_emails](/docs/modules/verify-emails) — confirm each address is deliverable before sending. - [delivery_check](/docs/modules/delivery-check) — measure inbox placement on a real message. - [filter](/docs/modules/filter) — keep only POIs that have a personal address, exclude disposable domains, or sample the list. --- title: Filter slug: modules/filter section: Modules --- ## Purpose The `filter` module narrows a dataset to the rows that match a set of rules. It is pipeline-internal (see [/docs/concepts/pipeline-orchestration](/docs/concepts/pipeline-orchestration)): it consumes the CSV produced by an upstream node and emits a strict subset, with the same columns. No new data is fetched and no column is added. Filtering early saves budget on the expensive enrichment steps that follow. ## Inputs The rules are read from the node's `config` object and applied row-by-row in a fixed order. Every key is optional; an empty rule is a no-op. ### Standard rules | Key | Type | Behaviour | | --- | --- | --- | | `require_phone` | `bool` | Keep rows where `telephone` is non-empty. | | `require_site` | `bool` | Keep rows where `site_web` is non-empty. | | `require_email` | `bool` | Keep rows where `email` is non-empty. | | `exclude_aggregators` | `bool` | Drop rows whose `site_web` points to a known aggregator domain. | | `alive_only` | `bool` | Keep rows whose dead-check `status` is `alive` or `stale`. | | `has_personal_email` | `bool` | Keep rows where at least one address in `email` is a personal mailbox (not role-based). | | `rating_min` | `float` | Keep rows where `note >= rating_min`. | | `reviews_min` | `int` | Keep rows where `nb_avis >= reviews_min`. | ### Advanced rules | Key | Shape | Behaviour | | --- | --- | --- | | `phone_prefix` | `{ column?, prefixes[], prefix_unparseable_keep? }` | Keep rows whose phone column starts with one of `prefixes` (e.g. `06`, `+33`). Requires the `phonenumbers` library on the worker — otherwise the rule is logged and skipped. | | `email_domain` | `{ column?, include[], exclude[], reject_disposable? }` | Keep rows whose email domain is in `include` (if set) and not in `exclude`. `reject_disposable` drops known throwaway providers. | | `category` | `{ column, values[] }` | Keep rows whose `column` value is contained in `values`. | | `dedup_column` | `string` | Collapse rows that share the same value on this column (first row wins). | ### Sampling | Key | Type | Behaviour | | --- | --- | --- | | `sample_type` | `"n" \| "pct" \| ""` | Selects which sampling mode applies after the rules above. | | `sample_n` | `int` | Keep the first `n` matched rows. | | `sample_pct` | `0..100` | Keep a percentage of matched rows. | | `sample_seed` | `int` | Seed for reproducible random sampling. | Order of application: requirement flags → aggregator/alive/rating/reviews → personal-email → `phone_prefix` → `email_domain` → `category` → `dedup_column` → sampling. ## Outputs The module writes a CSV with the same columns as the upstream node, containing only the matched rows. It does not produce new fields (`needs: []`, `produces: []`, `pipeline_passthrough: true`). | Field | Value | | --- | --- | | `output_filename` | `results_