FR
Copied
Modules

Legal mentions

Legal mentions

The legal_mentions module locates the legal-mentions page (also known as mentions légales or Impressum) on each POI's website and extracts its structured contents. It runs as an enrichment step on top of an existing POI list and returns one row per input site, whether or not a legal page was found.

Purpose

Most B2B due-diligence workflows hinge on facts only published on a company's own website: registered company name, director, share capital, hosting provider. The module surfaces those facts at list scale, so a downstream campaign or audit can filter, segment, and cross-reference without manual visits.

Typical uses:

The module never invents values. Fields left empty mean the information could not be located on the target page.

Inputs

The job consumes a list of POI items. Each item must carry a website URL; items without one are dropped during validation.

Field Type Required Notes
site_web string yes Root URL of the establishment's website.
name string no Carried through to the output for joining.
source_job_id string no ID of an upstream scrap job to inherit items from.

Submit between 1 and 10,000 items per job. Items are normalized and deduplicated before execution.

Outputs

One row is produced per input site. Columns:

Column Type Description
raison_sociale string Registered company name as it appears on the legal-mentions page.
forme_juridique string Legal form (SAS, SARL, SA, EI, etc.).
capital_social string Declared share capital, in the currency given on the page.
rcs string RCS registration entry (city + identifier).
adresse_postale string Postal address of the registered office.
dirigeant string Publication director or legal representative, when stated.
tva_intracom string Intra-community VAT number, validated against the FR format when present.

Empty cells indicate the field was not present on the parsed page — the module never invents values. The output is delivered as a CSV alongside the input columns.

Lifecycle

Standard job lifecycle — see Jobs lifecycle. Progress is reported per site; partial output is preserved if the job is canceled or fails mid-run.

Pipeline

The module is an enrichment step. It plugs into the standard list pipeline:

needs: [site_web]
produces: [raison_sociale, forme_juridique, capital_social, rcs, adresse_postale, dirigeant, tva_intracom]

A typical chain looks like scraplegal_mentionslegal_data. The source can be selected by uploading items directly or by referencing a recent scrap job through source_job_id.

Endpoints

All endpoints require an authenticated, active user.

Method Path Body Returns
POST /api/jobs/legal-mentions { items: [...], source_job_id } JobPublic
GET /api/jobs/{id} JobPublic
GET /api/jobs/{id}/output CSV stream
POST /api/jobs/{id}/cancel JobPublic

The create endpoint validates quota up front and returns 400 with a descriptive message if validation fails. Items per job: 1 to 10,000. Global quotas: see Limits.

Errors

Two outcomes are surfaced as empty rows rather than job failures, because they are expected at list scale:

Condition Behavior
No legal page found Row returned with every legal field blank.
Website down Row returned with all fields blank; the site is marked unreachable.

Job-level failures (status = failed) are reserved for non-recoverable conditions such as invalid input or quota errors. The error message is exposed on the job record.

What's next