Legal mentions
Legal mentions
The legal_mentions module locates the legal-mentions page (also known as mentions légales or Impressum) on each POI's website and extracts its structured contents. It runs as an enrichment step on top of an existing POI list and returns one row per input site, whether or not a legal page was found.
Purpose
Most B2B due-diligence workflows hinge on facts only published on a company's own website: registered company name, director, share capital, hosting provider. The module surfaces those facts at list scale, so a downstream campaign or audit can filter, segment, and cross-reference without manual visits.
Typical uses:
- Qualifying a prospect list by company size proxies (capital, legal form).
- Matching a trade name against the registered entity before outreach.
- Building a directors index for personalized messaging.
- Auditing host providers across a sector.
The module never invents values. Fields left empty mean the information could not be located on the target page.
Inputs
The job consumes a list of POI items. Each item must carry a website URL; items without one are dropped during validation.
| Field | Type | Required | Notes |
|---|---|---|---|
site_web |
string | yes | Root URL of the establishment's website. |
name |
string | no | Carried through to the output for joining. |
source_job_id |
string | no | ID of an upstream scrap job to inherit items from. |
Submit between 1 and 10,000 items per job. Items are normalized and deduplicated before execution.
Outputs
One row is produced per input site. Columns:
| Column | Type | Description |
|---|---|---|
raison_sociale |
string | Registered company name as it appears on the legal-mentions page. |
forme_juridique |
string | Legal form (SAS, SARL, SA, EI, etc.). |
capital_social |
string | Declared share capital, in the currency given on the page. |
rcs |
string | RCS registration entry (city + identifier). |
adresse_postale |
string | Postal address of the registered office. |
dirigeant |
string | Publication director or legal representative, when stated. |
tva_intracom |
string | Intra-community VAT number, validated against the FR format when present. |
Empty cells indicate the field was not present on the parsed page — the module never invents values. The output is delivered as a CSV alongside the input columns.
Lifecycle
Standard job lifecycle — see Jobs lifecycle. Progress is reported per site; partial output is preserved if the job is canceled or fails mid-run.
Pipeline
The module is an enrichment step. It plugs into the standard list pipeline:
needs: [site_web]
produces: [raison_sociale, forme_juridique, capital_social, rcs, adresse_postale, dirigeant, tva_intracom]
A typical chain looks like scrap → legal_mentions → legal_data. The source can be selected by uploading items directly or by referencing a recent scrap job through source_job_id.
Endpoints
All endpoints require an authenticated, active user.
| Method | Path | Body | Returns |
|---|---|---|---|
| POST | /api/jobs/legal-mentions |
{ items: [...], source_job_id } |
JobPublic |
| GET | /api/jobs/{id} |
— | JobPublic |
| GET | /api/jobs/{id}/output |
— | CSV stream |
| POST | /api/jobs/{id}/cancel |
— | JobPublic |
The create endpoint validates quota up front and returns 400 with a descriptive message if validation fails. Items per job: 1 to 10,000. Global quotas: see Limits.
Errors
Two outcomes are surfaced as empty rows rather than job failures, because they are expected at list scale:
| Condition | Behavior |
|---|---|
| No legal page found | Row returned with every legal field blank. |
| Website down | Row returned with all fields blank; the site is marked unreachable. |
Job-level failures (status = failed) are reserved for non-recoverable conditions such as invalid input or quota errors. The error message is exposed on the job record.
What's next
- legal_ids — detect SIREN/SIRET from the same website set.
- legal_data — enrich each identifier with official company data.