French legal data
The legal_data module enriches a list of POI with official records from French public legal data sources. For each input row, the module queries api.gouv.fr (SIRENE, INPI, RNCS) and complements the response with BODACC legal notices and Infogreffe public extracts. The result is a structured profile attached to each company: legal form, capital, registered executives, NAF code, headcount band, headline financials, and a consolidated lead status.
The module is read-only: no credentials are required, no fees are charged by the upstream sources, and no business is contacted as part of the lookup.
No website needed. Unlike
legal_ids, which reads identifiers from a website, this module matches each row by name + address against SIRENE and also returns the SIRET/SIREN. Choose it when your list has no website column — for example a Google Maps scrape with names and map links only.
Purpose
French B2B prospecting lists tend to start with a name, an address, and maybe a website. legal_data turns each row into a qualified company record using only public registries.
Typical use cases:
- Filter a scraped list by capital, headcount band, or NAF code before outreach.
- Detect dead or insolvent entities (
bodacc_procedure_collective) and drop them from a sequence. - Identify companies with recent legal events (capital increase, executive change, address change) as opportunity signals.
- Recover named executives to personalize a first-touch email.
Inputs
legal_data is an enrichment module: it consumes an existing list of POI rather than producing one. The expected input is a poi_list, typically the output of a discovery job.
| Field | Required | Notes |
|---|---|---|
nom |
yes | Company name, used for fuzzy matching. |
siren |
no | If present, used for an exact match (preferred). |
code_postal |
no | Disambiguates fuzzy name matches. |
lat, lon |
no | Geographic fallback when name and SIREN both fail. |
Match resolution follows three tiers, in order:
- Exact SIREN lookup when the identifier is provided.
- Fuzzy match on
nom+code_postal. - Geographic fallback on coordinates within a small radius.
A row that cannot be resolved is returned with empty enrichment columns and an error code (see Errors).
Outputs
Each input row is augmented with the following columns. Empty values are preserved as empty strings — the module never fabricates a value.
| Column | Type | Description |
|---|---|---|
legal_form |
string | Legal form (SAS, SARL, SA, EI, association, etc.). |
capital |
number | Registered share capital in EUR. |
founding_date |
date | Date of registration in the company register. |
executives |
list | Named executives with role (Président, Gérant, DG). |
financials |
object | Last available revenue and net income, with fiscal year. |
naf_code |
string | Five-character NAF/APE activity code. |
employees_range |
string | INSEE headcount band (e.g. 10-19, 100-199). |
A consolidated lead_status is also returned, taking one of four values: mort, alerte, opportunite, actif. It encodes the combination of administrative state, BODACC signals, and recency of legal events.
Lifecycle
Standard job lifecycle — see Jobs lifecycle. Progress is reported per establishment processed. The job is idempotent within a session: re-running on the same input list yields the same enriched columns, modulo upstream registry updates.
Pipeline
needs: poi_list
produces: enriched_list
legal_data consumes a poi_list and emits an enriched_list carrying the original rows plus the columns described in Outputs. The enriched list can itself be consumed by downstream enrichment modules (legal_mentions, legal_ids, etc.).
Endpoints
Create a job
POST /api/jobs/legal-data
Request body:
{
"items": [
{ "nom": "Boulangerie Martin", "code_postal": "75011" },
{ "siren": "552120222" }
],
"source_job_id": "job_01HXYZ..."
}
Either items or source_job_id must be provided. When source_job_id references a completed discovery job, its rows are used as input directly.
Response: a Job resource with id, type, status, and progress fields.
Retrieve a job
GET /api/jobs/{job_id}
Returns the current state, progress counters, and — when done — the download URL for the enriched CSV.
List jobs
GET /api/jobs?type=legal_data
Maximum 5,000 rows per job. Larger lists must be split client-side. Global quotas and rate limits: see Limits.
Financial figures depend on the company having filed its accounts (roughly 60 percent of French SMEs). Executive names reflect the last filing; recent changes may take a few weeks to propagate.
Errors
Row-level errors are reported in an error column on the enriched output. Job-level errors transition the job to failed.
| Code | Scope | Meaning |
|---|---|---|
not_found |
row | No match in SIRENE for the provided name and postcode. |
foreign_business |
row | Establishment is not registered in France. |
ambiguous_match |
row | Several candidates with equal score; none selected. |
source_unavailable |
job | One or more upstream public sources are unreachable. |
quota_exceeded |
job | Daily fair-use quota reached; retry the next day. |
invalid_input |
job | Input list is empty or missing required fields. |
A source_unavailable failure preserves all rows already enriched before the outage. The job can be re-submitted with the remaining rows once the upstream source recovers.
What's next
legal_ids— detect SIREN and SIRET identifiers directly on a company website, with Luhn validation. A useful prerequisite when input rows lack asiren.legal_mentions— parse the legal notice page of a website to extract registered name, capital, RCS, postal address, and VAT number. Complementslegal_datawhen registry filings are sparse.