FR
Copied
Modules

Scrap (Google Maps)

Source module that extracts Google Maps listings for a set of search queries across one or more geographic zones.

Purpose

Source module: runs each query over every grid point covering the requested zones and returns a flat CSV of Google Maps establishments (name, contact, location, rating).

Inputs

Field Type Required Description
queries string[] (1–20) yes Search terms run against Google Maps. Each query is trimmed and capped at 200 chars.
zones string[] (1–50) yes Geographic zones. Accepts INSEE codes, department codes, region names, or "France". Each zone is resolved to a grid of points server-side.
include_reviews bool no Kept for backward compatibility. Does not chain a reviews job — use the reviews module instead. Defaults to false.

Request body:

{
  "queries": ["plombier", "chauffagiste"],
  "zones": ["75", "92"],
  "include_reviews": false
}

Effective Google Maps requests = len(queries) × grid_points(zones). Rejected at submit if cost exceeds the per-job EF ceiling.

Outputs

Result file: UTF-8 CSV, semicolon delimiter, BOM (Excel-safe). Same dataset available in three formats via download endpoint.

Column Type Description
nom string Establishment name as displayed on Google Maps.
site_web string Public website URL if listed.
telephone string Phone number as listed.
adresse string Street address (number + street) as shown in the Maps list.
ville string City, taken from Google's own structured place data (exact, including Paris/Lyon/Marseille arrondissements and multi-postcode cities). Empty for listings Google has no address for.
code_postal string Postal code, from the same Google source as ville.
rating float Average star rating (0.0–5.0).
reviews_count int Number of public reviews.
category string Primary Google Maps category.
lien_google_maps string Canonical Google Maps URL for the listing.
aggregator_flag bool True if the listing looks like a directory/aggregator rather than an end business.
query string Source query that produced the row.
lat, lon float Grid point at which the row was collected.

ville and code_postal come from the structured place data Google ships with each result, not from reverse-geocoding — so they match Google exactly. The Maps list view only renders the street, which is why adresse alone never carried the city.

Optional columns

Three extra columns are off by default and enabled per job via extra_columns (a list). The default output is street + ville + code_postal.

Option in extra_columns Adds column(s) Description
gps lat, lon Exact latitude/longitude of the business (Google's own coordinates — not a grid approximation).
departement departement Department name, derived from code_postal.
region region Region name, derived from code_postal.

Example request body: { "queries": ["plumber"], "zones": ["Paris 10km"], "extra_columns": ["gps", "departement", "region"] }.

Formats: csv (original), json, xlsx. Selected via ?format= on the download endpoint.

Lifecycle

Standard job lifecycle: see Jobs & lifecycle. While running, the SSE status event carries a query_stats payload of shape { "<query>": { "tiles": int, "with_results": int } }, updated in real time to expose per-query hit ratio.

Pipeline

Field Value
needs null (source module — no input CSV required)
produces poi_list

Typical downstream modules chained against a scrap output:

Endpoints

Dedicated endpoint:

POST /api/jobs
Content-Type: application/json

{
  "queries": ["plombier"],
  "zones": ["75"],
  "include_reviews": false
}

Generic job endpoint (equivalent — same payload, job_type inferred from shape):

POST /api/jobs
Content-Type: application/json

{
  "job_type": "scrap",
  "queries": ["plombier"],
  "zones": ["75"]
}

Both responses return the created JobPublic object including id, status, grid_points_count, ef_cost and output_filename.

Download:

GET /api/jobs/{job_id}/download?format=csv|json|xlsx

Limits

Platform-wide quotas: see /docs/concepts/limits. Module-specific caps:

Limit Value
Maximum queries per job 20
Maximum zones per job 50
Maximum query length 200 chars
Maximum cost per job 1.0 equivalent-France (EF)
Email verification Required on the account before a scrap job can be created.

Errors

Scenario HTTP Resolution
Unrecognised zone string 400 Inspect the errors array in the response body; use INSEE/department codes or "France".
No grid points resolved 400 The zone set is empty after resolution — broaden the zone selection.
EF quota exceeded 400 Reduce the number of queries or shrink the zones until estimated EF ≤ 1.0.
Email not verified 403 Verify the account email before creating a scrap job.
No worker available The job stays in pending until the shared multi-proxy pool is free. Only one multi-proxy job runs at a time platform-wide.
Job failed mid-run A partial CSV is preserved. A POST /api/jobs/{id}/resume creates a follow-up job that skips already-processed grid points and is billed only for the remainder.
Download expired 410 Result files have a retention window — re-run the job or chain from a fresh source.

Queries refused by Google Maps surface in dead_queries on the job object.

What's next