Scrap (Google Maps)
Source module that extracts Google Maps listings for a set of search queries across one or more geographic zones.
Purpose
Source module: runs each query over every grid point covering the requested zones and returns a flat CSV of Google Maps establishments (name, contact, location, rating).
Inputs
| Field | Type | Required | Description |
|---|---|---|---|
queries |
string[] (1–20) |
yes | Search terms run against Google Maps. Each query is trimmed and capped at 200 chars. |
zones |
string[] (1–50) |
yes | Geographic zones. Accepts INSEE codes, department codes, region names, or "France". Each zone is resolved to a grid of points server-side. |
include_reviews |
bool |
no | Kept for backward compatibility. Does not chain a reviews job — use the reviews module instead. Defaults to false. |
Request body:
{
"queries": ["plombier", "chauffagiste"],
"zones": ["75", "92"],
"include_reviews": false
}
Effective Google Maps requests = len(queries) × grid_points(zones). Rejected at submit if cost exceeds the per-job EF ceiling.
Outputs
Result file: UTF-8 CSV, semicolon delimiter, BOM (Excel-safe). Same dataset available in three formats via download endpoint.
| Column | Type | Description |
|---|---|---|
nom |
string | Establishment name as displayed on Google Maps. |
site_web |
string | Public website URL if listed. |
telephone |
string | Phone number as listed. |
adresse |
string | Street address (number + street) as shown in the Maps list. |
ville |
string | City, taken from Google's own structured place data (exact, including Paris/Lyon/Marseille arrondissements and multi-postcode cities). Empty for listings Google has no address for. |
code_postal |
string | Postal code, from the same Google source as ville. |
rating |
float | Average star rating (0.0–5.0). |
reviews_count |
int | Number of public reviews. |
category |
string | Primary Google Maps category. |
lien_google_maps |
string | Canonical Google Maps URL for the listing. |
aggregator_flag |
bool | True if the listing looks like a directory/aggregator rather than an end business. |
query |
string | Source query that produced the row. |
lat, lon |
float | Grid point at which the row was collected. |
villeandcode_postalcome from the structured place data Google ships with each result, not from reverse-geocoding — so they match Google exactly. The Maps list view only renders the street, which is whyadressealone never carried the city.
Optional columns
Three extra columns are off by default and enabled per job via extra_columns (a list). The default output is street + ville + code_postal.
Option in extra_columns |
Adds column(s) | Description |
|---|---|---|
gps |
lat, lon |
Exact latitude/longitude of the business (Google's own coordinates — not a grid approximation). |
departement |
departement |
Department name, derived from code_postal. |
region |
region |
Region name, derived from code_postal. |
Example request body: { "queries": ["plumber"], "zones": ["Paris 10km"], "extra_columns": ["gps", "departement", "region"] }.
Formats: csv (original), json, xlsx. Selected via ?format= on the download endpoint.
Lifecycle
Standard job lifecycle: see Jobs & lifecycle. While running, the SSE status event carries a query_stats payload of shape { "<query>": { "tiles": int, "with_results": int } }, updated in real time to expose per-query hit ratio.
Pipeline
| Field | Value |
|---|---|
needs |
null (source module — no input CSV required) |
produces |
poi_list |
Typical downstream modules chained against a scrap output:
emails— find professional and personal emails fromsite_web.socials— extract social network handles fromsite_web.legal_ids— extract SIREN/SIRET from the establishment's website (legal-mentions page).reviews— collect full review threads fromlien_google_maps.techstack,dead_check,brand_assets,ads_intelligence— site-level enrichments keyed onsite_web.
Endpoints
Dedicated endpoint:
POST /api/jobs
Content-Type: application/json
{
"queries": ["plombier"],
"zones": ["75"],
"include_reviews": false
}
Generic job endpoint (equivalent — same payload, job_type inferred from shape):
POST /api/jobs
Content-Type: application/json
{
"job_type": "scrap",
"queries": ["plombier"],
"zones": ["75"]
}
Both responses return the created JobPublic object including id, status, grid_points_count, ef_cost and output_filename.
Download:
GET /api/jobs/{job_id}/download?format=csv|json|xlsx
Limits
Platform-wide quotas: see /docs/concepts/limits. Module-specific caps:
| Limit | Value |
|---|---|
| Maximum queries per job | 20 |
| Maximum zones per job | 50 |
| Maximum query length | 200 chars |
| Maximum cost per job | 1.0 equivalent-France (EF) |
| Email verification | Required on the account before a scrap job can be created. |
Errors
| Scenario | HTTP | Resolution |
|---|---|---|
| Unrecognised zone string | 400 | Inspect the errors array in the response body; use INSEE/department codes or "France". |
| No grid points resolved | 400 | The zone set is empty after resolution — broaden the zone selection. |
| EF quota exceeded | 400 | Reduce the number of queries or shrink the zones until estimated EF ≤ 1.0. |
| Email not verified | 403 | Verify the account email before creating a scrap job. |
| No worker available | The job stays in pending until the shared multi-proxy pool is free. Only one multi-proxy job runs at a time platform-wide. |
|
| Job failed mid-run | A partial CSV is preserved. A POST /api/jobs/{id}/resume creates a follow-up job that skips already-processed grid points and is billed only for the remainder. |
|
| Download expired | 410 | Result files have a retention window — re-run the job or chain from a fresh source. |
Queries refused by Google Maps surface in dead_queries on the job object.