action Azione | enumstartstatusstopresults
| si | start | start = avvia nuovo job (ritorna jobId). status = metriche job (richiede jobId). stop = termina job. results = paginated batch (richiede jobId+cursor opzionale). |
endpoint Crawler endpoint | string | no | — https://crawler.miosito.com (vuoto = env FLOWFORGE_CRAWLER_ENDPOINT) | Servizio crawler self-host o managed Zeli. |
apiKey API Key | string (encrypted) | no | — | Bearer token. |
jobId Job ID (status/stop/results) | string | no | — crawl_abc123 | Required per status/stop/results. Per start, opzionale (auto-generato se vuoto) o per resume (con resume=true). |
seeds Seed URLs | string (multiline) | no | — https://site.com
https://site.com/blog | URLs iniziali (comma o newline). Required per action=start. |
maxDepth Max depth | number | no | 3 | Profondita\` max link da seed. 0 = solo seed. Max 10. |
maxPages Max pages | number | no | 1000 | Pagine totali max. Hard stop. Min 1, max 100k. |
allowDomains Allow domains | string | no | — site.com, www.site.com (vuoto = hostname dei seeds) | Solo questi domini vengono crawled. Default: hostname dei seeds (no cross-domain). |
denyPatterns Deny patterns (regex) | string (multiline) | no | — /admin/.*
.*\.pdf$
/logout | Regex URL da NON crawlare (es. /admin, /logout, file binari). |
respectRobots Respect robots.txt | boolean | no | true | RFC 9309 compliance. Default ON (raccomandato). |
sitemapFirst Sitemap-first seed | boolean | no | false | Prima di crawlare HTML, fetcha sitemap.xml e accoda quegli URL in priorita\`. |
bloomCapacity Bloom filter capacity | number | no | 1000000 | Slot per dedup URL. Default 1M = ~7MB RAM. Aumenta per crawl giganti. |
bloomFpr Bloom FP rate | number | no | 0.001 | False positive rate. Default 0.001 (0.1%). Lower = piu\` RAM. |
parallelism Worker paralleli | number | no | 4 | Coroutine paralleli per il job. Max 50. |
rateLimitPerHostQps Rate-limit per host (QPS) | number | no | 2 | Max requests/sec per hostname (politeness). Default 2 QPS. |
userAgent User-Agent | string | no | FlowForge-Crawler/1.0 (+https://flowforge.automazionezeli.com) | UA identificativo (etica, transparency). |
callbackUrl Callback webhook | string | no | — https://tenant.app.automazionezeli.com/webhooks/crawler | Webhook FlowForge che riceve batch di pages. Vuoto = no callback, usa action=results. |
callbackSecret Callback secret | string (encrypted) | no | — | HMAC secret per autenticare il callback. Validato da trigger_webhook a downstream. |
callbackBatchSize Batch size callback | number | no | 10 | Pagine per callback POST. Default 10. Max 1000. |
cursor Cursor (results) | string | no | — | Cursor paginazione per action=results. |
resume Resume job | boolean | no | false | Se ON + jobId esistente: resume da checkpoint. Altrimenti errore se jobId esiste. |