Nicolas
|
f4d10c5031
|
Nick: formatting fixes
|
2025-01-10 18:35:10 -03:00 |
|
Gergő Móricz
|
29c1f126ab
|
feat(scrape-status): adapt
|
2025-01-09 19:14:00 +01:00 |
|
Nicolas
|
51cb4b1615
|
Nick: temp rl for /extract
|
2025-01-08 15:24:38 -03:00 |
|
Nicolas
|
27457ed5db
|
Nick: init
|
2025-01-03 20:44:27 -03:00 |
|
Nicolas
|
d2742bec4d
|
Nick: v1 search
|
2025-01-02 19:31:03 -03:00 |
|
Nicolas
|
d1f3e26f9e
|
Nick: blocklist string
|
2024-12-20 18:09:49 -03:00 |
|
Nicolas
|
6222152249
|
Nick: credit usage endpoint
|
2024-12-20 15:44:17 -03:00 |
|
Eric Ciarla
|
a2998d4499
|
Hash Urls
|
2024-12-12 16:10:10 -05:00 |
|
Nicolas
|
8a1c404918
|
Nick: revert trailing comma
|
2024-12-11 19:51:08 -03:00 |
|
Nicolas
|
52f2e733e2
|
Nick: fixes
|
2024-12-11 19:48:22 -03:00 |
|
Nicolas
|
00335e2ba9
|
Nick: fixed prettier
|
2024-12-11 19:46:11 -03:00 |
|
Nicolas
|
d817aa744f
|
Update v1.ts
|
2024-11-24 19:46:31 -08:00 |
|
Nicolas
|
a4f15260a7
|
Nick:
|
2024-11-12 12:23:24 -05:00 |
|
Gergő Móricz
|
8d467c8ca7
|
WebScraper refactor into scrapeURL (#714)
* feat: use strictNullChecking
* feat: switch logger to Winston
* feat(scrapeURL): first batch
* fix(scrapeURL): error swallow
* fix(scrapeURL): add timeout to EngineResultsTracker
* fix(scrapeURL): report unexpected error to sentry
* chore: remove unused modules
* feat(transfomers/coerce): warn when a format's response is missing
* feat(scrapeURL): feature flag priorities, engine quality sorting, PDF and DOCX support
* (add note)
* feat(scrapeURL): wip readme
* feat(scrapeURL): LLM extract
* feat(scrapeURL): better warnings
* fix(scrapeURL/engines/fire-engine;playwright): fix screenshot
* feat(scrapeURL): add forceEngine internal option
* feat(scrapeURL/engines): scrapingbee
* feat(scrapeURL/transformars): uploadScreenshot
* feat(scrapeURL): more intense tests
* bunch of stuff
* get rid of WebScraper (mostly)
* adapt batch scrape
* add staging deploy workflow
* fix yaml
* fix logger issues
* fix v1 test schema
* feat(scrapeURL/fire-engine/chrome-cdp): remove wait inserts on actions
* scrapeURL: v0 backwards compat
* logger fixes
* feat(scrapeurl): v0 returnOnlyUrls support
* fix(scrapeURL/v0): URL leniency
* fix(batch-scrape): ts non-nullable
* fix(scrapeURL/fire-engine/chromecdp): fix wait action
* fix(logger): remove error debug key
* feat(requests.http): use dotenv expression
* fix(scrapeURL/extractMetadata): extract custom metadata
* fix crawl option conversion
* feat(scrapeURL): Add retry logic to robustFetch
* fix(scrapeURL): crawl stuff
* fix(scrapeURL): LLM extract
* fix(scrapeURL/v0): search fix
* fix(tests/v0): grant larger response size to v0 crawl status
* feat(scrapeURL): basic fetch engine
* feat(scrapeURL): playwright engine
* feat(scrapeURL): add url-specific parameters
* Update readme and examples
* added e2e tests for most parameters. Still a few actions, location and iframes to be done.
* fixed type
* Nick:
* Update scrape.ts
* Update index.ts
* added actions and base64 check
* Nick: skipTls feature flag?
* 403
* todo
* todo
* fixes
* yeet headers from url specific params
* add warning when final engine has feature deficit
* expose engine results tracker for ScrapeEvents implementation
* ingest scrape events
* fixed some tests
* comment
* Update index.test.ts
* fixed rawHtml
* Update index.test.ts
* update comments
* move geolocation to global f-e option, fix removeBase64Images
* Nick:
* trim url-specific params
* Update index.ts
---------
Co-authored-by: Eric Ciarla <ericciarla@yahoo.com>
Co-authored-by: rafaelmmiller <8574157+rafaelmmiller@users.noreply.github.com>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
|
2024-11-07 20:57:33 +01:00 |
|
Nicolas
|
d8abd15716
|
Nick: from bulk to batch
|
2024-10-23 15:37:24 -03:00 |
|
Gergő Móricz
|
70c4e7c334
|
feat(bulk/scrape): check credits via url list length
|
2024-10-23 19:42:02 +02:00 |
|
Nicolas
|
66e505317e
|
Merge branch 'main' into mog/bulk-scrape
|
2024-10-23 14:36:26 -03:00 |
|
Gergő Móricz
|
6ed3104eb6
|
feat: clear ACUC cache endpoint based on team ID
|
2024-10-22 20:28:10 +02:00 |
|
Nicolas
|
79e65f31ef
|
Update v1.ts
|
2024-10-17 17:57:44 -03:00 |
|
Gergő Móricz
|
03b37998fd
|
feat: bulk scrape
|
2024-10-17 19:40:18 +02:00 |
|
Nicolas
|
18f9cd09e1
|
Nick: fixed more stuff
|
2024-10-01 16:04:39 -03:00 |
|
Nicolas
|
8d44cb33bb
|
Nick: fixed error message
|
2024-09-26 22:15:15 +02:00 |
|
Gergő Móricz
|
f8c70fe5dd
|
feat(db): implement auth_credit_usage_chunk RPC
|
2024-09-26 22:15:15 +02:00 |
|
Gergő Móricz
|
3e661a2087
|
fix(v1/crawl-cancel): avoid double authing
|
2024-09-24 20:01:34 +02:00 |
|
Gergo Moricz
|
b4dbf75537
|
fix(v1): check if url is string in blocklistMiddleware
Fixes FIRECRAWL-SCRAPER-JS-9Z
|
2024-09-10 10:25:14 +02:00 |
|
rafaelsideguide
|
ad950a6c9d
|
fixed controller res and tests
|
2024-09-04 11:29:32 -03:00 |
|
Nicolas
|
8431be5826
|
Nick:
|
2024-08-31 14:23:55 -03:00 |
|
Nicolas
|
170a8ebfe5
|
Nick:
|
2024-08-27 11:58:42 -03:00 |
|
rafaelsideguide
|
1ef41b92a0
|
feat: cancel
v0 implementation + e2e test
|
2024-08-27 09:42:55 -03:00 |
|
Nicolas
|
7d93eab0f8
|
Nick:
|
2024-08-26 18:48:00 -03:00 |
|
Gergő Móricz
|
eea530e0ad
|
feat(v1): update for sentry
|
2024-08-23 17:29:42 +02:00 |
|
Nicolas
|
b36faeaf54
|
Nick:
|
2024-08-20 14:39:52 -03:00 |
|
Nicolas
|
27903247b6
|
Nick: map tests and fixes
|
2024-08-20 12:04:08 -03:00 |
|
rafaelsideguide
|
fd7fdc1d52
|
added blocklist middleware
|
2024-08-19 13:28:54 -03:00 |
|
Gergő Móricz
|
eb84673b06
|
feat: crawl status websocket WIP
|
2024-08-17 01:04:14 +02:00 |
|
Gergő Móricz
|
e2a6ef26d3
|
mount v1Router under v1 path
|
2024-08-16 23:48:50 +02:00 |
|
Gergő Móricz
|
f20328bdbb
|
crawl status and document stuff
|
2024-08-16 22:48:05 +02:00 |
|
Gergő Móricz
|
8b7569f8f3
|
add zod, create middleware, update openapi declaration, add crawl logic
|
2024-08-15 23:30:33 +02:00 |
|
rafaelsideguide
|
6cdf4c68ec
|
wip: map, crawl, scrape mockups
|
2024-08-06 15:24:45 -03:00 |
|