firecrawl

Author	SHA1	Message	Date
Gergő Móricz	8d467c8ca7	`WebScraper` refactor into `scrapeURL` (#714 ) * feat: use strictNullChecking * feat: switch logger to Winston * feat(scrapeURL): first batch * fix(scrapeURL): error swallow * fix(scrapeURL): add timeout to EngineResultsTracker * fix(scrapeURL): report unexpected error to sentry * chore: remove unused modules * feat(transfomers/coerce): warn when a format's response is missing * feat(scrapeURL): feature flag priorities, engine quality sorting, PDF and DOCX support * (add note) * feat(scrapeURL): wip readme * feat(scrapeURL): LLM extract * feat(scrapeURL): better warnings * fix(scrapeURL/engines/fire-engine;playwright): fix screenshot * feat(scrapeURL): add forceEngine internal option * feat(scrapeURL/engines): scrapingbee * feat(scrapeURL/transformars): uploadScreenshot * feat(scrapeURL): more intense tests * bunch of stuff * get rid of WebScraper (mostly) * adapt batch scrape * add staging deploy workflow * fix yaml * fix logger issues * fix v1 test schema * feat(scrapeURL/fire-engine/chrome-cdp): remove wait inserts on actions * scrapeURL: v0 backwards compat * logger fixes * feat(scrapeurl): v0 returnOnlyUrls support * fix(scrapeURL/v0): URL leniency * fix(batch-scrape): ts non-nullable * fix(scrapeURL/fire-engine/chromecdp): fix wait action * fix(logger): remove error debug key * feat(requests.http): use dotenv expression * fix(scrapeURL/extractMetadata): extract custom metadata * fix crawl option conversion * feat(scrapeURL): Add retry logic to robustFetch * fix(scrapeURL): crawl stuff * fix(scrapeURL): LLM extract * fix(scrapeURL/v0): search fix * fix(tests/v0): grant larger response size to v0 crawl status * feat(scrapeURL): basic fetch engine * feat(scrapeURL): playwright engine * feat(scrapeURL): add url-specific parameters * Update readme and examples * added e2e tests for most parameters. Still a few actions, location and iframes to be done. * fixed type * Nick: * Update scrape.ts * Update index.ts * added actions and base64 check * Nick: skipTls feature flag? * 403 * todo * todo * fixes * yeet headers from url specific params * add warning when final engine has feature deficit * expose engine results tracker for ScrapeEvents implementation * ingest scrape events * fixed some tests * comment * Update index.test.ts * fixed rawHtml * Update index.test.ts * update comments * move geolocation to global f-e option, fix removeBase64Images * Nick: * trim url-specific params * Update index.ts --------- Co-authored-by: Eric Ciarla <ericciarla@yahoo.com> Co-authored-by: rafaelmmiller <8574157+rafaelmmiller@users.noreply.github.com> Co-authored-by: Nicolas <nicolascamara29@gmail.com>	2024-11-07 20:57:33 +01:00
rafaelsideguide	cb8571abad	fix: enforced dotenv config	2024-09-04 15:57:57 -03:00
Nicolas	08a9cb8db4	Merge branch 'main' into pr/516	2024-09-02 23:32:23 -03:00
rafaelsideguide	bbed6ef23d	added validation on every USE_DB_AUTHENTICATION call	2024-08-12 14:20:41 -03:00
Gergő Móricz	5fc7fcb77c	Merge branch 'main' into feat/queue-scrapes	2024-08-07 16:35:44 +02:00
Gergo Moricz	fe9fdb578b	revert bad hotfixes	2024-08-07 16:34:25 +02:00
Gergo Moricz	2e2e80d679	fix(scrape-events): updateScrapeResult fix	2024-08-07 14:17:50 +02:00
Nicolas	a28ecc1f61	Nick: caching	2024-07-30 18:59:35 -04:00
Nicolas	7e002a8b06	Nick: bull mq	2024-07-30 13:27:23 -04:00
Nicolas	50d2426fc4	Update scrape-events.ts	2024-07-25 16:20:29 -04:00
rafaelsideguide	cc98f83fda	added failed and completed log events	2024-07-24 15:25:36 -03:00
Gergo Moricz	60c74357df	feat(ScrapeEvents): log queue events	2024-07-24 18:44:14 +02:00
Gergo Moricz	4d35ad073c	feat(monitoring/scrape): include url, worker, response_size	2024-07-24 16:43:39 +02:00
Gergo Moricz	71072fef3b	fix(scrape-events): bad logic	2024-07-24 14:46:41 +02:00
Gergo Moricz	7cd9bf92e3	feat: scrape event logging to DB	2024-07-24 14:31:25 +02:00

15 Commits