Commit Graph

38 Commits

Author SHA1 Message Date
Gergő Móricz e7f267b6fe Merge branch 'main' into v1-webscraper 2024-08-23 17:21:54 +02:00
Gergő Móricz 6d48dbcd38 feat(sentry): add trace continuity for queue 2024-08-22 16:47:38 +02:00
Nicolas 90b32f16c8 Nick: fixes 2024-08-20 21:38:11 -03:00
Gergő Móricz 5818236659 fix: remove rawHtml properly 2024-08-20 22:51:12 +02:00
Gergő Móricz aabfaf0ac5 clean up crawl-status, fix db ddos 2024-08-16 23:29:39 +02:00
Gergő Móricz 29f0d9ec94 propagate priority to fire-engine 2024-08-15 19:04:46 +02:00
Gergo Moricz d7549d4dc5 feat: remove webScraperQueue 2024-08-13 21:03:24 +02:00
Nicolas e28c415cf4 Nick: 2024-08-09 14:07:46 -04:00
Gergo Moricz 920b7f2f44 fix(runWebScraper): don't filter empty docs 2024-08-07 21:00:22 +02:00
Gergo Moricz 191dfbd9ca fix: move to completed in one place 2024-08-07 18:49:58 +02:00
Gergo Moricz 8566ece700 fix(scrape): pass extractorOptions 2024-08-06 17:15:19 +02:00
Gergo Moricz 03c84a9372 cleanup and fix cancelling 2024-08-06 16:26:46 +02:00
Nicolas f43d5e7895 Nick: scrape queue 2024-07-30 14:44:13 -04:00
Nicolas 7e002a8b06 Nick: bull mq 2024-07-30 13:27:23 -04:00
rafaelsideguide cc98f83fda added failed and completed log events 2024-07-24 15:25:36 -03:00
Gergo Moricz 7cd9bf92e3 feat: scrape event logging to DB 2024-07-24 14:31:25 +02:00
rafaelsideguide 6208ecdbc0 added logger 2024-07-23 17:30:46 -03:00
Nicolas fd18f2269b Nick: slack alerts 2024-07-12 19:07:59 -04:00
Nicolas 0ddaac6ae0 Nick: fixed the other instances as well 2024-07-12 15:39:10 -04:00
rafaelsideguide d66e1f7846 looking good 2024-06-27 16:00:45 -03:00
rafaelsideguide c40da77be0 Added implementation for saving docs on supabase
- TODO: remove the comments on `log_job.ts` before deploying to prod
2024-06-26 18:23:28 -03:00
Jeff Pereira 199cbe8bcb add some types 2024-06-25 12:20:25 -07:00
Nicolas b4c6819a54 Nick: 2024-06-05 11:11:09 -07:00
Nicolas f3ec21d9c4 Update runWebScraper.ts 2024-05-13 13:57:22 -07:00
Nicolas bdbee963f7 Merge branch 'main' into nsc/cancel-job 2024-05-07 10:13:43 -07:00
rafaelsideguide e1f52c538f nested includeHtml inside pageOptions 2024-05-07 13:40:24 -03:00
Nicolas 6d5da358cc Nick: cancel job 2024-05-06 17:16:43 -07:00
rafaelsideguide 509250c4ef changed to includeHtml 2024-05-06 19:45:56 -03:00
Nicolas 2aa09a3000 Nick: partial docs working, cleaner 2024-05-04 12:30:12 -07:00
rafaelsideguide 06675d1fe3 almost finished 2024-04-26 11:42:49 -03:00
Nicolas 8939ca570b Merge branch 'main' into nsc/returnOnlyUrls 2024-04-23 18:05:48 -07:00
Nicolas 0db0874b00 Nick: 2024-04-20 19:37:45 -07:00
Nicolas 6aa3cc3ce8 Nick: 2024-04-20 13:53:11 -07:00
Nicolas 1a3aa2999d Nick: return the only list of urls 2024-04-20 11:59:42 -07:00
Nicolas ddf9ff9c9a Nick: 2024-04-20 11:46:06 -07:00
Nicolas 36abe0f7f9 Nick: 2024-04-17 18:24:46 -07:00
Viktor Szépe 34ab21db59 Fix typos 2024-04-17 05:13:27 +00:00
Nicolas a6c2a87811 Initial commit 2024-04-15 17:01:47 -04:00