Commit Graph

3 Commits

Author SHA1 Message Date
Gergő Móricz b03670a8b7 feat: parse PDFs on fc side and reject if too long for timeout (FIR-2083) (#1592)
* feat: pdf-parser, implementation in scrapeURL

* use pdf-parser for page count instead of mu

* fix(pdf-parser): bindings

* feat(scrapeURL/pdf): adjust MILLISECONDS_PER_PAGE

* implement post-runsync polling and fix

* fix(Dockerfile): copy in the pdf-parser source code

* fix(scrapeURL/pdf): better error for timeout below 0
2025-05-23 13:45:53 +02:00
Gergő Móricz fd74299134 feat(scrapeURL, logJob): log pdf page count to db (FIR-2068) (#1587)
* feat(scrapeURL, logJob): log pdf page count to db

* devin stop the test littering pls
2025-05-22 17:26:01 -03:00
devin-ai-integration[bot] 526165e1b9 Add caching for RunPod PDF markdown results in GCS (#1561)
* Add caching for RunPod PDF markdown results in GCS

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* Update PDF caching to hash base64 directly and add metadata

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* Fix PDF caching to directly hash content and fix test expectations

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: thomas@sideguide.dev <thomas@sideguide.dev>
2025-05-16 12:04:38 -03:00