rafaelsideguide
bb859ae9a7
Added metadata.pageStatusCode and metadata.pageError properties to the responses
2024-06-13 17:08:40 -03:00
Nicolas
cbf8d79cce
Update pdfProcessor.ts
2024-06-04 00:13:37 -07:00
Nicolas
5be208f595
Nick: fixed
2024-05-17 10:40:44 -07:00
rafaelsideguide
8eb2e95f19
Cleaned up
2024-05-13 16:13:10 -03:00
rafaelsideguide
f4348024c6
Added check during scraping to deal with pdfs
...
Checks if the URL is a PDF during the scraping process (single_url.ts).
TODO: Run integration tests - Does this strat affect the running time?
ps. Some comments need to be removed if we decide to proceed with this strategy.
2024-05-13 09:13:42 -03:00
rafaelsideguide
f8b207793f
changed the request to do a HEAD to check for a PDF instead
2024-04-29 15:15:32 -03:00
Nicolas
c5cb268b61
Update pdfProcessor.ts
2024-04-19 13:13:42 -07:00
Nicolas
43cfcec326
Nick: disabling in crawl and sitemap for now
2024-04-19 13:12:08 -07:00
Nicolas
140529c609
Nick: fixes pdfs not found
2024-04-19 13:05:21 -07:00
rafaelsideguide
57e5b36014
[Feat] Adding pdf parser
2024-04-18 11:43:57 -03:00