Commit Graph

9 Commits

Author SHA1 Message Date
Nicolas 498558d358 Nick: formatting done 2025-01-22 18:47:44 -03:00
Nicolas 5e5b5ee0e2 (feat/extract) New re-ranker + multi entity extraction (#1061)
* agent that decides if splits schema or not

* split and merge properties done

* wip

* wip

* changes

* ch

* array merge working!

* comment

* wip

* dereferentiate schema

* dereference schemas

* Nick: new re-ranker

* Create llm-links.txt

* Nick: format

* Update extraction-service.ts

* wip: cooking schema mix and spread functions

* wip

* wip getting there!!!

* nick:

* moved functions to helpers

* nick:

* cant reproduce the error anymore

* error handling all scrapes failed

* fix

* Nick: added the sitemap index

* Update sitemap-index.ts

* Update map.ts

* deduplicate and merge arrays

* added error handler for object transformations

* Update url-processor.ts

* Nick:

* Nick: fixes

* Nick: big improvements to rerank of multi-entity

* Nick: working

* Update reranker.ts

* fixed transformations for nested objs

* fix merge nulls

* Nick: fixed error piping

* Update queue-worker.ts

* Update extraction-service.ts

* Nick: format

* Update queue-worker.ts

* Update pnpm-lock.yaml

* Update queue-worker.ts

---------

Co-authored-by: rafaelmmiller <150964962+rafaelsideguide@users.noreply.github.com>
Co-authored-by: Thomas Kosmas <thomas510111@gmail.com>
2025-01-13 22:30:15 -03:00
Nicolas 8a1c404918 Nick: revert trailing comma 2024-12-11 19:51:08 -03:00
Nicolas 00335e2ba9 Nick: fixed prettier 2024-12-11 19:46:11 -03:00
rafaelsideguide c5e1d77a82 added invalid html tests 2024-09-03 15:21:45 -03:00
rafaelsideguide 48056ea1bd feat: added go html to md parser 2024-09-02 14:15:56 -03:00
Nicolas ca34f1203b Nick: bucket limit increase 2024-08-27 17:03:46 -03:00
Nicolas c009013ff6 Nick: expire tests 2024-08-27 15:26:43 -03:00
Nicolas 0ea0a5db46 Nick: wip 2024-08-21 20:54:39 -03:00