Update matching logic: AI scores all candidates, lower threshold, absolute amount, prompt improvements

This commit is contained in:
Iyeoluwa Akinrinola
2025-07-02 16:38:01 +01:00
commit a519c42866
10641 changed files with 3944174 additions and 0 deletions
@@ -0,0 +1 @@
pip
@@ -0,0 +1,29 @@
Copyright (c) 2006-2008, Mathieu Fenniak
Some contributions copyright (c) 2007, Ashish Kulkarni <kulkarni.ashish@gmail.com>
Some contributions copyright (c) 2014, Steve Witham <switham_github@mac-guyver.com>
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* The name of the author may not be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
@@ -0,0 +1,164 @@
Metadata-Version: 2.1
Name: PyPDF2
Version: 3.0.1
Summary: A pure-python PDF library capable of splitting, merging, cropping, and transforming PDF files
Author-email: Mathieu Fenniak <biziqe@mathieu.fenniak.net>
Maintainer-email: Martin Thoma <info@martin-thoma.de>
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Operating System :: OS Independent
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Dist: typing_extensions >= 3.10.0.0; python_version < '3.10'
Requires-Dist: dataclasses; python_version < '3.7'
Requires-Dist: PyCryptodome ; extra == "crypto"
Requires-Dist: black ; extra == "dev"
Requires-Dist: pip-tools ; extra == "dev"
Requires-Dist: pre-commit<2.18.0 ; extra == "dev"
Requires-Dist: pytest-cov ; extra == "dev"
Requires-Dist: flit ; extra == "dev"
Requires-Dist: wheel ; extra == "dev"
Requires-Dist: sphinx ; extra == "docs"
Requires-Dist: sphinx_rtd_theme ; extra == "docs"
Requires-Dist: myst_parser ; extra == "docs"
Requires-Dist: PyCryptodome ; extra == "full"
Requires-Dist: Pillow ; extra == "full"
Requires-Dist: Pillow ; extra == "image"
Project-URL: Bug Reports, https://github.com/py-pdf/PyPDF2/issues
Project-URL: Changelog, https://pypdf2.readthedocs.io/en/latest/meta/CHANGELOG.html
Project-URL: Documentation, https://pypdf2.readthedocs.io/en/latest/
Project-URL: Source, https://github.com/py-pdf/PyPDF2
Provides-Extra: crypto
Provides-Extra: dev
Provides-Extra: docs
Provides-Extra: full
Provides-Extra: image
[![PyPI version](https://badge.fury.io/py/PyPDF2.svg)](https://badge.fury.io/py/PyPDF2)
[![Python Support](https://img.shields.io/pypi/pyversions/PyPDF2.svg)](https://pypi.org/project/PyPDF2/)
[![](https://img.shields.io/badge/-documentation-green)](https://pypdf2.readthedocs.io/en/stable/)
[![GitHub last commit](https://img.shields.io/github/last-commit/py-pdf/PyPDF2)](https://github.com/py-pdf/PyPDF2)
[![codecov](https://codecov.io/gh/py-pdf/PyPDF2/branch/main/graph/badge.svg?token=id42cGNZ5Z)](https://codecov.io/gh/py-pdf/PyPDF2)
# **NOTE**: The PyPDF2 project is going back to its roots. PyPDF2==3.0.X will be the last version of PyPDF2. Development will continue with [`pypdf==3.1.0`](https://pypi.org/project/pyPdf/).
# PyPDF2
PyPDF2 is a free and open-source pure-python PDF library capable of splitting,
[merging](https://pypdf2.readthedocs.io/en/stable/user/merging-pdfs.html),
[cropping, and transforming](https://pypdf2.readthedocs.io/en/stable/user/cropping-and-transforming.html)
the pages of PDF files. It can also add
custom data, viewing options, and
[passwords](https://pypdf2.readthedocs.io/en/stable/user/encryption-decryption.html)
to PDF files. PyPDF2 can
[retrieve text](https://pypdf2.readthedocs.io/en/stable/user/extract-text.html)
and
[metadata](https://pypdf2.readthedocs.io/en/stable/user/metadata.html)
from PDFs as well.
## Installation
You can install PyPDF2 via pip:
```
pip install PyPDF2
```
If you plan to use PyPDF2 for encrypting or decrypting PDFs that use AES, you
will need to install some extra dependencies. Encryption using RC4 is supported
using the regular installation.
```
pip install PyPDF2[crypto]
```
## Usage
```python
from PyPDF2 import PdfReader
reader = PdfReader("example.pdf")
number_of_pages = len(reader.pages)
page = reader.pages[0]
text = page.extract_text()
```
PyPDF2 can do a lot more, e.g. splitting, merging, reading and creating
annotations, decrypting and encrypting, and more.
Please see [the documentation](https://pypdf2.readthedocs.io/en/stable/)
for more usage examples!
A lot of questions are asked and answered
[on StackOverflow](https://stackoverflow.com/questions/tagged/pypdf2).
## Contributions
Maintaining PyPDF2 is a collaborative effort. You can support PyPDF2 by writing
documentation, helping to narrow down issues, and adding code.
### Q&A
The experience PyPDF2 users have covers the whole range from beginners who
want to make their live easier to experts who developed software before PDF
existed. You can contribute to the PyPDF2 community by answering questions
on [StackOverflow](https://stackoverflow.com/questions/tagged/pypdf2),
helping in [discussions](https://github.com/py-pdf/PyPDF2/discussions),
and asking users who report issues for [MCVE](https://stackoverflow.com/help/minimal-reproducible-example)'s (Code + example PDF!).
### Issues
A good bug ticket includes a MCVE - a minimal complete verifiable example.
For PyPDF2, this means that you must upload a PDF that causes the bug to occur
as well as the code you're executing with all of the output. Use
`print(PyPDF2.__version__)` to tell us which version you're using.
### Code
All code contributions are welcome, but smaller ones have a better chance to
get included in a timely manner. Adding unit tests for new features or test
cases for bugs you've fixed help us to ensure that the Pull Request (PR) is fine.
PyPDF2 includes a test suite which can be executed with `pytest`:
```bash
$ pytest
===================== test session starts =====================
platform linux -- Python 3.6.15, pytest-7.0.1, pluggy-1.0.0
rootdir: /home/moose/GitHub/Martin/PyPDF2
plugins: cov-3.0.0
collected 233 items
tests/test_basic_features.py .. [ 0%]
tests/test_constants.py . [ 1%]
tests/test_filters.py .................x..... [ 11%]
tests/test_generic.py ................................. [ 25%]
............. [ 30%]
tests/test_javascript.py .. [ 31%]
tests/test_merger.py . [ 32%]
tests/test_page.py ......................... [ 42%]
tests/test_pagerange.py ................ [ 49%]
tests/test_papersizes.py .................. [ 57%]
tests/test_reader.py .................................. [ 72%]
............... [ 78%]
tests/test_utils.py .................... [ 87%]
tests/test_workflows.py .......... [ 91%]
tests/test_writer.py ................. [ 98%]
tests/test_xmp.py ... [100%]
========== 232 passed, 1 xfailed, 1 warning in 4.52s ==========
```
@@ -0,0 +1,71 @@
PyPDF2/__init__.py,sha256=L8aP6Tz9KflekpLy0IiO6OfpK1Y6vszAr9jRj2Z4co0,1338
PyPDF2/__pycache__/__init__.cpython-311.pyc,,
PyPDF2/__pycache__/_cmap.cpython-311.pyc,,
PyPDF2/__pycache__/_encryption.cpython-311.pyc,,
PyPDF2/__pycache__/_merger.cpython-311.pyc,,
PyPDF2/__pycache__/_page.cpython-311.pyc,,
PyPDF2/__pycache__/_protocols.cpython-311.pyc,,
PyPDF2/__pycache__/_reader.cpython-311.pyc,,
PyPDF2/__pycache__/_security.cpython-311.pyc,,
PyPDF2/__pycache__/_utils.cpython-311.pyc,,
PyPDF2/__pycache__/_version.cpython-311.pyc,,
PyPDF2/__pycache__/_writer.cpython-311.pyc,,
PyPDF2/__pycache__/constants.cpython-311.pyc,,
PyPDF2/__pycache__/errors.cpython-311.pyc,,
PyPDF2/__pycache__/filters.cpython-311.pyc,,
PyPDF2/__pycache__/pagerange.cpython-311.pyc,,
PyPDF2/__pycache__/papersizes.cpython-311.pyc,,
PyPDF2/__pycache__/types.cpython-311.pyc,,
PyPDF2/__pycache__/xmp.cpython-311.pyc,,
PyPDF2/_cmap.py,sha256=nwGfthg7CJF7CXVWTa5_BGMhGloJugV3Tn6hd81m6u4,14645
PyPDF2/_codecs/__init__.py,sha256=y4x5s4q00SlzSjNkDDqaI38uDO0L5IpyXyOKgtFgZ1E,1720
PyPDF2/_codecs/__pycache__/__init__.cpython-311.pyc,,
PyPDF2/_codecs/__pycache__/adobe_glyphs.cpython-311.pyc,,
PyPDF2/_codecs/__pycache__/pdfdoc.cpython-311.pyc,,
PyPDF2/_codecs/__pycache__/std.cpython-311.pyc,,
PyPDF2/_codecs/__pycache__/symbol.cpython-311.pyc,,
PyPDF2/_codecs/__pycache__/zapfding.cpython-311.pyc,,
PyPDF2/_codecs/adobe_glyphs.py,sha256=aMXhp5va7TgNyHEmnS9ZqA3G-h8Te8Ew7p7kCsgKJLY,431492
PyPDF2/_codecs/pdfdoc.py,sha256=xfSvMFYsvxuaSQ0Uu9vZDKaB0Wu85h1uCiB1i9rAcUU,4269
PyPDF2/_codecs/std.py,sha256=DyQMuEpAGEpS9uy1jWf4cnj-kqShPOAij5sI7Q1YD8E,2630
PyPDF2/_codecs/symbol.py,sha256=nIaGQIlhWCJiPMHrwUlmGHH-_fOXyEKvguRmuKXcGAk,3734
PyPDF2/_codecs/zapfding.py,sha256=PQxjxRC616d41xF3exVxP1W8nM4QrZfjO3lmtLxpE_s,3742
PyPDF2/_encryption.py,sha256=KaaIKpGzG921muzE-FkQB2SnKuDhoioFkDL7t9yvPOw,38979
PyPDF2/_merger.py,sha256=-hskprroJsRC9gvUmNcGzx34Qu_m5qOm3UytZCIYwYg,30464
PyPDF2/_page.py,sha256=su41KemcbUSE6oWYd1HYvWHEkuzuBKHh6BxnlJcshX4,82079
PyPDF2/_protocols.py,sha256=7Y-5QbYVRBrWJmWv536jgX_XUPODrh25IKXaeY9tuEI,1486
PyPDF2/_reader.py,sha256=i53XyAVa5N9UYDAkny6NdfMvrsVawJabRhUA3pN5oVI,77206
PyPDF2/_security.py,sha256=rwJUT1_W46c1pQshRH22Q5P5Gb_3_LorMvV9xnXNST0,10628
PyPDF2/_utils.py,sha256=93acqHLpvbBuIo57Ot_OHQ5MeAt-1HABWwijjbPA7RI,14252
PyPDF2/_version.py,sha256=E3P6AbnCwaWk6ndR1zNqlOTVebX9z5rv9voltc71dos,22
PyPDF2/_writer.py,sha256=Sx7Ctf8pIBDVgxlKjyZuDfCw9lvSXCBf7a4WF-JFC88,106551
PyPDF2/constants.py,sha256=2O1gjddmSZGv8XMpUMKI09MBL0gIBBkxZBNFx3DXIFY,13154
PyPDF2/errors.py,sha256=BZ7z1dFjppXNvWilX8BoVCEGFxCapXas05JWs0XsNrc,782
PyPDF2/filters.py,sha256=d0h5rpehpBZh2i1yqQSJGchDUTKuzUvjVGfm2hmTzfk,24364
PyPDF2/generic/__init__.py,sha256=YXnX-pSPDwSUPv6CQSG76Bxhq3Q-B3kpXa3bE_BxnWU,4413
PyPDF2/generic/__pycache__/__init__.cpython-311.pyc,,
PyPDF2/generic/__pycache__/_annotations.cpython-311.pyc,,
PyPDF2/generic/__pycache__/_base.cpython-311.pyc,,
PyPDF2/generic/__pycache__/_data_structures.cpython-311.pyc,,
PyPDF2/generic/__pycache__/_fit.cpython-311.pyc,,
PyPDF2/generic/__pycache__/_outline.cpython-311.pyc,,
PyPDF2/generic/__pycache__/_rectangle.cpython-311.pyc,,
PyPDF2/generic/__pycache__/_utils.cpython-311.pyc,,
PyPDF2/generic/_annotations.py,sha256=bjwsoFWxQWTmkzLO8gwekHDMbH8xlOB08RLm96E9jJI,10065
PyPDF2/generic/_base.py,sha256=fdQOICdpzzPlQ2h_fBWJ631dTvYCqm1qPRLQyc9el8o,23986
PyPDF2/generic/_data_structures.py,sha256=cENL4Z3FFeaLccTywz6sIBXDdeMdny5v8GRcyvK5M_o,51408
PyPDF2/generic/_fit.py,sha256=sUoDxD_y_Jz5y6V7-IcCRHg961096oXqFXt1MtOGE5w,4894
PyPDF2/generic/_outline.py,sha256=7d2eaAqoPRSh9RVZxbPJNwwnauyA0MKh0cGMxln9NTc,1201
PyPDF2/generic/_rectangle.py,sha256=yyGkaXj7S2ShBAUR23OJkXDf1zTR2QA-207Vcuy34bg,9439
PyPDF2/generic/_utils.py,sha256=NMEwhDNbUHSjIJ39OTtiMNtMWwm0vDFbs6NTGw12DS8,6272
PyPDF2/pagerange.py,sha256=PuVME9JOTFN6UbegjU_a0Q2Tvi2Pvgaw84yhT1ik5Eo,6415
PyPDF2/papersizes.py,sha256=p82oLUyKE4dyCkID5NmsnNX2N-1d8u64qYmcCJ-G2nM,1369
PyPDF2/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
PyPDF2/types.py,sha256=aZt-WmZ3QQNVSI-R1_GbFy0rApINHqsV-fyUwmyJ0SU,1676
PyPDF2/xmp.py,sha256=CCN838ewDipjDgBLtj256a4ZPgSJxMyD9lOOQdEbhD0,18354
pypdf2-3.0.1.dist-info/INSTALLER,sha256=zuuue4knoyJ-UwPPXg8fezS7VCrXJQrAP7zeNuwvFQg,4
pypdf2-3.0.1.dist-info/LICENSE,sha256=qXrCMOXzPvEKU2eoUOsB-R8aCwZONHQsd5TSKUVX9SQ,1605
pypdf2-3.0.1.dist-info/METADATA,sha256=TqZRd2BwsQotikOd2PwuBpwcdD7F8ukmMNkndi0ePDM,6805
pypdf2-3.0.1.dist-info/RECORD,,
pypdf2-3.0.1.dist-info/REQUESTED,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
pypdf2-3.0.1.dist-info/WHEEL,sha256=rSgq_JpHF9fHR1lx53qwg_1-2LypZE_qmcuXbVUq948,81
@@ -0,0 +1,4 @@
Wheel-Version: 1.0
Generator: flit 3.8.0
Root-Is-Purelib: true
Tag: py3-none-any