HEX
Server: Apache
System: Linux srv1.prosuiteplus.com 5.4.0-216-generic #236-Ubuntu SMP Fri Apr 11 19:53:21 UTC 2025 x86_64
User: prosuiteplus (1001)
PHP: 8.3.20
Disabled: NONE
Upload Files
File: //lib/python3/dist-packages/ocrmypdf/__pycache__/quality.cpython-38.pyc
U

��Z^V�@s&ddlZddlmZGdd�d�ZdS)�N)�Iterablec@s8eZdZdZgd�eed�dd�Zeed�dd�ZdS)	�OcrQualityDictionaryz3Manages a dictionary for simple OCR quality checks.)�wordlistcCs"t�|_|j�dd�|D��dS)z�Construct a dictionary from a list of words.

        Words for which capitalization is important should be capitalized in the
        dictionary. Words that contain spaces or other punctuation will never match.
        css|]
}|VqdS)N���.0�wrr�2/usr/lib/python3/dist-packages/ocrmypdf/quality.py�	<genexpr>"sz0OcrQualityDictionary.__init__.<locals>.<genexpr>N)�set�
dictionary�update)�selfrrrr	�__init__szOcrQualityDictionary.__init__)�ocr_text�returncCs�t�dd|�}t�dd|�}t�d|�}dd�|D�}d}|D]0}||jksf||��kr>|��|jkr>|d7}q>|dkr�|t|�}nd	}|S)
aCheck how many unique words in the OCR text match a dictionary.

        Words with mixed capitalized are only considered a match if the test word
        matches that capitalization.

        Returns:
            number of words that match / number
        z[0-9_]+� z\W+z\s+cSsh|]}t|�dkr|�qS)�)�lenrrrr	�	<setcomp>0sz=OcrQualityDictionary.measure_words_matched.<locals>.<setcomp>r�g)�re�sub�splitr�lowerr)rr�textZtext_words_listZ
text_wordsZmatchesrZ	hit_ratiorrr	�measure_words_matched$s 	

��
z*OcrQualityDictionary.measure_words_matchedN)	�__name__�
__module__�__qualname__�__doc__r�strr�floatrrrrr	rs	r)r�typingrrrrrr	�<module>s