HEX
Server: Apache
System: Linux srv1.prosuiteplus.com 5.4.0-216-generic #236-Ubuntu SMP Fri Apr 11 19:53:21 UTC 2025 x86_64
User: prosuiteplus (1001)
PHP: 8.3.20
Disabled: NONE
Upload Files
File: //usr/lib/python3/dist-packages/ocrmypdf/__pycache__/_validation.cpython-38.pyc
U

��Z^p>�@sZddlZddlZddlZddlZddlmZddlmZddlZddl	m
Z
ddlmZm
Z
mZmZddlmZmZmZmZmZmZmZddlmZmZmZmZed	d
ddd
g�Zd	Ze� e!�Z"e
�dd�Z#dd�Z$dd�Z%dd�Z&dd�Z'dd�Z(dd�Z)dd�Z*dd�Z+d d!�Z,d"d#�Z-d$d%�Z.d&d'�Z/d(d)�Z0d*d+�Z1d,d-�Z2d.d/�Z3d0d1�Z4dS)2�N)�Path)�copyfileobj�)�verify_python3_env)�BadArgsError�InputFileError�MissingDependencyError�OutputFileAccessError)�check_external_program�ghostscript�jbig2enc�pngquant�qpdf�	tesseract�unpaper)�is_file_writable�is_iterable_notstr�	monotonic�safe_symlinkZengZdeuZspaZitaZporcCs"tjdkrtjdkrt�d�dS)N�ntlzmYou are running OCRmyPDF in a 32-bit (x86) Python interpreter.Please use a 64-bit (x86-64) version of Python.)�os�name�sys�maxsize�log�error�rr�6/usr/lib/python3/dist-packages/ocrmypdf/_validation.py�check_platform=s�rcCs�|js4tg|_t��d}|r4|�d�s4t�dt�d|jdkrT|jd�d�|_t|j�}|�	t
���s�d}|t
��D]}||d7}q|t|��dS)NrZenz-No language specified; assuming --language %s�+zgThe installed version of tesseract does not have language data for the following requested languages: 
�
)
�language�DEFAULT_LANGUAGE�localeZ	getlocale�
startswithr�debug�split�set�issubsetr�	languagesr)�optionsZsystem_langr)�msgZlangrrr�check_options_languagesFs
�r,cCs
t|j�}|�t�}|jdkr0|s0d}t�|�t��dkrj|j	dkrj|sjd}|dt����7}t�|�|jdkrzd|_|jdkr�t
�|j|�s�t
d	��|j	d
kr�d|_	|j	dkr�t��d
kr�t
d��d}t|j|j|j|jf�s�d}||_|j�s|j�rtd��dS)NZhocrz�The 'hocr' PDF renderer is known to cause problems with one or more of the languages in your document.  Use --pdf-renderer auto (the default) to avoid this issue.z9.20Zpdfz�The installed version of Ghostscript does not work correctly with the OCR languages you specified. Use --output-type pdf or upgrade to Ghostscript 9.20 or later to avoid this issue.zFound Ghostscript �autoZsandwichz�You are using an alpha version of Tesseract 4.0 that does not support the textonly_pdf parameter. We don't support versions this old.�pdfazpdfa-2zpdfa-3z9.19z7--output-type pdfa-3 requires Ghostscript 9.19 or laterFTz\--redo-ocr is not currently compatible with --deskew, --clean-final, and --remove-background)r'r!r(�
HOCR_OK_LANGSZpdf_rendererr�warningr�version�output_typerZhas_textonly_pdf�
tesseract_envr�any�deskew�clean_final�	force_ocr�remove_background�lossless_reconstruction�redo_ocrr)r*r)Zis_latinr+r9rrr�check_options_output\sP

�
�

��
����r;cCs,|jdkr(|jdkrtd��|jd|_dS)N��-z@--sidecar filename must be specified when output file is stdout.z.txt)�sidecar�output_filer�r*rrr�check_options_sidecar�s

�rAc
Cs�|jrd|_|jr |js td��|jr�tddtjddgd�z|jrRt�|j�|_Wn.tk
r�}ztt	|���W5d}~XYnXdS)NTz&--clean is required for --unpaper-argsrz6.1z--clean, --clean-final��program�package�version_checker�need_version�required_for)
r6ZcleanZunpaper_argsrr
rr1Zvalidate_custom_args�	Exception�str)r*�errr�check_options_preprocessing�s&��
rKc	Cs�t|�rt|�Sg}|�dd��d�}|D]�}|s4q*z|�d�\}}Wn&tk
rl|�t|�d�Yq*Xz |�tt|�dt|���Wq*tk
r�t	d��Yq*Xq*t
|�s�t�d�t
dd	�|D��r�t	d
��t�d|�t|�S)N� ��,r=rzinvalid page rangezQList of pages to process contains duplicate pages, or pages that are out of ordercss|]}|dkVqdS)rNr)�.0�pagerrr�	<genexpr>�sz%_pages_from_ranges.<locals>.<genexpr>z)pages refers to a page number less than 1zOCRing only these pages: %s)rr'�replacer&�
ValueError�append�int�extend�rangerrrr0r4r%)Zranges�pagesZpage_groups�g�start�endrrr�_pages_from_ranges�s. �r\cCsXtdd�|j|j|jfD��}|dkr.td��|jrB|jrBtd��|jrTt|j�|_dS)NcSsg|]}|rdnd�qS)rrr)rOZoptrrr�
<listcomp>�s�z.check_options_ocr_behavior.<locals>.<listcomp>�z8Choose only one of --force-ocr, --skip-text, --redo-ocr.z,--pages and --sidecar are mutually exclusive)�sumr7Z	skip_textr:rrXr>r\)r*Zexclusive_optionsrrr�check_options_ocr_behavior�s��r`cCst|jdkrtddtjddd�|jdkrHtddtjdd	|js@d
ndd�|jd
krpt|j|j|jg�rpt	�
d�dS)Nr^r
z2.0.1z--optimize {2,3}rB�jbig2rz0.28z --optimize {2,3} | --jbig2-lossyTF)rCrDrErFrGZrecommendedrzdThe arguments --jbig2-lossy, --png-quality, and --jpeg-quality will be ignored because --optimize=0.)�optimizer
r
r1rZjbig2_lossyr4Zpng_qualityZjpeg_qualityrr0r@rrr�check_options_optimizing�s.
�
�	��rccCsF|jdkr |j�d�r t�d�t�|j�sB|js8|j	rBt�d�dS)Nr-r.zg--pdfa-image-compression argument has no effect when --output-type is not 'pdfa', 'pdfa-1', or 'pdfa-2'zZTesseract 4.0 ignores --user-words and --user-patterns, so these arguments have no effect.)
Zpdfa_image_compressionr2r$rr0rZhas_user_wordsr3Z
user_wordsZ
user_patternsr@rrr�check_options_advanced	s�����rdc	Cs|ddl}|j|j|j|jg}dd�|D�D]L}|D]B}|�|�dksPt|�dkr2td�|t	t|��dd��
����q2q*dS)Nrcss|]}|r|VqdS�Nr)rO�mrrrrQsz)check_options_metadata.<locals>.<genexpr>ZCoizROne of the metadata strings contains an unsupported Unicode character: '{}' (U+{})r^)�unicodedata�titleZauthor�keywordsZsubject�category�ordrS�format�hex�upper)r*rgZdocinfo�s�crrr�check_options_metadatas��rqcCs*t|jd�tj_tjjdkr&dtj_dS)Ni@Br)rUZmax_image_mpixels�PILZImageZMAX_IMAGE_PIXELSr@rrr�check_options_pillow)srscCsZt�t|�t|�t|�t|�t|�t|�t|�t|�t	|�t
|�dSre)rr,rqr;rArKr`rcrdrs�check_dependency_versionsr@rrr�
check_options/srucCs�tjdd�dkrdStjdkr.ttjd�t_tjdkr^|jdkrPt�	d�d	Sttjd
�t_tj
dkr�|jdkr�t�	d�d	Sttjd�t_
dS)a�Work around Python issue with multiprocessing forking on closed streams

    https://bugs.python.org/issue28326

    Attempting to a fork/exec a new Python process when any of std{in,out,err}
    are closed or not flushable for some reason may raise an exception.
    Fix this by opening devnull if the handle seems to be closed.  Do this
    globally to avoid tracking places all places that fork.

    Seems to be specific to multiprocessing.Process not all Python process
    forkers.

    The error actually occurs when the stream object is not flushable,
    but replacing an open stream object that is not flushable with
    /dev/null is a bad idea since it will create a silent failure.  Replacing
    a closed handle with /dev/null seems safe.

    r�)rv��TN�wr=z0Trying to read from stdin but stdin seems closedF�rzyOutput was set to stdout '-' but the stream attached to stdout does not support the flush() system call.  This will fail.)r�version_info�stderr�openr�devnull�stdin�
input_filerr�stdoutr?r@rrr�check_closed_streams=s"





�r�c
Csnddddd�}g}t|�D]8\}}|jp*d}|dkr|�d�|d|�|d	���q|rjt�d
d�|��dS)N�nrJrory)r�Z�irz{0}{1}rrMzPage orientations detected: %srL)�	enumerateZrotationrTrl�getr�info�join)Zpdfinfo�	directionZorientationsr�rPZanglerrr�log_page_orientationsms
 r�c	Cs�|jdkrNt�d�tj�|d�}t|d��}ttj	j
|�W5QRX|dfSz,tj�|d�}t|j|�|t�|j�fWSt
k
r�td|j����YnXdS)Nr=z reading file from standard inputr�wbz<stdin>�originzFile not found - )r�rr�r�pathr�r}rrr�bufferr�fspath�FileNotFoundErrorr)r*Zwork_folder�targetZ
stream_bufferrrr�create_input_filexs

r�cCs>|jdkrtj��r:td��nt|j�s:td|j�d���dS)Nr=ztOutput was set to stdout '-' but it looks like stdout is connected to a terminal.  Please redirect stdout to a file.zOutput file location (z) is not a writable file.)r?rr��isattyrrr	r@rrr�check_requested_output_file�s

�
�r�c
Cs z t|���j}t|���j}Wntk
r6YdSX||}|dksP|dkrTdSg}dddddh}|D]*}t||d�rj|�d	|�d
d��d��qj|jd
kr�|�d�n:t�	�t
�	�d�}	|	��D]\}
}|s�|�d|
�d��q�|�rdd�|�d}nd}t
�d|d�d|���dS)Ng�������?i�ar5r6r8Z
oversampler7FzThe argument --�_r=z! was issued, causing transcoding.rzOptimization was disabled.)rar
zThe optional dependency 'zD' was not found, so some image optimizations could not be attempted.z#Possible reasons for this include:
r z@No reason for this increase is known.  Please report this issue.zThe output file size is z.2fu× larger than the input file.
)r�stat�st_sizer��getattrrTrRrbr�	availabler
�itemsr�rr0)
r*r�r?Zoutput_sizeZ
input_sizeZratioZreasonsZ
image_preproc�argZimage_optimizersrr�Zexplanationrrr�report_output_file_size�sH��
�
��r�cCsRtddditjdd�tddtjdd�t��d	kr<td
��tddtjdd�dS)
NrZlinuxz
tesseract-ocrz4.0.0)rCrDrErFZgsrz9.15z9.24zGhostscript 9.24 contains serious regressions and is not supported. Please upgrade to Ghostscript 9.25 or use an older version.rz8.0.2)r
rr1rrrr@rrrrt�s,����rt)5r#Zloggingrr�pathlibrZshutilrrrZ_unicodefunr�
exceptionsrrrr	�execr
rrr
rrrZhelpersrrrr�	frozensetr/r"Z	getLogger�__name__rrr,r;rArKr\r`rcrdrqrsrur�r�r�r�r�rtrrrr�<module>s@$	
	B	01