HEX

File: //lib/python3/dist-packages/bs4/builder/__pycache__/_htmlparser.cpython-38.pyc
U

t�^sA�
@s�dZdZdgZddlmZzddlmZWn2ek
r\ZzGdd�de�ZW5dZ[XYnXddl	Z	ddl
Z
e	jdd	�\ZZ
Zed	ko�e
d
ko�ed	kZed	ko�e
d	kZed	ko�e
dkZddlmZmZmZmZmZdd
lmZmZddlmZmZmZdZGdd�de�Z Gdd�de�Z!ed	k�r�e
d
k�r�e�s�ddl"Z"e"�#d�Z$e$e!_$e"�#de"j%�Z&e&e _&ddlm'Z'm(Z(dd�Z)dd�Z*e)e _)e*e _*dZdS)zCUse the HTMLParser library to parse HTML files that aren't too bad.ZMIT�HTMLParserTreeBuilder�)�
HTMLParser)�HTMLParseErrorc@seZdZdS)rN)�__name__�
__module__�__qualname__�rr�9/usr/lib/python3/dist-packages/bs4/builder/_htmlparser.pyrsrN���)�CData�Comment�Declaration�Doctype�ProcessingInstruction)�EntitySubstitution�
UnicodeDammit)�HTML�HTMLTreeBuilder�STRICTzhtml.parserc@steZdZdZdd�Zdd�Zdd�Zdd	d
�Zddd�Zd
d�Z	dd�Z
dd�Zdd�Zdd�Z
dd�Zdd�ZdS)�BeautifulSoupHTMLParserz�A subclass of the Python standard library's HTMLParser class, which
    listens for HTMLParser events and translates them into calls
    to Beautiful Soup's tree construction API.
    cOstj|f|�|�g|_dS)N)r�__init__�already_closed_empty_element)�self�args�kwargsrrr	r=s	z BeautifulSoupHTMLParser.__init__cCst�|�dS)a�In Python 3, HTMLParser subclasses must implement error(), although
        this requirement doesn't appear to be documented.

        In Python 2, HTMLParser implements error() by raising an exception,
        which we don't want to do.

        In any event, this method is called only on very strange
        markup and our best strategy is to pretend it didn't happen
        and keep going.
        N)�warnings�warn)r�msgrrr	�errorIszBeautifulSoupHTMLParser.errorcCs|j||dd�}|�|�dS)z�Handle an incoming empty-element tag.

        This is only called when the markup looks like <tag/>.

        :param name: Name of the tag.
        :param attrs: Dictionary of the tag's attributes.
        F)�handle_empty_elementN)�handle_starttag�
handle_endtag)r�name�attrs�tagrrr	�handle_startendtagVsz*BeautifulSoupHTMLParser.handle_startendtagTcCszi}|D] \}}|dkrd}|||<d}q|��\}}	|jj|dd|||	d�}
|
rv|
jrv|rv|j|dd�|j�|�dS)a3Handle an opening tag, e.g. '<tag>'

        :param name: Name of the tag.
        :param attrs: Dictionary of the tag's attributes.
        :param handle_empty_element: True if this tag is known to be
            an empty-element tag (i.e. there is not expected to be any
            closing tag).
        N�z"")�
sourceline�	sourceposF)�check_already_closed)�getpos�soupr"Zis_empty_elementr#r�append)rr$r%r!Z	attr_dict�key�value�	attrvaluer)r*r&rrr	r"es$
�
z'BeautifulSoupHTMLParser.handle_starttagcCs,|r||jkr|j�|�n|j�|�dS)z�Handle a closing tag, e.g. '</tag>'
        
        :param name: A tag name.
        :param check_already_closed: True if this tag is expected to
           be the closing portion of an empty-element tag,
           e.g. '<tag></tag>'.
        N)r�remover-r#)rr$r+rrr	r#�s	z%BeautifulSoupHTMLParser.handle_endtagcCs|j�|�dS)z4Handle some textual data that shows up between tags.N)r-�handle_data�r�datarrr	r3�sz#BeautifulSoupHTMLParser.handle_datacCs�|�d�rt|�d�d�}n$|�d�r8t|�d�d�}nt|�}d}|dkr�|jjdfD]B}|sbqXzt|g��|�}WqXtk
r�}zW5d}~XYqXXqX|s�zt|�}Wn&t	t
fk
r�}zW5d}~XYnX|p�d}|�|�dS)z�Handle a numeric character reference by converting it to the
        corresponding Unicode character and treating it as textual
        data.

        :param name: Character number, possibly in hexadecimal.
        �x��XN�zwindows-1252u�)�
startswith�int�lstripr-�original_encoding�	bytearray�decode�UnicodeDecodeError�chr�
ValueError�
OverflowErrorr3)rr$Z	real_namer5�encoding�errr	�handle_charref�s*


z&BeautifulSoupHTMLParser.handle_charrefcCs0tj�|�}|dk	r|}nd|}|�|�dS)z�Handle a named entity reference by converting it to the
        corresponding Unicode character and treating it as textual
        data.

        :param name: Name of the entity reference.
        Nz&%s)rZHTML_ENTITY_TO_CHARACTER�getr3)rr$�	characterr5rrr	�handle_entityref�s
z(BeautifulSoupHTMLParser.handle_entityrefcCs&|j��|j�|�|j�t�dS)zOHandle an HTML comment.

        :param data: The text of the comment.
        N)r-�endDatar3rr4rrr	�handle_comment�s
z&BeautifulSoupHTMLParser.handle_commentcCs6|j��|td�d�}|j�|�|j�t�dS)zYHandle a DOCTYPE declaration.

        :param data: The text of the declaration.
        zDOCTYPE N)r-rJ�lenr3rr4rrr	�handle_decl�s
z#BeautifulSoupHTMLParser.handle_declcCsN|���d�r$t}|td�d�}nt}|j��|j�|�|j�|�dS)z{Handle a declaration of unknown type -- probably a CDATA block.

        :param data: The text of the declaration.
        zCDATA[N)�upperr:r
rLrr-rJr3)rr5�clsrrr	�unknown_decl�s
z$BeautifulSoupHTMLParser.unknown_declcCs&|j��|j�|�|j�t�dS)z\Handle a processing instruction.

        :param data: The text of the instruction.
        N)r-rJr3rr4rrr	�	handle_pi�s
z!BeautifulSoupHTMLParser.handle_piN)T)T)rrr�__doc__rr r'r"r#r3rFrIrKrMrPrQrrrr	r7s

(
'	
rcsNeZdZdZdZdZeZeee	gZ
dZd�fdd�	Zddd�Z
d	d
�Z�ZS)
rzpA Beautiful soup `TreeBuilder` that uses the `HTMLParser` parser,
    found in the Python standard library.
    FTNcsLtt|�jf|�|pg}|p i}tr2ts2d|d<tr>d|d<||f|_dS)a�Constructor.

        :param parser_args: Positional arguments to pass into 
            the BeautifulSoupHTMLParser constructor, once it's
            invoked.
        :param parser_kwargs: Keyword arguments to pass into 
            the BeautifulSoupHTMLParser constructor, once it's
            invoked.
        :param kwargs: Keyword arguments for the superclass constructor.
        F�strictZconvert_charrefsN)�superrr�CONSTRUCTOR_TAKES_STRICT� CONSTRUCTOR_STRICT_IS_DEPRECATED�"CONSTRUCTOR_TAKES_CONVERT_CHARREFS�parser_args)rrXZ
parser_kwargsr��	__class__rr	rszHTMLParserTreeBuilder.__init__ccsNt|t�r|dddfVdS||g}t||d|d�}|j|j|j|jfVdS)a�Run any preliminary steps necessary to make incoming markup
        acceptable to the parser.

        :param markup: Some markup -- probably a bytestring.
        :param user_specified_encoding: The user asked to try this encoding.
        :param document_declared_encoding: The markup itself claims to be
            in this encoding.
        :param exclude_encodings: The user asked _not_ to try any of
            these encodings.

        :yield: A series of 4-tuples:
         (markup, encoding, declared encoding,
          has undergone character replacement)

         Each 4-tuple represents a strategy for converting the
         document to Unicode and parsing it. Each strategy will be tried 
         in turn.
        NFT)Zis_html�exclude_encodings)�
isinstance�strr�markupr=Zdeclared_html_encodingZcontains_replacement_characters)rr^Zuser_specified_encodingZdocument_declared_encodingr[Z
try_encodingsZdammitrrr	�prepare_markup)s
��z$HTMLParserTreeBuilder.prepare_markupc
Csr|j\}}t||�}|j|_z|�|�|��Wn4tk
rf}zt�td��|�W5d}~XYnXg|_	dS)z{Run some incoming markup through some parsing process,
        populating the `BeautifulSoup` object in self.soup.
        a*Python's built-in HTMLParser cannot parse the given document. This is not a bug in Beautiful Soup. The best solution is to install an external parser (lxml or html5lib), and use Beautiful Soup with that parser. See http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser for help.N)
rXrr-�feed�closerrr�RuntimeWarningr)rr^rr�parserrErrr	r`Ks


�zHTMLParserTreeBuilder.feed)NN)NNN)rrrrRZis_xmlZ	picklable�
HTMLPARSER�NAMErrZfeaturesZTRACKS_LINE_NUMBERSrr_r`�
__classcell__rrrYr	rs
�
"zQ\s*((?<=[\'"\s])[^\s/>][^\s/=>]*)(\s*=+\s*(\'[^\']*\'|"[^"]*"|(?![\'"])[^>\s]*))?a�
  <[a-zA-Z][-.a-zA-Z0-9:_]*          # tag name
  (?:\s+                             # whitespace before attribute name
    (?:[a-zA-Z_][-.:a-zA-Z0-9_]*     # attribute name
      (?:\s*=\s*                     # value indicator
        (?:'[^']*'                   # LITA-enclosed value
          |\"[^\"]*\"                # LIT-enclosed value
          |[^'\">\s]+                # bare value
         )
       )?
     )
   )*
  \s*                                # trailing whitespace
)�tagfind�attrfindcCs4d|_|�|�}|dkr|S|j}|||�|_g}t�||d�}|sPtd��|��}||d|���|_}||k�rN|j	r�t
�||�}nt�||�}|s��qN|�ddd�\}	}
}|
s�d}n`|dd�dkr�|dd�k�sn|dd�dk�r|dd�k�r"nn|dd�}|�r2|�
|�}|�|	��|f�|��}qr|||���}|d	k�r�|��\}
}d
|jk�r�|
|j�d
�}
t|j�|j�d
�}n|t|j�}|j	�r�|�d|||�dd�f�|�|||��|S|�d
��r|�||�n"|�||�||jk�r0|�|�|S)Nr�z#unexpected call to parse_starttag()rr
�'����")�>�/>�
z junk characters in start tag: %r�rn)Z__starttag_textZcheck_for_whole_start_tag�rawdatarg�match�AssertionError�end�lowerZlasttagrSrh�attrfind_tolerant�groupZunescaper.�stripr,�countrL�rfindr r3�endswithr'r"ZCDATA_CONTENT_ELEMENTS�set_cdata_mode)r�i�endposrqr%rr�kr&�m�attrname�restr1rt�lineno�offsetrrr	�parse_starttagysh

(
�

�



��
r�cCs$|��|_t�d|jtj�|_dS)Nz</\s*%s\s*>)ruZ
cdata_elem�re�compile�IZinteresting)r�elemrrr	r|�s
r|T)+rRZ__license__�__all__Zhtml.parserrr�ImportErrorrE�	Exception�sysr�version_info�major�minor�releaserUrVrWZbs4.elementr
rrrrZ
bs4.dammitrrZbs4.builderrrrrdrrr�r�rv�VERBOSEZlocatestarttagendrgrhr�r|rrrr	�<module>sJ�"	RX�
�7