Python: Parsing HTML with Beautiful soup

Beautiful Soup is probably the most popular Python library to parse HTML files.

Here is an example for when we have < and/or > as part of the HTML attributes.

examples/python/beautiful_soup_example.py

from bs4 import BeautifulSoup
# BeautifulSoup4-4.10.0 soupsieve-2.2.1
# html5lib-1.1

for html in [
    '<a if="{something.length > 0}">remove</a>'
    ]:
    for parser in ["lxml", "html5lib", "html.parser"]:
        soup = BeautifulSoup(html, parser)
        for formatter in [None, "minimal", "html"]:
            prettyHTML = soup.prettify(formatter=formatter)
            print(prettyHTML)

Written by
Gabor Szabo

Published on 2021-09-29

If you have any comments or questions, feel free to post them on the source of this page in GitHub. Source on GitHub. Comment on this post

Author: Gabor Szabo

Gábor who writes the articles of the Code Maven site offers courses in in the subjects that are discussed on this web site.

Gábor helps companies set up test automation, CI/CD Continuous Integration and Continuous Delivery and other DevOps related systems. Gabor can help your team improve the development speed and reduce the risk of bugs.

He is also the author of a number of eBooks.

Contact Gabor if you'd like to hire his services.

If you would like to support his freely available work, you can do it via Patreon, GitHub, or PayPal.