Python: Parsing HTML with Beautiful soup
Beautiful Soup is probably the most popular Python library to parse HTML files.
Here is an example for when we have < and/or > as part of the HTML attributes.
examples/python/beautiful_soup_example.py
from bs4 import BeautifulSoup # BeautifulSoup4-4.10.0 soupsieve-2.2.1 # html5lib-1.1 for html in [ '<a if="{something.length > 0}">remove</a>' ]: for parser in ["lxml", "html5lib", "html.parser"]: soup = BeautifulSoup(html, parser) for formatter in [None, "minimal", "html"]: prettyHTML = soup.prettify(formatter=formatter) print(prettyHTML)
Published on 2021-09-29
If you have any comments or questions, feel free to post them on the source of this page in GitHub. Source on GitHub.
Comment on this post