Beautiful Soup (HTML parser)
Beautiful Soup is a Python package for parsing HTML and XML documents. It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.
It is available for Python 2.7 and Python 3.
Code example
- !/usr/bin/env python3
- Anchor extraction from HTML document
from urllib.request import urlopen
with urlopen as response:
soup = BeautifulSoup
for anchor in soup.find_all:
Advantages and Disadvantages
This table summarizes the advantages and disadvantages of each parser libraryParser | Typical usage | Advantages | Disadvantages |
Python’s html.parser | BeautifulSoup |
| |
lxml’s HTML parser | BeautifulSoup | ||
lxml’s XML parser | BeautifulSoup BeautifulSoup |
| |
html5lib | BeautifulSoup |
Release
Beautiful Soup 3 was the official release line of Beautiful Soup from May 2006 to March 2012. The current release is . You can install Beautiful Soup 4 withpip install beautifulsoup4
.