Last Updated on March 27, 2023 by mishou
I. Beautiful Soup
Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.
https://github.com/wention/BeautifulSoup4
II. Selenium
Beautiful Soup is a library for static scraping. Static scraping ignores JavaScript. If you need data that are present in components which get rendered on clicking JavaScript links, you should use Selenium besides Beautiful Soup.
III. Creating a virtual environment with Pipenv
Create a virtual environment with Pipenv
mkdir scraping && cd scraping
pipenv install --python 3.9
pipenv shell
pipenv install selenium
pipenv install pandas
You can learn about Pipenv from my previous post.
IV. Setting up Chrome Driver
Chrome Driver is a separate executable that Selenium WebDriver uses to control Chrome.
Check my Chromium version. I can find the version on About Brave in Brave Browser:

You can download the ChromeDriver binary for your platform under the downloads. You can also use a library called chromedriver-py:
pipenv install chromedriver-py
You can learn more here.
I opened Python by running the command:
python
And run the following Python code showed on the page linked above.
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from chromedriver_py import binary_path # this will get you the path variable
service_object = Service(binary_path)
driver = webdriver.Chrome(service=service_object)
But I encountered an error. It said:
raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: unknown error: cannot find Chrome binary
This means that ChromeDriver was unable to find the Chrome binary in the default location. google-chrome is expected to be located at /usr/bin/google-chrome on Linux.
As I mentioned earlier, I use Brave Browser and the brave file is installed at /usr/bin/brave on Linux. I overrode the default Chrome binary location as follows:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from chromedriver_py import binary_path # for the path variable
service_object = Service(binary_path)
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.binary_location = "/usr/bin/brave"
driver = webdriver.Chrome(chrome_options = options, executable_path=binary_path)
driver.get('http://google.com/')
print("Chrome Browser Invoked")
driver.quit()
You can learn more here.
V. Using Selenium and Beautiful Soup
I was able to retrieve the titles and authors on the Amazon website without using any Xpath. First I accessed the website with Brave Browser (not with headless mode) by running the following code:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from chromedriver_py import binary_path # for the path variable
service_object = Service(binary_path)
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import pandas as pd
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
# setting up sebdriver
options = Options()
options.binary_location = "/usr/bin/brave"
driver = webdriver.Chrome(chrome_options = options, executable_path=binary_path)
# access the page with the browser
driver.get('https://www.amazon.com/b?ie=UTF8&node=8192263011')
Then I zoomed out to show all the books (actually run the scripts on the page.)

Retrieve the page source and scrape titles and authors using Beautiful Soup.

To be continued.
VI. References
Web Scraping with Selenium in Python — Amazon Search Result (Part 1)
Web Scraping using Beautiful Soup and Selenium for dynamic page
This page documents how to start using ChromeDriver for testing your website on desktop
Scraping Amazon results with Selenium and Python.
https://selenium-python.readthedocs.io/installation.html
Selenium with Python Tutorial: Getting started with Test Automation
How to run Selenium tests on Chrome using ChromeDriver
https://pypi.org/project/chromedriver-py/