{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "McZUmFIxlaBi" }, "source": [ "# Dynamic Webscraping with Selenium\n", "\n", "In the last workshop, we saw how to use `BeautifulSoup` to scrape data from a website and read that data into a powerful data structure called a `pandas` `DataFrame`. In this notebook, we'll do something very similar again. We'll be taking a website url, passing it through 3rd party software and extracting useful information that we can use to populate a DataFrame. This time, however, we will be scraping a *dynamic* website, that is a website whose HTML code is generated by an application." ] }, { "cell_type": "markdown", "metadata": { "id": "SPbeV52-laBr" }, "source": [ "## Our Task in this notebook\n", "We are going to scrape the headlines from today's issue of the [New York Times](https://www.nytimes.com/). Then, we'll put this data in a `DataFrame` and save it locally as a csv for later use.\n", "\n", "A note on copyright: all Tufts logins come with the New York Times, so be sure to log into your Tufts account before you continue. Please find instructions on doing so [here](https://researchguides.library.tufts.edu/nytimes)." ] }, { "cell_type": "markdown", "metadata": { "id": "27VDBw-ylaBs" }, "source": [ "## Goals:\n", "* Understand what a dynamic website is and how it is different from a static website\n", "* Install `Selenium` along with it's associated dependencies\n", "* Navigate the content of the site both in `Selenium` and `BeautifulSoup`\n", "* Get more experience with generating DataFrames" ] }, { "cell_type": "markdown", "metadata": { "id": "siaXoLP3laBt" }, "source": [ "## What is a dynamic website\n", "Let's delve a bit deeper into what a dynamic website is and why we can't just use `BeautifulSoup` to parse it as we can with static websites. While a static webpage would require a manual update before content on the site can change, a dynamic website takes advantage of client and server-side scripting to be more adaptable to a user's needs.\n", "* Client-side scripting: code that is executed by the user's browser, generally using JavaScript. This scripting renders changes to the site when the user interacts with it. This can be anything from selecting a choice in a drop down menu to full fledged games like Wordle. This type of scripting is also common in many static sites.\n", "* Server-side scripting: code that is executed by the server before sending content to the user's browser. This code can be written in a wide varity of languages like Ruby (`RubyOnRails`), JavaScript (`VueJS`, `NodeJS`) and Python (`Django`, `Flask`). This code generally gets inputs from querying a database associated with the site and outputs HTML code from a template. This way, programmers can update elements in their sites without having to rewrite large sections of it. But, it also means that the HTML is not yet generated when we do a get request.\n", "\n", "Let's look at what this means for us in Python code below." ] }, { "cell_type": "code", "source": [ "## what we did in part i of the art of webscraping. if this isn't making any sense, check out [LINK TO PART I]\n", "from bs4 import BeautifulSoup\n", "import requests\n", "\n", "soup = BeautifulSoup(requests.get('https://www.nytimes.com/').text)" ], "metadata": { "id": "iCVI4Vhulf2q" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "soup" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Y0YWTD5yl4we", "outputId": "a3e6f361-fac5-4978-a427-714aaadb3f9f" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "\n", "\n", "\n", "\n", "The New York Times - Breaking News, US News, World News and Videos\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "
\n", "
  1. Fort Myers Beach, Fla.
    Hilary Swift for The New York Times
  2. Fort Myers, Fla.
    Kinfay Moroti for The New York Times
  3. Kissimmee, Fla.
  4. Port Charlotte, Fla.
    Johnny Milano for The New York Times
  5. Fort Myers, Fla.
    Joe Raedle/Getty Images
  6. St. Petersburg, Fla.
    Bob Croslin for The New York Times
  7. Fort Myers, Fla.
    Hilary Swift for The New York Times
  8. Punta Gorda, Fla.
    Shannon Stapleton/Reuters
  9. Fort Myers, Fla.
  10. Tampa, Fla.
    Hilary Swift for The New York Times
  11. Cape Coral, Fla.
  12. Tampa, Fla.
    Hilary Swift for The New York Times
  13. South Gandy Channel, Fla.
    Johnny Milano for The New York Times
  14. St. Petersburg, Fla.
    Bob Croslin for The New York Times
  15. Cape Coral, Fla.
    By The New York Times
\n", "
\n", "
LIVE

Hurricane Ian Leaves Millions Without Power in Florida

  • Governor Ron DeSantis said the storm’s impact was “historic” and that infrastructure would need rebuilding to restore power to some areas.
  • Ian brought severe winds, storm surge and heavy rain, leaving more than two million customers without electricity as it rolled toward the Atlantic Ocean.

 

Watch Live: President Biden is speaking about Tropical Storm Ian.

\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "

Tropical Storm Ian as of 11 a.m. ET ›

\n", "

Category

\n", "

\n", "

TS
\n", "

Wind speed

\n", "

70m.p.h.\n", "

\n", "

Maximum
sustained

\n", "

Eye Location

\n", "

285mi.\n", "

\n", "

From Charleston, S.C.

\n", "
\n", "
\n", "
\n", "\n", "\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "

Death tolls immediately after a disaster are often unreliable. One unconfirmed estimate spread widely.

2 min read

\n", "\n", "
Mitch Smith
The National Weather Service said Ian was expected to strengthen back into a hurricane today and make landfall as a hurricane tomorrow on the South Carolina coast.
Nicholas Bogel-Burroughs
I’m in a neighborhood in North Fort Myers, where house after house has been swamped by the storm. One couple in their 90s said their bedroom was so badly flooded that the bed started floating around the room.
Patricia Mazzei
Ray Rowley, 78, tried to get back on foot to his home in Port Charlotte Village, a community north of Fort Myers, but was deterred by waist-high water. “I think we had a seven-foot storm surge,” he said.
Amanda Holpuch
Governor DeSantis said that the storm’s impacts were “historic” and that officials were still trying to assess the extent of damage. “Today is about identifying the people who need help and may still be in harm’s way,” he said.
Daniel Victor
Coastal waters are subsiding along the western coast of Florida, according to the National Hurricane Center’s Storm Surge Unit. But a danger of storm surge remains along the coasts of northeast Florida, Georgia and South Carolina.
Amanda Holpuch
Carmine Marceno, the sheriff in Lee County, Fla., told Good Morning America that there were thousands of people waiting to be rescued. The number of fatalities is unknown.
Richard Fausset
Intense wind and rain from Ian battered Orlando through the night, and is not letting up as day breaks Thursday over the central Florida city. Orlando Mayor Buddy Dyer sent a tweet at around 7 a.m. urging people to stay in their homes.

Most of the homes in the path of the hurricane lack flood insurance, complicating rebuilding efforts.

4 min read

\n", "\n", "
LIVE

Russia to Push Ahead With Land Grab in Ukraine After Sham Votes

The Kremlin planned to begin absorbing four Ukrainian territories, despite widespread condemnation of referendums where some were made to vote at gunpoint.

 

\n", "\n", "
\n", "
\n", "
Latest Photos From Ukraine
  1. Dnipro
    Nicole Tung for The New York Times
  2. Bakhmut
    Nicole Tung for The New York Times
  3. Pokrovsk
    Nicole Tung for The New York Times
  4. Kramatorsk
    Leo Correa/Associated Press
  5. Izium
    Zohra Bensemra/Reuters
  6. Dnipro
    Nicole Tung for The New York Times
  7. Kharkiv
    Paula Bronstein/Getty Images
  8. Moscow, Russia
    Yuri Kochetkov/EPA, via Shutterstock
\n", "
\n", "

Pentagon Plans to Set Up a New Command to Arm Ukraine

The new command signals that the United States expects the threat from Russia to persist for many years.

5 min read

President Volodymyr Zelensky of Ukraine to Russian soldiers: “If you want to live, run.”

3 min read

NATO labeled the Nord Stream gas pipeline leaks sabotage and promised a “determined response.”

\n", "\n", "

Analysis: New Infectious Threats Are Coming. The U.S. Isn’t Ready.

The coronavirus revealed flaws in the nation’s pandemic plans. The spread of monkeypox shows that the problems remain deeply entrenched.

7 min read

\"A
Brynn Anderson/Associated Press

China’s Covid propaganda has led some citizens to argue the language has bordered on “nonsense.”

4 min read

Physician burnout has reached distressing levels, a new study found. But the situation is not irreparable.

5 min read

\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "
\n", "

Tracking the Coronavirus ›

\n", "
\n", "
United States ›\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
United States ›United StatesAvg. on Sept. 2814-day change
New Covid cases48,806–22%
Hospitalized28,765–14%
New deaths404–12%
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "
\n", "
In Case You Missed ItTop picks from The Times, recommended for you
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "" ] }, "metadata": {}, "execution_count": 2 } ] }, { "cell_type": "markdown", "source": [ "What is this mess? There is clearly useful information, like headlines, here, but it's really hard to figure out how to scrape it. There doesn't seem to be an inherent structure of the site that we can take advantage of, as we did in the last notebook. Instead, we are going to use `Selenium` to render this site in a browser, so we can then use access the HTML output without dealing with the back-end of the site, which we don't have access to." ], "metadata": { "id": "LJHXPA0xnntF" } }, { "cell_type": "markdown", "source": [ "## `Selenium`: what is it and how does it works\n", "\n", "`Selenium` is a software package that automates an instance of a particular browser in the runtime of many programming languages, including Ruby, JavaScript and of course Python.\n", "\n", "In order to use Selenium in any environment, one must first download the driver to a web browser (we'll be using Chrome) and then also install the software package itself (we'll be using pip).\n", "\n", "I chose to put this notebook lesson on Colab because the Linux environment makes it very easy to download and install the dependencies needed to use `Selenium`, but if you prefer to use this notebook locally, either run it in a WSL terminal (CHECK ON THIS) or follow the official [documentation](https://selenium-python.readthedocs.io/)." ], "metadata": { "id": "1K4PPCRfoaRF" } }, { "cell_type": "code", "source": [ "## installing all of the dependencies as well as the selenium package\n", "%%shell\n", "\n", "# Add debian buster\n", "cat > /etc/apt/sources.list.d/debian.list <<'EOF'\n", "deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster.gpg] http://deb.debian.org/debian buster main\n", "deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster-updates.gpg] http://deb.debian.org/debian buster-updates main\n", "deb [arch=amd64 signed-by=/usr/share/keyrings/debian-security-buster.gpg] http://deb.debian.org/debian-security buster/updates main\n", "EOF\n", "\n", "# Add keys\n", "apt-key adv --keyserver keyserver.ubuntu.com --recv-keys DCC9EFBF77E11517\n", "apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 648ACFD622F3D138\n", "apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 112695A0E562B32A\n", "\n", "apt-key export 77E11517 | gpg --dearmour -o /usr/share/keyrings/debian-buster.gpg\n", "apt-key export 22F3D138 | gpg --dearmour -o /usr/share/keyrings/debian-buster-updates.gpg\n", "apt-key export E562B32A | gpg --dearmour -o /usr/share/keyrings/debian-security-buster.gpg\n", "\n", "# Prefer debian repo for chromium* packages only\n", "# Note the double-blank lines between entries\n", "cat > /etc/apt/preferences.d/chromium.pref << 'EOF'\n", "Package: *\n", "Pin: release a=eoan\n", "Pin-Priority: 500\n", "\n", "\n", "Package: *\n", "Pin: origin \"deb.debian.org\"\n", "Pin-Priority: 300\n", "\n", "\n", "Package: chromium*\n", "Pin: origin \"deb.debian.org\"\n", "Pin-Priority: 700\n", "EOF\n", "\n", "# Install chromium and chromium-driver\n", "apt-get update\n", "apt-get install chromium chromium-driver\n", "\n", "# Install selenium\n", "pip install selenium" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "radH6wp-l6FX", "outputId": "a40b30d1-6d7b-41af-a424-77302e577624" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).\n", "Executing: /tmp/apt-key-gpghome.sOteFPdRQB/gpg.1.sh --keyserver keyserver.ubuntu.com --recv-keys DCC9EFBF77E11517\n", "gpg: key DCC9EFBF77E11517: public key \"Debian Stable Release Key (10/buster) \" imported\n", "gpg: Total number processed: 1\n", "gpg: imported: 1\n", "Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).\n", "Executing: /tmp/apt-key-gpghome.gVarvbuXyS/gpg.1.sh --keyserver keyserver.ubuntu.com --recv-keys 648ACFD622F3D138\n", "gpg: key DC30D7C23CBBABEE: public key \"Debian Archive Automatic Signing Key (10/buster) \" imported\n", "gpg: Total number processed: 1\n", "gpg: imported: 1\n", "Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).\n", "Executing: /tmp/apt-key-gpghome.ig0P32XMqC/gpg.1.sh --keyserver keyserver.ubuntu.com --recv-keys 112695A0E562B32A\n", "gpg: key 4DFAB270CAA96DFA: public key \"Debian Security Archive Automatic Signing Key (10/buster) \" imported\n", "gpg: Total number processed: 1\n", "gpg: imported: 1\n", "Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).\n", "Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).\n", "Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).\n", "Get:1 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]\n", "Get:2 http://deb.debian.org/debian buster InRelease [122 kB]\n", "Hit:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64 InRelease\n", "Get:4 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,626 B]\n", "Get:5 http://deb.debian.org/debian buster-updates InRelease [56.6 kB]\n", "Get:6 http://deb.debian.org/debian-security buster/updates InRelease [34.8 kB]\n", "Get:7 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [832 kB]\n", "Hit:8 http://archive.ubuntu.com/ubuntu jammy InRelease\n", "Get:9 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [119 kB]\n", "Get:10 http://deb.debian.org/debian buster/main amd64 Packages [10.7 MB]\n", "Hit:11 https://ppa.launchpadcontent.net/c2d4u.team/c2d4u4.0+/ubuntu jammy InRelease\n", "Get:12 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages [979 kB]\n", "Get:13 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [109 kB]\n", "Get:14 http://deb.debian.org/debian buster-updates/main amd64 Packages [9,745 B]\n", "Get:15 http://deb.debian.org/debian-security buster/updates/main amd64 Packages [701 kB]\n", "Hit:16 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease\n", "Get:17 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [1,129 kB]\n", "Hit:18 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease\n", "Get:19 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1,236 kB]\n", "Hit:20 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease\n", "Fetched 16.2 MB in 2s (7,708 kB/s)\n", "Reading package lists... Done\n", "Reading package lists... Done\n", "Building dependency tree... Done\n", "Reading state information... Done\n", "The following additional packages will be installed:\n", " chromium-common chromium-sandbox libevent-2.1-6 libfontenc1 libgudev-1.0-0\n", " libicu63 libimobiledevice6 libjpeg62-turbo libjsoncpp1 libplist3 libre2-5\n", " libu2f-udev libupower-glib3 libusbmuxd6 libvpx5 libwebp6 libxkbfile1\n", " libxtst6 libxxf86dga1 notification-daemon systemd-hwe-hwdb udev upower\n", " usbmuxd x11-utils\n", "Suggested packages:\n", " chromium-l10n chromium-shell libusbmuxd-tools mesa-utils\n", "The following NEW packages will be installed:\n", " chromium chromium-common chromium-driver chromium-sandbox libevent-2.1-6\n", " libfontenc1 libgudev-1.0-0 libicu63 libimobiledevice6 libjpeg62-turbo\n", " libjsoncpp1 libplist3 libre2-5 libu2f-udev libupower-glib3 libusbmuxd6\n", " libvpx5 libwebp6 libxkbfile1 libxtst6 libxxf86dga1 notification-daemon\n", " systemd-hwe-hwdb udev upower usbmuxd x11-utils\n", "0 upgraded, 27 newly installed, 0 to remove and 25 not upgraded.\n", "Need to get 76.8 MB of archives.\n", "After this operation, 267 MB of additional disk space will be used.\n", "Get:1 http://deb.debian.org/debian buster/main amd64 libevent-2.1-6 amd64 2.1.8-stable-4 [177 kB]\n", "Get:2 http://deb.debian.org/debian buster/main amd64 libicu63 amd64 63.1-6+deb10u3 [8,293 kB]\n", "Get:3 http://deb.debian.org/debian buster/main amd64 libjpeg62-turbo amd64 1:1.5.2-2+deb10u1 [133 kB]\n", "Get:4 http://deb.debian.org/debian buster/main amd64 libjsoncpp1 amd64 1.7.4-3 [75.6 kB]\n", "Get:5 http://deb.debian.org/debian buster/main amd64 libre2-5 amd64 20190101+dfsg-2 [161 kB]\n", "Get:6 http://deb.debian.org/debian buster/main amd64 libvpx5 amd64 1.7.0-3+deb10u1 [800 kB]\n", "Get:7 http://deb.debian.org/debian-security buster/updates/main amd64 libwebp6 amd64 0.6.1-2+deb10u2 [262 kB]\n", "Get:8 http://deb.debian.org/debian buster/main amd64 chromium-common amd64 90.0.4430.212-1~deb10u1 [1,423 kB]\n", "Get:9 http://deb.debian.org/debian buster/main amd64 chromium amd64 90.0.4430.212-1~deb10u1 [58.3 MB]\n", "Get:10 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 udev amd64 249.11-0ubuntu3.9 [1,557 kB]\n", "Get:11 http://deb.debian.org/debian buster/main amd64 chromium-driver amd64 90.0.4430.212-1~deb10u1 [4,703 kB]\n", "Get:12 http://deb.debian.org/debian buster/main amd64 chromium-sandbox amd64 90.0.4430.212-1~deb10u1 [146 kB]\n", "Get:13 http://archive.ubuntu.com/ubuntu jammy/main amd64 libfontenc1 amd64 1:1.1.4-1build3 [14.7 kB]\n", "Get:14 http://archive.ubuntu.com/ubuntu jammy/main amd64 libxkbfile1 amd64 1:1.1.0-1build3 [71.8 kB]\n", "Get:15 http://archive.ubuntu.com/ubuntu jammy/main amd64 libxtst6 amd64 2:1.2.3-1build4 [13.4 kB]\n", "Get:16 http://archive.ubuntu.com/ubuntu jammy/main amd64 libxxf86dga1 amd64 2:1.1.5-0ubuntu3 [12.6 kB]\n", "Get:17 http://archive.ubuntu.com/ubuntu jammy/main amd64 x11-utils amd64 7.7+5build2 [206 kB]\n", "Get:18 http://archive.ubuntu.com/ubuntu jammy/main amd64 libgudev-1.0-0 amd64 1:237-2build1 [16.3 kB]\n", "Get:19 http://archive.ubuntu.com/ubuntu jammy/main amd64 libplist3 amd64 2.2.0-6build2 [32.1 kB]\n", "Get:20 http://archive.ubuntu.com/ubuntu jammy/main amd64 libusbmuxd6 amd64 2.0.2-3build2 [20.4 kB]\n", "Get:21 http://archive.ubuntu.com/ubuntu jammy/main amd64 libimobiledevice6 amd64 1.3.0-6build3 [71.1 kB]\n", "Get:22 http://archive.ubuntu.com/ubuntu jammy/main amd64 libu2f-udev all 1.1.10-3build2 [4,190 B]\n", "Get:23 http://archive.ubuntu.com/ubuntu jammy/main amd64 libupower-glib3 amd64 0.99.17-1 [46.7 kB]\n", "Get:24 http://archive.ubuntu.com/ubuntu jammy/universe amd64 notification-daemon amd64 3.20.0-4build1 [57.6 kB]\n", "Get:25 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 systemd-hwe-hwdb all 249.11.3 [2,908 B]\n", "Get:26 http://archive.ubuntu.com/ubuntu jammy/main amd64 upower amd64 0.99.17-1 [86.7 kB]\n", "Get:27 http://archive.ubuntu.com/ubuntu jammy/main amd64 usbmuxd amd64 1.1.1-2build2 [42.8 kB]\n", "Fetched 76.8 MB in 2s (37.5 MB/s)\n", "Selecting previously unselected package udev.\n", "(Reading database ... 120831 files and directories currently installed.)\n", "Preparing to unpack .../00-udev_249.11-0ubuntu3.9_amd64.deb ...\n", "Unpacking udev (249.11-0ubuntu3.9) ...\n", "Selecting previously unselected package libevent-2.1-6:amd64.\n", "Preparing to unpack .../01-libevent-2.1-6_2.1.8-stable-4_amd64.deb ...\n", "Unpacking libevent-2.1-6:amd64 (2.1.8-stable-4) ...\n", "Selecting previously unselected package libicu63:amd64.\n", "Preparing to unpack .../02-libicu63_63.1-6+deb10u3_amd64.deb ...\n", "Unpacking libicu63:amd64 (63.1-6+deb10u3) ...\n", "Selecting previously unselected package libjpeg62-turbo:amd64.\n", "Preparing to unpack .../03-libjpeg62-turbo_1%3a1.5.2-2+deb10u1_amd64.deb ...\n", "Unpacking libjpeg62-turbo:amd64 (1:1.5.2-2+deb10u1) ...\n", "Selecting previously unselected package libjsoncpp1:amd64.\n", "Preparing to unpack .../04-libjsoncpp1_1.7.4-3_amd64.deb ...\n", "Unpacking libjsoncpp1:amd64 (1.7.4-3) ...\n", "Selecting previously unselected package libre2-5:amd64.\n", "Preparing to unpack .../05-libre2-5_20190101+dfsg-2_amd64.deb ...\n", "Unpacking libre2-5:amd64 (20190101+dfsg-2) ...\n", "Selecting previously unselected package libvpx5:amd64.\n", "Preparing to unpack .../06-libvpx5_1.7.0-3+deb10u1_amd64.deb ...\n", "Unpacking libvpx5:amd64 (1.7.0-3+deb10u1) ...\n", "Selecting previously unselected package libwebp6:amd64.\n", "Preparing to unpack .../07-libwebp6_0.6.1-2+deb10u2_amd64.deb ...\n", "Unpacking libwebp6:amd64 (0.6.1-2+deb10u2) ...\n", "Selecting previously unselected package libfontenc1:amd64.\n", "Preparing to unpack .../08-libfontenc1_1%3a1.1.4-1build3_amd64.deb ...\n", "Unpacking libfontenc1:amd64 (1:1.1.4-1build3) ...\n", "Selecting previously unselected package libxkbfile1:amd64.\n", "Preparing to unpack .../09-libxkbfile1_1%3a1.1.0-1build3_amd64.deb ...\n", "Unpacking libxkbfile1:amd64 (1:1.1.0-1build3) ...\n", "Selecting previously unselected package libxtst6:amd64.\n", "Preparing to unpack .../10-libxtst6_2%3a1.2.3-1build4_amd64.deb ...\n", "Unpacking libxtst6:amd64 (2:1.2.3-1build4) ...\n", "Selecting previously unselected package libxxf86dga1:amd64.\n", "Preparing to unpack .../11-libxxf86dga1_2%3a1.1.5-0ubuntu3_amd64.deb ...\n", "Unpacking libxxf86dga1:amd64 (2:1.1.5-0ubuntu3) ...\n", "Selecting previously unselected package x11-utils.\n", "Preparing to unpack .../12-x11-utils_7.7+5build2_amd64.deb ...\n", "Unpacking x11-utils (7.7+5build2) ...\n", "Selecting previously unselected package chromium-common.\n", "Preparing to unpack .../13-chromium-common_90.0.4430.212-1~deb10u1_amd64.deb ...\n", "Unpacking chromium-common (90.0.4430.212-1~deb10u1) ...\n", "Selecting previously unselected package chromium.\n", "Preparing to unpack .../14-chromium_90.0.4430.212-1~deb10u1_amd64.deb ...\n", "Unpacking chromium (90.0.4430.212-1~deb10u1) ...\n", "Selecting previously unselected package chromium-driver.\n", "Preparing to unpack .../15-chromium-driver_90.0.4430.212-1~deb10u1_amd64.deb ...\n", "Unpacking chromium-driver (90.0.4430.212-1~deb10u1) ...\n", "Selecting previously unselected package chromium-sandbox.\n", "Preparing to unpack .../16-chromium-sandbox_90.0.4430.212-1~deb10u1_amd64.deb ...\n", "Unpacking chromium-sandbox (90.0.4430.212-1~deb10u1) ...\n", "Selecting previously unselected package libgudev-1.0-0:amd64.\n", "Preparing to unpack .../17-libgudev-1.0-0_1%3a237-2build1_amd64.deb ...\n", "Unpacking libgudev-1.0-0:amd64 (1:237-2build1) ...\n", "Selecting previously unselected package libplist3:amd64.\n", "Preparing to unpack .../18-libplist3_2.2.0-6build2_amd64.deb ...\n", "Unpacking libplist3:amd64 (2.2.0-6build2) ...\n", "Selecting previously unselected package libusbmuxd6:amd64.\n", "Preparing to unpack .../19-libusbmuxd6_2.0.2-3build2_amd64.deb ...\n", "Unpacking libusbmuxd6:amd64 (2.0.2-3build2) ...\n", "Selecting previously unselected package libimobiledevice6:amd64.\n", "Preparing to unpack .../20-libimobiledevice6_1.3.0-6build3_amd64.deb ...\n", "Unpacking libimobiledevice6:amd64 (1.3.0-6build3) ...\n", "Selecting previously unselected package libu2f-udev.\n", "Preparing to unpack .../21-libu2f-udev_1.1.10-3build2_all.deb ...\n", "Unpacking libu2f-udev (1.1.10-3build2) ...\n", "Selecting previously unselected package libupower-glib3:amd64.\n", "Preparing to unpack .../22-libupower-glib3_0.99.17-1_amd64.deb ...\n", "Unpacking libupower-glib3:amd64 (0.99.17-1) ...\n", "Selecting previously unselected package notification-daemon.\n", "Preparing to unpack .../23-notification-daemon_3.20.0-4build1_amd64.deb ...\n", "Unpacking notification-daemon (3.20.0-4build1) ...\n", "Selecting previously unselected package systemd-hwe-hwdb.\n", "Preparing to unpack .../24-systemd-hwe-hwdb_249.11.3_all.deb ...\n", "Unpacking systemd-hwe-hwdb (249.11.3) ...\n", "Selecting previously unselected package upower.\n", "Preparing to unpack .../25-upower_0.99.17-1_amd64.deb ...\n", "Unpacking upower (0.99.17-1) ...\n", "Selecting previously unselected package usbmuxd.\n", "Preparing to unpack .../26-usbmuxd_1.1.1-2build2_amd64.deb ...\n", "Unpacking usbmuxd (1.1.1-2build2) ...\n", "Setting up libplist3:amd64 (2.2.0-6build2) ...\n", "Setting up libxtst6:amd64 (2:1.2.3-1build4) ...\n", "Setting up libxxf86dga1:amd64 (2:1.1.5-0ubuntu3) ...\n", "Setting up chromium-sandbox (90.0.4430.212-1~deb10u1) ...\n", "Setting up libicu63:amd64 (63.1-6+deb10u3) ...\n", "Setting up notification-daemon (3.20.0-4build1) ...\n", "Setting up libfontenc1:amd64 (1:1.1.4-1build3) ...\n", "Setting up libjpeg62-turbo:amd64 (1:1.5.2-2+deb10u1) ...\n", "Setting up udev (249.11-0ubuntu3.9) ...\n", "invoke-rc.d: could not determine current runlevel\n", "invoke-rc.d: policy-rc.d denied execution of start.\n", "Setting up libwebp6:amd64 (0.6.1-2+deb10u2) ...\n", "Setting up libevent-2.1-6:amd64 (2.1.8-stable-4) ...\n", "Setting up systemd-hwe-hwdb (249.11.3) ...\n", "Setting up libusbmuxd6:amd64 (2.0.2-3build2) ...\n", "Setting up libupower-glib3:amd64 (0.99.17-1) ...\n", "Setting up libre2-5:amd64 (20190101+dfsg-2) ...\n", "Setting up libxkbfile1:amd64 (1:1.1.0-1build3) ...\n", "Setting up libimobiledevice6:amd64 (1.3.0-6build3) ...\n", "Setting up libgudev-1.0-0:amd64 (1:237-2build1) ...\n", "Setting up libvpx5:amd64 (1.7.0-3+deb10u1) ...\n", "Setting up libjsoncpp1:amd64 (1.7.4-3) ...\n", "Setting up libu2f-udev (1.1.10-3build2) ...\n", "Setting up upower (0.99.17-1) ...\n", "Setting up usbmuxd (1.1.1-2build2) ...\n", "Warning: The home dir /var/lib/usbmux you specified can't be accessed: No such file or directory\n", "Adding system user `usbmux' (UID 104) ...\n", "Adding new user `usbmux' (UID 104) with group `plugdev' ...\n", "Not creating home directory `/var/lib/usbmux'.\n", "Setting up x11-utils (7.7+5build2) ...\n", "Setting up chromium-common (90.0.4430.212-1~deb10u1) ...\n", "Setting up chromium (90.0.4430.212-1~deb10u1) ...\n", "update-alternatives: using /usr/bin/chromium to provide /usr/bin/x-www-browser (x-www-browser) in auto mode\n", "update-alternatives: using /usr/bin/chromium to provide /usr/bin/gnome-www-browser (gnome-www-browser) in auto mode\n", "Setting up chromium-driver (90.0.4430.212-1~deb10u1) ...\n", "Processing triggers for man-db (2.10.2-1) ...\n", "Processing triggers for dbus (1.12.20-2ubuntu4.1) ...\n", "Processing triggers for hicolor-icon-theme (0.17-2) ...\n", "Processing triggers for libc-bin (2.35-0ubuntu3.1) ...\n", "/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc.so.2 is not a symbolic link\n", "\n", "/sbin/ldconfig.real: /usr/local/lib/libtbbbind.so.3 is not a symbolic link\n", "\n", "/sbin/ldconfig.real: /usr/local/lib/libtbb.so.12 is not a symbolic link\n", "\n", "/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_0.so.3 is not a symbolic link\n", "\n", "/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_5.so.3 is not a symbolic link\n", "\n", "/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc_proxy.so.2 is not a symbolic link\n", "\n", "Collecting selenium\n", " Downloading selenium-4.11.2-py3-none-any.whl (7.2 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.2/7.2 MB\u001b[0m \u001b[31m16.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hRequirement already satisfied: urllib3[socks]<3,>=1.26 in /usr/local/lib/python3.10/dist-packages (from selenium) (2.0.4)\n", "Collecting trio~=0.17 (from selenium)\n", " Downloading trio-0.22.2-py3-none-any.whl (400 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m400.2/400.2 kB\u001b[0m \u001b[31m37.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hCollecting trio-websocket~=0.9 (from selenium)\n", " Downloading trio_websocket-0.10.3-py3-none-any.whl (17 kB)\n", "Requirement already satisfied: certifi>=2021.10.8 in /usr/local/lib/python3.10/dist-packages (from selenium) (2023.7.22)\n", "Requirement already satisfied: attrs>=20.1.0 in /usr/local/lib/python3.10/dist-packages (from trio~=0.17->selenium) (23.1.0)\n", "Requirement already satisfied: sortedcontainers in /usr/local/lib/python3.10/dist-packages (from trio~=0.17->selenium) (2.4.0)\n", "Requirement already satisfied: idna in /usr/local/lib/python3.10/dist-packages (from trio~=0.17->selenium) (3.4)\n", "Collecting outcome (from trio~=0.17->selenium)\n", " Downloading outcome-1.2.0-py2.py3-none-any.whl (9.7 kB)\n", "Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from trio~=0.17->selenium) (1.3.0)\n", "Requirement already satisfied: exceptiongroup>=1.0.0rc9 in /usr/local/lib/python3.10/dist-packages (from trio~=0.17->selenium) (1.1.2)\n", "Collecting wsproto>=0.14 (from trio-websocket~=0.9->selenium)\n", " Downloading wsproto-1.2.0-py3-none-any.whl (24 kB)\n", "Requirement already satisfied: pysocks!=1.5.7,<2.0,>=1.5.6 in /usr/local/lib/python3.10/dist-packages (from urllib3[socks]<3,>=1.26->selenium) (1.7.1)\n", "Collecting h11<1,>=0.9.0 (from wsproto>=0.14->trio-websocket~=0.9->selenium)\n", " Downloading h11-0.14.0-py3-none-any.whl (58 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m7.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hInstalling collected packages: outcome, h11, wsproto, trio, trio-websocket, selenium\n", "Successfully installed h11-0.14.0 outcome-1.2.0 selenium-4.11.2 trio-0.22.2 trio-websocket-0.10.3 wsproto-1.2.0\n" ] }, { "output_type": "execute_result", "data": { "text/plain": [] }, "metadata": {}, "execution_count": 1 } ] }, { "cell_type": "code", "source": [ "from selenium import webdriver # the main entry point for the selenium API, the webdriver\n", "chrome_options = webdriver.ChromeOptions() # some useful features for the chrome driver\n", "chrome_options.add_argument('--headless')\n", "chrome_options.add_argument('--no-sandbox')\n", "chrome_options.add_argument('--disable-dev-shm-usage')\n", "wd = webdriver.Chrome(options=chrome_options) # final webdriver object" ], "metadata": { "id": "Cjfuw6s3q7hv", "colab": { "base_uri": "https://localhost:8080/", "height": 758 }, "outputId": "0421b414-2a5f-468b-91dd-45a3e36a063d" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "WARNING:selenium.webdriver.common.selenium_manager:The chromedriver version (90.0.4430.212) detected in PATH at /usr/bin/chromedriver might not be compatible with the detected chrome version (116.0.5845.96); currently, chromedriver 116.0.5845.96 is recommended for chrome 116.*, so it is advised to delete the driver in PATH and retry\n" ] }, { "output_type": "error", "ename": "SessionNotCreatedException", "evalue": "ignored", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mSessionNotCreatedException\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mchrome_options\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0madd_argument\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'--no-sandbox'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mchrome_options\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0madd_argument\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'--disable-dev-shm-usage'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0mwd\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mwebdriver\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mChrome\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moptions\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mchrome_options\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# final webdriver object\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/selenium/webdriver/chrome/webdriver.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, options, service, keep_alive)\u001b[0m\n\u001b[1;32m 43\u001b[0m \u001b[0moptions\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0moptions\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0moptions\u001b[0m \u001b[0;32melse\u001b[0m \u001b[0mOptions\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 44\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 45\u001b[0;31m super().__init__(\n\u001b[0m\u001b[1;32m 46\u001b[0m \u001b[0mDesiredCapabilities\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mCHROME\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"browserName\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 47\u001b[0m \u001b[0;34m\"goog\"\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/selenium/webdriver/chromium/webdriver.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, browser_name, vendor_prefix, options, service, keep_alive)\u001b[0m\n\u001b[1;32m 54\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 55\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 56\u001b[0;31m super().__init__(\n\u001b[0m\u001b[1;32m 57\u001b[0m command_executor=ChromiumRemoteConnection(\n\u001b[1;32m 58\u001b[0m \u001b[0mremote_server_addr\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mservice\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mservice_url\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, command_executor, keep_alive, file_detector, options)\u001b[0m\n\u001b[1;32m 204\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_authenticator_id\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 205\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstart_client\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 206\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstart_session\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcapabilities\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 207\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 208\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__repr__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py\u001b[0m in \u001b[0;36mstart_session\u001b[0;34m(self, capabilities)\u001b[0m\n\u001b[1;32m 288\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 289\u001b[0m \u001b[0mcaps\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_create_caps\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcapabilities\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 290\u001b[0;31m \u001b[0mresponse\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mCommand\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mNEW_SESSION\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcaps\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"value\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 291\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msession_id\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mresponse\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"sessionId\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 292\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcaps\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mresponse\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"capabilities\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py\u001b[0m in \u001b[0;36mexecute\u001b[0;34m(self, driver_command, params)\u001b[0m\n\u001b[1;32m 343\u001b[0m \u001b[0mresponse\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcommand_executor\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdriver_command\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mparams\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 344\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mresponse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 345\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0merror_handler\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcheck_response\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mresponse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 346\u001b[0m \u001b[0mresponse\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"value\"\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_unwrap_value\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mresponse\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"value\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 347\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mresponse\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/errorhandler.py\u001b[0m in \u001b[0;36mcheck_response\u001b[0;34m(self, response)\u001b[0m\n\u001b[1;32m 227\u001b[0m \u001b[0malert_text\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"alert\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"text\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 228\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mexception_class\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmessage\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mscreen\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstacktrace\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0malert_text\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# type: ignore[call-arg] # mypy is not smart enough here\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 229\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mexception_class\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmessage\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mscreen\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstacktrace\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mSessionNotCreatedException\u001b[0m: Message: session not created: This version of ChromeDriver only supports Chrome version 90\nCurrent browser version is 116.0.5845.96 with binary path /root/.cache/selenium/chrome/linux64/116.0.5845.96/chrome\nStacktrace:\n#0 0x5d01b8e7e7f9 \n#1 0x5d01b8e1e3b3 \n#2 0x5d01b8b66016 \n#3 0x5d01b8b8ce4a \n#4 0x5d01b8b8899a \n#5 0x5d01b8b8589a \n#6 0x5d01b8bc300a \n#7 0x5d01b8bbdc93 \n#8 0x5d01b8b8fce4 \n#9 0x5d01b8b914d2 \n#10 0x5d01b8e4a542 \n#11 0x5d01b8e59ce7 \n#12 0x5d01b8e599e4 \n#13 0x5d01b8e5e13a \n#14 0x5d01b8e5a5b9 \n#15 0x5d01b8e3fe00 \n#16 0x5d01b8e715d2 \n#17 0x5d01b8e71778 \n#18 0x5d01b8e89a1f \n#19 0x79373e040b43 \n#20 0x79373e0d2a00 \n" ] } ] }, { "cell_type": "code", "source": [ "wd" ], "metadata": { "id": "U2Fr9byIsC1Y" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Just like when we used `requests`, we can use the get method to load the site into our runtime" ], "metadata": { "id": "Ih6pB2j2rylk" } }, { "cell_type": "code", "source": [ "wd.get('https://www.nytimes.com/')" ], "metadata": { "id": "kkRrW9yqrWFM" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "If we want to get the headlines from each article, we need to select the `section` tags. In `BeautifulSoup`, we could use something like `soup.find_all('section')`, but the syntax is slightly different in `Selenium`.\n", "\n", "Check out the documentation [here](https://selenium-python.readthedocs.io/locating-elements.html)." ], "metadata": { "id": "KjT0Is4Qsk0f" } }, { "cell_type": "code", "source": [ "[i.get_attribute('outerHTML') for i in wd.find_elements(By.XPATH, './/section//h3')]" ], "metadata": { "id": "m_8ChUMMwpJI" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "## first, this By class allows us to select many options to search by\n", "## here, we'll query by HTML tag\n", "from selenium.webdriver.common.by import By\n", "\n", "sections = wd.find_elements(By.TAG_NAME, 'section') # NOTE: I used find_elements with an 's', if you were to use find_element, it would only return the first element that meets the condiction, which can be useful" ], "metadata": { "id": "bf0bP_irsHcs" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "sections[6].find_element(By.TAG_NAME,'h3')#.get_attribute('outerHTML')" ], "metadata": { "id": "0oxtP23Tuwe3" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "for section in sections:\n", " if (len(section.text) > 0): # omitting all of the empty titles\n", " print(type(section), section.text.replace('\\n', ''), sep='\\t')\n", " else:\n", " print(type(section), section.find_element(By.TAG_NAME,'h3'), sep='\\t')" ], "metadata": { "id": "FQ16Desrt8eg" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "This is getting there, but there is still a lot of stuff in here that we don't want. We could just remove the end of the list and hard code and indexing statement like `list_of_tags[:idx_of_last_headline]`, but this is dangerous, as the New York Times changes at least everyday, sometimes multiple times a day, so we will use a more robust method of querying the webpage than just HTML tag.\n", "\n", "Instead, we are going to select headlines by how they are styled on the webpage. Below, I have printed out all of the links of the site, which will have all of the headlines we want. Go through this list and determine which class attributes we want for our headlines." ], "metadata": { "id": "pNBR8STiw-_1" } }, { "cell_type": "code", "source": [ "a_tags = wd.find_elements(By.TAG_NAME, 'a')\n", "for a in a_tags:\n", " print(a.text, a.get_attribute(\"class\"), sep='\\t')" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "xAJON6dTDrUI", "outputId": "67acde2a-ace0-447a-8cd7-a5e40b84dd2c" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "SKIP TO CONTENT\tcss-kgn7zc\n", "SKIP TO SITE INDEX\tcss-kgn7zc\n", "SKIP ADVERTISEMENT\tcss-777zgl\n", "\tcss-nhjhh0 ell52qj1\n", "\tcss-ogiugu\n", "\tcss-ogiugu\n", "\tcss-ogiugu\n", "\tcss-ogiugu\n", "\tcss-ogiugu\n", "\tnytcp-opt css-1kj7lfb\n", "\tcss-1kj7lfb\n", "\tcss-129gw94\n", "\tcss-hnzl8o\n", "SUBSCRIBE FOR $1/WEEK\tnytcp-opt\n", "\tcss-1q2j1fr eoab3xr0\n", "\t\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-13phoew\n", "\tcss-13phoew\n", "\tcss-13phoew\n", "\tcss-13phoew\n", "\tcss-13phoew\n", "\tcss-13phoew\n", "\tcss-13phoew\n", "\tcss-13phoew\n", "\tcss-13phoew\n", "\tcss-13phoew\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-1wjnrbv\n", "\tcss-13phoew\n", "\tcss-13phoew\n", "\tcss-13phoew\n", "\tcss-13phoew\n", "\tcss-13phoew\n", "\tcss-13phoew\n", "\tcss-13phoew\n", "\tcss-13phoew\n", "\tcss-13phoew\n", "\tcss-13phoew\n", "Ukraine Seeks a Breakthrough in the South but Faces Big Obstacles\n", "The goal of Ukraine’s counteroffensive is to drive a wedge through Russian-occupied territory. But the execution has proved difficult.\n", "See more headlines 9+\tcss-9mylee\n", "President Vladimir Putin promised free grain to at least six African countries at a summit, hoping to shore up Russia’s image.\tcss-9mylee\n", "\tcss-777zgl\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-1rsvbdu\n", "\t\n", "\t\n", "\t\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\t\n", "\t\n", "\t\n", "\t\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tus_heat_link svelte-mkrnf3\n", "\tintl_heat_link svelte-mkrnf3\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-1rsvbdu\n", "\t\n", "\t\n", "\t\n", "\t\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tgame-holder svelte-5gtd79\n", "\tgame-holder svelte-5gtd79\n", "\tgame-holder svelte-5gtd79\n", "\tgame-holder svelte-5gtd79\n", "\tgame-holder svelte-5gtd79\n", "\tgame-holder svelte-5gtd79\n", "\tgame-holder svelte-5gtd79\n", "\tgame-holder svelte-5gtd79\n", "\t\n", "\t\n", "\t\n", "\t\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-777zgl\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-rgq5s4\n", "\tcss-rgq5s4\n", "\tcss-9mylee\n", "\tcss-rgq5s4\n", "\tcss-rgq5s4\n", "\tcss-rgq5s4\n", "\tcss-9mylee\n", "\tcss-rgq5s4\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-777zgl\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-777zgl\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-777zgl\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-9mylee\n", "\tcss-jq1cx6\n", "\tcss-jq1cx6\n", "\tcss-jq1cx6\n", "\tcss-jq1cx6\n", "\tcss-jq1cx6\n", "\tcss-jq1cx6\n", "\tcss-jq1cx6\n", "\tcss-jq1cx6\n", "\tcss-jq1cx6\n", "\tcss-jq1cx6\n", "\tcss-jq1cx6\n", "\tcss-jq1cx6\n", "\tcss-jq1cx6\n", "\tcss-jq1cx6\n", "\tcss-jq1cx6\n", "\tcss-jq1cx6\n", "Terms of Sale\tcss-sur6ab\n", "Terms of Service\tcss-sur6ab\n", "Privacy Policy\tcss-sur6ab\n" ] } ] }, { "cell_type": "markdown", "source": [ "From this list, I found `css-9mylee` and `css-rgq5s4` as the two main headline classes. Now we can use XPATH filtering to isolate them." ], "metadata": { "id": "uz3IQT8oF9Q0" } }, { "cell_type": "code", "source": [ "## here, we are going to use the XPATH attribute of the By class\n", "## XPATH is a common practice for XML parsing that can be extended to HTML using XHTML\n", "\n", "## in order to process two XPATH patterns, we can make a list of them and feed them into\n", "## the find_elements method one by one\n", "\n", "patterns = [\n", " \"//a[@class='css-9mylee']\",\n", " \"//a[@class='css-rgq5s4']\"\n", "]\n", "\n", "for pattern in patterns:\n", " hls = wd.find_elements(By.XPATH, pattern)\n", " for hl in hls:\n", " if (len(hl.text) > 0):\n", " print(hl.text)\n", " print('_____________________') ## print this here so we see where each a tag stops" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "mPzwtrzNt9nw", "outputId": "fbafa1fd-a1a1-434b-dea0-9a10e93da394" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Ukraine Seeks a Breakthrough in the South but Faces Big Obstacles\n", "The goal of Ukraine’s counteroffensive is to drive a wedge through Russian-occupied territory. But the execution has proved difficult.\n", "See more headlines 9+\n", "_____________________\n", "President Vladimir Putin promised free grain to at least six African countries at a summit, hoping to shore up Russia’s image.\n", "_____________________\n" ] } ] }, { "cell_type": "markdown", "source": [ "But, now there is stuff here, like the author name or how long a read the article is, that isn't the headline and we still haven't been able to scrape the link to the article itself.\n", "\n", "Now, we want to look at the raw HTML at these place and try to deal with the problems above with BeautifulSoup parsing." ], "metadata": { "id": "ir5wZsj3ck8t" } }, { "cell_type": "code", "source": [ "for pattern in patterns:\n", " hls = wd.find_elements(By.XPATH, pattern)\n", " for hl in hls:\n", " if (len(hl.text) > 0):\n", " print(hl.get_attribute('outerHTML')) ## this is the only line different from above\n", " print('_____________________')" ], "metadata": { "id": "EcqCxbLbzAWH", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "b45ff7f0-9961-46d9-9d26-69c0a77e1333" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "

Ukraine Seeks a Breakthrough in the South but Faces Big Obstacles

The goal of Ukraine’s counteroffensive is to drive a wedge through Russian-occupied territory. But the execution has proved difficult.

See more headlines 9+

\n", "_____________________\n", "

President Vladimir Putin promised free grain to at least six African countries at a summit, hoping to shore up Russia’s image.

2 min read

\n", "_____________________\n" ] } ] }, { "cell_type": "markdown", "source": [ "Because we now have the raw HTML, we can call BeautifulSoup and sort out any issues that way.\n", "\n", "We are going to use the `href` value in the a tag to govern what type of headline we are going to look for in each `a` tag.\n", "* First, we don't want any of the games or podcast titles, so we can search the link to see if it has any keywords that would suggest it is one of these. Using the `in` operator and treating the link like any string, we can filter out the results we don't want\n", "* Once we have all of the headlines, we can again search the link to see if it is in the opinion section. If it is, then both the author's name and the article's name will be in a `h3` tag, and we can select which one we want accordingly.\n", "\n", "Note that for both of these operations, I needed a familiarity with the webpage beyond just want it looks like in my browser. I needed to experiement with what worked and follow the HTML of the site as closely as I could." ], "metadata": { "id": "AIehygtmdKC2" } }, { "cell_type": "code", "source": [ "from bs4 import BeautifulSoup\n", "for pattern in patterns:\n", " hls = wd.find_elements(By.XPATH, pattern)\n", " for hl in hls:\n", " if (len(hl.text) > 0):\n", " soup = BeautifulSoup(hl.get_attribute('outerHTML'))\n", " link = soup.find('a')['href']\n", " if not isinstance(soup.find('h3'), type(None)): ## filtering out any none objects, that is links with no h3 tags\n", " if ('tips' not in link) and ('puzzles' not in link) and ('crossword' not in link) and ('games' not in link) and ('podcasts' not in link) and ('briefing' not in link):\n", " if ('opinion' in link):\n", " headline = soup.find_all('h3')[1].text\n", " print(link, headline)\n", " else:\n", " headline = soup.find('h3').text\n", " print(link, headline)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "sD1t6aeKdFUe", "outputId": "9cea8961-c0a1-46de-9111-0e3443d6aae9" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "https://www.nytimes.com/live/2023/07/27/world/russia-ukraine-news Ukraine Seeks a Breakthrough in the South but Faces Big Obstacles\n", "https://www.nytimes.com/2023/07/27/world/europe/putin-russia-africa-summit.html President Vladimir Putin promised free grain to at least six African countries at a summit, hoping to shore up Russia’s image.\n" ] } ] }, { "cell_type": "code", "source": [ "## now let's move this to a dataframe\n", "import pandas as pd\n", "headline_dict = {}\n", "for pattern in patterns:\n", " hls = wd.find_elements(By.XPATH, pattern)\n", " for hl in hls:\n", " if (len(hl.text) > 0):\n", " soup = BeautifulSoup(hl.get_attribute('outerHTML'))\n", " link = soup.find('a')['href']\n", " if not isinstance(soup.find('h3'), type(None)):\n", " if ('tips' not in link) and ('puzzles' not in link) and ('crossword' not in link) and ('games' not in link) and ('podcasts' not in link) and ('briefing' not in link) and ('theathletic' not in link):\n", " if ('opinion' in link):\n", " headline = soup.find_all('h3')[1].text\n", " headline_dict[headline] = link\n", " else:\n", " headline = soup.find('h3').text\n", " headline_dict[headline] = link\n", "\n", "headline_df = pd.DataFrame.from_dict(headline_dict, orient='index').reset_index().rename(columns={'index':'headline', 0:'link'})" ], "metadata": { "id": "mDtwiufadi2w" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "headline_df" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "3JOCqTsrgyqr", "outputId": "d45ce787-2658-4506-ec47-372e66c957b9" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " headline \\\n", "0 World Cup: Croatia vs. Japan \n", "1 Russia-Ukraine War \n", "2 Supreme Court Gay Rights Case \n", "3 Supreme Court Hears Case Pitting Gay Rights Ag... \n", "4 Blasts Reported at 2 Military Bases Deep Insid... \n", "5 Advice for Europeans: Bundle Up and Get Ready ... \n", "6 War and sanctions are threatening to thrust Ru... \n", "7 DealBook: Energy traders pushed crude prices h... \n", "8 Covid Protests in China Raise Hope for Solidar... \n", "9 China has stemmed the wave of mass protests, b... \n", "10 Chinese expatriates in the U.S. are elated but... \n", "11 Croatia vs. Japan: Former Finalist Takes On th... \n", "12 As the World Focuses on Soccer, a Women’s Team... \n", "13 Soccer may be a global game, but corner kicks ... \n", "14 Last Day of Campaigning Gets Underway in Georg... \n", "15 Senator Raphael Warnock’s time in Harlem as a ... \n", "16 Former President Trump’s call for the “termina... \n", "17 Twin Friends of Eric Adams Are Dogged by Alleg... \n", "18 Few details were made public of New York Mayor... \n", "19 The Crypto Industry Struggles for a Way Forward \n", "20 Iran Has Abolished Morality Police, an Officia... \n", "21 Skyrocketing Prices in Turkey Hurt Families an... \n", "22 The Best Theater of 2022 \n", "23 The Best Comedy of 2022 \n", "24 One Dough, Six Cookies \n", "25 Here’s what we learned from Week 13 in the N.F.L. \n", "26 Giving a rescue animal as a gift is sweet, but... \n", "27 My Mother Has Two Sons: Me and a Squirrel \n", "28 If You Want to Give Something Back to Nature, ... \n", "29 The Big Thing Effective Altruism (Still) Gets ... \n", "30 How Wildlife Rescue Can Heal the Human Heart \n", "31 What Euthanasia Has Done to Canada \n", "32 Read the Well Newsletter \n", "33 Listen to the ‘Book Review’ Podcast \n", "34 Plan Tests Tense Relationship Between N.Y.P.D.... \n", "35 May ‘Bad Spaniels’ Mock Jack Daniel’s? The Sup... \n", "36 Popular Pastor Returns After Absence Over an ‘... \n", "37 North Carolina Power Outages Caused by Gunfire... \n", "38 Brussels Terrorist Attack Trial Opens, Revivin... \n", "39 Mysterious Object Emerges on a Florida Beach, ... \n", "40 Paul Pelosi Makes First Public Appearance Sinc... \n", "41 Bob McGrath, Longtime ‘Sesame Street’ Star, Di... \n", "42 Her Baby Needs Heart Surgery. But She Is Deman... \n", "43 Sudan Military and Pro-Democracy Coalition Sig... \n", "44 Courtroom Drama: New Legal Battle Over ‘To Kil... \n", "45 Review: ‘A Beautiful Noise’ \n", "46 Review: The Met’s Grand Old ‘Aida’ \n", "47 Are Solar Panels a Good Investment? \n", "48 The Many Layers of Lari Pittman \n", "49 The Political Winds Are Blowing. And Blowing. ... \n", "50 Everything Democrats Could Do if Warnock Wins \n", "51 The Supreme Court Is About to Ask the Wrong Qu... \n", "52 Biden Is Putting South Carolina First. I Won’t... \n", "53 The Man Who Neutered Trump \n", "54 Free to Be You and Me. Or Not. \n", "\n", " link \n", "0 https://www.nytimes.com/live/2022/12/05/sports... \n", "1 https://www.nytimes.com/live/2022/12/05/world/... \n", "2 https://www.nytimes.com/live/2022/12/05/us/sup... \n", "3 https://www.nytimes.com/live/2022/12/05/us/sup... \n", "4 https://www.nytimes.com/live/2022/12/05/world/... \n", "5 https://www.nytimes.com/2022/12/05/business/eu... \n", "6 https://www.nytimes.com/2022/12/05/world/europ... \n", "7 https://www.nytimes.com/2022/12/05/business/de... \n", "8 https://www.nytimes.com/2022/12/05/world/asia/... \n", "9 https://www.nytimes.com/2022/12/05/world/asia/... \n", "10 https://www.nytimes.com/2022/12/05/nyregion/ne... \n", "11 https://www.nytimes.com/live/2022/12/05/sports... \n", "12 https://www.nytimes.com/2022/12/03/sports/socc... \n", "13 https://www.nytimes.com/interactive/2022/12/05... \n", "14 https://www.nytimes.com/live/2022/12/04/us/war... \n", "15 https://www.nytimes.com/2022/12/04/us/politics... \n", "16 https://www.nytimes.com/2022/12/04/us/politics... \n", "17 https://www.nytimes.com/2022/12/05/nyregion/er... \n", "18 https://www.nytimes.com/2022/12/04/nyregion/er... \n", "19 https://www.nytimes.com/2022/12/05/technology/... \n", "20 https://www.nytimes.com/2022/12/04/world/middl... \n", "21 https://www.nytimes.com/2022/12/05/world/europ... \n", "22 https://www.nytimes.com/2022/12/05/theater/bes... \n", "23 https://www.nytimes.com/2022/12/05/arts/best-c... \n", "24 https://www.nytimes.com/interactive/2022/12/04... \n", "25 https://www.nytimes.com/2022/12/04/sports/foot... \n", "26 https://www.nytimes.com/2022/12/05/style/rescu... \n", "27 https://www.nytimes.com/2022/12/05/opinion/my-... \n", "28 https://www.nytimes.com/interactive/2022/12/05... \n", "29 https://www.nytimes.com/2022/12/04/opinion/cha... \n", "30 https://www.nytimes.com/2022/12/05/opinion/wil... \n", "31 https://www.nytimes.com/2022/12/03/opinion/can... \n", "32 https://www.nytimes.com/2022/12/01/well/holida... \n", "33 https://www.nytimes.com/2022/12/02/books/revie... \n", "34 https://www.nytimes.com/2022/12/05/nyregion/me... \n", "35 https://www.nytimes.com/2022/12/05/us/politics... \n", "36 https://www.nytimes.com/2022/12/04/us/matt-cha... \n", "37 https://www.nytimes.com/2022/12/04/us/power-ou... \n", "38 https://www.nytimes.com/2022/12/05/world/europ... \n", "39 https://www.nytimes.com/2022/12/05/us/mysterio... \n", "40 https://www.nytimes.com/2022/12/05/arts/music/... \n", "41 https://www.nytimes.com/2022/12/04/arts/televi... \n", "42 https://www.nytimes.com/2022/12/05/world/austr... \n", "43 https://www.nytimes.com/2022/12/05/world/afric... \n", "44 https://www.nytimes.com/2022/12/02/theater/to-... \n", "45 https://www.nytimes.com/2022/12/04/theater/a-b... \n", "46 https://www.nytimes.com/2022/12/04/arts/music/... \n", "47 https://www.nytimes.com/2022/11/26/realestate/... \n", "48 https://www.nytimes.com/2022/11/30/arts/design... \n", "49 https://www.nytimes.com/2022/12/05/opinion/bid... \n", "50 https://www.nytimes.com/2022/12/05/opinion/war... \n", "51 https://www.nytimes.com/2022/12/05/opinion/303... \n", "52 https://www.nytimes.com/2022/12/05/opinion/iow... \n", "53 https://www.nytimes.com/2022/12/04/opinion/bri... \n", "54 https://www.nytimes.com/2022/12/04/opinion/fre... " ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
headlinelink
0World Cup: Croatia vs. Japanhttps://www.nytimes.com/live/2022/12/05/sports...
1Russia-Ukraine Warhttps://www.nytimes.com/live/2022/12/05/world/...
2Supreme Court Gay Rights Casehttps://www.nytimes.com/live/2022/12/05/us/sup...
3Supreme Court Hears Case Pitting Gay Rights Ag...https://www.nytimes.com/live/2022/12/05/us/sup...
4Blasts Reported at 2 Military Bases Deep Insid...https://www.nytimes.com/live/2022/12/05/world/...
5Advice for Europeans: Bundle Up and Get Ready ...https://www.nytimes.com/2022/12/05/business/eu...
6War and sanctions are threatening to thrust Ru...https://www.nytimes.com/2022/12/05/world/europ...
7DealBook: Energy traders pushed crude prices h...https://www.nytimes.com/2022/12/05/business/de...
8Covid Protests in China Raise Hope for Solidar...https://www.nytimes.com/2022/12/05/world/asia/...
9China has stemmed the wave of mass protests, b...https://www.nytimes.com/2022/12/05/world/asia/...
10Chinese expatriates in the U.S. are elated but...https://www.nytimes.com/2022/12/05/nyregion/ne...
11Croatia vs. Japan: Former Finalist Takes On th...https://www.nytimes.com/live/2022/12/05/sports...
12As the World Focuses on Soccer, a Women’s Team...https://www.nytimes.com/2022/12/03/sports/socc...
13Soccer may be a global game, but corner kicks ...https://www.nytimes.com/interactive/2022/12/05...
14Last Day of Campaigning Gets Underway in Georg...https://www.nytimes.com/live/2022/12/04/us/war...
15Senator Raphael Warnock’s time in Harlem as a ...https://www.nytimes.com/2022/12/04/us/politics...
16Former President Trump’s call for the “termina...https://www.nytimes.com/2022/12/04/us/politics...
17Twin Friends of Eric Adams Are Dogged by Alleg...https://www.nytimes.com/2022/12/05/nyregion/er...
18Few details were made public of New York Mayor...https://www.nytimes.com/2022/12/04/nyregion/er...
19The Crypto Industry Struggles for a Way Forwardhttps://www.nytimes.com/2022/12/05/technology/...
20Iran Has Abolished Morality Police, an Officia...https://www.nytimes.com/2022/12/04/world/middl...
21Skyrocketing Prices in Turkey Hurt Families an...https://www.nytimes.com/2022/12/05/world/europ...
22The Best Theater of 2022https://www.nytimes.com/2022/12/05/theater/bes...
23The Best Comedy of 2022https://www.nytimes.com/2022/12/05/arts/best-c...
24One Dough, Six Cookieshttps://www.nytimes.com/interactive/2022/12/04...
25Here’s what we learned from Week 13 in the N.F.L.https://www.nytimes.com/2022/12/04/sports/foot...
26Giving a rescue animal as a gift is sweet, but...https://www.nytimes.com/2022/12/05/style/rescu...
27My Mother Has Two Sons: Me and a Squirrelhttps://www.nytimes.com/2022/12/05/opinion/my-...
28If You Want to Give Something Back to Nature, ...https://www.nytimes.com/interactive/2022/12/05...
29The Big Thing Effective Altruism (Still) Gets ...https://www.nytimes.com/2022/12/04/opinion/cha...
30How Wildlife Rescue Can Heal the Human Hearthttps://www.nytimes.com/2022/12/05/opinion/wil...
31What Euthanasia Has Done to Canadahttps://www.nytimes.com/2022/12/03/opinion/can...
32Read the Well Newsletterhttps://www.nytimes.com/2022/12/01/well/holida...
33Listen to the ‘Book Review’ Podcasthttps://www.nytimes.com/2022/12/02/books/revie...
34Plan Tests Tense Relationship Between N.Y.P.D....https://www.nytimes.com/2022/12/05/nyregion/me...
35May ‘Bad Spaniels’ Mock Jack Daniel’s? The Sup...https://www.nytimes.com/2022/12/05/us/politics...
36Popular Pastor Returns After Absence Over an ‘...https://www.nytimes.com/2022/12/04/us/matt-cha...
37North Carolina Power Outages Caused by Gunfire...https://www.nytimes.com/2022/12/04/us/power-ou...
38Brussels Terrorist Attack Trial Opens, Revivin...https://www.nytimes.com/2022/12/05/world/europ...
39Mysterious Object Emerges on a Florida Beach, ...https://www.nytimes.com/2022/12/05/us/mysterio...
40Paul Pelosi Makes First Public Appearance Sinc...https://www.nytimes.com/2022/12/05/arts/music/...
41Bob McGrath, Longtime ‘Sesame Street’ Star, Di...https://www.nytimes.com/2022/12/04/arts/televi...
42Her Baby Needs Heart Surgery. But She Is Deman...https://www.nytimes.com/2022/12/05/world/austr...
43Sudan Military and Pro-Democracy Coalition Sig...https://www.nytimes.com/2022/12/05/world/afric...
44Courtroom Drama: New Legal Battle Over ‘To Kil...https://www.nytimes.com/2022/12/02/theater/to-...
45Review: ‘A Beautiful Noise’https://www.nytimes.com/2022/12/04/theater/a-b...
46Review: The Met’s Grand Old ‘Aida’https://www.nytimes.com/2022/12/04/arts/music/...
47Are Solar Panels a Good Investment?https://www.nytimes.com/2022/11/26/realestate/...
48The Many Layers of Lari Pittmanhttps://www.nytimes.com/2022/11/30/arts/design...
49The Political Winds Are Blowing. And Blowing. ...https://www.nytimes.com/2022/12/05/opinion/bid...
50Everything Democrats Could Do if Warnock Winshttps://www.nytimes.com/2022/12/05/opinion/war...
51The Supreme Court Is About to Ask the Wrong Qu...https://www.nytimes.com/2022/12/05/opinion/303...
52Biden Is Putting South Carolina First. I Won’t...https://www.nytimes.com/2022/12/05/opinion/iow...
53The Man Who Neutered Trumphttps://www.nytimes.com/2022/12/04/opinion/bri...
54Free to Be You and Me. Or Not.https://www.nytimes.com/2022/12/04/opinion/fre...
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 35 } ] }, { "cell_type": "markdown", "source": [ "Now we have a really clean `DataFrame` that we can use to then get the text from each of these articles." ], "metadata": { "id": "JLdwWDyKyxp8" } }, { "cell_type": "code", "source": [ "# let's use our new Selenium skills to pull the dynamically generated text from the article\n", "a_link = headline_df['link'].iloc[9]\n", "\n", "# here, I am making a new webdriver object, we don't need to but it can be helpful if you have multiple\n", "article_wd = webdriver.Chrome('chromedriver',options=chrome_options)\n", "article_wd.get(a_link)" ], "metadata": { "id": "gwABBzn20B6Q" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# the paragraphs are predictably held in p tags\n", "for p in article_wd.find_elements(By.TAG_NAME, 'p'):\n", " print(BeautifulSoup(p.get_attribute('outerHTML')).text)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "r8fzxaz40LJa", "outputId": "2697379a-3ab6-46a7-a3c3-cfa6a83324c7" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Advertisement\n", "Supported by\n", "Students, residents, lawyers and workers are still challenging the country’s Covid-19 restrictions, even though the intensity of the political chants has been dialed back.\n", "Send any friend a story\n", "As a subscriber, you have 10 gift articles to give each month. Anyone can read what you share.\n", "By Chang Che, Chris Buckley, Amy Chang Chien and Joy Dong\n", "In central China, students chanted demands for more transparency about Covid rules, while avoiding the bold slogans that riled the Communist Party a week earlier. In Shanghai, residents successfully negotiated with the local authorities to stop a lockdown of their neighborhood. And despite pressure from officials, a team of volunteer lawyers across China, committed to defending the right of citizens to voice their views, fielded anxious calls from protesters.\n", "The recent wave of demonstrations that washed over China was prompted by frustration about pandemic restrictions, but the unrest also sometimes resulted in calls for China’s leader, Xi Jinping, to resign. Since then, the police have been out in force to prevent a resurgence, and the mass protests have subsided. In the aftermath, a low-key hum of resistance against the authorities has persisted, suggesting that the big rallies emboldened a small but significant number of people, including students, professionals and blue-collar workers.\n", "None of those local acts amount to a major challenge to Mr. Xi and the Communist Party. But they suggest that residents are less afraid of challenging officialdom, albeit in more measured, tactical ways. They often invoke China’s own laws and policy pledges, an approach that is less likely to draw the wrath of Communist Party leaders.\n", "“There are people yelling out demands that are also my own, and I’m extremely grateful — grateful that they were able to speak out for me,” said Wang Shengsheng, a lawyer in Zhengzhou, central China. Ms. Wang helped compile a list of more than a dozen lawyers available to give free advice by phone to people in Shanghai and elsewhere worried about repercussions from taking part in vigils and protests.\n", "Advertisement\n", "“I’m sure that the number of people who expressed themselves this time, especially the youth, will later shape some policy changes,” she said. “I’m sure that the decision makers are not a monolithic lump of iron.”\n", "In late November, dozens of protests broke out across China, ignited by fury over a deadly fire in Urumqi, capital of the Xinjiang region in the west. The result was the boldest and most widespread demonstrations in China since the pro-democracy movement of 1989.\n", "The Urumqi government had firmly denied widespread rumors that the residents killed in the fire — 10 by the official count — had been trapped in their apartments by Covid restrictions. But many Chinese were unconvinced, and grief turned into wider anger at pervasive lockdowns, virus testing and limits on travel. At demonstrations in Shanghai, Beijing and other cities, some protesters called for Mr. Xi and the Communist Party to give up power.\n", "Since then, the Chinese government has taken a two-pronged approach: detaining some protesters and warning would-be protesters, and letting local governments abandon some of the Covid rules that have frustrated the population. Mr. Xi has not spoken publicly about the protests, and it is unclear how far the displays of dissent played into his decision to adjust policy. But plenty of Chinese people seem to believe that the nationwide defiance played a big role. They may now try to keep up pressure in smaller ways.\n", "Advertisement\n", "“I think what’s going to happen is people will coordinate, it will be low-level, it will look individualized and spontaneous, but there will be learning and discussion behind the scenes,” said Mary Gallagher, a professor at the University of Michigan who studies politics and social change in China.\n", "“That’s what you need to do in a politically repressive environment,” she said. “It’s really going to put pressure on the local governments not to lock down.”\n", "Despite China’s hulking authoritarian government, local protests are not uncommon. Before Covid, they often focused on government land seizures, pollution outbreaks and unpaid wages. Since the pandemic, outbursts of discontent have continued. But this renewed pattern of local unrest will test Mr. Xi’s government at a particularly delicate time as China seeks to ease Covid restrictions while trying to avoid an uncontrolled surge of infections.\n", " \n", "Hundreds of students at Wuhan University, in the city where the pandemic first took hold starting in late 2019, rallied on a recent rainy evening to call for changes to Covid policies, according to a video that has been verified by The New York Times. “An open process, transparent information,” they chanted while holding umbrellas over their heads.\n", "That relatively mild slogan appeared to be a considered move. A student at the university said that classmates were unhappy about the university’s plans to restore in-person teaching, which had upset their plans to go home for a break after months of living under restrictions. The student, who asked to be identified only by his surname, Wu, fearing repercussions, said that he had not attended the rally but had seen videos shared by classmates. He noted that none of the protesters had held pieces of white paper, which have become a symbol of defiance of the government.\n", "The school relented, allowing students to return home and choose between online and in-person classes, Mr. Wu said.\n", "Advertisement\n", "While some cities in China have begun to ease lockdown restrictions, not all local officials have followed suit. They remain under heavy pressure to contain outbreaks, even as more senior leaders want to appear sympathetic to public impatience.\n", "In a wealthy district of Shanghai on Sunday afternoon, security teams blocked the entry to an apartment complex after a local committee ordered a lockdown upon discovering a Covid case in one building.\n", "Angry residents soon confronted the guards, challenging the closure as unlawful. “You don’t have the right!” one woman is seen yelling repeatedly in a video posted on Twitter. Hours later, the police arrived and backed the residents. A neighborhood committee worker for the apartment complex told The New York Times that the lockdowns were lifted “after engagement and coordination.”\n", "In Wuhan over the weekend, residents in one neighborhood took matters into their own hands, pouring into the street after breaking down barriers that had held them in lockdown, as seen in a video posted on Twitter.\n", " \n", "With so much risk from taking part in protests, Chinese residents are using an older tactic: citing the central leaders’ words to push back against local officials. For centuries, disgruntled people have seized on central government edicts to make their case, often appealing to the idea — sincerely held or as a tactic — that a well-intentioned ruler in Beijing has been misled by corrupt or disloyal functionaries.\n", "“It’s this idea that you can use the central government’s words against local overreach,” Professor Gallagher said. “And it protects you, because the central government is supposed to be benevolent.”\n", "Advertisement\n", "Chinese are invoking the law to negotiate and push back against persisting pandemic restrictions. In areas that have failed to ease lockdowns, residents have pointed to the government’s move in early November to push for local authorities to take a more targeted approach in controlling Covid.\n", "The local confrontations in Shanghai and Wuhan point to the impatience of residents under lockdown who are more worried about paying mortgages, reviving battered businesses and getting children back to regular school.\n", "“We want to lift the lockdown, our kids need to go to school,” residents of an apartment complex in Wuxi, eastern China, shouted as they resisted a lockdown of their complex, a video posted on Twitter showed. “We need to make money to feed our families. We want to eat.”\n", " \n", "Members of the legal community have also stepped up to help raise residents’ awareness of their rights. As the authorities mobilized to detain protesters and search residents’ phones in recent days, often without clear justification, legal advice has circulated on the Chinese internet. One such article outlined citizens’ rights in the event that a police officer demands to search their phones.\n", "In that article, the author, who belongs to a Shanghai law firm, invokes the Chinese Constitution and concludes: “Arbitrary content checks of citizens’ cellphones are a serious infringement of citizens’ privacy and an abuse of public power.”\n", "Some of those who have been speaking out, however, continue to face greater pressures. Ms. Wang, 37, the lawyer who helped coordinate advice for worried protesters and their friends and families, said that she had received phone calls from local officials.\n", "Advertisement\n", "She said that she had decided to help the protesters and their families after seeing images circulate on Chinese social media of the vigil in Shanghai commemorating those killed in Urumqi. She had taken a couple of dozen calls, she said, including from people who had been detained and questioned and who wanted to know their rights.\n", "The Chinese authorities have over the past decade tried to silence rights attorneys by revoking their law licenses or by detaining and imprisoning them. But Ms. Wang said that she felt no reason to worry.\n", "“To my mind, I’m just providing a little bit of legal advice services” to people who took part in protests, she said.\n", "“How is it that if some people believe that they were in the wrong, then I’m also in the wrong simply by providing them legal advice?” she added. “That’s fundamentally against the idea of rule of law.”\n", "Advertisement\n", " Support independent journalism. \n" ] } ] }, { "cell_type": "markdown", "source": [ "Again, there's some text here that we don't want, mostly at the beginning and a couple throoughout." ], "metadata": { "id": "F3z1cwL-36LG" } }, { "cell_type": "code", "source": [ "for p in article_wd.find_elements(By.TAG_NAME, 'p'):\n", " print(p.get_attribute('class'))" ], "metadata": { "id": "iwD7QWXF1_tC", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "0e97b0d8-6923-4138-fcfa-cc565c8ebe4f" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "\n", "\n", "css-1n0orw4 e1wiw3jv0\n", "css-6yj280\n", "css-4m7ryc\n", "css-4anu6l e1jsehar1\n", "css-at9mc1 evys1bk0\n", "css-at9mc1 evys1bk0\n", "css-at9mc1 evys1bk0\n", "css-at9mc1 evys1bk0\n", "\n", "css-at9mc1 evys1bk0\n", "css-at9mc1 evys1bk0\n", "css-at9mc1 evys1bk0\n", "css-at9mc1 evys1bk0\n", "\n", "css-at9mc1 evys1bk0\n", "css-at9mc1 evys1bk0\n", "css-at9mc1 evys1bk0\n", "vhs-data\n", "css-at9mc1 evys1bk0\n", "css-at9mc1 evys1bk0\n", "css-at9mc1 evys1bk0\n", "\n", "css-at9mc1 evys1bk0\n", "css-at9mc1 evys1bk0\n", "css-at9mc1 evys1bk0\n", "css-at9mc1 evys1bk0\n", "vhs-data\n", "css-at9mc1 evys1bk0\n", "css-at9mc1 evys1bk0\n", "\n", "css-at9mc1 evys1bk0\n", "css-at9mc1 evys1bk0\n", "css-at9mc1 evys1bk0\n", "vhs-data\n", "css-at9mc1 evys1bk0\n", "css-at9mc1 evys1bk0\n", "css-at9mc1 evys1bk0\n", "\n", "css-at9mc1 evys1bk0\n", "css-at9mc1 evys1bk0\n", "css-at9mc1 evys1bk0\n", "css-at9mc1 evys1bk0\n", "\n", "css-i0n974\n" ] } ] }, { "cell_type": "markdown", "source": [ "It seems that the css class `css-at9mc1 evys1bk0` will select all of the text paragraphs, while leaving out the other information we don't want." ], "metadata": { "id": "6bws9For4eTL" } }, { "cell_type": "code", "source": [ "paragraphs = []\n", "for p in article_wd.find_elements(By.TAG_NAME, 'p'):\n", " if p.get_attribute('class') == 'css-at9mc1 evys1bk0':\n", " paragraphs.append(BeautifulSoup(p.get_attribute('outerHTML')).text)\n", "paragraphs" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "RRpio68U4Xjz", "outputId": "f686b931-c801-4dd6-f2a3-7e03f0426b4c" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "['In central China, students chanted demands for more transparency about Covid rules, while avoiding the bold slogans that riled the Communist Party a week earlier. In Shanghai, residents successfully negotiated with the local authorities to stop a lockdown of their neighborhood. And despite pressure from officials, a team of volunteer lawyers across China, committed to defending the right of citizens to voice their views, fielded anxious calls from protesters.',\n", " 'The recent wave of demonstrations that washed over China was prompted by frustration about pandemic restrictions, but the unrest also sometimes resulted in calls for China’s leader, Xi Jinping, to resign. Since then, the police have been out in force to prevent a resurgence, and the mass protests have subsided. In the aftermath, a low-key hum of resistance against the authorities has persisted, suggesting that the big rallies emboldened a small but significant number of people, including students, professionals and blue-collar workers.',\n", " 'None of those local acts amount to a major challenge to Mr. Xi and the Communist Party. But they suggest that residents are less afraid of challenging officialdom, albeit in more measured, tactical ways. They often invoke China’s own laws and policy pledges, an approach that is less likely to draw the wrath of Communist Party leaders.',\n", " '“There are people yelling out demands that are also my own, and I’m extremely grateful — grateful that they were able to speak out for me,” said Wang Shengsheng, a lawyer in Zhengzhou, central China. Ms. Wang helped compile a list of more than a dozen lawyers available to give free advice by phone to people in Shanghai and elsewhere worried about repercussions from taking part in vigils and protests.',\n", " '“I’m sure that the number of people who expressed themselves this time, especially the youth, will later shape some policy changes,” she said. “I’m sure that the decision makers are not a monolithic lump of iron.”',\n", " 'In late November, dozens of protests broke out across China, ignited by fury over a deadly fire in Urumqi, capital of the Xinjiang region in the west. The result was the boldest and most widespread demonstrations in China since the pro-democracy movement of 1989.',\n", " 'The Urumqi government had firmly denied widespread rumors that the residents killed in the fire — 10 by the official count — had been trapped in their apartments by Covid restrictions. But many Chinese were unconvinced, and grief turned into wider anger at pervasive lockdowns, virus testing and limits on travel. At demonstrations in Shanghai, Beijing and other cities, some protesters called for Mr. Xi and the Communist Party to give up power.',\n", " 'Since then, the Chinese government has taken a two-pronged approach: detaining some protesters and warning would-be protesters, and letting local governments abandon some of the Covid rules that have frustrated the population. Mr. Xi has not spoken publicly about the protests, and it is unclear how far the displays of dissent played into his decision to adjust policy. But plenty of Chinese people seem to believe that the nationwide defiance played a big role. They may now try to keep up pressure in smaller ways.',\n", " '“I think what’s going to happen is people will coordinate, it will be low-level, it will look individualized and spontaneous, but there will be learning and discussion behind the scenes,” said Mary Gallagher, a professor at the University of Michigan who studies politics and social change in China.',\n", " '“That’s what you need to do in a politically repressive environment,” she said. “It’s really going to put pressure on the local governments not to lock down.”',\n", " 'Despite China’s hulking authoritarian government, local protests are not uncommon. Before Covid, they often focused on government land seizures, pollution outbreaks and unpaid wages. Since the pandemic, outbursts of discontent have continued. But this renewed pattern of local unrest will test Mr. Xi’s government at a particularly delicate time as China seeks to ease Covid restrictions while trying to avoid an uncontrolled surge of infections.',\n", " 'Hundreds of students at Wuhan University, in the city where the pandemic first took hold starting in late 2019, rallied on a recent rainy evening to call for changes to Covid policies, according to a video that has been verified by The New York Times. “An open process, transparent information,” they chanted while holding umbrellas over their heads.',\n", " 'That relatively mild slogan appeared to be a considered move. A student at the university said that classmates were unhappy about the university’s plans to restore in-person teaching, which had upset their plans to go home for a break after months of living under restrictions. The student, who asked to be identified only by his surname, Wu, fearing repercussions, said that he had not attended the rally but had seen videos shared by classmates. He noted that none of the protesters had held pieces of white paper, which have become a symbol of defiance of the government.',\n", " 'The school relented, allowing students to return home and choose between online and in-person classes, Mr. Wu said.',\n", " 'While some cities in China have begun to ease lockdown restrictions, not all local officials have followed suit. They remain under heavy pressure to contain outbreaks, even as more senior leaders want to appear sympathetic to public impatience.',\n", " 'In a wealthy district of Shanghai on Sunday afternoon, security teams blocked the entry to an apartment complex after a local committee ordered a lockdown upon discovering a Covid case in one building.',\n", " 'Angry residents soon confronted the guards, challenging the closure as unlawful. “You don’t have the right!” one woman is seen yelling repeatedly in a video posted on Twitter. Hours later, the police arrived and backed the residents. A neighborhood committee worker for the apartment complex told The New York Times that the lockdowns were lifted “after engagement and coordination.”',\n", " 'In Wuhan over the weekend, residents in one neighborhood took matters into their own hands, pouring into the street after breaking down barriers that had held them in lockdown, as seen in a video posted on Twitter.',\n", " 'With so much risk from taking part in protests, Chinese residents are using an older tactic: citing the central leaders’ words to push back against local officials. For centuries, disgruntled people have seized on central government edicts to make their case, often appealing to the idea — sincerely held or as a tactic — that a well-intentioned ruler in Beijing has been misled by corrupt or disloyal functionaries.',\n", " '“It’s this idea that you can use the central government’s words against local overreach,” Professor Gallagher said. “And it protects you, because the central government is supposed to be benevolent.”',\n", " 'Chinese are invoking the law to negotiate and push back against persisting pandemic restrictions. In areas that have failed to ease lockdowns, residents have pointed to the government’s move in early November to push for local authorities to take a more targeted approach in controlling Covid.',\n", " 'The local confrontations in Shanghai and Wuhan point to the impatience of residents under lockdown who are more worried about paying mortgages, reviving battered businesses and getting children back to regular school.',\n", " '“We want to lift the lockdown, our kids need to go to school,” residents of an apartment complex in Wuxi, eastern China, shouted as they resisted a lockdown of their complex, a video posted on Twitter showed. “We need to make money to feed our families. We want to eat.”',\n", " 'Members of the legal community have also stepped up to help raise residents’ awareness of their rights. As the authorities mobilized to detain protesters and search residents’ phones in recent days, often without clear justification, legal advice has circulated on the Chinese internet. One such article outlined citizens’ rights in the event that a police officer demands to search their phones.',\n", " 'In that article, the author, who belongs to a Shanghai law firm, invokes the Chinese Constitution and concludes: “Arbitrary content checks of citizens’ cellphones are a serious infringement of citizens’ privacy and an abuse of public power.”',\n", " 'Some of those who have been speaking out, however, continue to face greater pressures. Ms. Wang, 37, the lawyer who helped coordinate advice for worried protesters and their friends and families, said that she had received phone calls from local officials.',\n", " 'She said that she had decided to help the protesters and their families after seeing images circulate on Chinese social media of the vigil in Shanghai commemorating those killed in Urumqi. She had taken a couple of dozen calls, she said, including from people who had been detained and questioned and who wanted to know their rights.',\n", " 'The Chinese authorities have over the past decade tried to silence rights attorneys by revoking their law licenses or by detaining and imprisoning them. But Ms. Wang said that she felt no reason to worry.',\n", " '“To my mind, I’m just providing a little bit of legal advice services” to people who took part in protests, she said.',\n", " '“How is it that if some people believe that they were in the wrong, then I’m also in the wrong simply by providing them legal advice?” she added. “That’s fundamentally against the idea of rule of law.”']" ] }, "metadata": {}, "execution_count": 39 } ] }, { "cell_type": "code", "source": [ "headline_df" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "OWl1SGBn5hcb", "outputId": "5180d332-1d46-471b-9bfe-6b4283927a9a" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " headline \\\n", "0 World Cup: Croatia vs. Japan \n", "1 Russia-Ukraine War \n", "2 Supreme Court Gay Rights Case \n", "3 Supreme Court Hears Case Pitting Gay Rights Ag... \n", "4 Blasts Reported at 2 Military Bases Deep Insid... \n", "5 Advice for Europeans: Bundle Up and Get Ready ... \n", "6 War and sanctions are threatening to thrust Ru... \n", "7 DealBook: Energy traders pushed crude prices h... \n", "8 Covid Protests in China Raise Hope for Solidar... \n", "9 China has stemmed the wave of mass protests, b... \n", "10 Chinese expatriates in the U.S. are elated but... \n", "11 Croatia vs. Japan: Former Finalist Takes On th... \n", "12 As the World Focuses on Soccer, a Women’s Team... \n", "13 Soccer may be a global game, but corner kicks ... \n", "14 Last Day of Campaigning Gets Underway in Georg... \n", "15 Senator Raphael Warnock’s time in Harlem as a ... \n", "16 Former President Trump’s call for the “termina... \n", "17 Twin Friends of Eric Adams Are Dogged by Alleg... \n", "18 Few details were made public of New York Mayor... \n", "19 The Crypto Industry Struggles for a Way Forward \n", "20 Iran Has Abolished Morality Police, an Officia... \n", "21 Skyrocketing Prices in Turkey Hurt Families an... \n", "22 The Best Theater of 2022 \n", "23 The Best Comedy of 2022 \n", "24 One Dough, Six Cookies \n", "25 Here’s what we learned from Week 13 in the N.F.L. \n", "26 Giving a rescue animal as a gift is sweet, but... \n", "27 My Mother Has Two Sons: Me and a Squirrel \n", "28 If You Want to Give Something Back to Nature, ... \n", "29 The Big Thing Effective Altruism (Still) Gets ... \n", "30 How Wildlife Rescue Can Heal the Human Heart \n", "31 What Euthanasia Has Done to Canada \n", "32 Read the Well Newsletter \n", "33 Listen to the ‘Book Review’ Podcast \n", "34 Plan Tests Tense Relationship Between N.Y.P.D.... \n", "35 May ‘Bad Spaniels’ Mock Jack Daniel’s? The Sup... \n", "36 Popular Pastor Returns After Absence Over an ‘... \n", "37 North Carolina Power Outages Caused by Gunfire... \n", "38 Brussels Terrorist Attack Trial Opens, Revivin... \n", "39 Mysterious Object Emerges on a Florida Beach, ... \n", "40 Paul Pelosi Makes First Public Appearance Sinc... \n", "41 Bob McGrath, Longtime ‘Sesame Street’ Star, Di... \n", "42 Her Baby Needs Heart Surgery. But She Is Deman... \n", "43 Sudan Military and Pro-Democracy Coalition Sig... \n", "44 Courtroom Drama: New Legal Battle Over ‘To Kil... \n", "45 Review: ‘A Beautiful Noise’ \n", "46 Review: The Met’s Grand Old ‘Aida’ \n", "47 Are Solar Panels a Good Investment? \n", "48 The Many Layers of Lari Pittman \n", "49 The Political Winds Are Blowing. And Blowing. ... \n", "50 Everything Democrats Could Do if Warnock Wins \n", "51 The Supreme Court Is About to Ask the Wrong Qu... \n", "52 Biden Is Putting South Carolina First. I Won’t... \n", "53 The Man Who Neutered Trump \n", "54 Free to Be You and Me. Or Not. \n", "\n", " link \n", "0 https://www.nytimes.com/live/2022/12/05/sports... \n", "1 https://www.nytimes.com/live/2022/12/05/world/... \n", "2 https://www.nytimes.com/live/2022/12/05/us/sup... \n", "3 https://www.nytimes.com/live/2022/12/05/us/sup... \n", "4 https://www.nytimes.com/live/2022/12/05/world/... \n", "5 https://www.nytimes.com/2022/12/05/business/eu... \n", "6 https://www.nytimes.com/2022/12/05/world/europ... \n", "7 https://www.nytimes.com/2022/12/05/business/de... \n", "8 https://www.nytimes.com/2022/12/05/world/asia/... \n", "9 https://www.nytimes.com/2022/12/05/world/asia/... \n", "10 https://www.nytimes.com/2022/12/05/nyregion/ne... \n", "11 https://www.nytimes.com/live/2022/12/05/sports... \n", "12 https://www.nytimes.com/2022/12/03/sports/socc... \n", "13 https://www.nytimes.com/interactive/2022/12/05... \n", "14 https://www.nytimes.com/live/2022/12/04/us/war... \n", "15 https://www.nytimes.com/2022/12/04/us/politics... \n", "16 https://www.nytimes.com/2022/12/04/us/politics... \n", "17 https://www.nytimes.com/2022/12/05/nyregion/er... \n", "18 https://www.nytimes.com/2022/12/04/nyregion/er... \n", "19 https://www.nytimes.com/2022/12/05/technology/... \n", "20 https://www.nytimes.com/2022/12/04/world/middl... \n", "21 https://www.nytimes.com/2022/12/05/world/europ... \n", "22 https://www.nytimes.com/2022/12/05/theater/bes... \n", "23 https://www.nytimes.com/2022/12/05/arts/best-c... \n", "24 https://www.nytimes.com/interactive/2022/12/04... \n", "25 https://www.nytimes.com/2022/12/04/sports/foot... \n", "26 https://www.nytimes.com/2022/12/05/style/rescu... \n", "27 https://www.nytimes.com/2022/12/05/opinion/my-... \n", "28 https://www.nytimes.com/interactive/2022/12/05... \n", "29 https://www.nytimes.com/2022/12/04/opinion/cha... \n", "30 https://www.nytimes.com/2022/12/05/opinion/wil... \n", "31 https://www.nytimes.com/2022/12/03/opinion/can... \n", "32 https://www.nytimes.com/2022/12/01/well/holida... \n", "33 https://www.nytimes.com/2022/12/02/books/revie... \n", "34 https://www.nytimes.com/2022/12/05/nyregion/me... \n", "35 https://www.nytimes.com/2022/12/05/us/politics... \n", "36 https://www.nytimes.com/2022/12/04/us/matt-cha... \n", "37 https://www.nytimes.com/2022/12/04/us/power-ou... \n", "38 https://www.nytimes.com/2022/12/05/world/europ... \n", "39 https://www.nytimes.com/2022/12/05/us/mysterio... \n", "40 https://www.nytimes.com/2022/12/05/arts/music/... \n", "41 https://www.nytimes.com/2022/12/04/arts/televi... \n", "42 https://www.nytimes.com/2022/12/05/world/austr... \n", "43 https://www.nytimes.com/2022/12/05/world/afric... \n", "44 https://www.nytimes.com/2022/12/02/theater/to-... \n", "45 https://www.nytimes.com/2022/12/04/theater/a-b... \n", "46 https://www.nytimes.com/2022/12/04/arts/music/... \n", "47 https://www.nytimes.com/2022/11/26/realestate/... \n", "48 https://www.nytimes.com/2022/11/30/arts/design... \n", "49 https://www.nytimes.com/2022/12/05/opinion/bid... \n", "50 https://www.nytimes.com/2022/12/05/opinion/war... \n", "51 https://www.nytimes.com/2022/12/05/opinion/303... \n", "52 https://www.nytimes.com/2022/12/05/opinion/iow... \n", "53 https://www.nytimes.com/2022/12/04/opinion/bri... \n", "54 https://www.nytimes.com/2022/12/04/opinion/fre... " ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
headlinelink
0World Cup: Croatia vs. Japanhttps://www.nytimes.com/live/2022/12/05/sports...
1Russia-Ukraine Warhttps://www.nytimes.com/live/2022/12/05/world/...
2Supreme Court Gay Rights Casehttps://www.nytimes.com/live/2022/12/05/us/sup...
3Supreme Court Hears Case Pitting Gay Rights Ag...https://www.nytimes.com/live/2022/12/05/us/sup...
4Blasts Reported at 2 Military Bases Deep Insid...https://www.nytimes.com/live/2022/12/05/world/...
5Advice for Europeans: Bundle Up and Get Ready ...https://www.nytimes.com/2022/12/05/business/eu...
6War and sanctions are threatening to thrust Ru...https://www.nytimes.com/2022/12/05/world/europ...
7DealBook: Energy traders pushed crude prices h...https://www.nytimes.com/2022/12/05/business/de...
8Covid Protests in China Raise Hope for Solidar...https://www.nytimes.com/2022/12/05/world/asia/...
9China has stemmed the wave of mass protests, b...https://www.nytimes.com/2022/12/05/world/asia/...
10Chinese expatriates in the U.S. are elated but...https://www.nytimes.com/2022/12/05/nyregion/ne...
11Croatia vs. Japan: Former Finalist Takes On th...https://www.nytimes.com/live/2022/12/05/sports...
12As the World Focuses on Soccer, a Women’s Team...https://www.nytimes.com/2022/12/03/sports/socc...
13Soccer may be a global game, but corner kicks ...https://www.nytimes.com/interactive/2022/12/05...
14Last Day of Campaigning Gets Underway in Georg...https://www.nytimes.com/live/2022/12/04/us/war...
15Senator Raphael Warnock’s time in Harlem as a ...https://www.nytimes.com/2022/12/04/us/politics...
16Former President Trump’s call for the “termina...https://www.nytimes.com/2022/12/04/us/politics...
17Twin Friends of Eric Adams Are Dogged by Alleg...https://www.nytimes.com/2022/12/05/nyregion/er...
18Few details were made public of New York Mayor...https://www.nytimes.com/2022/12/04/nyregion/er...
19The Crypto Industry Struggles for a Way Forwardhttps://www.nytimes.com/2022/12/05/technology/...
20Iran Has Abolished Morality Police, an Officia...https://www.nytimes.com/2022/12/04/world/middl...
21Skyrocketing Prices in Turkey Hurt Families an...https://www.nytimes.com/2022/12/05/world/europ...
22The Best Theater of 2022https://www.nytimes.com/2022/12/05/theater/bes...
23The Best Comedy of 2022https://www.nytimes.com/2022/12/05/arts/best-c...
24One Dough, Six Cookieshttps://www.nytimes.com/interactive/2022/12/04...
25Here’s what we learned from Week 13 in the N.F.L.https://www.nytimes.com/2022/12/04/sports/foot...
26Giving a rescue animal as a gift is sweet, but...https://www.nytimes.com/2022/12/05/style/rescu...
27My Mother Has Two Sons: Me and a Squirrelhttps://www.nytimes.com/2022/12/05/opinion/my-...
28If You Want to Give Something Back to Nature, ...https://www.nytimes.com/interactive/2022/12/05...
29The Big Thing Effective Altruism (Still) Gets ...https://www.nytimes.com/2022/12/04/opinion/cha...
30How Wildlife Rescue Can Heal the Human Hearthttps://www.nytimes.com/2022/12/05/opinion/wil...
31What Euthanasia Has Done to Canadahttps://www.nytimes.com/2022/12/03/opinion/can...
32Read the Well Newsletterhttps://www.nytimes.com/2022/12/01/well/holida...
33Listen to the ‘Book Review’ Podcasthttps://www.nytimes.com/2022/12/02/books/revie...
34Plan Tests Tense Relationship Between N.Y.P.D....https://www.nytimes.com/2022/12/05/nyregion/me...
35May ‘Bad Spaniels’ Mock Jack Daniel’s? The Sup...https://www.nytimes.com/2022/12/05/us/politics...
36Popular Pastor Returns After Absence Over an ‘...https://www.nytimes.com/2022/12/04/us/matt-cha...
37North Carolina Power Outages Caused by Gunfire...https://www.nytimes.com/2022/12/04/us/power-ou...
38Brussels Terrorist Attack Trial Opens, Revivin...https://www.nytimes.com/2022/12/05/world/europ...
39Mysterious Object Emerges on a Florida Beach, ...https://www.nytimes.com/2022/12/05/us/mysterio...
40Paul Pelosi Makes First Public Appearance Sinc...https://www.nytimes.com/2022/12/05/arts/music/...
41Bob McGrath, Longtime ‘Sesame Street’ Star, Di...https://www.nytimes.com/2022/12/04/arts/televi...
42Her Baby Needs Heart Surgery. But She Is Deman...https://www.nytimes.com/2022/12/05/world/austr...
43Sudan Military and Pro-Democracy Coalition Sig...https://www.nytimes.com/2022/12/05/world/afric...
44Courtroom Drama: New Legal Battle Over ‘To Kil...https://www.nytimes.com/2022/12/02/theater/to-...
45Review: ‘A Beautiful Noise’https://www.nytimes.com/2022/12/04/theater/a-b...
46Review: The Met’s Grand Old ‘Aida’https://www.nytimes.com/2022/12/04/arts/music/...
47Are Solar Panels a Good Investment?https://www.nytimes.com/2022/11/26/realestate/...
48The Many Layers of Lari Pittmanhttps://www.nytimes.com/2022/11/30/arts/design...
49The Political Winds Are Blowing. And Blowing. ...https://www.nytimes.com/2022/12/05/opinion/bid...
50Everything Democrats Could Do if Warnock Winshttps://www.nytimes.com/2022/12/05/opinion/war...
51The Supreme Court Is About to Ask the Wrong Qu...https://www.nytimes.com/2022/12/05/opinion/303...
52Biden Is Putting South Carolina First. I Won’t...https://www.nytimes.com/2022/12/05/opinion/iow...
53The Man Who Neutered Trumphttps://www.nytimes.com/2022/12/04/opinion/bri...
54Free to Be You and Me. Or Not.https://www.nytimes.com/2022/12/04/opinion/fre...
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 40 } ] }, { "cell_type": "code", "source": [ "# now let's formalize this into a function and call it on the whole link column using apply\n", "# this cell will take 10-15 minutes to run as the article_wd.get line will take some time to process for each article\n", "import time\n", "from tqdm import tqdm\n", "tqdm.pandas()\n", "\n", "def getParagraphs(link):\n", " paragraphs = []\n", "\n", " article_wd = webdriver.Chrome('chromedriver',options=chrome_options)\n", " time.sleep(2)\n", " article_wd.get(link)\n", " for p in article_wd.find_elements(By.TAG_NAME, 'p'):\n", " if p.get_attribute('class') == 'css-at9mc1 evys1bk0':\n", " paragraphs.append(BeautifulSoup(p.get_attribute('outerHTML')).text)\n", " return ' '.join(paragraphs).strip() # this line takes the list of paragraphs and converts them into a single string\n", "\n", "headline_df['article_text'] = headline_df['link'].progress_apply(getParagraphs)" ], "metadata": { "id": "YGQFKfiG5Er9", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "3d95f99a-cfd1-4476-b359-c729c7a6edcd" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "100%|██████████| 55/55 [15:43<00:00, 17.15s/it]\n" ] } ] }, { "cell_type": "code", "source": [ "headline_df" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "jv3b-D6mSkO3", "outputId": "f6a88ebb-2765-4d76-b1f9-da2106102230" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " headline \\\n", "0 World Cup: Croatia vs. Japan \n", "1 Russia-Ukraine War \n", "2 Supreme Court Gay Rights Case \n", "3 Supreme Court Hears Case Pitting Gay Rights Ag... \n", "4 Blasts Reported at 2 Military Bases Deep Insid... \n", "5 Advice for Europeans: Bundle Up and Get Ready ... \n", "6 War and sanctions are threatening to thrust Ru... \n", "7 DealBook: Energy traders pushed crude prices h... \n", "8 Covid Protests in China Raise Hope for Solidar... \n", "9 China has stemmed the wave of mass protests, b... \n", "10 Chinese expatriates in the U.S. are elated but... \n", "11 Croatia vs. Japan: Former Finalist Takes On th... \n", "12 As the World Focuses on Soccer, a Women’s Team... \n", "13 Soccer may be a global game, but corner kicks ... \n", "14 Last Day of Campaigning Gets Underway in Georg... \n", "15 Senator Raphael Warnock’s time in Harlem as a ... \n", "16 Former President Trump’s call for the “termina... \n", "17 Twin Friends of Eric Adams Are Dogged by Alleg... \n", "18 Few details were made public of New York Mayor... \n", "19 The Crypto Industry Struggles for a Way Forward \n", "20 Iran Has Abolished Morality Police, an Officia... \n", "21 Skyrocketing Prices in Turkey Hurt Families an... \n", "22 The Best Theater of 2022 \n", "23 The Best Comedy of 2022 \n", "24 One Dough, Six Cookies \n", "25 Here’s what we learned from Week 13 in the N.F.L. \n", "26 Giving a rescue animal as a gift is sweet, but... \n", "27 My Mother Has Two Sons: Me and a Squirrel \n", "28 If You Want to Give Something Back to Nature, ... \n", "29 The Big Thing Effective Altruism (Still) Gets ... \n", "30 How Wildlife Rescue Can Heal the Human Heart \n", "31 What Euthanasia Has Done to Canada \n", "32 Read the Well Newsletter \n", "33 Listen to the ‘Book Review’ Podcast \n", "34 Plan Tests Tense Relationship Between N.Y.P.D.... \n", "35 May ‘Bad Spaniels’ Mock Jack Daniel’s? The Sup... \n", "36 Popular Pastor Returns After Absence Over an ‘... \n", "37 North Carolina Power Outages Caused by Gunfire... \n", "38 Brussels Terrorist Attack Trial Opens, Revivin... \n", "39 Mysterious Object Emerges on a Florida Beach, ... \n", "40 Paul Pelosi Makes First Public Appearance Sinc... \n", "41 Bob McGrath, Longtime ‘Sesame Street’ Star, Di... \n", "42 Her Baby Needs Heart Surgery. But She Is Deman... \n", "43 Sudan Military and Pro-Democracy Coalition Sig... \n", "44 Courtroom Drama: New Legal Battle Over ‘To Kil... \n", "45 Review: ‘A Beautiful Noise’ \n", "46 Review: The Met’s Grand Old ‘Aida’ \n", "47 Are Solar Panels a Good Investment? \n", "48 The Many Layers of Lari Pittman \n", "49 The Political Winds Are Blowing. And Blowing. ... \n", "50 Everything Democrats Could Do if Warnock Wins \n", "51 The Supreme Court Is About to Ask the Wrong Qu... \n", "52 Biden Is Putting South Carolina First. I Won’t... \n", "53 The Man Who Neutered Trump \n", "54 Free to Be You and Me. Or Not. \n", "\n", " link \\\n", "0 https://www.nytimes.com/live/2022/12/05/sports... \n", "1 https://www.nytimes.com/live/2022/12/05/world/... \n", "2 https://www.nytimes.com/live/2022/12/05/us/sup... \n", "3 https://www.nytimes.com/live/2022/12/05/us/sup... \n", "4 https://www.nytimes.com/live/2022/12/05/world/... \n", "5 https://www.nytimes.com/2022/12/05/business/eu... \n", "6 https://www.nytimes.com/2022/12/05/world/europ... \n", "7 https://www.nytimes.com/2022/12/05/business/de... \n", "8 https://www.nytimes.com/2022/12/05/world/asia/... \n", "9 https://www.nytimes.com/2022/12/05/world/asia/... \n", "10 https://www.nytimes.com/2022/12/05/nyregion/ne... \n", "11 https://www.nytimes.com/live/2022/12/05/sports... \n", "12 https://www.nytimes.com/2022/12/03/sports/socc... \n", "13 https://www.nytimes.com/interactive/2022/12/05... \n", "14 https://www.nytimes.com/live/2022/12/04/us/war... \n", "15 https://www.nytimes.com/2022/12/04/us/politics... \n", "16 https://www.nytimes.com/2022/12/04/us/politics... \n", "17 https://www.nytimes.com/2022/12/05/nyregion/er... \n", "18 https://www.nytimes.com/2022/12/04/nyregion/er... \n", "19 https://www.nytimes.com/2022/12/05/technology/... \n", "20 https://www.nytimes.com/2022/12/04/world/middl... \n", "21 https://www.nytimes.com/2022/12/05/world/europ... \n", "22 https://www.nytimes.com/2022/12/05/theater/bes... \n", "23 https://www.nytimes.com/2022/12/05/arts/best-c... \n", "24 https://www.nytimes.com/interactive/2022/12/04... \n", "25 https://www.nytimes.com/2022/12/04/sports/foot... \n", "26 https://www.nytimes.com/2022/12/05/style/rescu... \n", "27 https://www.nytimes.com/2022/12/05/opinion/my-... \n", "28 https://www.nytimes.com/interactive/2022/12/05... \n", "29 https://www.nytimes.com/2022/12/04/opinion/cha... \n", "30 https://www.nytimes.com/2022/12/05/opinion/wil... \n", "31 https://www.nytimes.com/2022/12/03/opinion/can... \n", "32 https://www.nytimes.com/2022/12/01/well/holida... \n", "33 https://www.nytimes.com/2022/12/02/books/revie... \n", "34 https://www.nytimes.com/2022/12/05/nyregion/me... \n", "35 https://www.nytimes.com/2022/12/05/us/politics... \n", "36 https://www.nytimes.com/2022/12/04/us/matt-cha... \n", "37 https://www.nytimes.com/2022/12/04/us/power-ou... \n", "38 https://www.nytimes.com/2022/12/05/world/europ... \n", "39 https://www.nytimes.com/2022/12/05/us/mysterio... \n", "40 https://www.nytimes.com/2022/12/05/arts/music/... \n", "41 https://www.nytimes.com/2022/12/04/arts/televi... \n", "42 https://www.nytimes.com/2022/12/05/world/austr... \n", "43 https://www.nytimes.com/2022/12/05/world/afric... \n", "44 https://www.nytimes.com/2022/12/02/theater/to-... \n", "45 https://www.nytimes.com/2022/12/04/theater/a-b... \n", "46 https://www.nytimes.com/2022/12/04/arts/music/... \n", "47 https://www.nytimes.com/2022/11/26/realestate/... \n", "48 https://www.nytimes.com/2022/11/30/arts/design... \n", "49 https://www.nytimes.com/2022/12/05/opinion/bid... \n", "50 https://www.nytimes.com/2022/12/05/opinion/war... \n", "51 https://www.nytimes.com/2022/12/05/opinion/303... \n", "52 https://www.nytimes.com/2022/12/05/opinion/iow... \n", "53 https://www.nytimes.com/2022/12/04/opinion/bri... \n", "54 https://www.nytimes.com/2022/12/04/opinion/fre... \n", "\n", " article_text \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 Life in some European cities may soon look lik... \n", "6 KALUGA, Russia— Valery Volodin, a welder at a ... \n", "7 Crude oil prices rose this morning after a whi... \n", "8 Three years ago in Melbourne, Australia, Ronni... \n", "9 In central China, students chanted demands for... \n", "10 Huanjie Li, 26, has never been more worried ab... \n", "11 \n", "12 When the Afghanistan women’s national soccer t... \n", "13 \n", "14 \n", "15 Four days before the November midterm election... \n", "16 An extraordinary antidemocratic statement from... \n", "17 Vadim Shubaderov, a 35-year-old businessman, t... \n", "18 He met with heads of state. He visited the Acr... \n", "19 Not long after several Wall Street banks colla... \n", "20 A senior Iranian official said this weekend th... \n", "21 ISTANBUL — As Turkey’s annual inflation rate h... \n", "22 Musicals are exceedingly difficult; they not o... \n", "23 This year was a “best of times, worst of times... \n", "24 \n", "25 The Bengals completed a rare trifecta — three ... \n", "26 You know the scene: A child ambles downstairs ... \n", "27 During the summer of 2020, I returned home to ... \n", "28 \n", "29 This article is part of Times Opinion’s Holida... \n", "30 NASHVILLE — As a young college student, I work... \n", "31 La Maison Simons, commonly known as Simons, is... \n", "32 Next week, Well’s new columnist, Jancee Dunn, ... \n", "33 Heads up! The Book Review podcast returns with... \n", "34 Each day, New York City’s 911 system is inunda... \n", "35 WASHINGTON — The next frontier in First Amendm... \n", "36 FLOWER MOUND, Texas — The prominent pastor of ... \n", "37 A county in central North Carolina where about... \n", "38 BRUSSELS — The mammoth trial against 10 men ac... \n", "39 There’s something protruding through the sand ... \n", "40 WASHINGTON — It is not easy to upstage the sta... \n", "41 Bob McGrath, who played the sweater-clad neigh... \n", "42 A New Zealand couple is refusing to allow thei... \n", "43 NAIROBI, Kenya — Sudan’s military and a coalit... \n", "44 In 2019, the producers of a Broadway adaptatio... \n", "45 For decades, Neil Diamond was on top of the wo... \n", "46 “I’m not dead!” a decrepit old man croaks as h... \n", "47 Q: We were considering installing solar panels... \n", "48 Lari Pittman’s career as a painter has been as... \n", "49 Gail Collins: Bret, I think we’ve got good fig... \n", "50 Nearly two years ago, Raphael Warnock and Jon ... \n", "51 Can an artist be compelled to create a website... \n", "52 President Biden’s three-part plan to reform th... \n", "53 It’s not surprising that Gov. Brian Kemp of Ge... \n", "54 If you grew up in any remotely liberal enclave... " ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
headlinelinkarticle_text
0World Cup: Croatia vs. Japanhttps://www.nytimes.com/live/2022/12/05/sports...
1Russia-Ukraine Warhttps://www.nytimes.com/live/2022/12/05/world/...
2Supreme Court Gay Rights Casehttps://www.nytimes.com/live/2022/12/05/us/sup...
3Supreme Court Hears Case Pitting Gay Rights Ag...https://www.nytimes.com/live/2022/12/05/us/sup...
4Blasts Reported at 2 Military Bases Deep Insid...https://www.nytimes.com/live/2022/12/05/world/...
5Advice for Europeans: Bundle Up and Get Ready ...https://www.nytimes.com/2022/12/05/business/eu...Life in some European cities may soon look lik...
6War and sanctions are threatening to thrust Ru...https://www.nytimes.com/2022/12/05/world/europ...KALUGA, Russia— Valery Volodin, a welder at a ...
7DealBook: Energy traders pushed crude prices h...https://www.nytimes.com/2022/12/05/business/de...Crude oil prices rose this morning after a whi...
8Covid Protests in China Raise Hope for Solidar...https://www.nytimes.com/2022/12/05/world/asia/...Three years ago in Melbourne, Australia, Ronni...
9China has stemmed the wave of mass protests, b...https://www.nytimes.com/2022/12/05/world/asia/...In central China, students chanted demands for...
10Chinese expatriates in the U.S. are elated but...https://www.nytimes.com/2022/12/05/nyregion/ne...Huanjie Li, 26, has never been more worried ab...
11Croatia vs. Japan: Former Finalist Takes On th...https://www.nytimes.com/live/2022/12/05/sports...
12As the World Focuses on Soccer, a Women’s Team...https://www.nytimes.com/2022/12/03/sports/socc...When the Afghanistan women’s national soccer t...
13Soccer may be a global game, but corner kicks ...https://www.nytimes.com/interactive/2022/12/05...
14Last Day of Campaigning Gets Underway in Georg...https://www.nytimes.com/live/2022/12/04/us/war...
15Senator Raphael Warnock’s time in Harlem as a ...https://www.nytimes.com/2022/12/04/us/politics...Four days before the November midterm election...
16Former President Trump’s call for the “termina...https://www.nytimes.com/2022/12/04/us/politics...An extraordinary antidemocratic statement from...
17Twin Friends of Eric Adams Are Dogged by Alleg...https://www.nytimes.com/2022/12/05/nyregion/er...Vadim Shubaderov, a 35-year-old businessman, t...
18Few details were made public of New York Mayor...https://www.nytimes.com/2022/12/04/nyregion/er...He met with heads of state. He visited the Acr...
19The Crypto Industry Struggles for a Way Forwardhttps://www.nytimes.com/2022/12/05/technology/...Not long after several Wall Street banks colla...
20Iran Has Abolished Morality Police, an Officia...https://www.nytimes.com/2022/12/04/world/middl...A senior Iranian official said this weekend th...
21Skyrocketing Prices in Turkey Hurt Families an...https://www.nytimes.com/2022/12/05/world/europ...ISTANBUL — As Turkey’s annual inflation rate h...
22The Best Theater of 2022https://www.nytimes.com/2022/12/05/theater/bes...Musicals are exceedingly difficult; they not o...
23The Best Comedy of 2022https://www.nytimes.com/2022/12/05/arts/best-c...This year was a “best of times, worst of times...
24One Dough, Six Cookieshttps://www.nytimes.com/interactive/2022/12/04...
25Here’s what we learned from Week 13 in the N.F.L.https://www.nytimes.com/2022/12/04/sports/foot...The Bengals completed a rare trifecta — three ...
26Giving a rescue animal as a gift is sweet, but...https://www.nytimes.com/2022/12/05/style/rescu...You know the scene: A child ambles downstairs ...
27My Mother Has Two Sons: Me and a Squirrelhttps://www.nytimes.com/2022/12/05/opinion/my-...During the summer of 2020, I returned home to ...
28If You Want to Give Something Back to Nature, ...https://www.nytimes.com/interactive/2022/12/05...
29The Big Thing Effective Altruism (Still) Gets ...https://www.nytimes.com/2022/12/04/opinion/cha...This article is part of Times Opinion’s Holida...
30How Wildlife Rescue Can Heal the Human Hearthttps://www.nytimes.com/2022/12/05/opinion/wil...NASHVILLE — As a young college student, I work...
31What Euthanasia Has Done to Canadahttps://www.nytimes.com/2022/12/03/opinion/can...La Maison Simons, commonly known as Simons, is...
32Read the Well Newsletterhttps://www.nytimes.com/2022/12/01/well/holida...Next week, Well’s new columnist, Jancee Dunn, ...
33Listen to the ‘Book Review’ Podcasthttps://www.nytimes.com/2022/12/02/books/revie...Heads up! The Book Review podcast returns with...
34Plan Tests Tense Relationship Between N.Y.P.D....https://www.nytimes.com/2022/12/05/nyregion/me...Each day, New York City’s 911 system is inunda...
35May ‘Bad Spaniels’ Mock Jack Daniel’s? The Sup...https://www.nytimes.com/2022/12/05/us/politics...WASHINGTON — The next frontier in First Amendm...
36Popular Pastor Returns After Absence Over an ‘...https://www.nytimes.com/2022/12/04/us/matt-cha...FLOWER MOUND, Texas — The prominent pastor of ...
37North Carolina Power Outages Caused by Gunfire...https://www.nytimes.com/2022/12/04/us/power-ou...A county in central North Carolina where about...
38Brussels Terrorist Attack Trial Opens, Revivin...https://www.nytimes.com/2022/12/05/world/europ...BRUSSELS — The mammoth trial against 10 men ac...
39Mysterious Object Emerges on a Florida Beach, ...https://www.nytimes.com/2022/12/05/us/mysterio...There’s something protruding through the sand ...
40Paul Pelosi Makes First Public Appearance Sinc...https://www.nytimes.com/2022/12/05/arts/music/...WASHINGTON — It is not easy to upstage the sta...
41Bob McGrath, Longtime ‘Sesame Street’ Star, Di...https://www.nytimes.com/2022/12/04/arts/televi...Bob McGrath, who played the sweater-clad neigh...
42Her Baby Needs Heart Surgery. But She Is Deman...https://www.nytimes.com/2022/12/05/world/austr...A New Zealand couple is refusing to allow thei...
43Sudan Military and Pro-Democracy Coalition Sig...https://www.nytimes.com/2022/12/05/world/afric...NAIROBI, Kenya — Sudan’s military and a coalit...
44Courtroom Drama: New Legal Battle Over ‘To Kil...https://www.nytimes.com/2022/12/02/theater/to-...In 2019, the producers of a Broadway adaptatio...
45Review: ‘A Beautiful Noise’https://www.nytimes.com/2022/12/04/theater/a-b...For decades, Neil Diamond was on top of the wo...
46Review: The Met’s Grand Old ‘Aida’https://www.nytimes.com/2022/12/04/arts/music/...“I’m not dead!” a decrepit old man croaks as h...
47Are Solar Panels a Good Investment?https://www.nytimes.com/2022/11/26/realestate/...Q: We were considering installing solar panels...
48The Many Layers of Lari Pittmanhttps://www.nytimes.com/2022/11/30/arts/design...Lari Pittman’s career as a painter has been as...
49The Political Winds Are Blowing. And Blowing. ...https://www.nytimes.com/2022/12/05/opinion/bid...Gail Collins: Bret, I think we’ve got good fig...
50Everything Democrats Could Do if Warnock Winshttps://www.nytimes.com/2022/12/05/opinion/war...Nearly two years ago, Raphael Warnock and Jon ...
51The Supreme Court Is About to Ask the Wrong Qu...https://www.nytimes.com/2022/12/05/opinion/303...Can an artist be compelled to create a website...
52Biden Is Putting South Carolina First. I Won’t...https://www.nytimes.com/2022/12/05/opinion/iow...President Biden’s three-part plan to reform th...
53The Man Who Neutered Trumphttps://www.nytimes.com/2022/12/04/opinion/bri...It’s not surprising that Gov. Brian Kemp of Ge...
54Free to Be You and Me. Or Not.https://www.nytimes.com/2022/12/04/opinion/fre...If you grew up in any remotely liberal enclave...
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 42 } ] }, { "cell_type": "markdown", "source": [ "## Application\n", "Let's see what we can do with this scraped data. This is just a test so that you can see what type of things you can now do with this data in this form. It will be a recap of what it is in the `Gentle Introduction to NLP` workshop." ], "metadata": { "id": "1ZWs0uYoXUV6" } }, { "cell_type": "code", "source": [ "import nltk\n", "nltk.download('punkt')\n", "nltk.download('stopwords')\n", "\n", "# sentence-level tokenization\n", "headline_df['sents'] = headline_df['article_text'].apply(nltk.sent_tokenize)\n", "headline_sents = headline_df.explode('sents').drop(['link','article_text'],axis=1).reset_index(drop=True).dropna()\n", "\n", "# word-level tokenization\n", "headline_sents['words'] = headline_sents['sents'].apply(nltk.word_tokenize)" ], "metadata": { "id": "65VANuoE5s0e", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "9f86a46d-470b-48a6-8da0-5196b1b4468b" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "[nltk_data] Downloading package punkt to /root/nltk_data...\n", "[nltk_data] Package punkt is already up-to-date!\n", "[nltk_data] Downloading package stopwords to /root/nltk_data...\n", "[nltk_data] Package stopwords is already up-to-date!\n" ] } ] }, { "cell_type": "code", "source": [ "# stopwords are very common words that we don't generally care about in statisical analysis\n", "# sometimes, though, you may care about them...\n", "# this stops object is a normal Python set, so you can change it accordingly\n", "from nltk.corpus import stopwords\n", "stops = list(stopwords.words('english'))\n", "# i'll add some that are relevent here\n", "stops.append('said') # very common in journalism\n", "stops.append('would') # these two don't come pre-loaded though they should...\n", "stops.append('could')\n", "print(stops)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "UTZFMg4bgqRn", "outputId": "d24e0a74-5ce2-4296-b295-0444446639ba" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', \"you're\", \"you've\", \"you'll\", \"you'd\", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', \"she's\", 'her', 'hers', 'herself', 'it', \"it's\", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', \"that'll\", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', \"don't\", 'should', \"should've\", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', \"aren't\", 'couldn', \"couldn't\", 'didn', \"didn't\", 'doesn', \"doesn't\", 'hadn', \"hadn't\", 'hasn', \"hasn't\", 'haven', \"haven't\", 'isn', \"isn't\", 'ma', 'mightn', \"mightn't\", 'mustn', \"mustn't\", 'needn', \"needn't\", 'shan', \"shan't\", 'shouldn', \"shouldn't\", 'wasn', \"wasn't\", 'weren', \"weren't\", 'won', \"won't\", 'wouldn', \"wouldn't\", 'said', 'would', 'could']\n" ] } ] }, { "cell_type": "code", "source": [ "# now let's add all of the punctuation to the stops list\n", "import string\n", "for punct in string.punctuation:\n", " stops.append(punct)\n", "\n", "# nyt uses these special characters\n", "stops.append('’')\n", "stops.append('“')\n", "stops.append('”')\n", "stops.append('—')" ], "metadata": { "id": "UKOksNqmm5fe" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# simpliest way to do word frequency... count the words!\n", "import collections\n", "word_count = collections.defaultdict(int) # creates a dictionary that will increment its value everytime it sees the same key (in this case, every word used multiple times)\n", "words = [word.lower() for sent in headline_sents.words.to_list() for word in sent] # all words taken from the DataFrame in a single list\n", "\n", "for word in words:\n", " if word not in stops:\n", " word_count[word] += 1" ], "metadata": { "id": "o8O3Rem8j7sT" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "count_df = pd.DataFrame.from_dict(dict(sorted(word_count.items(), key=lambda item: item[1], reverse=True)), orient='index').reset_index().rename(columns={'index':'word',0:'count'})\n", "count_df" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 424 }, "id": "1nk2pf9HlaH-", "outputId": "70cc0db4-3ed4-4aec-a4bb-bd9daae53b5d" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " word count\n", "0 mr. 358\n", "1 new 147\n", "2 people 138\n", "3 one 134\n", "4 also 87\n", "... ... ...\n", "9286 npr 1\n", "9287 subsequent 1\n", "9288 dated. 1\n", "9289 spotify 1\n", "9290 winding 1\n", "\n", "[9291 rows x 2 columns]" ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
wordcount
0mr.358
1new147
2people138
3one134
4also87
.........
9286npr1
9287subsequent1
9288dated.1
9289spotify1
9290winding1
\n", "

9291 rows × 2 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 47 } ] }, { "cell_type": "code", "source": [ "# let's look at the top 50 words by frequency\n", "count_df[:50].plot.barh(x='word',y='count', figsize=(15,10))" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Do3rd93Xlda6", "outputId": "34fa27f3-e1d4-4e02-c603-dea0dd256b7e" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "" ] }, "metadata": {}, "execution_count": 48 }, { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "iVBORw0KGgoAAAANSUhEUgAAA6UAAAI/CAYAAACPq3/XAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzde5idZXn3/e/PiMQQGCqkvhhqB5WKQiCQBRUhFBB9aatVWnh8BDdga2rRUuuDT/NW6r4Va2sttFJTBXfUIu6gRlGqIBC2K+yGyMZDnSrBWog4AjEI4Xz/WHdknMxkO5OVWev7OY4c695c13Wf9xz55zyu6z6vVBWSJEmSJHXDE7odgCRJkiSpf5mUSpIkSZK6xqRUkiRJktQ1JqWSJEmSpK4xKZUkSZIkdY1JqSRJkiSpa57Y7QD6we67716Dg4PdDkOSJEmSumL58uX3VdWc8e6ZlG4Dg4ODtNvtbochSZIkSV2R5L8muufyXUmSJElS1zhTOo4kAVJVj03GeEMrRxhcvHQyhupLw2f+brdDkCRJkjRFnCltJBlMcmeSTwC3AWtH3Ts+ycea448lOSvJ1Um+m+T4LoUsSZIkSdOeM6W/bG/gNVV1bZIHN9BuD+BwYB/gYuCz2yI4SZIkSdu/Rx55hLvvvps1a9Z0O5RtbubMmey5557ssMMOm9zHpPSX/VdVXbsJ7b7YLO39VpKnjtcgySJgEcCMXcYtMiVJkiSpB919993svPPODA4O0vkysD9UFatWreLuu+9mr7322uR+Lt/9ZQ+NOq5RxzPHtHt41PG4/8uqaklVtaqqNWPWwGTFJ0mSJGk7t2bNGnbbbbe+SkgBkrDbbrtt9gyxM6UT+1GS5wB3AscBD2zpQPPmDtC2WI8kSZLUN/otIV1nS97bmdKJLQa+BFwN/LDLsUiSJEnSduGDH/wgq1evnrTxnCltVNUwsN+o888yTgGjqjp5zPnsqY5NkiRJ0vQ12dtDdnvLxA9+8IO88pWvZNasWZMynjOlkiRJktRjPvGJT7D//vtzwAEH8KpXvYrh4WGOPvpo9t9/f17wghfw/e9/H4CTTz6Zz3728bm42bM7c26XX345Rx55JMcffzz77LMPJ510ElXFWWedxT333MNRRx3FUUcdNSmxOlMqSZIkST1kxYoVvOc97+Hqq69m991358c//jGvec1rfvHv3HPP5bTTTuOLX/ziBse56aabWLFiBU972tM47LDDWLZsGaeddhof+MAHuOyyy9h9990nJd6+nilN8uUku26kzeVJWuNcn5/kd6YuOkmSJEnafN/4xjc44YQTfpE0PuUpT+Gaa67hxBNPBOBVr3oVV1111UbHOeSQQ9hzzz15whOewPz58xkeHp6SePt2pjSdslAvbvYb3RLzgRbw5Y01HFo5MunryPtJt9fMS5IkSb3qiU98Io891kmJHnvsMX7+85//4t6OO+74i+MZM2bw6KOPTkkMfTVTmmQwyZ1JPgHcBqxNsntz76+ae1cl+XSS00d1PSHJ9UnuSrIwyZOAdwEvT3Jzkpd34XUkSZIkaT1HH300F154IatWrQLgxz/+Mc9//vP593//dwDOP/98Fi5cCMDg4CDLly8H4OKLL+aRRx7Z6Pg777wzDzywxTtmrqcfZ0r3Bl5TVdcmGQZIcjDwB8ABwA7AjcDyUX2eWFWHNMt1315VxyR5G9Cqqjdu2/AlSZIkaWL77rsvb33rW/mt3/otZsyYwYEHHsjZZ5/NKaecwvvf/37mzJnDeeedB8DrXvc6XvrSl3LAAQdw7LHHstNOO210/EWLFnHsscfytKc9jcsuu2yr401VbfUg00WSQeCyqtqrOR+mswT3lcCvVNXbm+sfAO6pqr9Lcjnw1qpaluSpwLKqelaSk9lAUppkEbAIYMYucxbs+SfnTeWr9TSX70qSJGk6uf3223nOc57T7TC6Zrz3T7K8qtar1QN9tny38dAW9Hm4+V3LJs4uV9WSqmpVVWvGrIEteKQkSZIk9b5+XL47nmXAh5O8l87f5MXAko30eQDYeVMGnzd3gLazfZIkSZK0nn6cKV1PVd0AXAzcCnwFGAJGNtLtMuC5FjqSJEmSpC3XVzOlVTUM7DfqfHDU7b+rqnckmQVcQVPoqKqOHNX+PmCwOf4xcPBUxyxJkiRp+qkqOrtQ9pctqVnkTOnjliS5mU7l3c9V1Y3dDkiSJEnS9DNz5kxWrVq1RQnadFZVrFq1ipkzZ25Wv76ZKU3yl1X1NxPdr6oTN9D3ZOBrVXXPVMQmSZIkqXfsueee3H333dx7773dDmWbmzlzJnvuuedm9emppDTJjKpaO8HtvwQmTEo34mTgNsCkVJIkSdIG7bDDDuy1117dDmPamDZJabPH6CV0vvU8CFgBvBr4FnAB8ELgb9NZuP2XQIClVfUXSc4Entwsz11RVScleSVwGvAk4Drg1OZRH6Wzd2kB5wI/aM7PT/Iz4FDg7cDvAY/SmUE9fUOxD60cYXDx0sn4M/Q19yuVJEmSes+0SUobzwb+sKqWJTmXxxPJVVV1UJKnAdcCC4D7ga8leVlVLU7yxqqaD5DkOcDLgcOq6pEkHwJOopPozq2q/Zp2u1bVT5K8ETi9qtpJdgOOA/apqkqy6zZ8f0mSJEnqKdOt0NEPqmpZc/wp4PDm+ILm92Dg8qq6t6oeBc4HjhhnnBfQSVxvaGZPXwA8A/gu8IwkZyc5FvjpOH1HgDXAR5P8PrB6vECTLErSTtJeu3pju8tIkiRJUn+abknp2PJV684f2sxxAny8quY3/55dVe+oqvuBA4DLgdcDH1kvgE6yewjwWeDFdJYUrx9o1ZKqalVVa8asgc0MT5IkSZL6w3RLSp+e5NDm+ETgqjH3rwd+K8nuSWYArwC+2dx7JMkOzfHXgeOT/CpAkqck+fUkuwNPqKrPAWfQ+XYV4AFg56btbGCgqr4M/DmdJFaSJEmStAWm2zeldwJvaL4n/RZwDvCn625W1Q+TLAYu4/FCRxc1t5cAtya5sSl0dAadb06fADwCvAH4GXBecw3g/2t+Pwb8S1Po6LeBi5LMbJ7x5o0FPW/uAG2L9EiSJEnSejJdNnRtqu9+aV0Roumk1WpVu93udhiSJEmS1BVJlldVa7x70235riRJkiSph0yb5btVNQxMu1lSSZIkSdLE+mKmNMnrk7x6nOuDSW7binHflGTW1kUnSZIkSf1rWialTWXdTVZV/1JVn5iCUN4EmJRKkiRJ0hba7pbvNgWNLgGW09mSZQXwajrVdi8AXgj8bZIfA+8EdgS+A5xSVQ8mORP4PeBR4GtVdXqSdwAPVtXfJVkAnNs87mujnjsDOBM4shnzn6vqw0mOBN4B3Edn+fBy4JV0qv4+DbgsyX1VddRE7zS0coTBxUu36u8iGLaCsSRJktRztteZ0mcDH6qq5wA/BU5trq+qqoOA/6Szj+gxzXkbeHOS3YDjgH2ran/gPeOMfR7wp1U1dn/RPwRGqupg4GDgdUn2au4dSGdW9LnAM4DDquos4B7gqA0lpJIkSZKkiW2vSekPqmpZc/wp4PDm+ILm93l0EsRlSW4GXgP8OjACrAE+muT3gdWjB02yK7BrVV3RXPrkqNsvAl7djHcdsBuwd3Pv+qq6u6oeA24GBjf2AkkWJWknaa9dPbKJry1JkiRJ/WW7W77bGLt56rrzh5rfAJdW1SvGdkxyCPAC4HjgjcDRm/jM0JlB/eqY8Y4EHh51aS2b8HerqiXAEoAd99h7emwGK0mSJEnb2PY6U/r0JIc2xycCV425fy1wWJJnASTZKclvJJkNDFTVl4E/B35piW5V/QT4SZJ1M68njbr9VeBPkuzQjPkbSXbaSJwPADtv5rtJkiRJkhrb60zpncAbkpxLp8DROXQKCwFQVfcmORn4dJIdm8tn0EkSL0oyk87M55vHGfsU4NwkxahCR8BH6CzLvTFJgHuBl20kziXAJUnu2dB3pfPmDtC2SI8kSZIkrSdV29fK0qb67peqar8uhzJpWq1WtdvtbochSZIkSV2RZHlVtca7t70u35UkSZIk9YHtbvluVQ3T2Q9UkiRJktTj+mKmNMmuSU5tjo9M8qUJ2n0kyXM3MtbHkhw/FXFKkiRJUr/Z7mZKp8iuwKnAhzbUqKr+aCoePrRyhMHFS6di6L4ybLEoSZIkqef0xUwpcCbwzCQ3A+8HZif5bJI7kpzfVNslyeVJWs3xg0n+OsktSa5N8tSxgyZ5dzNzOmObvo0kSZIk9Yh+SUoXA9+pqvnAW4ADgTcBzwWeARw2Tp+dgGur6gDgCuB1o28meT8wBzilqtZOYeySJEmS1LP6JSkd6/qquruqHgNuprM/6Vg/B9Z9e7p8TJu/Agaq6vU1wZ46SRYlaSdpr109MnmRS5IkSVIP6dek9OFRx2sZ/9vaR0YlnGPb3AAsSPKUiR5QVUuqqlVVrRmzBrY6YEmSJEnqRf2SlD4A7DyJ411C5zvVpUkmc1xJkiRJ6it9UX23qlYlWZbkNuBnwI8mYcwLm4T04iS/U1U/m6jtvLkDtK0cK0mSJEnryQSfRGoStVqtarfb3Q5DkiRJkroiyfKqao13r1+W70qSJEmStkMmpZIkSZKkrjEplSRJkiR1TV8UOuq2oZUjDC5e2u0wpr1hi0VJkiRJPacvZkqTfDHJ8iQrkixKckKSDzT3/izJd5vjZyRZ1hy/LckNSW5LsiQdz0xy46hx9x59LkmSJEnaPH2RlAKvraoFQAs4DbgaWNjcWwisSjK3Ob6iuf5PVXVwVe0HPBl4cVV9BxhJMr9pcwpw3rZ6CUmSJEnqNf2SlJ6W5BbgWuDXmn+zm31Gfw34N+AIOknplU2fo5Jcl2QIOBrYt7n+EeCUJDOAlzd919PMyLaTtNeuHpmq95IkSZKkaa3nk9IkRwLHAIdW1QHATcBMOrOlpwB30klEFwKHAsuSzAQ+BBxfVfOAf236AHwO+G3gxcDyqlo13nOraklVtaqqNWPWwFS9niRJkiRNaz2flAIDwP1VtTrJPsDzmutXAqfTWa57E3AU8HBVjfB4AnpfktnA8esGq6o1wFeBc3DpriRJkiRtlX6ovnsJ8Pokt9OZFb22uX4lnaW7V1TV2iQ/AO4AqKqfJPlX4Dbgv4Ebxox5PnAc8LVNCWDe3AHaVo6VJEmSpPX0fFJaVQ/TWW47noxq96Ix/c4Azpig3+HAeVW1dlKClCRJkqQ+1fNJ6WRL8gXgmXSKH0mSJEmStoJJ6WaqquO6HYMkSZIk9Yp+KHQ0qZJcnqTV7TgkSZIkqRc4U7oNDK0cYXDx0m6HMe0NWyxKkiRJ6jk9P1Oa5C1JTmuO/yHJN5rjo5Ocn+RFSa5JcmOSC5stYEiyIMk3kyxP8tUke4wZ9wlJPpbkPdv+rSRJkiSpN/R8Ukpn65eFzXELmJ1kh+barXQq7B5TVQcBbeDNzf2zgeOragFwLvDXo8Z8Ip1tYb7dVOmVJEmSJG2Bfli+uxxYkGQX4GHgRjrJ6ULgYuC5wLIkAE8CrgGeDewHXNpcnwH8cNSYHwY+U1WjE9VfkmQRsAhgxi5zJveNJEmSJKlH9HxSWlWPJPkecDJwNZ3Z0aOAZwHfAy6tqleM7pNkHrCiqg6dYNirgaOS/H1VrZnguUuAJQA77rF3Tca7SJIkSVKv6Yflu9BZwns6cEVz/HrgJuBa4LAkzwJIslOS3wDuBOYkObS5vkOSfUeN91Hgy8BnkvR8Yi9JkiRJU6VfEqorgbcC11TVQ0nWAFdW1b1JTgY+nWTHpu0ZVXVXkuOBs5IM0Pk7fRBYsW7AqvpAc++TSU6qqscmevi8uQO0rRwrSZIkSetJlStLp1qr1ap2u93tMCRJkiSpK5Isr6rWePf6ZfmuJEmSJGk7ZFIqSZIkSeoak9KNSDI/ye90Ow5JkiRJ6kX9UugIgCRPrKpHN7PbfDr7mn55S587tHKEwcVLt7S7xjFs4ShJkiSpJ/RcUprk1XS2fyk6e5KuBdYABwLLkrwEeH5TefcJwF3AocD7m3YtYBfgzcDXgHcBT05yOPBe4FLgXOAZwGpgUVXduu3eUJIkSZJ6R08lpc1eomfQSTrvS/IU4APAns21tUlGgJPobPFyDHBLk6ACDAKHAM8ELgOeBbwNaFXVG5tnnA3cVFUvS3I08Ak6s6mSJEmSpM3Ua9+UHg1cWFX3AVTVj5vrF1bV2ub4XODVzfFrgfNG9f9MVT1WVd8GvgvsM84zDgc+2Yz/DWC3JLuMbZRkUZJ2kvba1SNb+16SJEmS1JN6LSmdyEPrDqrqB8CPmlnOQ4CvjGo3dtPWLd7EtaqWVFWrqlozZg1s6TCSJEmS1NN6avku8A3gC0k+UFWrmuW74/kI8Cngk6NmUAFOSPJxYC8634zeSWcJ786j2lxJZ/nvu5McCdxXVT/dUFDz5g7QtjCPJEmSJK2np5LSqlqR5K+BbyZZC9w0QdOL6SzbPW/M9e8D19MpdPT6qlqT5DJgcZKb6RQ6egdwbpJb6RQ6es3kv4kkSZIk9YeeSkoBqurjwMc30uwAOgWO7hhz/T+r6vVjxvsxcPCYdi/buiglSZIkSdCDSenGJFkM/AmdJbiSJEmSpC7qu6S0qs4Ezhzn+snbPhpJkiRJ6m/9Un130iQZTrJ7t+OQJEmSpF7QdzOlWyPJjC3pN7RyhMHFSyc7HAHDVjWWJEmSprW+mSlN8pYkpzXH/5DkG83x0UnOT/KKJENJbkvyvlH9Hkzy90luAQ4ddf3JSb6S5HXb/GUkSZIkqUf0TVJKZ3/Rhc1xC5idZIfm2l3A+4CjgfnAwUnWVdjdCbiuqg6oqquaa7OB/wA+XVX/uq1eQJIkSZJ6TT8lpcuBBUl2AR4GrqGTnC4EfgJcXlX3VtWjwPnAEU2/tcDnxox1EXBeVX1iooclWZSknaS9dvXIJL+KJEmSJPWGvklKq+oR4HvAycDVdGZOjwKeBQxvoOuaqlo75toy4Ngk2cDzllRVq6paM2YNbE3okiRJktSz+q3Q0ZXA6cBrgSHgA3RmUK8Hzmqq6t4PvAI4ewPjvK3598/AqRt76Ly5A7QtyCNJkiRJ6+mbmdLGlcAewDVV9SNgDXBlVf0QWAxcBtwCLK+qizYy1p8BT07yt1MZsCRJkiT1slRVt2Poea1Wq9rtdrfDkCRJkqSuSLK8qlrj3eu3mVJJkiRJ0nbEpHQTJbk8ybiZvSRJkiRpy5iUSpIkSZK6pqer7yYZBC4BrgWeD9wAnAe8E/hV4KSm6T8CM4GfAadU1Z1Jnty0PQC4A3jyqHFf1IyxI/Cdps+DE8UxtHKEwcVLJ/PV1Bi2qrEkSZI0rfXDTOmzgL8H9mn+nQgcTmdrmL+kk3AurKoD6Wzz8jdNvz8BVlfVc4C3AwsAmm1jzgCOqaqDgDbw5m32NpIkSZLUQ3p6prTxvaoaAkiyAvh6VVWSIWAQGAA+nmRvoIAdmn5HAGcBVNWtSW5trj8PeC6wLAnAk4Brxj40ySJgEcCMXeZMzZtJkiRJ0jTXD0npw6OOHxt1/hid9383cFlVHdcs9718I+MFuLSqXrGhRlW1BFgCsOMee7vvjiRJkiSNox+W727MALCyOT551PUr6Cz1Jcl+wP7N9WuBw5I8q7m3U5Lf2DahSpIkSVJv6YeZ0o35WzrLd88ARlcjOgc4L8ntwO3AcoCqujfJycCnk+zYtD0DuGuiB8ybO0DbgjySJEmStJ5UubJ0qrVarWq3290OQ5IkSZK6IsnyqmqNd8/lu5IkSZKkrjEplSRJkiR1Tc8kpUlOS3J7kvOT7JjkP5PcnOTlST6S5Lkb6Pt7SRZvZPyTk/zT5EcuSZIkSf2rlwodnQocU1V3J3keQFXNb+5dsKGOVXUxcPEUxydJkiRJGmNaJqVJ3gy8tjn9CLAP8AzgK0k+BbwOmJPkZuAPgI8Cp1dVO8mxwN8AM4D7quoFTTXdVlW9MclL6FTTfRKwCjipqn405vknAG8H1gIjVXXEhuIdWjnC4OKlG2qiLTRsVWNJkiRpWpt2SWmSBcApwG8CAa4DXgkcCxxVVfcluY5OEvrips+6vnOAfwWOqKrvJXnKOI+4CnheVVWSPwL+L/B/xrR5G/D/VtXKJLtO+ktKkiRJUp+YdkkpcDjwhap6CCDJ54GFm9j3ecAVVfU9gKr68Tht9gQuSLIHndnS743TZhnwsSSfAT4/3oOSLAIWAczYZc4mhidJkiRJ/aVnCh1NorOBf6qqecAfAzPHNqiq19NZ4vtrwPIku43TZklVtaqqNWPWwFTHLEmSJEnT0nRMSq8EXpZkVpKdgOOaa5viWuCIJHsBTLB8dwBY2Ry/ZrxBkjyzqq6rqrcB99JJTiVJkiRJm2naLd+tqhuTfAy4vrn0kaq6ad13oxvpe2+zrPbzSZ4A/A/wwjHN3gFcmOR+4BvAXuMM9f4ke9P5pvXrwC0beu68uQO0LcgjSZIkSetJVXU7hp7XarWq3W53OwxJkiRJ6ooky6uqNd696bh8V5IkSZLUI0xKJUmSJEld07dJaZLhJLt3Ow5JkiRJ6md9m5RKkiRJkrpv2lXf3RLN1jGfAfYEZgDvbm79aZKXADsAJ1TVHc02MecCzwBWA4uq6tYkQ8BCYAS4D/jzqvpEkk8An6yqSyd6/tDKEQYXL52q1xMwbHVjSZIkaVrql5nSY4F7quqAqtoPuKS5fl9VHQScA5zeXHsncFNV7Q/8JfCJ5voy4DBgX+C7dBJUgEOBq6f+FSRJkiSp9/RLUjoEvDDJ+5IsrKqR5vrnm9/lwGBzfDjwSYCq+gawW5JdgCuBI5p/5wDzkswF7q+qh8Y+MMmiJO0k7bWrR8beliRJkiTRJ0lpVd0FHEQnOX1Pkrc1tx5uftey8aXMV9CZHV0IXA7cCxxPJ1kd75lLqqpVVa0Zswa27gUkSZIkqUf1RVKa5GnA6qr6FPB+OgnqRK4ETmr6HUlnie9Pq+oHwO7A3lX1XeAqOkt+r5jK2CVJkiSpl/VFoSNgHvD+JI8BjwB/Anx2grbvAM5NciudQkevGXXvOjqFkqCTvL6XTnK64YfPHaBtIR5JkiRJWk+qqtsx9LxWq1XtdrvbYUiSJElSVyRZXlWt8e71xfJdSZIkSdL2yaRUkiRJktQ1PZ+UJtk1yanN8ZFJvtTtmCRJkiRJHf1Q6GhX4FTgQ5vaIcmMqlo7WQEMrRxhcPHSyRpO4xi2kJQkSZI0LfX8TClwJvDMJDfT2Q5mdpLPJrkjyflJApBkOMn7ktwInJDkRUmuSXJjkguTzG7aLUjyzSTLk3w1yR7dezVJkiRJmt76ISldDHynquYDbwEOBN4EPBd4BnDYqLarquog4D+BM4BjmvM28OYkOwBnA8dX1QLgXOCvt9mbSJIkSVKP6Yflu2NdX1V3AzSzp4M8vtfoBc3v8+gkrcuaidQnAdcAzwb2Ay5trs8AfjjeQ5IsAhYBzNhlzhS8hiRJkiRNf/2YlD486ngtv/w3eKj5DXBpVb1idMck84AVVXXoxh5SVUuAJQA77rG3m8FKkiRJ0jj6YfnuA8DOm9nnWuCwJM8CSLJTkt8A7gTmJDm0ub5Dkn0nNVpJkiRJ6iM9P1NaVauSLEtyG/Az4Eeb0OfeJCcDn06yY3P5jKq6K8nxwFlJBuj8/T4IrNjQePPmDtC2OqwkSZIkrSdVriydaq1Wq9rtdrfDkCRJkqSuSLK8qlrj3euH5buSJEmSpO2USakkSZIkqWu2y6Q0ydVb2O/IJF/azD7vSHJ6c/yuJMdsybMlSZIkSZtvuyx0VFXP79Jz3zYV4w6tHGFw8dKpGFqNYQtJSZIkSdPS9jpT+mDze2SSy5N8NskdSc5PkubewUmuTnJLkuuT7DxmjF/MgDbntyUZbI7fmuSuJFcBzx7V5mNNdV2SDCd5Z5Ibkwwl2ae5PifJpUlWJPlIkv9KsvsU/0kkSZIkqSdtl0npGAcCbwKeCzyDzv6hTwIuAP6sqg4AjqGz3ctGJVkA/G9gPvA7wMEbaH5fVR0EnAOsS3DfDnyjqvYFPgs8fbPfSJIkSZIETI+k9PqquruqHgNuBgbpzG7+sKpuAKiqn1bVo5s43kLgC1W1uqp+Cly8gbafb36XN88FOBz49+a5lwD3j9cxyaIk7STttatHNjE0SZIkSeov0yEpfXjU8Vo2/TvYR/nl95u5Fc/enOcCUFVLqqpVVa0Zswa24NGSJEmS1PumQ1I6njuBPZIcDJBk5yRjk8Zh4KDm/kHAXs31K4CXJXly8x3qSzbz2cuA/9WM+yLgV7boDSRJkiRJ22f13Y2pqp8neTlwdpIn0/medOxWLp8DXp1kBXAdcFfT98YkFwC3AP8D3LCZj38n8OkkrwKuAf4beGBDHebNHaBtdVhJkiRJWk+qqtsxTCtJdgTWVtWjSQ4Fzqmq+Rvq02q1qt1ub5sAJUmSJGk7k2R5VbXGuzctZ0q77OnAZ5I8Afg58LouxyNJkiRJ05ZJ6Waqqm/T2aZGkiRJkrSVpmuhoymT5PIk404rS5IkSZImlzOl28DQyhEGFy/tdhh9YdiCUpIkSdK00rczpUkGk9yR5Pwktyf5bJJZY9qck6SdZEWSdzbXjk7yxVFtXpjkC9s6fkmSJEnqBX2blDaeDXyoqp4D/BQ4dcz9tzYVovYHfivJ/sBlwD5J5jRtTgHO3VYBS5IkSVIv6fek9AdVtaw5/hRw+Jj7/yvJjcBNwL7Ac6uzh84ngVcm2RU4FPjK2IGTLGpmWdtrV49M3RtIkiRJ0jTW79+Ujt2k9RfnSfYCTgcOrqr7k3wMmNncPg/4D2ANcGFVPbrewFVLgCUAO+6xt5vBSpIkSdI4+n2m9OlJDm2OTwSuGnVvF+AhYCTJU4HfXnejqu4B7gHOoJOgSpIkSZK2QL/PlN4JvCHJucC3gHOAlwBU1S1JbgLuAH4ALBvT93xgTlXdvrGHzJs7QNuqsJIkSZK0nn5PSh+tqleOuXbkuoOqOnkDfQ8H/nUKYpIkSZKkvtHvSekWSbKcztLe/9PtWCRJkiRpOuvbpLSqhoH9trDvgsmNRpIkSZL6U18UOkoymOS2bschSZIkSZ8EjAQAACAASURBVPplfTtTui0NrRxhcPHSbofRF4YtKCVJkiRNK30xUzpakmckuSnJW5J8PsklSb6d5G9HtXlFkqEktyV5X3PthCQfaI7/LMl3R403tjKvJEmSJGkT9NVMaZJnA/8OnAwcCMxvfh8G7kxyNrAWeB+wALgf+FqSlwFXAv+3GWohsCrJ3Ob4im34GpIkSZLUM/pppnQOcBFwUlXd0lz7elWNVNUaOvuU/jpwMHB5Vd1bVY/S2Y/0iKr6b2B2kp2BXwP+DTiCTlJ65diHJVmUpJ2kvXb1yJS/nCRJkiRNR/2UlI4A36ezv+g6D486XsvGZ46vBk4B7qSTiC4EDgXWW75bVUuqqlVVrRmzBrYmbkmSJEnqWf20fPfnwHHAV5M8uIF21wNnJdmdzvLdVwBnN/euBN7V/LsJOAr4WVVtcCp03twB2hbgkSRJkqT19NNMKVX1EPBi4M+BXSZo80NgMXAZcAuwvKouam5fSWfp7hVVtRb4AXDVVMctSZIkSb0qVdXtGHpeq9Wqdrvd7TAkSZIkqSuSLK+q1nj3+mqmVJIkSZK0fTEplSRJkiR1jUnpJkhyeZL1ppqTnJzkn7oRkyRJkiT1gn6qvrtFkszY2jGGVo4wuHjpZISjTTRstWNJkiRpWujpmdIkb0lyWnP8D0m+0RwfneT8JK9IMpTktiTvG9XvwSR/n+QWOvuQjh7zlCR3JbkeOGxbvo8kSZIk9ZqeTkrpbOGysDluAbOT7NBcuwt4H3A0MB84OMnLmrY7AddV1QFV9YstX5LsAbyTTjJ6OPDcbfIWkiRJktSjej0pXQ4sSLIL8DBwDZ3kdCHwE+Dyqrq3qh4FzgeOaPqtBT43zni/OarPz4ELJnpwkkVJ2knaa1ePTN4bSZIkSVIP6emktKoeAb4HnAxcTWfm9CjgWcDwBrquqaq1W/nsJVXVqqrWjFkDWzOUJEmSJPWsfih0dCVwOvBaYAj4AJ0Z1OuBs5LsDtwPvAI4eyNjXQf8Y5LdgJ8CJwC3bCyAeXMHaFt4R5IkSZLW09MzpY0rgT2Aa6rqR8Aa4Mqq+iGwGLiMTmK5vKou2tBATZ930FkGvAy4fQrjliRJkqSel6rqdgw9r9VqVbvd7nYYkiRJktQVSZZXVWu8e/0wUypJkiRJ2k6ZlEqSJEmSusakVJIkSZLUNf1QfXejkgwClwDXAs8HbgDOA94J/CpwEvBk4B+bLgUcUVUPbMr4QytHGFy8dHKD1gYNW+1YkiRJmhZMSh/3LDpbvLyWTlJ6InA48HvAXwIzgDdU1bIks+lU8ZUkSZIkbQWX7z7ue1U1VFWPASuAr1enNPEQMEhnC5gPJDkN2LWqHt3QYEkWJWknaa9dPTLVsUuSJEnStGRS+riHRx0/Nur8MeCJVXUm8Ed0lvEuS7LPhgarqiVV1aqq1oxZA1MSsCRJkiRNdy7f3URJnllVQ8BQkoOBfYA7uhyWJEmSJE1rJqWb7k1JjqIzc7oC+ApAkpurav6GOs6bO0DbwjuSJEmStB6TUqCqhoH9Rp2fPNG9cfpuMCGVJEmSJE3Mb0olSZIkSV1jUipJkiRJ6pq+SkqTDCa5bSvHODLJlyYrJkmSJEnqZ32VlEqSJEmSti/9WOjoiUnOBw6iU0X31cDpwEvo7EF6NfDHVVVJngX8CzAHWAucMHqgZmuYJcDxVfWdiR44tHKEwcVLp+JdNIFhqx1LkiRJ00I/zpQ+G/hQVT0H+ClwKvBPVXVwVe1HJzF9cdP2fOCfq+oA4PnAD9cNkuT5dBLWl24oIZUkSZIkTawfk9IfVNWy5vhTwOHAUUmuSzIEHA3sm2RnYG5VfQGgqtZU1eqm33PozJC+pKq+P95DkixK0k7SXrt6ZEpfSJIkSZKmq35MSmuc8w/RWYI7D/hXYOZGxvghsAY4cMKHVC2pqlZVtWbMGtiaeCVJkiSpZ/VjUvr0JIc2xycCVzXH9yWZDRwPUFUPAHcneRlAkh2TzGra/gT4XeC9SY7cZpFLkiRJUo/px0JHdwJvSHIu8C3gHOBXgNuA/wZuGNX2VcCHk7wLeIRRhY6q6kdJXgx8Jclrq+q6iR44b+4AbQvvSJIkSdJ6UjV2NasmW6vVqna73e0wJEmSJKkrkiyvqtZ49/px+a4kSZIkaTthUipJkiRJ6hqT0gkkedOowkYk+XKSXZvjB5vfwSS3dStGSZIkSZru+rHQ0aZ6E519TFcDVNXvbOlAQytHGFy8dLLi0iYYtrCUJEmSNC2YlAJJdgI+A+wJzAAuBJ4GXJbkvqo6Kskw0Kqq+7oXqSRJkiT1FpPSjmOBe6rqdwGSDACnAEeZhEqSJEnS1PGb0o4h4IVJ3pdkYVWNbO2ASRYlaSdpr1291cNJkiRJUk8yKQWq6i7gIDrJ6XuSvG0SxlxSVa2qas2YNbDVMUqSJElSL3L5LpDkacCPq+pTSX4C/BHwALAz4PJdSZIkSZoiJqUd84D3J3kMeAT4E+BQ4JIk91TVUVs1+NwB2laDlSRJkqT1pKq6HUPPa7Va1W63ux2GJEmSJHVFkuVV1Rrvnt+USpIkSZK6xqRUkiRJktQ1PZ+UJtk1yandjkOSJEmStL5+KHS0K3Aq8KFuBTC0coTBxUu79fi+N2yRKUmSJGm71fMzpcCZwDOT3Jzk/UnekuSGJLcmeee6Rkm+mGR5khVJFo26/mDTb0WS/0xySJLLk3w3ye915Y0kSZIkqUf0Q1K6GPhOVc0HLgX2Bg4B5gMLkhzRtHttVS0AWsBpSXZrru8EfKOq9qWzd+l7gBcCxwHv2navIUmSJEm9px+W7472oubfTc35bDpJ6hV0EtHjmuu/1lxfBfwcuKS5PgQ8XFWPJBkCBid6UDPbughgxi5zJvctJEmSJKlH9FtSGuC9VfXhX7qYHAkcAxxaVauTXA7MbG4/Uo9v5voY8DBAVT2WZMK/X1UtAZYA7LjH3m4GK0mSJEnj6Ifluw8AOzfHXwVem2Q2QJK5SX4VGADubxLSfYDndSdUSZIkSeovPT9TWlWrkixLchvwFeDfgGuSADwIvJLO8tzXJ7kduBO4djJjmDd3gLYVYCVJkiRpPXl8ZaqmSqvVqna73e0wJEmSJKkrkiyvqtZ49/ph+a4kSZIkaTtlUipJkiRJ6pq+TEqT7Jrk1I20GWy+Q5UkSZIkTZGeL3Q0gV2BU4EPbYuHDa0cYXDx0m3xKG2CYYtOSZIkSduNvpwpBc4Enpnk5iT/kOTrSW5MMpTkpWMbJ3lGkpuSHJzkmUkuSbI8yZXNFjKSJEmSpC3QrzOli4H9qmp+kicCs6rqp0l2B65NcvG6hkmeDfw7cHJV3ZLk68Drq+rbSX6Tzmzr0d14CUmSJEma7vo1KR0twN8kOQJ4DJgLPLW5Nwe4CPj9qvpWktnA84ELm31OAXYcd9BkEbAIYMYuc6YuekmSJEmaxkxK4SQ6yeeCqnokyTAws7k3AnwfOBz4Fp3lzj+pqvkbG7SqlgBLAHbcY283g5UkSZKkcfTrN6UPADs3xwPA/zQJ6VHAr49q93PgOODVSU6sqp8C30tyAkA6DtiWgUuSJElSL+nLmdKqWpVkWbPlyw3APkmGgDZwx5i2DyV5MXBpkgfpzKyek+QMYAc635vesqHnzZs7QNuKr5IkSZK0nr5MSgGq6sRNaLZf0/YnwMGjrh87JUFJkiRJUp/p1+W7kiRJkqTtgEmpJEmSJKlrTEqBJO9Icvo41web705J0kpy1raPTpIkSZJ6V99+U7q5qqpNpxDSZhtaOcLg4qWTHJG21rDFpyRJkqSu68mZ0maG844k5ye5Pclnk8xKMpxk96ZNK8nlo7odkOSaJN9O8rpxxjwyyZea49lJzksylOTWJH+wbd5MkiRJknpLL8+UPhv4w6paluRc4NSNtN8feB6wE3BTkg1Nbf4VMFJV8wCS/MpkBCxJkiRJ/aYnZ0obP6iqZc3xp4DDN9L+oqr6WVXdB1wGHLKBtscA/7zupKruH9sgyaIk7STttatHNjN0SZIkSeoPvZyU1jjnj/L4O8/chPZb/vCqJVXVqqrWjFkDWzOUJEmSJPWsXk5Kn57k0Ob4ROAqYBhY0Fwb+x3oS5PMTLIbcCRwwwbGvhR4w7oTl+9KkiRJ0pbp5W9K7wTe0HxP+i3gHOB64KNJ3g1cPqb9rXSW7e4OvLuq7kkyOMHY7wH+udkuZi3wTuDzEwUyb+4AbSu9SpIkSdJ6UrVVq1S3S00y+aWq2q/LoQDQarWq3d6i3WQkSZIkadpLsryqWuPd6+Xlu5IkSZKk7VxPLt+tqmFgu5gllSRJkiRNzJnSLZRkMMmJ3Y5DkiRJkqaznpgpTfLEqnp0Gz92kE5V33/bWMOhlSMMLl465QFp6w1bkEqSJEnapqZ0pjTJXyW5M8lVST6d5PQk85Ncm+TWJF9I8itJ9kly/ah+g0mGmuMFSb6ZZHmSrybZo7l+eZIPJmkDf9acvy/J9UnuSrKwaXdyki8muTTJcJI3JnlzkpuaOJ7StHtmkkua51yZZJ/m+seSnJXk6iTfTXJ8E+aZwMIkNyf586n8O0qSJElSr5qypDTJwXT2Aj0A+G1gXaWlTwB/UVX7A0PA26vqDuBJSfZq2rwcuCDJDsDZwPFVtQA4F/jrUY95UlW1qurvm/MnVtUhwJuAt49qtx/w+8DBTf/VVXUgcA3w6qbNEuBPm+ecDnxoVP89gMOBF9NJRgEWA1dW1fyq+ofN/wtJkiRJkqZy+e5hwEVVtQZYk+Q/gJ2AXavqm02bjwMXNsefoZOMntn8vhx4Np2E8tIkADOAH456xgVjnrlur9DldJbXrnNZVT0APJBkBPiP5voQsH+S2cDzgQub5wDsOKr/F6vqMeBbSZ66KS+fZBGwCGDGLnM2pYskSZIk9Z3t6ZvSC+gkhZ8Hqqq+nWQesKKqDp2gz0Njzh9uftfyy+/28Kjjx0adP9a0ewLwk6qaP8FzRvfPBG1+SVUtoTP7yo577N17m8FKkiRJ0iSYyqR0GfDhJO9tnvNiOkna/UkWVtWVwKuAbwJU1XeSrAX+isdnQO8E5iQ5tKquaZbz/kZVrZjMQKvqp0m+l+SEqrownenS/avqlg10ewDYeVPGnzd3gLYFdCRJkiRpPVP2TWlV3QBcDNwKfIXOUtkR4DXA+5PcCswH3jWq2wXAK+ks5aWqfg4cD7wvyS3AzXSW2U6Fk4A/bJ6zAnjpRtrfCqxNcouFjiRJkiRpy6Rq6laWJpldVQ8mmQVcASyqqhun7IHbqVarVe12u9thSJIkSVJXJFleVa3x7k31N6VLkjwXmAl8vB8TUkmSJEnSxDaYlCY5G5hwKrWqTttQ/6o6cQvjkiRJkiT1gY19U9qms73KTOAg4NvNv/nAk6Y2NEmSJElSr9ukb0qTXAscXlWPNuc7AFdW1fOmOL7tTpIZVbV2c/rsuMfetcdrPjhVIWkbG7aSsiRJkrRZNvRN6aZW3/0VYJdR57Oba9u1JO9K8qZR53+d5M+SvCXJDUluTfLOUfe/mGR5khVJFo26/mCSv28q8x6a5Mwk32r6/902fi1JkiRJ6hmbmpSeCdyU5GNJPg7cCPzN1IU1ac4FXg2Q5AnA/wb+G9gbOITOMuQFSY5o2r+2qhYALeC0JLs113cCrquqA4DbgeOAfatqf+A92+plJEmSJKnXbLT6bpPM3Qn8ZvMP4C+q6r+nMrDJUFXDSVYlORB4KnATcDDwouYYOrO+e9PZsua0JMc113+tub4KWAt8rrk+AqwBPprkS8CXxnt2M9O6CGDGLnMm+c0kSZIkqTdsNCmtqseS/HNVHQhctA1immwfAU4G/h86M6cvAN5bVR8e3SjJkcAxwKFVtTrJ5XQKPAGsWfcdaVU9muSQZpzjgTcCR499aFUtAZZA55vSSX8rSZIkSeoBm7pP6deT/AHw+dqUykjbly8A7wJ2AE4EHgXeneT8qnowyVzgEWAAuL9JSPcBxi3ilGQ2MKuqvpxkGfDdjQUwb+4AbYvjSJIkSdJ6NjUp/WPgzcDaJGuaa1VVu2ygz3ahqn6e5DLgJ81s59eSPAe4JgnAg8ArgUuA1ye5nc5y5WsnGHJn4KIkM4HQ+btIkiRJkrbAJiWlVbXzVAcyVZpvYp8HnLDuWlX9I/CP4zT/7fHGqKrZo45/SKdIkiRJ/3979x5mV13fe/z9MWIQkEEw5cR4GS9RBAIBNhYUrCB61FovVaSKF9RjqlKRY7GNymm1lVZLK2q9NbUIVVorKGqNoh5ASBGEHUgYAogX4tFgtSgMlwhC8j1/7JWymcxMLmRmZ/Z+v55nnln7d1nru7KePTxffr/1+0mSpAdoc0dKSfJCYMMqtd+uqnEX+NmeJNmbzkJE51bV93sdjyRJkiTp/jZrS5gk7wfeBlzb/LwtyV9PZWDbQlVdW1WPr6o/3lDWbGvzsrFtkzwyyTnTG6EkSZIkDbbNHSl9PrCwqtYDNHuVXgW8c6oCm25VdROd1XQlSZIkSdNks6fvArsBv2qOh6YglimR5DXASUABV9PZc/QZSd5OZ5uYP6mqc5IMA1+tqn2THAe8ENgJeAKd6b9/0pzvE3T2On0ocE5V/fmmYhhZM8rw4qXb+tbUI6tdSVmSJEnaZjY3Kf0r4Mpm787Qebd08VQFta0k2Qc4GXhaVd2cZHfgg8Bc4DBgL+ArwHjTdhcCBwB3A99L8vdV9RPg3VX1qySz6GyVs19VXT0d9yNJkiRJ/WZzk9IXAKcDtwCrgT+tqv+cqqC2oSOBs6vqZoAmmQT4UjMV+doke07Q9/yqGgVIci3wWOAnwMuTLKLzbzcX2JvOCOz9NG0WAczadc42vSlJkiRJ6hebtdAR8E/N7xfS2UrlY0neNjUhTYu7u46zGW3WAQ9O8jg6U4GfVVX7AUuBHcfrXFVLqqpVVa1ZO82Y2c6SJEmSNK02KymtqguBU4D/A/wj0ALePIVxbSsXAEcn2QOgmb77QOwK3AmMNiOs4+5rKkmSJEnaPJs1fTfJ+cDOwKXAMuDgqvrFVAa2LVTVqiSnABclWUdnxeAHcr6VSa4CrqczlfeSzem3YN4QbRfHkSRJkqSNpKo23Sg5DTiIzpTWS4CLgUur6tdTG15/aLVa1W63ex2GJEmSJPVEkuVV1RqvbrNGSqvqfzcnehhwHPBpOtupzN5GMUqSJEmSBtDmTt/9I+BwOqOlq+msxLts6sKSJEmSJA2Czd0SZkc6+3sur6p7pzCe7UKSNwFrq+qfkxwHfLOqbupxWJIkSZLUdzZ3+u7fTnUg25Oq+mTXx+OAawCTUkmSJEnaxjZ3pLSvJXkNnf1HC7ga+CFwB52pyi3grCS/Bt4NvLGqXtz0ezbwlqp6yWTnH1kzyvDipVN3A+qJ1a6oLEmSJD1gm7VPaT9Lsg9wMnBkVe0PvG1DXVWdA7SBY6tqIfA1YK8kc5omr6Pzfq0kSZIkaSsMfFIKHAmcXVU3A1TVryZqWJ39cz4DvCrJbsChwNfHa5tkUZJ2kva6taNTELYkSZIkzXxO391ynwb+HbiLTjI77sJPVbUEWAIwe+78TW8GK0mSJEkDyJFSuAA4OskeAEl2H1N/O/CwDR+aVXhvojPl99PTFaQkSZIk9aOBHymtqlVJTgEuSrIOuIrOAkcbnAF8slno6NCq+jVwFjCnqq7bnGssmDdE20VxJEmSJGkjA5+UAlTVmcCZE9R9AfjCmOLDgH+c6rgkSZIkqd+ZlG6hJMuBO4E/7nUskiRJkjTTmZRuoao6qNcxSJIkSVK/cKGjLZTkjl7HIEmSJEn9YiBGSpPMqqp1vbr+yJpRhhcv7dXlNcVWu4iVJEmStNW2+5HSJO9IckJzfFqSC5rjI5OcleQTSdpJViV5b1e/1Uk+kORKOlu+rE7y3iRXJhlJslfTbvckX0pydZLLkuzXlO+S5NNN26uTvHRMXI9IcmkSMxJJkiRJ2krbfVIKLAMOb45bwC5JdmjKLgbeXVUtYD/gdzYklY1fVtWBVfW55vPNVXUg8AngpKbsvcBVVbUf8C7gn5vy/wOMVtWCpu6CDSdNsiewFPizqnIIVJIkSZK20kxISpcDByXZFbgbuJROcno4nYT15c1o6FXAPsDeXX3/bcy5vth1zuHm+DDgMwBVdQGwR3Oto4CPbehYVbc0hzsA5wN/UlXfmijoJIuaEdz2urWjW3TDkiRJkjQotvuktKruAW4EjgO+QycRPQJ4IvBrOiOez2pGM5cCO3Z1v3PM6e5ufq9j69+nvZdOUvs/NxH3kqpqVVVr1k5DW3kpSZIkSepv231S2lhGJ/m8uDl+E52R0V3pJJ6jzZTa523luY8FSPJMOlN8bwO+BRy/oVGShzeHBbwe2CvJn27NzUiSJEmSOmbK6rvLgHcDl1bVnUnuApZV1cokVwHXAz8BLtmKc78HOD3J1cBa4LVN+fuAjyW5hs7I6ntppv9W1bokrwC+kuT2qvr4ZBdYMG+Itiu0SpIkSdJGUlW9jqHvtVqtarfbvQ5DkiRJknoiyfJmgdqNzJTpu5IkSZKkPmRSKkmSJEnqmYFMSpN8Ksnem24pSZIkSZpKvlM6DWbPnV9zX/uhXoehKbLaRawkSZKkSQ30O6VJdk6yNMnKJNckOSbJt5O0mvo7kpzS1F/WbC1Dkj2TnNuUr0zytKb8VUkuT7IiyT8kmdXL+5MkSZKkmazvk1LgucBNVbV/Ve0LnDemfmfgsqran84+qG9syj8CXNSUHwisSvIU4Bjg6VW1kM5WMcdOx01IkiRJUj8ahKR0BHh2kg8kObyqRsfU/wb4anO8HBhujo8EPgGdfUmbfs8CDgKuSLKi+fz48S6aZFGSdpL2urVjLylJkiRJAnhwrwOYalV1Q5IDgecD70ty/pgm99R9L9auY/J/kwBnVtU7N+O6S4Al0HmndMsjlyRJkqT+1/cjpUkeCaytqs8Cp9KZirs5zgfe3JxjVpKhpuxlSX6rKd89yWOnIGxJkiRJGgh9P1IKLABOTbIeuIdOovm3m9HvbcCSJG+gM4L65qq6NMnJwDeTPKg53/HAjycNYN4QbVdolSRJkqSNuCXMNGi1WtVut3sdhiRJkiT1xEBvCSNJkiRJ2n6ZlEqSJEmSesakdIwkJybZqddxSJIkSdIg8J3SMZKsBlpVdfM4dbOqat2WnnP23Pk197Uf2hbhaTu02kWsJEmSpEn13TulSV6T5OokK5N8JslwkguasvOTPKZpd0aSl3X1u6P5/cwk305yTpLrk5yVjhOARwIXJrlwQ58kf5dkJfDuJF/qOt+zk5w7rTcvSZIkSX1kxm0Jk2Qf4GTgaVV1c5LdgTOBM6vqzCSvBz4CvHgTpzoA2Ae4CbgEeHpVfSTJ24EjukZKdwa+W1V/nCTAdUnmVNV/Aa8DTt/mNylJkiRJA2ImjpQeCZy9IWmsql8BhwL/0tR/BjhsM85zeVX9tKrWAyuA4QnarQO+0FyrmvO/KsluzXW/Pl6nJIuStJO0160d3awbkyRJkqRBM+NGSrfQvTSJd5IHAQ/pqru763gdE/9b3DXmPdJPA/8O3EUnOb53vE5VtQRYAp13SrcqekmSJEnqczNxpPQC4OgkewA003e/A/xBU38ssKw5Xg0c1By/ENhhM85/O/CwiSqr6iY6U35PppOgSpIkSZK20owbKa2qVUlOAS5Ksg64Cngr8Okk7wA2vOsJ8I/Al5tFis4D7tyMSywBzktyU1UdMUGbs4A5VXXd5sS8YN4QbVdolSRJkqSNuCXMVkjyUeCqqvqnzWnfarWq3W5PcVSSJEmStH2abEuYGTdS2mtJltMZcf3jXsciSZIkSTOdSekWqqqDNt1KkiRJkrQ5ZuJCRz2TZDjJNb2OQ5IkSZL6hSOlmynJVv9bjawZZXjx0m0ZjrZDq13MSpIkSdpiAzdSmuQvkpzY9fmUJG9LcmqSa5KMJDmmqXtmkmVJvgJcO+Y8j09yVZKDp/kWJEmSJKlvDFxSCpwOvAYgyYPo7G/6U2AhsD9wFHBqkrlN+wOBt1XVkzacIMmTgS8Ax1XVFdMYuyRJkiT1lYGbvltVq5P8MskBwJ509jk9DPjXqloH/DzJRcDBwG3A5VV1Y9cp5gBfBn6/qq5lAkkWAYsAZu06Z2puRpIkSZJmuEEcKQX4FHAc8Do6I6eTuXPM51Hg/9FJZCdUVUuqqlVVrVk7DW1tnJIkSZLU1wY1KT0XeC6d0dBvAMuAY5LMSjIHeAZw+QR9fwO8BHhNkldOR7CSJEmS1K8GbvouQFX9JsmFwK1VtS7JucChwEqggD+pqv9MstcE/e9M8gLgW0nuqKqvTHa9BfOGaLsyqyRJkiRtJFXV6ximXbPA0ZXA0VX1/am+XqvVqna7PdWXkSRJkqTtUpLlVdUar27gpu8m2Rv4AXD+dCSkkiRJkqSJDdz03WbF3Mf3Og5JkiRJ0gCOlD5QSc5I8rJexyFJkiRJ/WDgRkp7YWTNKMOLl/Y6DE2x1S5mJUmSJG2xvh4pTTKc5PpmdPOGJGclOSrJJUm+n+SpSd6T5KSuPtckGW6OX5Pk6iQrk3ym69TPSPKdJD9y1FSSJEmStt4gjJQ+ETgaeD1wBfBK4DDghcC7gBXjdUqyD3Ay8LSqujnJ7l3Vc5tz7AV8BThnyqKXJEmSpD7W1yOljRuraqSq1gOr6Ky6W8AIMDxJvyOBs6vqZoCq+lVX3Zeqan2zaNKe43VOsihJO0l73drRbXIjkiRJktRvBiEpvbvreH3X5/V0Rorv5f7/Djtu4TkzXoOqWlJVrapqzdppaAvClSRJkqTBMQjTdzdlNfACgCQHAo9ryi8Azk3ywar6ZZLdx4yWbrYF84ZouwiOJEmSJG1kEEZKN+ULwO5JVgF/BNwAUFWrgFOAi5KsBD7YuxAlSZIkqT+l83qlplKr1ap2u93rMCRJ0ecsXAAAIABJREFUkiSpJ5Isr6rWeHWOlEqSJEmSesakVJIkSZLUMyalQJJvJ2k1x19LsluvY5IkSZKkQeDqu2NU1fO39TlH1owyvHjptj6tZoDVrrosSZIkTaovR0qTDCe5PslZSa5Lck6SnZI8K8lVSUaSnJ5k9jh9Vyd5RHP8miRXJ1mZ5DNN2ZwkX0hyRfPz9Om+P0mSJEnqF32ZlDaeDHy8qp4C3Aa8HTgDOKaqFtAZJX7zRJ2T7AOcDBxZVfsDb2uqPgycVlUHAy8FPjVldyBJkiRJfa6fk9KfVNUlzfFngWcBN1bVDU3ZmcAzJul/JHB2Vd0MUFW/asqPAj6aZAXwFWDXJLuM7ZxkUZJ2kva6taPb4HYkSZIkqf/08zulYzdgvRXYYxuc90HAIVV116QXr1oCLAGYPXe+m8FKkiRJ0jj6OSl9TJJDq+pS4JVAG/jDJE+sqh8ArwYumqT/BcC5ST5YVb9MsnszWvpN4K3AqQBJFlbViskCWTBviLYL3kiSJEnSRvp5+u73gOOTXAc8HDgNeB1wdpIRYD3wyYk6V9Uq4BTgoiQrgQ82VScArWYBpGuBN03hPUiSJElSX0tV/80sTTIMfLWq9u1xKAC0Wq1qt9u9DkOSJEmSeiLJ8qpqjVfXzyOlkiRJkqTtXF8mpVW1euwoaZLjkny0VzFJkiRJkjbWl0mpJEmSJGlm6JvVd5N8CXg0sCPw4apakuR1wDvpbAezEri7aft7wMnAQ4BfAsdW1c+TvAd4HPB44DHA/wYOAZ4HrAF+r6ruSfJ+4IXAvcA3q+qkyWIbWTPK8OKl2/iONROtdhVmSZIk6X76aaT09VV1ENACTkgyD3gv8HTgMGDvrrb/QWev0QOAzwF/0lX3BOBIOknnZ4ELq2oB8Gvgd5PsAbwE2Keq9gPeN7W3JUmSJEn9q5+S0hOarVsuozNi+mrg21X1X1X1G+Dfuto+CvhGszXMO4B9uuq+XlX3ACPALOC8pnwEGAZGgbuAf0ry+8Da8YJJsihJO0l73drRbXWPkiRJktRX+iIpTfJM4Cjg0KraH7gKuH6SLn8PfLQZAf1DOlN+N7gboKrWA/fUfXvmrAceXFX3Ak8FzgFewH1J6/1U1ZKqalVVa9ZOQ1t9b5IkSZLUz/rlndIh4JaqWptkLzrvgT4U+J1muu1twNF03ivd0H5Nc/zaLblQkl2Anarqa0kuAX60LW5AkiRJkgZRvySl5wFvSnId8D06U3h/BrwHuJTOQkcrutq/Bzg7yS3ABXQWN9pcDwO+nGRHIMDbN9Vhwbwh2i5wI0mSJEkbyX2zUzVVWq1WtdvtXochSZIkST2RZHlVtcar64t3SiVJkiRJM5NJqSRJkiSpZ0xKJUmSJEk9Y1IqSZIkSeqZGbv6bpJ3AHdX1UeSnAbsX1VHJjkSeAPwVeBddFbIXVpVf9r0uwP4BPB8Oiv0vgv4G+AxwIlV9ZUks4D3A88EZgMfq6p/aPZDfQ9wM7AvsBx4VW1itaiRNaMML166LW9ffWC1KzJLkiRJM3qkdBlweHPcAnZJskNTdgPwAeBIYCFwcJIXN213Bi6oqn2A24H3Ac8GXgL8RdPmDcBoVR0MHAy8McmGbWMOAE4E9gYeDzx9yu5QkiRJkvrcTE5KlwMHJdkVuJvOfqQtOknprcC3q+q/qupe4CzgGU2/39DZ1xRgBLioqu5pjoeb8ucAr0myAvgusAcwv6m7vKp+WlXr6ex9uqHP/SRZlKSdpL1u7eg2umVJkiRJ6i8zNiltEskbgeOA79AZOT0CeCKwepKu93RNt11PJ6GlSTI3TGcO8NaqWtj8PK6qvtnU3d11rnVMMAW6qpZUVauqWrN2GtrS25MkSZKkgTBjk9LGMuAk4OLm+E3AVcDlwO8keUTzfugrgIu24LzfAN7cTAcmyZOS7LxNI5ckSZIkzdyFjhrLgHcDl1bVnUnuApZV1c+SLAYu5L6Fjr68Bef9FJ1puVcmCfBfwIsn7TGJBfOGaLuojSRJkiRtJJtYOFbbQKvVqna73eswJEmSJKknkiyvqtZ4dTN9+q4kSZIkaQYzKZUkSZIk9YxJqSRJkiSpZ2b6QkfbhSQPbvZDHdfImlGGFy+dzpA0g6x2ESxJkiQNsL4bKU0ynOT6JGcluS7JOUl2SvKsJFclGUlyepLZSQ5O8sWm34uS/DrJQ5LsmORHTfkTkpyXZHmSZUn2asrPSPLJJN8F/qaHtyxJkiRJM1bfJaWNJwMfr6qnALcBbwfOAI6pqgV0RojfTGdP04VNn8OBa4CDgd8GvtuULwHeWlUH0dkT9eNd13kU8LSqevuU3o0kSZIk9al+nb77k6q6pDn+LPB/gBur6oam7Ezg+Kr6UJIfJnkK8FTgg8AzgFnAsiS7AE8Dzu5sVwrA7K7rnF1V68YLIMkiYBHArF3nbLs7kyRJkqQ+0q9J6djNV28F9pig7cXA84B7gP9LZ0R1FvAOOiPJt1bVwgn63jlhAFVL6IyyMnvufDeDlSRJkqRx9Ov03cckObQ5fiXQBoaTPLEpezVwUXO8DDgRuLSq/otO8vpk4Jqqug24McnRAOnYf7puQpIkSZL6Xb+OlH4POD7J6cC1wAnAZXSm4T4YuAL4ZNP2u8CedEZMAa4G/kdVbRjdPBb4RJKTgR2AzwErtySYBfOGaLvCqiRJkiRtpF+T0nur6lVjys4HDhjbsKp+Tdd7olW1aEz9jcBzx+l33DaJVJIkSZIGWL9O35UkSZIkzQB9N1JaVauBfXsdhyRJkiRp0wZ2pDTJiUl22lbtJEmSJElbLvet5zNYkqwGWlV187ZoN5nZc+fX3Nd+aGu7a8CsdlEsSZIk9Zkky6uqNV7dQIyUJtk5ydIkK5Nck+TPgUcCFya5sGnziSTtJKuSvLcpO2Gcds9JcmmSK5OcnWSXXt2XJEmSJM10A5GU0lk996aq2r+q9gU+BNwEHFFVRzRt3t1k7vsBv5Nkv6r6SHe7JI8ATgaOqqoD6ex/+vZpvxtJkiRJ6hODkpSOAM9O8oEkh1fV6DhtXp7kSuAqYB9g73HaHNKUX5JkBfBa4LHjXTDJombktb1u7XiXkyRJkiT13eq746mqG5IcCDwfeF+S87vrkzwOOAk4uKpuSXIGsOM4pwrwrap6xWZccwmwBDrvlD7AW5AkSZKkvjQQI6VJHgmsrarPAqcCBwK3Aw9rmuwK3AmMJtkTeF5X9+52lwFPT/LE5rw7J3nSNNyCJEmSJPWlgRgpBRYApyZZD9wDvBk4FDgvyU3N+6JXAdcDPwEu6eq7ZEy744B/TTK7qT8ZuGHSi88bou2KqpIkSZK0kYHdEmY6tVqtarfbvQ5DkiRJknpi4LeEkSRJkiRtn0xKJUmSJEk9M3BJaZLdkrylOX5kknN6HZMkSZIkDaqBe6c0yTDw1arad7quOXvu/Jr72g9N1+XUJ1a7OJYkSZL6xGTvlA7K6rvd3g88IckK4PvAU6pq32ZV3RcDOwPzgb8FHgK8GrgbeH5V/SrJE4CPAXOAtcAbq+r66b8NSZIkSZr5Bm76LrAY+GFVLQTeMaZuX+D3gYOBU+jsbXoAcCnwmqbNEuCtVXUQcBLw8WmJWpIkSZL60CCOlE7mwqq6Hbg9ySjw7035CLBfkl2ApwFnJ9nQZ/bGp4Eki4BFALN2nTOlQUuSJEnSTGVSen93dx2v7/q8ns6/1YOAW5tR1klV1RI6o6rMnjt/sF7clSRJkqTNNIjTd28HHrY1HavqNuDGJEcDpGP/bRmcJEmSJA2SgRsprapfJrkkyTXAdVtximOBTyQ5GdgB+BywcrIOC+YN0XYlVUmSJEnayMBtCdMLrVar2u12r8OQJEmSpJ6YbEuYQZy+K0mSJEnaTpiUSpIkSZJ6ZuCT0iS7JXlLr+OQJEmSpEE0cAsdjWM34C3Ax6fqAiNrRhlevHSqTi+Na7WLa0mSJGkGGPiRUuD9wBOSrEjy6SQvBEhybpLTm+PXJzmlOX57kmuanxN7GLckSZIkzXgmpbAY+GFVLQS+ARzelM8D9m6ODwcuTnIQ8Drgt4FDgDcmOWCa45UkSZKkvmFSen/LgMOT7A1cC/w8yVzgUOA7wGHAuVV1Z1XdAXyR+5LY+0myKEk7SXvd2tFpCl+SJEmSZhbfKe1SVWuS7AY8F7gY2B14OXBHVd2eZEvOtQRYAjB77nw3g5UkSZKkcZiUwu3Aw7o+XwacCBwJ7AGc0/xAZyT1jCTvBwK8BHj1pi6wYN4QbRedkSRJkqSNDHxSWlW/THJJkmuAr9NJPJ9TVT9I8mM6o6XLmrZXJjkDuLzp/qmquqoXcUuSJElSP0iVM0unWqvVqna73eswJEmSJKknkiyvqtZ4dS50JEmSJEnqGZNSSZIkSVLPmJRKkiRJknpm4Bc6mg4ja0YZXry012FowK12BWhJkiRth/p6pDTJcJLrk5yR5IYkZyU5qllt9/tJntr8XJrkqiTfSfLkpu9xSb6Y5Lym7d805a9P8qGua7wxyWm9ukdJkiRJmsn6OiltPBH4O2Cv5ueVwGHAScC7gOuBw6vqAODPgL/q6rsQOAZYAByT5NHA54HfS7JD0+Z1wOnTcB+SJEmS1HcGYfrujVU1ApBkFXB+VVWSEWAYGALOTDIfKGCHrr7nV9Vo0/da4LFV9ZMkFwAvSHIdsMOG83dLsghYBDBr1zlTd3eSJEmSNIMNwkjp3V3H67s+r6eTlP8lcGFV7Qv8HrDjBH3XcV8S/yngODqjpJ8e76JVtaSqWlXVmrXT0AO9B0mSJEnqS4MwUropQ8Ca5vi4zelQVd9tpvIeCOy3qfYL5g3RdpEZSZIkSdrIIIyUbsrfAH+d5Cq2LEn/PHBJVd0yNWFJkiRJUv9LVfU6hhkpyVeB06rq/E21bbVa1W63pyEqSZIkSdr+JFleVa3x6hwp3UJJdktyA/DrzUlIJUmSJEkT853SLVRVtwJP6nUckiRJktQPHCmVJEmSJPWMI6XbQJJZVbVuovqRNaMML146nSFJW2W1q0RLkiRpmg3cSGmSv0hyYtfnU5K8Lck7klyR5Ook7+2q/1KS5UlWJVnUVX5Hkr9LshI4dJpvQ5IkSZL6wsAlpcDpwGsAkjwI+APgP4H5wFOBhcBBSZ7RtH99VR0EtIATkuzRlO8MfLeq9q+q/5jOG5AkSZKkfjFw03eranWSXyY5ANgTuAo4GHhOcwywC50k9WI6iehLmvJHN+W/BNYBX5joOs2o6iKAWbvOmYI7kSRJkqSZb+CS0sangOOA/0Fn5PRZwF9X1T90N0ryTOAo4NCqWpvk28COTfVdk71HWlVLgCUAs+fOdzNYSZIkSRrHoCal5wJ/AewAvBK4F/jLJGdV1R1J5gH3AEPALU1CuhdwyNZcbMG8IdouICNJkiRJGxnIpLSqfpPkQuDWZrTzm0meAlyaBOAO4FXAecCbklwHfA+4rFcxS5IkSVI/GsiktFng6BDg6A1lVfVh4MPjNH/eeOeoql2mJjpJkiRJGhwDt/pukr2BHwDnV9X3ex2PJEmSJA2ygRspraprgcd3lyW5Y2tGPpv9TpdU1dptFZ8kSZIkDZKBGyndxk4Edup1EJIkSZI0Uw3cSOlkkuwCfBl4OJ2VeU+uqi8n2Rn4PPAoYBbwl3T2OH0kcGGSm6vqiInOO7JmlOHFS6c8fmlbWu2K0ZIkSZoGJqX3dxfwkqq6LckjgMuSfAV4LnBTVf0uQJKhqhpN8nbgiKq6uYcxS5IkSdKM5fTd+wvwV0muBv4vMI/OiOgI8OwkH0hyeFWNbvJEyaIk7STtdWs32VySJEmSBpJJ6f0dC8wBDqqqhcDPgR2r6gbgQDrJ6fuS/NmmTlRVS6qqVVWtWTsNTWnQkiRJkjRTOX33/oaAX1TVPUmOAB4LkOSRwK+q6rNJbgX+V9P+duBhgNN3JUmSJGkrmJTe31nAvycZAdrA9U35AuDUJOuBe4A3N+VLgPOS3DTZQkcL5g3RdtEYSZIkSdqISSmwYY/SZsGiQ8dpshr4xjj9/h74+ykNTpIkSZL6mO+USpIkSZJ6xqRUkiRJktQzA52UJjkhyXVJbkmyeAv6DSd55VTGJkmSJEmDYNDfKX0LcFRV/XS8yiQPrqp7x6kaBl4J/MsUxiZJkiRJfW9gk9IknwQeD3w9yenAE6rqj5KcAdwFHABckuTLwIebbgU8A3g/8JQkK4Azq+q0ya41smaU4cVLp+hOpOmx2hWkJUmSNAUGNimtqjcleS5wBPCCMdWPAp5WVeuS/DtwfFVdkmQXOgnrYuCkqhrbT5IkSZK0BQb6ndJJnF1V65rjS4APJjkB2G2C6bwbSbIoSTtJe93a0SkLVJIkSZJmMpPS8d254aCq3g/8L+ChdKbz7rU5J6iqJVXVqqrWrJ2GpihMSZIkSZrZBnb67uZK8oSqGgFGkhwM7AX8BHhYbyOTJEmSpJnPpHTTTkxyBLAeWAV8vTlel2QlcMamFjpaMG+ItovESJIkSdJGUlW9jqHvtVqtarfbvQ5DkiRJknoiyfKqao1X5zulkiRJkqSeMSmVJEmSJPWMSakkSZIkqWdc6GgajKwZZXjx0l6HIT0gq12sS5IkSVNgoEZKk7wqyeVJViT5hyTHJzm1q/64JB+doO2spvyOJKckWZnksiR79up+JEmSJGmmG5ikNMlTgGOAp1fVQmAdcAfwkq5mxwCfm6DtsU2bnYHLqmp/4GLgjdN0C5IkSZLUdwZp+u6zgIOAK5IAPBT4BfCjJIcA3wf2Ai4Bjp+gLcBvgK82x8uBZ493sSSLgEUAs3ads+3vRpIkSZL6wCAlpQHOrKp33q8weT3wcuB64NyqqnQy0Y3aNu6p+zZ3XccE/4ZVtQRYAjB77nw3g5UkSZKkcQzM9F3gfOBlSX4LIMnuSR4LnAu8CHgF8LlNtJUkSZIkbUMDM1JaVdcmORn4ZpIHAfcAx1fVj5NcB+xdVZdP1hb48dZce8G8IdquXCpJkiRJG8l9M1E1VVqtVrXb7V6HIUmSJEk9kWR5VbXGqxuk6buSJEmSpO2MSakkSZIkqWdMShtJ7mh+PzLJOc3xcUk+2tvIJEmSJKl/DcxCR5urqm4CXrYtzzmyZpThxUu35Sml7cJqF/CSJEnSA+RI6RhJhpNcM0757ya5NMkjkjynOb4yydlJdulFrJIkSZI005mUboYkLwEWA89vik4GjqqqA4E28PZexSZJkiRJM5nTdzftSKAFPKeqbkvyAmBv4JIkAA8BLh3bKckiYBHArF3nTF+0kiRJkjSDmJRu2g+BxwNPojMqGuBbVfWKyTpV1RJgCcDsufPdDFaSJEmSxuH03U37MfBS4J+T7ANcBjw9yRMBkuyc5Em9DFCSJEmSZipHSjdDVV2f5FjgbOD3gOOAf00yu2lyMnDDRP0XzBui7SqlkiRJkrSRVDmzdKq1Wq1qt9u9DkOSJEmSeiLJ8qpqjVfn9F1JkiRJUs+YlEqSJEmSesakdIwkq5M8otdxSJIkSdIgcKGjaTCyZpThxUt7HYbUU6td7EuSJEnjGOiR0iRfSrI8yaoki8bU7ZxkaZKVSa5JckxT/qwkVyUZSXJ61wq8kiRJkqQtNNBJKfD6qjoIaAEnJNmjq+65wE1VtX9V7Qucl2RH4AzgmKpaQGek+c3THbQkSZIk9YtBT0pPSLISuAx4NDC/q24EeHaSDyQ5vKpGgScDN1bVhj1JzwSeMd6JkyxK0k7SXrd2dApvQZIkSZJmroFNSpM8EzgKOLSq9geuAnbcUN8kngfSSU7fl+TPtuT8VbWkqlpV1Zq109C2C1ySJEmS+sjAJqXAEHBLVa1NshdwSHdlkkcCa6vqs8CpdBLU7wHDSZ7YNHs1cNE0xixJkiRJfWWQV989D3hTkuvoJJuXjalfAJyaZD1wD/DmqroryeuAs5M8GLgC+OSmLrRg3hBtVx6VJEmSpI0MbFJaVXcDzxunarj5/Y3mZ2y/84EDpi4ySZIkSRocgzx9V5IkSZLUYyalkiRJkqSeMSmVJEmSJPXMwL5TOp1G1owyvHhpr8OQpAmtdjE2SZLUI46UAknenuSa5ufEJMNJrkvyj0lWJflmkoc2bZ+Q5Lwky5Msa7aTkSRJkiRthYFPSpMcBLwO+G06e5W+EXg4MB/4WFXtA9wKvLTpsgR4a1UdBJwEfHzag5YkSZKkPuH0XTgMOLeq7gRI8kXgcODGqlrRtFkODCfZBXganX1KN/SfPd5JkywCFgHM2nXO1EUvSZIkSTOYSenE7u46Xgc8lM7I8q1VtXBTnatqCZ1RVWbPnV9TEqEkSZIkzXAmpbAMOCPJ+4EALwFeTTPK2a2qbktyY5Kjq+rsdIZL96uqlZNdYMG8IdouIiJJkiRJGxn4d0qr6krgDOBy4LvAp4BbJulyLPCGJCuBVcCLpjpGSZIkSepXqXJm6VRrtVrVbrd7HYYkSZIk9USS5VXVGq9u4EdKJUmSJEm9Y1IqSZIkSeoZk9LNkOS4JB/tdRySJEmS1G9cfXcajKwZZXjx0l6HIUkPyGpXEZckSVOgL0ZKkwwnuT7JWUmuS3JOkp2SHJTkoiTLk3wjydym/cIklyW5Osm5SR7elH87yYeTrEhyTZKnjnOtOUm+kOSK5ufp032/kiRJktQv+iIpbTwZ+HhVPQW4DTge+HvgZVV1EHA6cErT9p+BP62q/YAR4M+7zrNTVS0E3tL0GevDwGlVdTDwUjpbyEiSJEmStkI/Td/9SVVd0hx/FngXsC/wrSQAs4CfJRkCdquqi5q2ZwJnd53nXwGq6uIkuybZbcx1jgL2bs4JsGuSXarqju5GSRYBiwBm7TpnW9yfJEmSJPWdfkpKx264ejuwqqoO7S5sktItOc/Yzw8CDqmquyY9SdUSYAnA7Lnz3QxWkiRJksbRT0npY5IcWlWXAq8ELgPeuKEsyQ7Ak6pqVZJbkhxeVcuAVwMXdZ3nGODCJIcBo1U12jUqCvBN4K3AqdB5P7WqVkwW2IJ5Q7RdIESSJEmSNtJPSen3gOOTnA5cS+d90m8AH2lGRx8MfAhYBbwW+GSSnYAfAa/rOs9dSa4CdgBeP851TgA+luTq5pwXA2+amluSJEmSpP6Wqpk/szTJMPDVqtr3AZ7n28BJVdXeBmH9t1arVe32Nj2lJEmSJM0YSZZXVWu8un5afVeSJEmSNMP0xfTdqlpNZ6XdB3qeZz7gYCRJkiRJm82RUkmSJElSz/TFSOnWaN5D/TrwH8DTgDXAi4BHAh8D5gBrgTcC3wd+ADweGAJ+CRzR7GV6MfCGqvr+RNcaWTPK8OKlU3YvkjQTrXZVckmShCOl84GPVdU+wK3AS+nsLfrWqjoIOAn4eFWto7O6797AYcCVwOFJZgOPniwhlSRJkiRNbGBHShs3du0xuhwYpjNqenbX3qSzm9/LgGcAjwP+ms4I6kXAFeOdOMkiYBHArF3nTEHokiRJkjTzDfpI6d1dx+uA3YFbq2ph189TmvqLgcOBpwJfA3YDnkknWd1IVS2pqlZVtWbtNDRlNyBJkiRJM9mgJ6Vj3QbcmORogHTs39RdTmcUdX1V3QWsAP6QTrIqSZIkSdoKgz59dzzHAp9IcjKwA/A5YGVV3Z3kJ8BlTbtlwCuAkU2dcMG8Idou6CFJkiRJG0lV9TqGvtdqtardbvc6DEmSJEnqiSTLq6o1Xp3TdyVJkiRJPWNSKkmSJEnqGZNSSZIkSVLPmJQ+AElcKEqSJEmSHgCTqi5JhoHz6Kyw+zTgCuDTwHuB36KzMu/zgScAjwf+H50VeCc1smaU4cVLpyRmSZIkSVo9g3f7MCnd2BOBo4HX00lKXwkcBrwQeBed/Un3Bg6rql/3KkhJkiRJ6gdO393YjVU1UlXrgVXA+dXZN2cEGG7afGVTCWmSRUnaSdrr1o5ObcSSJEmSNEOZlG7s7q7j9V2f13PfyPKdmzpJVS2pqlZVtWbtNLSNQ5QkSZKk/mBSKkmSJEnqGd8pfYCSrKiqhZO1WTBviPYMfvFYkiRJkqaKSWmXqloN7Nv1+biJ6rrKJ01IJUmSJEkTc/quJEmSJKln0llYVlMpye3A93odh7bKI4Cbex2EtprPb+by2c1cPruZzec3c/nsZq5BeXaPrao541U4fXd6fK+qWr0OQlsuSdtnN3P5/GYun93M5bOb2Xx+M5fPbuby2Tl9V5IkSZLUQyalkiRJkqSeMSmdHkt6HYC2ms9uZvP5zVw+u5nLZzez+fxmLp/dzDXwz86FjiRJkiRJPeNIqSRJkiSpZ0xKp1CS5yb5XpIfJFnc63i0aUlWJxlJsiJJuynbPcm3kny/+f3wXscpSHJ6kl8kuaarbNxnlY6PNN/Fq5Mc2LvIBRM+v/ckWdN8/1YkeX5X3Tub5/e9JP+zN1ELIMmjk1yY5Nokq5K8rSn3+7edm+TZ+d3bziXZMcnlSVY2z+69Tfnjkny3eUb/luQhTfns5vMPmvrhXsY/6CZ5fmckubHru7ewKR+4v5smpVMkySzgY8DzgL2BVyTZu7dRaTMdUVULu5bmXgycX1XzgfObz+q9M4Dnjimb6Fk9D5jf/CwCPjFNMWpiZ7Dx8wM4rfn+LayqrwE0fzv/ANin6fPx5m+seuNe4I+ram/gEOD45hn5/dv+TfTswO/e9u5u4Miq2h9YCDw3ySHAB+g8uycCtwBvaNq/AbilKT+taafemej5Abyj67u3oikbuL+bJqVT56nAD6rqR1X1G+BzwIt6HJO2zouAM5vjM4EX9zAWNarqYuBXY4onelYvAv65Oi4Ddksyd3oi1XgmeH4TeRHwuaq6u6puBH5A52+seqCqflZVVzbHtwPXAfPw+7fdm+TZTcTv3nai+f7c0Xzcofkp4EjgnKZ87Pduw/fxHOBZSTJN4WqMSZ7fRAbu76ZJ6dSZB/yk6/NPmfwPv7YPBXxAQU2YAAADC0lEQVQzyfIki5qyPavqZ83xfwJ79iY0bYaJnpXfx5njj5qpSqd3TZX3+W2nmimBBwDfxe/fjDLm2YHfve1ekllJVgC/AL4F/BC4tarubZp0P5//fnZN/Siwx/RGrG5jn19VbfjundJ8905LMrspG7jvnkmpdH+HVdWBdKZNHJ/kGd2V1Vmu2iWrZwCf1Yz0CeAJdKY2/Qz4u96Go8kk2QX4AnBiVd3WXef3b/s2zrPzuzcDVNW6qloIPIrOiPVePQ5JW2Ds80uyL/BOOs/xYGB34E97GGJPmZROnTXAo7s+P6op03asqtY0v38BnEvnj/7PN0yZaH7/oncRahMmelZ+H2eAqvp58x/t9cA/ct80QZ/fdibJDnSSmrOq6otNsd+/GWC8Z+d3b2apqluBC4FD6UzrfHBT1f18/vvZNfVDwC+nOVSNo+v5PbeZUl9VdTfwaQb4u2dSOnWuAOY3q6I9hM5CAV/pcUyaRJKdkzxswzHwHOAaOs/ttU2z1wJf7k2E2gwTPauvAK9pVrM7BBjtmmao7cSY92VeQuf7B53n9wfNapKPo7Pww+XTHZ86mvfS/gm4rqo+2FXl9287N9Gz87u3/UsyJ8luzfFDgWfTeSf4QuBlTbOx37sN38eXARc0MxjUAxM8v+u7/kde6LwP3P3dG6i/mw/edBNtjaq6N8kfAd8AZgGnV9WqHoelye0JnNusA/Bg4F+q6rwkVwCfT/IG4MfAy3sYoxpJ/hV4JvCIJD8F/hx4P+M/q68Bz6ezSMda4HXTHrDuZ4Ln98xmOfwCVgN/CFBVq5J8HriWzuqhx1fVul7ELQCeDrwaGGnejwJ4F37/ZoKJnt0r/O5t9+YCZzarHz8I+HxVfTXJtcDnkrwPuIrO/3Sg+f2ZJD+gs6jcH/QiaP23iZ7fBUnmAAFWAG9q2g/c3834P00kSZIkSb3i9F1JkiRJUs+YlEqSJEmSesakVJIkSZLUMyalkiRJkqSeMSmVJEmSJPWMSakkSZIkqWdMSiVJkiRJPWNSKkmSJEnqmf8PwAD1i/fQsn4AAAAASUVORK5CYII=\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": [ "## Reviewing what we learned\n", "* The difference between dynamic and static websites\n", "* How to install `Selenium` in Colab\n", "* How to make a `get` request in `Selenium`\n", "* Navigating markup (HTML) in `Selenium`\n", "* Using `Selenium` and `BeautifulSoup` together to extract text\n", "* Pulling text and populating a `DataFrame`\n", "* Plotting word frequency, excluding stopwards and punctuation\n", "\n", "As a challenge, try scraping the New York Times archive. In `Selenium`, though we didn't cover it here, you can input queries into search bars and navigate through dynamically generated results. Read more about it [here](https://selenium-python.readthedocs.io/navigating.html#interacting-with-the-page).\n", "\n", "Once you have a `DataFrame`, try doing what we did above and calculate word frequency, but not for a given day's articles, but rather for a particular search. You could even package all of it into a single function whose only input is a search string. That way, we could visualize the most important subtopics for any given topic in the history of the Times. If you give that a try and get confused or want to show off what you did, feel free to reach out to me at peter.nadel@tufts.edu" ], "metadata": { "id": "B7R5xsv7pDts" } }, { "cell_type": "markdown", "source": [ "# Thanks for reading" ], "metadata": { "id": "LbXTEYiyrHhc" } } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.9 64-bit", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.8.9" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } }, "colab": { "provenance": [], "collapsed_sections": [ "siaXoLP3laBt" ] } }, "nbformat": 4, "nbformat_minor": 0 }