{
  "cells": [
    {
      "cell_type": "code",
      "source": [
        "!date\n",
        "!python --version"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Rsu3rv5lhpUt",
        "outputId": "a387fb6f-7e53-4f19-cac0-01d4e0d846cb"
      },
      "execution_count": 1,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Wed Jun  3 02:13:29 AM UTC 2026\n",
            "Python 3.12.13\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "v8IkNtWTYpm6"
      },
      "source": [
        "# トピックモデルによるクラスタリング\n",
        "トピックモデルとは文書中の単語出現分布を元に傾向（≒トピックらしきもの）を観察しようとするアプローチで、クラスタリングの一種である。なお、一般的なクラスタリング（例えば[k平均法](https://ja.wikipedia.org/wiki/K平均法)）では一つのサンプルが一つのクラスタに属するという前提でグルーピングを行うのに対し、トピックモデルでは一つのサンプルが複数のクラスタを内包しているという前提でグルーピングを行う。次の例を眺めるとイメージをつかみやすいだろう。\n",
        "\n",
        "- 例1: [トピックモデル入門：WikipediaをLDAモデル化してみた](https://recruit.gmo.jp/engineer/jisedai/blog/topic-model/)\n",
        "- 例2: [Wikipedia: Topic model](https://en.wikipedia.org/wiki/Topic_model)\n",
        "\n",
        "基本的には文書を BoW (CountVectrizor) やそれの重みを調整した TF-IDF 等の「文書単語行列」を作成し、ここから文書館類似度や単語間類似度を元に集約（≒次元削減）を試みる。文書単語行列の作成方法や次元削減方法、類似度の求め方などで様々なアルゴリズムが提案されている。ここでは (1) BowベースのLDAと、(2) TF-IDFベースのLDAを行い、それぞれどのようなトピックが出てくるのか眺めてみよう。\n",
        "\n",
        "なお、トピックモデルの注意点として、**トピックそのものは人手による解釈が求められる** 点が挙げられる。例えば先に上げた[トピックモデル入門：WikipediaをLDAモデル化してみた](https://recruit.gmo.jp/engineer/jisedai/blog/topic-model/)における図2（下図）では「政治」「スポーツ」「国際」といったトピックが並んでいるが、実際には「4-1. トピック観察」を行う必要がある。実際に観察してみよう。"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# spacy, ginza インストール\n",
        "!pip install -U ginza ja_ginza\n",
        "\n",
        "# plotlyで作図した図をファイル出力するためのパッケージ\n",
        "#!pip install -U kaleido"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "PNA5onfzhrrq",
        "outputId": "47874e01-bdcf-4955-950b-4c1972e1b1a0"
      },
      "execution_count": 2,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Collecting ginza\n",
            "  Downloading ginza-5.2.0-py3-none-any.whl.metadata (448 bytes)\n",
            "Collecting ja_ginza\n",
            "  Downloading ja_ginza-5.2.0-py3-none-any.whl.metadata (5.8 kB)\n",
            "Requirement already satisfied: spacy<4.0.0,>=3.4.4 in /usr/local/lib/python3.12/dist-packages (from ginza) (3.8.14)\n",
            "Collecting plac>=1.3.3 (from ginza)\n",
            "  Downloading plac-1.4.5-py2.py3-none-any.whl.metadata (5.9 kB)\n",
            "Collecting SudachiPy<0.7.0,>=0.6.2 (from ginza)\n",
            "  Downloading sudachipy-0.6.11-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (12 kB)\n",
            "Collecting SudachiDict-core>=20210802 (from ginza)\n",
            "  Downloading sudachidict_core-20260428-py3-none-any.whl.metadata (2.7 kB)\n",
            "Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /usr/local/lib/python3.12/dist-packages (from spacy<4.0.0,>=3.4.4->ginza) (3.0.12)\n",
            "Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /usr/local/lib/python3.12/dist-packages (from spacy<4.0.0,>=3.4.4->ginza) (1.0.5)\n",
            "Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.12/dist-packages (from spacy<4.0.0,>=3.4.4->ginza) (1.0.15)\n",
            "Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.12/dist-packages (from spacy<4.0.0,>=3.4.4->ginza) (2.0.13)\n",
            "Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.12/dist-packages (from spacy<4.0.0,>=3.4.4->ginza) (3.0.13)\n",
            "Requirement already satisfied: thinc<8.4.0,>=8.3.12 in /usr/local/lib/python3.12/dist-packages (from spacy<4.0.0,>=3.4.4->ginza) (8.3.13)\n",
            "Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /usr/local/lib/python3.12/dist-packages (from spacy<4.0.0,>=3.4.4->ginza) (1.1.3)\n",
            "Requirement already satisfied: srsly<3.0.0,>=2.5.3 in /usr/local/lib/python3.12/dist-packages (from spacy<4.0.0,>=3.4.4->ginza) (2.5.3)\n",
            "Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /usr/local/lib/python3.12/dist-packages (from spacy<4.0.0,>=3.4.4->ginza) (2.0.10)\n",
            "Requirement already satisfied: weasel<2.0.0,>=1.0.0 in /usr/local/lib/python3.12/dist-packages (from spacy<4.0.0,>=3.4.4->ginza) (1.0.0)\n",
            "Requirement already satisfied: confection<2.0.0,>=1.3.2 in /usr/local/lib/python3.12/dist-packages (from spacy<4.0.0,>=3.4.4->ginza) (1.3.3)\n",
            "Requirement already satisfied: typer<1.0.0,>=0.3.0 in /usr/local/lib/python3.12/dist-packages (from spacy<4.0.0,>=3.4.4->ginza) (0.25.1)\n",
            "Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.12/dist-packages (from spacy<4.0.0,>=3.4.4->ginza) (4.67.3)\n",
            "Requirement already satisfied: numpy>=1.19.0 in /usr/local/lib/python3.12/dist-packages (from spacy<4.0.0,>=3.4.4->ginza) (2.0.2)\n",
            "Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.12/dist-packages (from spacy<4.0.0,>=3.4.4->ginza) (2.32.4)\n",
            "Requirement already satisfied: pydantic<3.0.0,>=2.0.0 in /usr/local/lib/python3.12/dist-packages (from spacy<4.0.0,>=3.4.4->ginza) (2.12.3)\n",
            "Requirement already satisfied: jinja2 in /usr/local/lib/python3.12/dist-packages (from spacy<4.0.0,>=3.4.4->ginza) (3.1.6)\n",
            "Requirement already satisfied: setuptools in /usr/local/lib/python3.12/dist-packages (from spacy<4.0.0,>=3.4.4->ginza) (75.2.0)\n",
            "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.12/dist-packages (from spacy<4.0.0,>=3.4.4->ginza) (26.2)\n",
            "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.12/dist-packages (from pydantic<3.0.0,>=2.0.0->spacy<4.0.0,>=3.4.4->ginza) (0.7.0)\n",
            "Requirement already satisfied: pydantic-core==2.41.4 in /usr/local/lib/python3.12/dist-packages (from pydantic<3.0.0,>=2.0.0->spacy<4.0.0,>=3.4.4->ginza) (2.41.4)\n",
            "Requirement already satisfied: typing-extensions>=4.14.1 in /usr/local/lib/python3.12/dist-packages (from pydantic<3.0.0,>=2.0.0->spacy<4.0.0,>=3.4.4->ginza) (4.15.0)\n",
            "Requirement already satisfied: typing-inspection>=0.4.2 in /usr/local/lib/python3.12/dist-packages (from pydantic<3.0.0,>=2.0.0->spacy<4.0.0,>=3.4.4->ginza) (0.4.2)\n",
            "Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests<3.0.0,>=2.13.0->spacy<4.0.0,>=3.4.4->ginza) (3.4.7)\n",
            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.12/dist-packages (from requests<3.0.0,>=2.13.0->spacy<4.0.0,>=3.4.4->ginza) (3.15)\n",
            "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests<3.0.0,>=2.13.0->spacy<4.0.0,>=3.4.4->ginza) (2.5.0)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.12/dist-packages (from requests<3.0.0,>=2.13.0->spacy<4.0.0,>=3.4.4->ginza) (2026.5.20)\n",
            "Requirement already satisfied: blis<1.4.0,>=1.3.0 in /usr/local/lib/python3.12/dist-packages (from thinc<8.4.0,>=8.3.12->spacy<4.0.0,>=3.4.4->ginza) (1.3.3)\n",
            "Requirement already satisfied: click>=8.2.1 in /usr/local/lib/python3.12/dist-packages (from typer<1.0.0,>=0.3.0->spacy<4.0.0,>=3.4.4->ginza) (8.4.0)\n",
            "Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.12/dist-packages (from typer<1.0.0,>=0.3.0->spacy<4.0.0,>=3.4.4->ginza) (1.5.4)\n",
            "Requirement already satisfied: rich>=13.8.0 in /usr/local/lib/python3.12/dist-packages (from typer<1.0.0,>=0.3.0->spacy<4.0.0,>=3.4.4->ginza) (13.9.4)\n",
            "Requirement already satisfied: annotated-doc>=0.0.2 in /usr/local/lib/python3.12/dist-packages (from typer<1.0.0,>=0.3.0->spacy<4.0.0,>=3.4.4->ginza) (0.0.4)\n",
            "Requirement already satisfied: cloudpathlib>=0.7.0 in /usr/local/lib/python3.12/dist-packages (from weasel<2.0.0,>=1.0.0->spacy<4.0.0,>=3.4.4->ginza) (0.24.0)\n",
            "Requirement already satisfied: smart-open>=5.2.1 in /usr/local/lib/python3.12/dist-packages (from weasel<2.0.0,>=1.0.0->spacy<4.0.0,>=3.4.4->ginza) (7.6.1)\n",
            "Requirement already satisfied: httpx>=0.24.0 in /usr/local/lib/python3.12/dist-packages (from weasel<2.0.0,>=1.0.0->spacy<4.0.0,>=3.4.4->ginza) (0.28.1)\n",
            "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.12/dist-packages (from jinja2->spacy<4.0.0,>=3.4.4->ginza) (3.0.3)\n",
            "Requirement already satisfied: anyio in /usr/local/lib/python3.12/dist-packages (from httpx>=0.24.0->weasel<2.0.0,>=1.0.0->spacy<4.0.0,>=3.4.4->ginza) (4.13.0)\n",
            "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.12/dist-packages (from httpx>=0.24.0->weasel<2.0.0,>=1.0.0->spacy<4.0.0,>=3.4.4->ginza) (1.0.9)\n",
            "Requirement already satisfied: h11>=0.16 in /usr/local/lib/python3.12/dist-packages (from httpcore==1.*->httpx>=0.24.0->weasel<2.0.0,>=1.0.0->spacy<4.0.0,>=3.4.4->ginza) (0.16.0)\n",
            "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.12/dist-packages (from rich>=13.8.0->typer<1.0.0,>=0.3.0->spacy<4.0.0,>=3.4.4->ginza) (4.2.0)\n",
            "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.12/dist-packages (from rich>=13.8.0->typer<1.0.0,>=0.3.0->spacy<4.0.0,>=3.4.4->ginza) (2.20.0)\n",
            "Requirement already satisfied: wrapt in /usr/local/lib/python3.12/dist-packages (from smart-open>=5.2.1->weasel<2.0.0,>=1.0.0->spacy<4.0.0,>=3.4.4->ginza) (2.2.0)\n",
            "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.12/dist-packages (from markdown-it-py>=2.2.0->rich>=13.8.0->typer<1.0.0,>=0.3.0->spacy<4.0.0,>=3.4.4->ginza) (0.1.2)\n",
            "Downloading ginza-5.2.0-py3-none-any.whl (21 kB)\n",
            "Downloading ja_ginza-5.2.0-py3-none-any.whl (59.1 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m59.1/59.1 MB\u001b[0m \u001b[31m8.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading plac-1.4.5-py2.py3-none-any.whl (22 kB)\n",
            "Downloading sudachidict_core-20260428-py3-none-any.whl (72.2 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m72.2/72.2 MB\u001b[0m \u001b[31m8.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading sudachipy-0.6.11-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.6 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.6/1.6 MB\u001b[0m \u001b[31m24.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hInstalling collected packages: SudachiPy, plac, SudachiDict-core, ginza, ja_ginza\n",
            "Successfully installed SudachiDict-core-20260428 SudachiPy-0.6.11 ginza-5.2.0 ja_ginza-5.2.0 plac-1.4.5\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zXJ0jzI1Ypm-"
      },
      "source": [
        "## データの準備\n",
        "これまで見てきたいつものやつ。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "0DaZZaFJYpm-",
        "outputId": "fb392f64-c091-455b-b66d-8d90452269a3"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n",
            "                                 Dload  Upload   Total   Spent    Left  Speed\n",
            "100 34834  100 34834    0     0  14587      0  0:00:02  0:00:02 --:--:-- 14593\n"
          ]
        }
      ],
      "source": [
        "!curl -O https://ie.u-ryukyu.ac.jp/~tnal/2022/dm/static/r_assesment.pkl"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 204
        },
        "id": "qCxzwznFYpnA",
        "outputId": "e10559e7-4d80-4ed7-8faa-258db30179cb"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "   title  grade  required     q_id                       comment\n",
              "0  工業数学Ⅰ      1      True  Q21 (1)                          特になし\n",
              "1  工業数学Ⅰ      1      True  Q21 (2)            正直わかりずらい。むだに間があるし。\n",
              "2  工業数学Ⅰ      1      True  Q21 (2)          例題を取り入れて理解しやすくしてほしい。\n",
              "3  工業数学Ⅰ      1      True  Q21 (2)                          特になし\n",
              "4  工業数学Ⅰ      1      True  Q21 (2)  スライドに書く文字をもう少しわかりやすくして欲しいです。"
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-3fa6b08c-8a71-42ca-925c-beab68eb20e9\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>title</th>\n",
              "      <th>grade</th>\n",
              "      <th>required</th>\n",
              "      <th>q_id</th>\n",
              "      <th>comment</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>工業数学Ⅰ</td>\n",
              "      <td>1</td>\n",
              "      <td>True</td>\n",
              "      <td>Q21 (1)</td>\n",
              "      <td>特になし</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>工業数学Ⅰ</td>\n",
              "      <td>1</td>\n",
              "      <td>True</td>\n",
              "      <td>Q21 (2)</td>\n",
              "      <td>正直わかりずらい。むだに間があるし。</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>工業数学Ⅰ</td>\n",
              "      <td>1</td>\n",
              "      <td>True</td>\n",
              "      <td>Q21 (2)</td>\n",
              "      <td>例題を取り入れて理解しやすくしてほしい。</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>工業数学Ⅰ</td>\n",
              "      <td>1</td>\n",
              "      <td>True</td>\n",
              "      <td>Q21 (2)</td>\n",
              "      <td>特になし</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>工業数学Ⅰ</td>\n",
              "      <td>1</td>\n",
              "      <td>True</td>\n",
              "      <td>Q21 (2)</td>\n",
              "      <td>スライドに書く文字をもう少しわかりやすくして欲しいです。</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-3fa6b08c-8a71-42ca-925c-beab68eb20e9')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-3fa6b08c-8a71-42ca-925c-beab68eb20e9 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-3fa6b08c-8a71-42ca-925c-beab68eb20e9');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "variable_name": "assesment_df",
              "summary": "{\n  \"name\": \"assesment_df\",\n  \"rows\": 170,\n  \"fields\": [\n    {\n      \"column\": \"title\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 16,\n        \"samples\": [\n          \"\\u5de5\\u696d\\u6570\\u5b66\\u2160\",\n          \"\\u6280\\u8853\\u8005\\u306e\\u502b\\u7406\",\n          \"\\u30a2\\u30eb\\u30b4\\u30ea\\u30ba\\u30e0\\u3068\\u30c7\\u30fc\\u30bf\\u69cb\\u9020\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"grade\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0,\n        \"min\": 1,\n        \"max\": 3,\n        \"num_unique_values\": 3,\n        \"samples\": [\n          1,\n          2,\n          3\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"required\",\n      \"properties\": {\n        \"dtype\": \"boolean\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          false,\n          true\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"q_id\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 5,\n        \"samples\": [\n          \"Q21 (2)\",\n          \"Q22\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"comment\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 153,\n        \"samples\": [\n          \"\\u30fb\\u6559\\u79d1\\u66f8\\u304c\\u5fc5\\u8981\\u306a\\u306e\\u304b\\u5fc5\\u8981\\u3067\\u306a\\u3044\\u306e\\u304b\\u304c\\u66d6\\u6627\\u306a\\u307e\\u307e\\u6388\\u696d\\u304c\\u59cb\\u307e\\u308a\\u3001\\u975e\\u5e38\\u306b\\u4e0d\\u5b89\\u3060\\u3063\\u305f\\u305f\\u3081\\u3001\\u6559\\u79d1\\u66f8\\u304c\\u5fc5\\u9808\\u304b\\u305d\\u3046\\u3067\\u306a\\u3044\\u306e\\u304b\\u306f\\u6700\\u521d\\u306b\\u306f\\u3063\\u304d\\u308a\\u3057\\u3066\\u6b32\\u3057\\u3044\\u3002\\r\\n\\u30fb\\u8ab2\\u984c\\u3092\\u51fa\\u3059\\u3060\\u3051\\u51fa\\u3055\\u305b\\u3066\\u304a\\u3044\\u3066\\u3001\\u63a1\\u70b9\\u3082\\u305b\\u305a\\u3001\\u3069\\u3046\\u3044\\u3063\\u305f\\u89e3\\u7b54\\u304c\\u6b63\\u3057\\u3044\\u306e\\u304b\\u3068\\u3044\\u3063\\u305f\\u6307\\u91dd\\u3082\\u51fa\\u3059\\u306e\\u304c\\u3068\\u3066\\u3082\\u9045\\u3044\\u3002\\u8ab2\\u984c\\u306f\\u89e3\\u304f\\u3060\\u3051\\u3067\\u306f\\u77e5\\u8b58\\u306e\\u5b9a\\u7740\\u306b\\u3064\\u306a\\u304c\\u3089\\u306a\\u3044\\u3068\\u601d\\u3044\\u307e\\u3059\\u304c\\u3001\\u305d\\u3053\\u3089\\u3078\\u3093\\u306f\\u3069\\u3046\\u306a\\u3093\\u3067\\u3057\\u3087\\u3046\\u304b\\u3002\\r\\n\\u30fb\\u914d\\u5e03\\u8cc7\\u6599\\u3068\\u3057\\u3066\\u3001\\u904e\\u53bb\\u554f\\u3082\\u914d\\u5e03\\u3057\\u3066\\u304f\\u308c\\u308b\\u3068\\u3068\\u3066\\u3082\\u52a9\\u304b\\u308b\\u306a\\u3001\\u3068\\u601d\\u3044\\u307e\\u3059\\u3002\\u3054\\u691c\\u8a0e\\u304a\\u9858\\u3044\\u3057\\u307e\\u3059\\u3002\",\n          \"\\u30fb\\u4e2d\\u9593\\u30c6\\u30b9\\u30c8\\u3092\\u5ef6\\u671f\\u3057\\u7d9a\\u3051\\u3001\\u6700\\u7d42\\u7684\\u306b\\u4e2d\\u9593\\u30fb\\u671f\\u672b\\u8a66\\u9a13\\u3092\\uff12\\u9031\\u7d9a\\u3051\\u3066\\u3084\\u308b\\u3053\\u3068\\u3068\\u306a\\u308a\\u3001\\u8a08\\u753b\\u6027\\u304c\\u6b20\\u3051\\u3066\\u3044\\u308b\\u3002\\r\\n\\u30fb\\u914d\\u5e03\\u8cc7\\u6599\\u306e\\u8aa4\\u5b57\\u8131\\u5b57\\u304c\\u591a\\u3059\\u304e\\u308b\\u3002\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
            }
          },
          "metadata": {},
          "execution_count": 4
        }
      ],
      "source": [
        "import collections\n",
        "\n",
        "import numpy as np\n",
        "import pandas as pd\n",
        "import spacy\n",
        "from wordcloud import WordCloud\n",
        "\n",
        "# Python 3.12 + Spacy 3.8 + Ginza 5.2 の構成だとそのままでは動作しないため、\n",
        "# 以下の設定を追加指定\n",
        "config = {\n",
        "    \"components\": {\n",
        "        \"compound_splitter\": {\n",
        "            \"split_mode\": \"A\"\n",
        "        }\n",
        "    }\n",
        "}\n",
        "nlp = spacy.load(\"ja_ginza\", config=config)\n",
        "\n",
        "assesment_df = pd.read_pickle('r_assesment.pkl')\n",
        "assesment_df.head()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 674
        },
        "id": "Stv68NvmYpnB",
        "outputId": "83491437-92f3-488e-827a-7d7776979f59"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "        title  grade  required     q_id  \\\n",
              "0       工業数学Ⅰ      1      True  Q21 (1)   \n",
              "1       工業数学Ⅰ      1      True  Q21 (2)   \n",
              "2       工業数学Ⅰ      1      True  Q21 (2)   \n",
              "3       工業数学Ⅰ      1      True  Q21 (2)   \n",
              "4       工業数学Ⅰ      1      True  Q21 (2)   \n",
              "..        ...    ...       ...      ...   \n",
              "165  データマイニング      3     False      Q22   \n",
              "166  ICT実践英語Ⅰ      3     False      Q22   \n",
              "167   知能情報実験Ⅲ      3      True  Q21 (2)   \n",
              "168   知能情報実験Ⅲ      3      True      Q22   \n",
              "169   知能情報実験Ⅲ      3      True      Q22   \n",
              "\n",
              "                                               comment  \\\n",
              "0                                                 特になし   \n",
              "1                                   正直わかりずらい。むだに間があるし。   \n",
              "2                                 例題を取り入れて理解しやすくしてほしい。   \n",
              "3                                                 特になし   \n",
              "4                         スライドに書く文字をもう少しわかりやすくして欲しいです。   \n",
              "..                                                 ...   \n",
              "165  課題が難しいものが多く、時間を多くとってもらえたのは非常に良かったですがかなりきつかったです...   \n",
              "166                            オンラインなどで顔を合わせてやりたかったです。   \n",
              "167  unityの操作方法の説明などを最初に行ってもらえたらもう少しスムーズにできたのではないかと思う。   \n",
              "168  それぞれに任せるといった形で進められたものだったのでそれなりに進めやすかったですが、オンライ...   \n",
              "169  モバイルアプリ班\\r\\nHTML/CSS，JavaScriptなどを用いてアプリケーションを...   \n",
              "\n",
              "                                                wakati  \n",
              "0                                                特に なし  \n",
              "1                                   正直 わかる ずらい むだ 間 ある  \n",
              "2                                       例題 取り入れる 理解 する  \n",
              "3                                                特に なし  \n",
              "4                              スライド 書く 文字 もう 少し わかる する  \n",
              "..                                                 ...  \n",
              "165       課題 難しい もの 多い 時間 多い とる もらえる 非常 良い かなり きつい ござる  \n",
              "166                                    オンライン 顔 合わせる やる  \n",
              "167         unity 操作方法 説明 最初 行く もらえる もう 少し スムーズ できる 思う  \n",
              "168  それぞれ 任せる いう 形 進める もの なり 進める オンライン 班 員 指導 全く する...  \n",
              "169  モバイルアプリ 班 \\r\\n HTML CSS javascript 用いる アプリケーショ...  \n",
              "\n",
              "[170 rows x 6 columns]"
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-b14bf32d-eb0e-40b7-bdeb-b80d15dac235\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>title</th>\n",
              "      <th>grade</th>\n",
              "      <th>required</th>\n",
              "      <th>q_id</th>\n",
              "      <th>comment</th>\n",
              "      <th>wakati</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>工業数学Ⅰ</td>\n",
              "      <td>1</td>\n",
              "      <td>True</td>\n",
              "      <td>Q21 (1)</td>\n",
              "      <td>特になし</td>\n",
              "      <td>特に なし</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>工業数学Ⅰ</td>\n",
              "      <td>1</td>\n",
              "      <td>True</td>\n",
              "      <td>Q21 (2)</td>\n",
              "      <td>正直わかりずらい。むだに間があるし。</td>\n",
              "      <td>正直 わかる ずらい むだ 間 ある</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>工業数学Ⅰ</td>\n",
              "      <td>1</td>\n",
              "      <td>True</td>\n",
              "      <td>Q21 (2)</td>\n",
              "      <td>例題を取り入れて理解しやすくしてほしい。</td>\n",
              "      <td>例題 取り入れる 理解 する</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>工業数学Ⅰ</td>\n",
              "      <td>1</td>\n",
              "      <td>True</td>\n",
              "      <td>Q21 (2)</td>\n",
              "      <td>特になし</td>\n",
              "      <td>特に なし</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>工業数学Ⅰ</td>\n",
              "      <td>1</td>\n",
              "      <td>True</td>\n",
              "      <td>Q21 (2)</td>\n",
              "      <td>スライドに書く文字をもう少しわかりやすくして欲しいです。</td>\n",
              "      <td>スライド 書く 文字 もう 少し わかる する</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>...</th>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>165</th>\n",
              "      <td>データマイニング</td>\n",
              "      <td>3</td>\n",
              "      <td>False</td>\n",
              "      <td>Q22</td>\n",
              "      <td>課題が難しいものが多く、時間を多くとってもらえたのは非常に良かったですがかなりきつかったです...</td>\n",
              "      <td>課題 難しい もの 多い 時間 多い とる もらえる 非常 良い かなり きつい ござる</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>166</th>\n",
              "      <td>ICT実践英語Ⅰ</td>\n",
              "      <td>3</td>\n",
              "      <td>False</td>\n",
              "      <td>Q22</td>\n",
              "      <td>オンラインなどで顔を合わせてやりたかったです。</td>\n",
              "      <td>オンライン 顔 合わせる やる</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>167</th>\n",
              "      <td>知能情報実験Ⅲ</td>\n",
              "      <td>3</td>\n",
              "      <td>True</td>\n",
              "      <td>Q21 (2)</td>\n",
              "      <td>unityの操作方法の説明などを最初に行ってもらえたらもう少しスムーズにできたのではないかと思う。</td>\n",
              "      <td>unity 操作方法 説明 最初 行く もらえる もう 少し スムーズ できる 思う</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>168</th>\n",
              "      <td>知能情報実験Ⅲ</td>\n",
              "      <td>3</td>\n",
              "      <td>True</td>\n",
              "      <td>Q22</td>\n",
              "      <td>それぞれに任せるといった形で進められたものだったのでそれなりに進めやすかったですが、オンライ...</td>\n",
              "      <td>それぞれ 任せる いう 形 進める もの なり 進める オンライン 班 員 指導 全く する...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>169</th>\n",
              "      <td>知能情報実験Ⅲ</td>\n",
              "      <td>3</td>\n",
              "      <td>True</td>\n",
              "      <td>Q22</td>\n",
              "      <td>モバイルアプリ班\\r\\nHTML/CSS，JavaScriptなどを用いてアプリケーションを...</td>\n",
              "      <td>モバイルアプリ 班 \\r\\n HTML CSS javascript 用いる アプリケーショ...</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>170 rows × 6 columns</p>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-b14bf32d-eb0e-40b7-bdeb-b80d15dac235')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-b14bf32d-eb0e-40b7-bdeb-b80d15dac235 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-b14bf32d-eb0e-40b7-bdeb-b80d15dac235');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "  <div id=\"id_3094dd1a-1b83-435f-9640-5468adff58f6\">\n",
              "    <style>\n",
              "      .colab-df-generate {\n",
              "        background-color: #E8F0FE;\n",
              "        border: none;\n",
              "        border-radius: 50%;\n",
              "        cursor: pointer;\n",
              "        display: none;\n",
              "        fill: #1967D2;\n",
              "        height: 32px;\n",
              "        padding: 0 0 0 0;\n",
              "        width: 32px;\n",
              "      }\n",
              "\n",
              "      .colab-df-generate:hover {\n",
              "        background-color: #E2EBFA;\n",
              "        box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "        fill: #174EA6;\n",
              "      }\n",
              "\n",
              "      [theme=dark] .colab-df-generate {\n",
              "        background-color: #3B4455;\n",
              "        fill: #D2E3FC;\n",
              "      }\n",
              "\n",
              "      [theme=dark] .colab-df-generate:hover {\n",
              "        background-color: #434B5C;\n",
              "        box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "        filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "        fill: #FFFFFF;\n",
              "      }\n",
              "    </style>\n",
              "    <button class=\"colab-df-generate\" onclick=\"generateWithVariable('assesment_df')\"\n",
              "            title=\"Generate code using this dataframe.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "    <script>\n",
              "      (() => {\n",
              "      const buttonEl =\n",
              "        document.querySelector('#id_3094dd1a-1b83-435f-9640-5468adff58f6 button.colab-df-generate');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      buttonEl.onclick = () => {\n",
              "        google.colab.notebook.generateWithVariable('assesment_df');\n",
              "      }\n",
              "      })();\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "variable_name": "assesment_df",
              "summary": "{\n  \"name\": \"assesment_df\",\n  \"rows\": 170,\n  \"fields\": [\n    {\n      \"column\": \"title\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 16,\n        \"samples\": [\n          \"\\u5de5\\u696d\\u6570\\u5b66\\u2160\",\n          \"\\u6280\\u8853\\u8005\\u306e\\u502b\\u7406\",\n          \"\\u30a2\\u30eb\\u30b4\\u30ea\\u30ba\\u30e0\\u3068\\u30c7\\u30fc\\u30bf\\u69cb\\u9020\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"grade\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0,\n        \"min\": 1,\n        \"max\": 3,\n        \"num_unique_values\": 3,\n        \"samples\": [\n          1,\n          2,\n          3\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"required\",\n      \"properties\": {\n        \"dtype\": \"boolean\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          false,\n          true\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"q_id\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 5,\n        \"samples\": [\n          \"Q21 (2)\",\n          \"Q22\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"comment\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 153,\n        \"samples\": [\n          \"\\u30fb\\u6559\\u79d1\\u66f8\\u304c\\u5fc5\\u8981\\u306a\\u306e\\u304b\\u5fc5\\u8981\\u3067\\u306a\\u3044\\u306e\\u304b\\u304c\\u66d6\\u6627\\u306a\\u307e\\u307e\\u6388\\u696d\\u304c\\u59cb\\u307e\\u308a\\u3001\\u975e\\u5e38\\u306b\\u4e0d\\u5b89\\u3060\\u3063\\u305f\\u305f\\u3081\\u3001\\u6559\\u79d1\\u66f8\\u304c\\u5fc5\\u9808\\u304b\\u305d\\u3046\\u3067\\u306a\\u3044\\u306e\\u304b\\u306f\\u6700\\u521d\\u306b\\u306f\\u3063\\u304d\\u308a\\u3057\\u3066\\u6b32\\u3057\\u3044\\u3002\\r\\n\\u30fb\\u8ab2\\u984c\\u3092\\u51fa\\u3059\\u3060\\u3051\\u51fa\\u3055\\u305b\\u3066\\u304a\\u3044\\u3066\\u3001\\u63a1\\u70b9\\u3082\\u305b\\u305a\\u3001\\u3069\\u3046\\u3044\\u3063\\u305f\\u89e3\\u7b54\\u304c\\u6b63\\u3057\\u3044\\u306e\\u304b\\u3068\\u3044\\u3063\\u305f\\u6307\\u91dd\\u3082\\u51fa\\u3059\\u306e\\u304c\\u3068\\u3066\\u3082\\u9045\\u3044\\u3002\\u8ab2\\u984c\\u306f\\u89e3\\u304f\\u3060\\u3051\\u3067\\u306f\\u77e5\\u8b58\\u306e\\u5b9a\\u7740\\u306b\\u3064\\u306a\\u304c\\u3089\\u306a\\u3044\\u3068\\u601d\\u3044\\u307e\\u3059\\u304c\\u3001\\u305d\\u3053\\u3089\\u3078\\u3093\\u306f\\u3069\\u3046\\u306a\\u3093\\u3067\\u3057\\u3087\\u3046\\u304b\\u3002\\r\\n\\u30fb\\u914d\\u5e03\\u8cc7\\u6599\\u3068\\u3057\\u3066\\u3001\\u904e\\u53bb\\u554f\\u3082\\u914d\\u5e03\\u3057\\u3066\\u304f\\u308c\\u308b\\u3068\\u3068\\u3066\\u3082\\u52a9\\u304b\\u308b\\u306a\\u3001\\u3068\\u601d\\u3044\\u307e\\u3059\\u3002\\u3054\\u691c\\u8a0e\\u304a\\u9858\\u3044\\u3057\\u307e\\u3059\\u3002\",\n          \"\\u30fb\\u4e2d\\u9593\\u30c6\\u30b9\\u30c8\\u3092\\u5ef6\\u671f\\u3057\\u7d9a\\u3051\\u3001\\u6700\\u7d42\\u7684\\u306b\\u4e2d\\u9593\\u30fb\\u671f\\u672b\\u8a66\\u9a13\\u3092\\uff12\\u9031\\u7d9a\\u3051\\u3066\\u3084\\u308b\\u3053\\u3068\\u3068\\u306a\\u308a\\u3001\\u8a08\\u753b\\u6027\\u304c\\u6b20\\u3051\\u3066\\u3044\\u308b\\u3002\\r\\n\\u30fb\\u914d\\u5e03\\u8cc7\\u6599\\u306e\\u8aa4\\u5b57\\u8131\\u5b57\\u304c\\u591a\\u3059\\u304e\\u308b\\u3002\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"wakati\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 153,\n        \"samples\": [\n          \"\\u6559\\u79d1\\u66f8 \\u5fc5\\u8981 \\u5fc5\\u8981 \\u66d6\\u6627 \\u307e\\u307e \\u6388\\u696d \\u59cb\\u307e\\u308b \\u975e\\u5e38 \\u4e0d\\u5b89 \\u305f\\u3081 \\u6559\\u79d1\\u66f8 \\u5fc5\\u9808 \\u305d\\u3046 \\u6700\\u521d \\u306f\\u3063\\u304d\\u308a \\r\\n \\u8ab2\\u984c \\u51fa\\u3059 \\u51fa\\u3059 \\u304a\\u304f \\u63a1\\u70b9 \\u3059\\u308b \\u3069\\u3046 \\u3044\\u3046 \\u89e3\\u7b54 \\u6b63\\u3057\\u3044 \\u3044\\u3046 \\u6307\\u91dd \\u51fa\\u3059 \\u3068\\u3066\\u3082 \\u9045\\u3044 \\u8ab2\\u984c \\u89e3\\u304f \\u77e5\\u8b58 \\u5b9a\\u7740 \\u3064\\u306a\\u304c\\u308b \\u601d\\u3046 \\u3089 \\u3078\\u3093 \\u3069\\u3046 \\r\\n \\u914d\\u5e03\\u8cc7\\u6599 \\u904e\\u53bb\\u554f \\u914d\\u5e03 \\u304f\\u308c\\u308b \\u3068\\u3066\\u3082 \\u52a9\\u304b\\u308b \\u601d\\u3046 \\u3054 \\u691c\\u8a0e \\u304a \\u9858\\u3046\",\n          \"\\u4e2d\\u9593 \\u30c6\\u30b9\\u30c8 \\u5ef6\\u671f \\u7d9a\\u3051\\u308b \\u6700\\u7d42\\u7684 \\u4e2d\\u9593 \\u671f\\u672b \\u8a66\\u9a13 \\u9031 \\u7d9a\\u3051\\u308b \\u3084\\u308b \\u3053\\u3068 \\u306a\\u308b \\u8a08\\u753b\\u6027 \\u6b20\\u3051\\u308b \\u3044\\u308b \\r\\n \\u914d\\u5e03\\u8cc7\\u6599 \\u8aa4\\u5b57 \\u8131\\u5b57 \\u591a\\u3044 \\u3059\\u304e\\u308b\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
            }
          },
          "metadata": {},
          "execution_count": 5
        }
      ],
      "source": [
        "# 分かち書き\n",
        "poses = ['PROPN', 'NOUN', 'VERB', 'ADJ', 'ADV'] #名詞、動詞、形容詞、形容動詞\n",
        "\n",
        "assesment_df['wakati'] = ''\n",
        "for index, comment in enumerate(assesment_df['comment']):\n",
        "    doc = nlp(comment)\n",
        "    wakati_words = []\n",
        "    for token in doc:\n",
        "        if token.pos_ in poses:\n",
        "            wakati_words.append(token.lemma_)\n",
        "    wakati_text = ' '.join(wakati_words)\n",
        "    assesment_df.at[index, 'wakati'] = wakati_text\n",
        "\n",
        "assesment_df"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "g85bcizRYpnB"
      },
      "source": [
        "## 文書ベクトルの作成\n",
        "ここでは CountVectorizer (Bag-of-Words) で作成してみよう。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "LJNIMbe3YpnB",
        "outputId": "d94ad7dd-4269-414b-eae9-c7b958de7919"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "bow_tf_vector.shape =  (170, 741)\n"
          ]
        }
      ],
      "source": [
        "from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer\n",
        "\n",
        "stop_words = ['こと', '\\r\\n', 'ため', '思う', 'いる', 'ある', 'する', 'なる']\n",
        "vectorizer = CountVectorizer(stop_words=stop_words)\n",
        "bow_tf_vector = vectorizer.fit_transform(assesment_df['wakati'])\n",
        "print('bow_tf_vector.shape = ', bow_tf_vector.shape)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Hv06M0vyYpnC"
      },
      "source": [
        "## LDAによるトピックモデル解析\n",
        "sklearnでは [LatentDirichletAllocation](https://scikit-learn.org/stable/modules/decomposition.html?highlight=lda#latent-dirichlet-allocation-lda) として用意されている。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "id": "cQu3CNFLYpnC"
      },
      "outputs": [],
      "source": [
        "from sklearn.decomposition import LatentDirichletAllocation\n",
        "\n",
        "NUM_TOPICS = 5 #トピック数\n",
        "max_iter = 100  #LDAによる学習回数\n",
        "lda = LatentDirichletAllocation(n_components=NUM_TOPICS,\n",
        "                                max_iter=max_iter,\n",
        "                                learning_method='online',\n",
        "                                random_state=123) # シード値を指定すると結果を再現できる\n",
        "data_lda = lda.fit_transform(bow_tf_vector)"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## トピックの観察\n"
      ],
      "metadata": {
        "id": "OIK13M90vjeq"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import plotly.graph_objects as go\n",
        "from plotly.subplots import make_subplots\n",
        "\n",
        "def plot_top_words(model, feature_names, n_top_words, title):\n",
        "    \"\"\"\n",
        "    LDA のトピックごと上位語を水平方向のバーで表示する Plotly 版\n",
        "\n",
        "    Parameters\n",
        "    ----------\n",
        "    model : sklearn.decomposition.LatentDirichletAllocation\n",
        "        すでに fit_transform 済みの LDA モデル\n",
        "    feature_names : array‑like, shape (n_features,)\n",
        "        model.get_feature_names_out() で得た語彙\n",
        "    n_top_words : int\n",
        "        各トピックで表示したい単語数\n",
        "    title : str\n",
        "        図全体のタイトル\n",
        "    \"\"\"\n",
        "    n_topics = model.components_.shape[0]\n",
        "    n_cols = 5                                # 列数は固定\n",
        "    n_rows = int(np.ceil(n_topics / n_cols))  # トピック数に応じて行数を決定\n",
        "\n",
        "    # サブプロット用の Figure を用意\n",
        "    fig = make_subplots(\n",
        "        rows=n_rows,\n",
        "        cols=n_cols,\n",
        "        shared_xaxes=False,\n",
        "        horizontal_spacing=0.08,\n",
        "        vertical_spacing=0.06,\n",
        "        subplot_titles=[f\"Topic {i + 1}\" for i in range(n_topics)],\n",
        "    )\n",
        "\n",
        "    for topic_idx, topic in enumerate(model.components_):\n",
        "        # 指定トピックの上位語と重み\n",
        "        top_idx = topic.argsort()[-n_top_words:]\n",
        "        top_features = [feature_names[i] for i in top_idx]\n",
        "        weights = topic[top_idx]\n",
        "\n",
        "        row = topic_idx // n_cols + 1\n",
        "        col = topic_idx % n_cols + 1\n",
        "\n",
        "        # 水平バーを追加\n",
        "        fig.add_trace(\n",
        "            go.Bar(\n",
        "                x=weights,\n",
        "                y=top_features,\n",
        "                orientation=\"h\",\n",
        "                marker=dict(line=dict(width=0)),  # 枠線を消してすっきり\n",
        "            ),\n",
        "            row=row,\n",
        "            col=col,\n",
        "        )\n",
        "\n",
        "        # y 軸を上から下に並べ替え（matplotlib の barh と同じ見た目）\n",
        "        fig.update_yaxes(autorange=\"reversed\", row=row, col=col)\n",
        "\n",
        "    # 図全体のレイアウト調整\n",
        "    fig.update_layout(\n",
        "        height=450 * n_rows,\n",
        "        width=1700,\n",
        "        title=dict(text=title, x=0.5, xanchor=\"center\", font=dict(size=40)),\n",
        "        showlegend=False,\n",
        "        margin=dict(t=120, l=20, r=20, b=20),\n",
        "    )\n",
        "\n",
        "    # サブプロットタイトル（各トピック）のフォントサイズを揃える\n",
        "    fig.update_annotations(font_size=22)\n",
        "\n",
        "    fig.show()\n",
        "    #file_title = title.replace(' ', '_')\n",
        "    #fig.write_image(f'{file_title}.png')"
      ],
      "metadata": {
        "id": "CK3DlukWxzI6"
      },
      "execution_count": 8,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "n_top_words = 10\n",
        "plot_top_words(lda, vectorizer.get_feature_names_out(), n_top_words, \"Topics in LDA model (TF)\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 487
        },
        "id": "aeyIxVwVx4Ey",
        "outputId": "3de3cdde-0481-4711-a7af-3cc9ec6f39cf"
      },
      "execution_count": 9,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "<html>\n",
              "<head><meta charset=\"utf-8\" /></head>\n",
              "<body>\n",
              "    <div>            <script src=\"https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-AMS-MML_SVG\"></script><script type=\"text/javascript\">if (window.MathJax && window.MathJax.Hub && window.MathJax.Hub.Config) {window.MathJax.Hub.Config({SVG: {font: \"STIX-Web\"}});}</script>                <script type=\"text/javascript\">window.PlotlyConfig = {MathJaxConfig: 'local'};</script>\n",
              "        <script charset=\"utf-8\" src=\"https://cdn.plot.ly/plotly-2.35.2.min.js\"></script>                <div id=\"60e9755d-9246-4598-a185-c69c7cc32838\" class=\"plotly-graph-div\" style=\"height:450px; width:1700px;\"></div>            <script type=\"text/javascript\">                                    window.PLOTLYENV=window.PLOTLYENV || {};                                    if (document.getElementById(\"60e9755d-9246-4598-a185-c69c7cc32838\")) {                    Plotly.newPlot(                        \"60e9755d-9246-4598-a185-c69c7cc32838\",                        [{\"marker\":{\"line\":{\"width\":0}},\"orientation\":\"h\",\"x\":[10.061590520102405,10.753138411584064,10.930534547705236,10.936667187936939,11.017739861494029,11.579402367468628,12.453141315408999,12.48977021609219,17.658246820852675,17.773671811341288],\"y\":[\"いい\",\"わかる\",\"ござる\",\"よい\",\"機会\",\"とても\",\"講義\",\"グループワーク\",\"学ぶ\",\"できる\"],\"type\":\"bar\",\"xaxis\":\"x\",\"yaxis\":\"y\"},{\"marker\":{\"line\":{\"width\":0}},\"orientation\":\"h\",\"x\":[2.249260888082437,2.6244428059290605,2.7268009370582638,2.905699333719603,2.9089384508251963,3.4762700467948218,3.476358251225579,3.550315880420751,4.053538449121594,4.133369057507055],\"y\":[\"しまう\",\"テスト\",\"新しい\",\"実際\",\"レポート\",\"履修\",\"計画\",\"大変\",\"企業\",\"インタビュー\"],\"type\":\"bar\",\"xaxis\":\"x2\",\"yaxis\":\"y2\"},{\"marker\":{\"line\":{\"width\":0}},\"orientation\":\"h\",\"x\":[2.2519743723028163,2.252140588074692,2.252964234523194,2.255415635354486,2.2573848293379783,2.257514712340903,3.3167076121337473,3.5660534073211596,4.224282562156498,4.466060873831228],\"y\":[\"大変\",\"時間\",\"最後\",\"教える\",\"難しい\",\"非常\",\"内容\",\"教科書\",\"先生\",\"良い\"],\"type\":\"bar\",\"xaxis\":\"x3\",\"yaxis\":\"y3\"},{\"marker\":{\"line\":{\"width\":0}},\"orientation\":\"h\",\"x\":[8.979839647137297,9.052643685500783,9.392801613642227,9.9715697326492,10.281474032271538,10.282711191006237,10.950191146184823,10.974901526269925,13.932993070701517,27.760859795344235],\"y\":[\"いう\",\"知識\",\"とても\",\"良い\",\"出す\",\"採点\",\"試験\",\"多い\",\"授業\",\"課題\"],\"type\":\"bar\",\"xaxis\":\"x4\",\"yaxis\":\"y4\"},{\"marker\":{\"line\":{\"width\":0}},\"orientation\":\"h\",\"x\":[11.93497837620291,12.009607746443486,12.406353927195656,12.560632099046934,12.63886101924096,12.983616716417684,14.633999927314596,18.61785741197884,20.967585378790684,31.233157997221998],\"y\":[\"資料\",\"なし\",\"内容\",\"いう\",\"ない\",\"感ずる\",\"特に\",\"試験\",\"講義\",\"授業\"],\"type\":\"bar\",\"xaxis\":\"x5\",\"yaxis\":\"y5\"}],                        {\"template\":{\"data\":{\"histogram2dcontour\":[{\"type\":\"histogram2dcontour\",\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]]}],\"choropleth\":[{\"type\":\"choropleth\",\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}}],\"histogram2d\":[{\"type\":\"histogram2d\",\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]]}],\"heatmap\":[{\"type\":\"heatmap\",\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]]}],\"heatmapgl\":[{\"type\":\"heatmapgl\",\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]]}],\"contourcarpet\":[{\"type\":\"contourcarpet\",\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}}],\"contour\":[{\"type\":\"contour\",\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]]}],\"surface\":[{\"type\":\"surface\",\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]]}],\"mesh3d\":[{\"type\":\"mesh3d\",\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}}],\"scatter\":[{\"fillpattern\":{\"fillmode\":\"overlay\",\"size\":10,\"solidity\":0.2},\"type\":\"scatter\"}],\"parcoords\":[{\"type\":\"parcoords\",\"line\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}}}],\"scatterpolargl\":[{\"type\":\"scatterpolargl\",\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}}}],\"bar\":[{\"error_x\":{\"color\":\"#2a3f5f\"},\"error_y\":{\"color\":\"#2a3f5f\"},\"marker\":{\"line\":{\"color\":\"#E5ECF6\",\"width\":0.5},\"pattern\":{\"fillmode\":\"overlay\",\"size\":10,\"solidity\":0.2}},\"type\":\"bar\"}],\"scattergeo\":[{\"type\":\"scattergeo\",\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}}}],\"scatterpolar\":[{\"type\":\"scatterpolar\",\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}}}],\"histogram\":[{\"marker\":{\"pattern\":{\"fillmode\":\"overlay\",\"size\":10,\"solidity\":0.2}},\"type\":\"histogram\"}],\"scattergl\":[{\"type\":\"scattergl\",\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}}}],\"scatter3d\":[{\"type\":\"scatter3d\",\"line\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}}}],\"scattermapbox\":[{\"type\":\"scattermapbox\",\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}}}],\"scatterternary\":[{\"type\":\"scatterternary\",\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}}}],\"scattercarpet\":[{\"type\":\"scattercarpet\",\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}}}],\"carpet\":[{\"aaxis\":{\"endlinecolor\":\"#2a3f5f\",\"gridcolor\":\"white\",\"linecolor\":\"white\",\"minorgridcolor\":\"white\",\"startlinecolor\":\"#2a3f5f\"},\"baxis\":{\"endlinecolor\":\"#2a3f5f\",\"gridcolor\":\"white\",\"linecolor\":\"white\",\"minorgridcolor\":\"white\",\"startlinecolor\":\"#2a3f5f\"},\"type\":\"carpet\"}],\"table\":[{\"cells\":{\"fill\":{\"color\":\"#EBF0F8\"},\"line\":{\"color\":\"white\"}},\"header\":{\"fill\":{\"color\":\"#C8D4E3\"},\"line\":{\"color\":\"white\"}},\"type\":\"table\"}],\"barpolar\":[{\"marker\":{\"line\":{\"color\":\"#E5ECF6\",\"width\":0.5},\"pattern\":{\"fillmode\":\"overlay\",\"size\":10,\"solidity\":0.2}},\"type\":\"barpolar\"}],\"pie\":[{\"automargin\":true,\"type\":\"pie\"}]},\"layout\":{\"autotypenumbers\":\"strict\",\"colorway\":[\"#636efa\",\"#EF553B\",\"#00cc96\",\"#ab63fa\",\"#FFA15A\",\"#19d3f3\",\"#FF6692\",\"#B6E880\",\"#FF97FF\",\"#FECB52\"],\"font\":{\"color\":\"#2a3f5f\"},\"hovermode\":\"closest\",\"hoverlabel\":{\"align\":\"left\"},\"paper_bgcolor\":\"white\",\"plot_bgcolor\":\"#E5ECF6\",\"polar\":{\"bgcolor\":\"#E5ECF6\",\"angularaxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"},\"radialaxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"}},\"ternary\":{\"bgcolor\":\"#E5ECF6\",\"aaxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"},\"baxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"},\"caxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"}},\"coloraxis\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"colorscale\":{\"sequential\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"sequentialminus\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"diverging\":[[0,\"#8e0152\"],[0.1,\"#c51b7d\"],[0.2,\"#de77ae\"],[0.3,\"#f1b6da\"],[0.4,\"#fde0ef\"],[0.5,\"#f7f7f7\"],[0.6,\"#e6f5d0\"],[0.7,\"#b8e186\"],[0.8,\"#7fbc41\"],[0.9,\"#4d9221\"],[1,\"#276419\"]]},\"xaxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\",\"title\":{\"standoff\":15},\"zerolinecolor\":\"white\",\"automargin\":true,\"zerolinewidth\":2},\"yaxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\",\"title\":{\"standoff\":15},\"zerolinecolor\":\"white\",\"automargin\":true,\"zerolinewidth\":2},\"scene\":{\"xaxis\":{\"backgroundcolor\":\"#E5ECF6\",\"gridcolor\":\"white\",\"linecolor\":\"white\",\"showbackground\":true,\"ticks\":\"\",\"zerolinecolor\":\"white\",\"gridwidth\":2},\"yaxis\":{\"backgroundcolor\":\"#E5ECF6\",\"gridcolor\":\"white\",\"linecolor\":\"white\",\"showbackground\":true,\"ticks\":\"\",\"zerolinecolor\":\"white\",\"gridwidth\":2},\"zaxis\":{\"backgroundcolor\":\"#E5ECF6\",\"gridcolor\":\"white\",\"linecolor\":\"white\",\"showbackground\":true,\"ticks\":\"\",\"zerolinecolor\":\"white\",\"gridwidth\":2}},\"shapedefaults\":{\"line\":{\"color\":\"#2a3f5f\"}},\"annotationdefaults\":{\"arrowcolor\":\"#2a3f5f\",\"arrowhead\":0,\"arrowwidth\":1},\"geo\":{\"bgcolor\":\"white\",\"landcolor\":\"#E5ECF6\",\"subunitcolor\":\"white\",\"showland\":true,\"showlakes\":true,\"lakecolor\":\"white\"},\"title\":{\"x\":0.05},\"mapbox\":{\"style\":\"light\"}}},\"xaxis\":{\"anchor\":\"y\",\"domain\":[0.0,0.13599999999999998]},\"yaxis\":{\"anchor\":\"x\",\"domain\":[0.0,1.0],\"autorange\":\"reversed\"},\"xaxis2\":{\"anchor\":\"y2\",\"domain\":[0.21599999999999997,0.352]},\"yaxis2\":{\"anchor\":\"x2\",\"domain\":[0.0,1.0],\"autorange\":\"reversed\"},\"xaxis3\":{\"anchor\":\"y3\",\"domain\":[0.43199999999999994,0.568]},\"yaxis3\":{\"anchor\":\"x3\",\"domain\":[0.0,1.0],\"autorange\":\"reversed\"},\"xaxis4\":{\"anchor\":\"y4\",\"domain\":[0.6479999999999999,0.7839999999999999]},\"yaxis4\":{\"anchor\":\"x4\",\"domain\":[0.0,1.0],\"autorange\":\"reversed\"},\"xaxis5\":{\"anchor\":\"y5\",\"domain\":[0.8639999999999999,0.9999999999999999]},\"yaxis5\":{\"anchor\":\"x5\",\"domain\":[0.0,1.0],\"autorange\":\"reversed\"},\"annotations\":[{\"font\":{\"size\":22},\"showarrow\":false,\"text\":\"Topic 1\",\"x\":0.06799999999999999,\"xanchor\":\"center\",\"xref\":\"paper\",\"y\":1.0,\"yanchor\":\"bottom\",\"yref\":\"paper\"},{\"font\":{\"size\":22},\"showarrow\":false,\"text\":\"Topic 2\",\"x\":0.284,\"xanchor\":\"center\",\"xref\":\"paper\",\"y\":1.0,\"yanchor\":\"bottom\",\"yref\":\"paper\"},{\"font\":{\"size\":22},\"showarrow\":false,\"text\":\"Topic 3\",\"x\":0.49999999999999994,\"xanchor\":\"center\",\"xref\":\"paper\",\"y\":1.0,\"yanchor\":\"bottom\",\"yref\":\"paper\"},{\"font\":{\"size\":22},\"showarrow\":false,\"text\":\"Topic 4\",\"x\":0.716,\"xanchor\":\"center\",\"xref\":\"paper\",\"y\":1.0,\"yanchor\":\"bottom\",\"yref\":\"paper\"},{\"font\":{\"size\":22},\"showarrow\":false,\"text\":\"Topic 5\",\"x\":0.9319999999999999,\"xanchor\":\"center\",\"xref\":\"paper\",\"y\":1.0,\"yanchor\":\"bottom\",\"yref\":\"paper\"}],\"title\":{\"font\":{\"size\":40},\"text\":\"Topics in LDA model (TF)\",\"x\":0.5,\"xanchor\":\"center\"},\"margin\":{\"t\":120,\"l\":20,\"r\":20,\"b\":20},\"height\":450,\"width\":1700,\"showlegend\":false},                        {\"responsive\": true}                    ).then(function(){\n",
              "                            \n",
              "var gd = document.getElementById('60e9755d-9246-4598-a185-c69c7cc32838');\n",
              "var x = new MutationObserver(function (mutations, observer) {{\n",
              "        var display = window.getComputedStyle(gd).display;\n",
              "        if (!display || display === 'none') {{\n",
              "            console.log([gd, 'removed!']);\n",
              "            Plotly.purge(gd);\n",
              "            observer.disconnect();\n",
              "        }}\n",
              "}});\n",
              "\n",
              "// Listen for the removal of the full notebook cells\n",
              "var notebookContainer = gd.closest('#notebook-container');\n",
              "if (notebookContainer) {{\n",
              "    x.observe(notebookContainer, {childList: true});\n",
              "}}\n",
              "\n",
              "// Listen for the clearing of the current output cell\n",
              "var outputEl = gd.closest('.output');\n",
              "if (outputEl) {{\n",
              "    x.observe(outputEl, {childList: true});\n",
              "}}\n",
              "\n",
              "                        })                };                            </script>        </div>\n",
              "</body>\n",
              "</html>"
            ]
          },
          "metadata": {}
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 文書ベクトル2(TF-IDF）"
      ],
      "metadata": {
        "id": "e9CXnek_zCcR"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer\n",
        "\n",
        "stop_words = ['こと', '\\r\\n', 'ため', '思う', 'いる', 'ある', 'する', 'なる']\n",
        "vectorizer2 = TfidfVectorizer(stop_words=stop_words)\n",
        "tfidf_vector = vectorizer2.fit_transform(assesment_df['wakati'])\n",
        "print('tfidf_vector.shape = ', tfidf_vector.shape)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "45CIeMgWvn95",
        "outputId": "c4e1f6a6-9e05-4e74-f31a-f116f07e12ca"
      },
      "execution_count": 10,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "tfidf_vector.shape =  (170, 741)\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "lda2 = LatentDirichletAllocation(n_components=NUM_TOPICS,\n",
        "                                max_iter=max_iter,\n",
        "                                learning_method='online',\n",
        "                                random_state=123) # シード値を指定すると結果を再現できる\n",
        "\n",
        "data_lda2 = lda2.fit_transform(bow_tf_vector)"
      ],
      "metadata": {
        "id": "YWvQ3Ie8vn0O"
      },
      "execution_count": 11,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "plot_top_words(lda2, vectorizer2.get_feature_names_out(), n_top_words, \"Topics in LDA model (TF-IDF)\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 487
        },
        "id": "liJQunL0vnhR",
        "outputId": "7eb6da6d-a66b-4ca3-eb8c-3c759b83b9ae"
      },
      "execution_count": 12,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "<html>\n",
              "<head><meta charset=\"utf-8\" /></head>\n",
              "<body>\n",
              "    <div>            <script src=\"https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-AMS-MML_SVG\"></script><script type=\"text/javascript\">if (window.MathJax && window.MathJax.Hub && window.MathJax.Hub.Config) {window.MathJax.Hub.Config({SVG: {font: \"STIX-Web\"}});}</script>                <script type=\"text/javascript\">window.PlotlyConfig = {MathJaxConfig: 'local'};</script>\n",
              "        <script charset=\"utf-8\" src=\"https://cdn.plot.ly/plotly-2.35.2.min.js\"></script>                <div id=\"da9e021a-9f93-418b-922a-c662f64ec72f\" class=\"plotly-graph-div\" style=\"height:450px; width:1700px;\"></div>            <script type=\"text/javascript\">                                    window.PLOTLYENV=window.PLOTLYENV || {};                                    if (document.getElementById(\"da9e021a-9f93-418b-922a-c662f64ec72f\")) {                    Plotly.newPlot(                        \"da9e021a-9f93-418b-922a-c662f64ec72f\",                        [{\"marker\":{\"line\":{\"width\":0}},\"orientation\":\"h\",\"x\":[10.061590520102405,10.753138411584064,10.930534547705236,10.936667187936939,11.017739861494029,11.579402367468628,12.453141315408999,12.48977021609219,17.658246820852675,17.773671811341288],\"y\":[\"いい\",\"わかる\",\"ござる\",\"よい\",\"機会\",\"とても\",\"講義\",\"グループワーク\",\"学ぶ\",\"できる\"],\"type\":\"bar\",\"xaxis\":\"x\",\"yaxis\":\"y\"},{\"marker\":{\"line\":{\"width\":0}},\"orientation\":\"h\",\"x\":[2.249260888082437,2.6244428059290605,2.7268009370582638,2.905699333719603,2.9089384508251963,3.4762700467948218,3.476358251225579,3.550315880420751,4.053538449121594,4.133369057507055],\"y\":[\"しまう\",\"テスト\",\"新しい\",\"実際\",\"レポート\",\"履修\",\"計画\",\"大変\",\"企業\",\"インタビュー\"],\"type\":\"bar\",\"xaxis\":\"x2\",\"yaxis\":\"y2\"},{\"marker\":{\"line\":{\"width\":0}},\"orientation\":\"h\",\"x\":[2.2519743723028163,2.252140588074692,2.252964234523194,2.255415635354486,2.2573848293379783,2.257514712340903,3.3167076121337473,3.5660534073211596,4.224282562156498,4.466060873831228],\"y\":[\"大変\",\"時間\",\"最後\",\"教える\",\"難しい\",\"非常\",\"内容\",\"教科書\",\"先生\",\"良い\"],\"type\":\"bar\",\"xaxis\":\"x3\",\"yaxis\":\"y3\"},{\"marker\":{\"line\":{\"width\":0}},\"orientation\":\"h\",\"x\":[8.979839647137297,9.052643685500783,9.392801613642227,9.9715697326492,10.281474032271538,10.282711191006237,10.950191146184823,10.974901526269925,13.932993070701517,27.760859795344235],\"y\":[\"いう\",\"知識\",\"とても\",\"良い\",\"出す\",\"採点\",\"試験\",\"多い\",\"授業\",\"課題\"],\"type\":\"bar\",\"xaxis\":\"x4\",\"yaxis\":\"y4\"},{\"marker\":{\"line\":{\"width\":0}},\"orientation\":\"h\",\"x\":[11.93497837620291,12.009607746443486,12.406353927195656,12.560632099046934,12.63886101924096,12.983616716417684,14.633999927314596,18.61785741197884,20.967585378790684,31.233157997221998],\"y\":[\"資料\",\"なし\",\"内容\",\"いう\",\"ない\",\"感ずる\",\"特に\",\"試験\",\"講義\",\"授業\"],\"type\":\"bar\",\"xaxis\":\"x5\",\"yaxis\":\"y5\"}],                        {\"template\":{\"data\":{\"histogram2dcontour\":[{\"type\":\"histogram2dcontour\",\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]]}],\"choropleth\":[{\"type\":\"choropleth\",\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}}],\"histogram2d\":[{\"type\":\"histogram2d\",\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]]}],\"heatmap\":[{\"type\":\"heatmap\",\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]]}],\"heatmapgl\":[{\"type\":\"heatmapgl\",\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]]}],\"contourcarpet\":[{\"type\":\"contourcarpet\",\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}}],\"contour\":[{\"type\":\"contour\",\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]]}],\"surface\":[{\"type\":\"surface\",\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]]}],\"mesh3d\":[{\"type\":\"mesh3d\",\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}}],\"scatter\":[{\"fillpattern\":{\"fillmode\":\"overlay\",\"size\":10,\"solidity\":0.2},\"type\":\"scatter\"}],\"parcoords\":[{\"type\":\"parcoords\",\"line\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}}}],\"scatterpolargl\":[{\"type\":\"scatterpolargl\",\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}}}],\"bar\":[{\"error_x\":{\"color\":\"#2a3f5f\"},\"error_y\":{\"color\":\"#2a3f5f\"},\"marker\":{\"line\":{\"color\":\"#E5ECF6\",\"width\":0.5},\"pattern\":{\"fillmode\":\"overlay\",\"size\":10,\"solidity\":0.2}},\"type\":\"bar\"}],\"scattergeo\":[{\"type\":\"scattergeo\",\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}}}],\"scatterpolar\":[{\"type\":\"scatterpolar\",\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}}}],\"histogram\":[{\"marker\":{\"pattern\":{\"fillmode\":\"overlay\",\"size\":10,\"solidity\":0.2}},\"type\":\"histogram\"}],\"scattergl\":[{\"type\":\"scattergl\",\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}}}],\"scatter3d\":[{\"type\":\"scatter3d\",\"line\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}}}],\"scattermapbox\":[{\"type\":\"scattermapbox\",\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}}}],\"scatterternary\":[{\"type\":\"scatterternary\",\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}}}],\"scattercarpet\":[{\"type\":\"scattercarpet\",\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}}}],\"carpet\":[{\"aaxis\":{\"endlinecolor\":\"#2a3f5f\",\"gridcolor\":\"white\",\"linecolor\":\"white\",\"minorgridcolor\":\"white\",\"startlinecolor\":\"#2a3f5f\"},\"baxis\":{\"endlinecolor\":\"#2a3f5f\",\"gridcolor\":\"white\",\"linecolor\":\"white\",\"minorgridcolor\":\"white\",\"startlinecolor\":\"#2a3f5f\"},\"type\":\"carpet\"}],\"table\":[{\"cells\":{\"fill\":{\"color\":\"#EBF0F8\"},\"line\":{\"color\":\"white\"}},\"header\":{\"fill\":{\"color\":\"#C8D4E3\"},\"line\":{\"color\":\"white\"}},\"type\":\"table\"}],\"barpolar\":[{\"marker\":{\"line\":{\"color\":\"#E5ECF6\",\"width\":0.5},\"pattern\":{\"fillmode\":\"overlay\",\"size\":10,\"solidity\":0.2}},\"type\":\"barpolar\"}],\"pie\":[{\"automargin\":true,\"type\":\"pie\"}]},\"layout\":{\"autotypenumbers\":\"strict\",\"colorway\":[\"#636efa\",\"#EF553B\",\"#00cc96\",\"#ab63fa\",\"#FFA15A\",\"#19d3f3\",\"#FF6692\",\"#B6E880\",\"#FF97FF\",\"#FECB52\"],\"font\":{\"color\":\"#2a3f5f\"},\"hovermode\":\"closest\",\"hoverlabel\":{\"align\":\"left\"},\"paper_bgcolor\":\"white\",\"plot_bgcolor\":\"#E5ECF6\",\"polar\":{\"bgcolor\":\"#E5ECF6\",\"angularaxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"},\"radialaxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"}},\"ternary\":{\"bgcolor\":\"#E5ECF6\",\"aaxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"},\"baxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"},\"caxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"}},\"coloraxis\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"colorscale\":{\"sequential\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"sequentialminus\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"diverging\":[[0,\"#8e0152\"],[0.1,\"#c51b7d\"],[0.2,\"#de77ae\"],[0.3,\"#f1b6da\"],[0.4,\"#fde0ef\"],[0.5,\"#f7f7f7\"],[0.6,\"#e6f5d0\"],[0.7,\"#b8e186\"],[0.8,\"#7fbc41\"],[0.9,\"#4d9221\"],[1,\"#276419\"]]},\"xaxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\",\"title\":{\"standoff\":15},\"zerolinecolor\":\"white\",\"automargin\":true,\"zerolinewidth\":2},\"yaxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\",\"title\":{\"standoff\":15},\"zerolinecolor\":\"white\",\"automargin\":true,\"zerolinewidth\":2},\"scene\":{\"xaxis\":{\"backgroundcolor\":\"#E5ECF6\",\"gridcolor\":\"white\",\"linecolor\":\"white\",\"showbackground\":true,\"ticks\":\"\",\"zerolinecolor\":\"white\",\"gridwidth\":2},\"yaxis\":{\"backgroundcolor\":\"#E5ECF6\",\"gridcolor\":\"white\",\"linecolor\":\"white\",\"showbackground\":true,\"ticks\":\"\",\"zerolinecolor\":\"white\",\"gridwidth\":2},\"zaxis\":{\"backgroundcolor\":\"#E5ECF6\",\"gridcolor\":\"white\",\"linecolor\":\"white\",\"showbackground\":true,\"ticks\":\"\",\"zerolinecolor\":\"white\",\"gridwidth\":2}},\"shapedefaults\":{\"line\":{\"color\":\"#2a3f5f\"}},\"annotationdefaults\":{\"arrowcolor\":\"#2a3f5f\",\"arrowhead\":0,\"arrowwidth\":1},\"geo\":{\"bgcolor\":\"white\",\"landcolor\":\"#E5ECF6\",\"subunitcolor\":\"white\",\"showland\":true,\"showlakes\":true,\"lakecolor\":\"white\"},\"title\":{\"x\":0.05},\"mapbox\":{\"style\":\"light\"}}},\"xaxis\":{\"anchor\":\"y\",\"domain\":[0.0,0.13599999999999998]},\"yaxis\":{\"anchor\":\"x\",\"domain\":[0.0,1.0],\"autorange\":\"reversed\"},\"xaxis2\":{\"anchor\":\"y2\",\"domain\":[0.21599999999999997,0.352]},\"yaxis2\":{\"anchor\":\"x2\",\"domain\":[0.0,1.0],\"autorange\":\"reversed\"},\"xaxis3\":{\"anchor\":\"y3\",\"domain\":[0.43199999999999994,0.568]},\"yaxis3\":{\"anchor\":\"x3\",\"domain\":[0.0,1.0],\"autorange\":\"reversed\"},\"xaxis4\":{\"anchor\":\"y4\",\"domain\":[0.6479999999999999,0.7839999999999999]},\"yaxis4\":{\"anchor\":\"x4\",\"domain\":[0.0,1.0],\"autorange\":\"reversed\"},\"xaxis5\":{\"anchor\":\"y5\",\"domain\":[0.8639999999999999,0.9999999999999999]},\"yaxis5\":{\"anchor\":\"x5\",\"domain\":[0.0,1.0],\"autorange\":\"reversed\"},\"annotations\":[{\"font\":{\"size\":22},\"showarrow\":false,\"text\":\"Topic 1\",\"x\":0.06799999999999999,\"xanchor\":\"center\",\"xref\":\"paper\",\"y\":1.0,\"yanchor\":\"bottom\",\"yref\":\"paper\"},{\"font\":{\"size\":22},\"showarrow\":false,\"text\":\"Topic 2\",\"x\":0.284,\"xanchor\":\"center\",\"xref\":\"paper\",\"y\":1.0,\"yanchor\":\"bottom\",\"yref\":\"paper\"},{\"font\":{\"size\":22},\"showarrow\":false,\"text\":\"Topic 3\",\"x\":0.49999999999999994,\"xanchor\":\"center\",\"xref\":\"paper\",\"y\":1.0,\"yanchor\":\"bottom\",\"yref\":\"paper\"},{\"font\":{\"size\":22},\"showarrow\":false,\"text\":\"Topic 4\",\"x\":0.716,\"xanchor\":\"center\",\"xref\":\"paper\",\"y\":1.0,\"yanchor\":\"bottom\",\"yref\":\"paper\"},{\"font\":{\"size\":22},\"showarrow\":false,\"text\":\"Topic 5\",\"x\":0.9319999999999999,\"xanchor\":\"center\",\"xref\":\"paper\",\"y\":1.0,\"yanchor\":\"bottom\",\"yref\":\"paper\"}],\"title\":{\"font\":{\"size\":40},\"text\":\"Topics in LDA model (TF-IDF)\",\"x\":0.5,\"xanchor\":\"center\"},\"margin\":{\"t\":120,\"l\":20,\"r\":20,\"b\":20},\"height\":450,\"width\":1700,\"showlegend\":false},                        {\"responsive\": true}                    ).then(function(){\n",
              "                            \n",
              "var gd = document.getElementById('da9e021a-9f93-418b-922a-c662f64ec72f');\n",
              "var x = new MutationObserver(function (mutations, observer) {{\n",
              "        var display = window.getComputedStyle(gd).display;\n",
              "        if (!display || display === 'none') {{\n",
              "            console.log([gd, 'removed!']);\n",
              "            Plotly.purge(gd);\n",
              "            observer.disconnect();\n",
              "        }}\n",
              "}});\n",
              "\n",
              "// Listen for the removal of the full notebook cells\n",
              "var notebookContainer = gd.closest('#notebook-container');\n",
              "if (notebookContainer) {{\n",
              "    x.observe(notebookContainer, {childList: true});\n",
              "}}\n",
              "\n",
              "// Listen for the clearing of the current output cell\n",
              "var outputEl = gd.closest('.output');\n",
              "if (outputEl) {{\n",
              "    x.observe(outputEl, {childList: true});\n",
              "}}\n",
              "\n",
              "                        })                };                            </script>        </div>\n",
              "</body>\n",
              "</html>"
            ]
          },
          "metadata": {}
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [],
      "metadata": {
        "id": "cE98RagMtt8f"
      },
      "execution_count": 12,
      "outputs": []
    }
  ],
  "metadata": {
    "interpreter": {
      "hash": "880b2a8c90f9e6beae80b56829e3f671fedd58b6d14887184ddce26124cedfbd"
    },
    "kernelspec": {
      "display_name": "Python 3 (ipykernel)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.9"
    },
    "colab": {
      "provenance": []
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}