{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.6"
    },
    "colab": {
      "provenance": [],
      "toc_visible": true
    }
  },
  "cells": [
    {
      "cell_type": "code",
      "source": [
        "!date\n",
        "!python --version"
      ],
      "metadata": {
        "id": "LCat4RVnJmyt",
        "outputId": "dec95791-d256-44f4-ed43-735425a624c9",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "execution_count": 1,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Fri Apr 18 04:51:41 AM UTC 2025\n",
            "Python 3.11.12\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "更新メモ\n",
        "- 2024年5月: nltkインストールなしに修正。\n",
        "- 2025年4月18日: 'punkt_tab'を追加ダウンロード。解説文追加。"
      ],
      "metadata": {
        "id": "vam4CSwxD-GL"
      }
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mf7eFNWYRoYA"
      },
      "source": [
        "# コード例：シソーラス、カウントと推論に基づいた設計（生実装、NLTK編）\n",
        "- 補足\n",
        "  - 自然言語処理は利用するツールによって操作が大きく異なります。ここでは代表的な前処理（文分割、トークナイズ、ステミング等）を観察しやすくすることを優先しています。後日より使いやすいツールについても紹介する予定です。\n",
        "- 全体の流れ\n",
        "    - 事前準備\n",
        "    - シソーラスの例\n",
        "    - Bag-of-Words\n",
        "    - sklearnのBoWとTF-IDFを使った例\n",
        "    - 共起行列に基づいた単語のベクトル化\n",
        "    - 相互情報量による分散表現の高度化\n",
        "    - SVDによる次元削減"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Dqa9QDJ7RoYG"
      },
      "source": [
        "## 事前準備\n",
        "- 実行する際の注意\n",
        "    - Google Colabでは[Natural Language Toolkit; NLTK](https://www.nltk.org)が標準でインストールされています。ただしコーパスの追加ダウンロードが必要。（全てをまとめてインストールすることも可能だが、それなりに容量を必要とするためデフォルトでは最小限しかインストールされてない）。\n",
        "    - 今回は英語文書を対象としている。対象言語毎にそれぞれダウンロードして利用する必要がある。どのようなものが用意されているかの一覧を確認したい場合には `nltk.download()` を実行しよう。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "MWObKXywR_i5"
      },
      "source": [
        "# 2024年5月現在、インストール不要。\n",
        "#!pip install nltk"
      ],
      "execution_count": 2,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "huRK9h9VSPil",
        "outputId": "99045ded-7e62-4b56-930e-7025173fcae9"
      },
      "source": [
        "import nltk\n",
        "nltk.download(['wordnet', 'stopwords', 'punkt', 'punkt_tab'])"
      ],
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "[nltk_data] Downloading package wordnet to /root/nltk_data...\n",
            "[nltk_data] Downloading package stopwords to /root/nltk_data...\n",
            "[nltk_data]   Unzipping corpora/stopwords.zip.\n",
            "[nltk_data] Downloading package punkt to /root/nltk_data...\n",
            "[nltk_data]   Unzipping tokenizers/punkt.zip.\n",
            "[nltk_data] Downloading package punkt_tab to /root/nltk_data...\n",
            "[nltk_data]   Unzipping tokenizers/punkt_tab.zip.\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "True"
            ]
          },
          "metadata": {},
          "execution_count": 3
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "e4EZ_PuwRoYH"
      },
      "source": [
        "## シソーラスの例\n",
        "ここでは以下の状況を想定した検索タスクを通してシソーラスの使い方を観察しよう。\n",
        "- iPhoneについて書かれている3つの文書がある。\n",
        "- タスク: 「how much iphone」という質問に対し、最も適切な文書を探し出す。\n",
        "\n",
        "上記タスクを解くために、(1)各文書を前処理し、(2)2種類の方法で文書検索を行うことを考えます。\n",
        "\n",
        "(1)の前処理は、例えば「iPhone」と「iphone」を同一視するために小文字に統一するといったことを指します。このための実装例を preprocess_docs() として示しています。\n",
        "\n",
        "(2)の文書検索としては単純なマッチングと、シソーラスを使ったマッチングの2通りを考えてみます。\n",
        "- 単純なマッチング simple_matching() では、ユーザクエリに対する単語マッチング数をスコアとします。\n",
        "- シソーラスを使ったマッチング relation_matching() では、まず単語マッチングで評価し、その後でシソーラスを用いて加点を行い、それらの合計値を最終スコアとします。\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "rrF7i9ojRoYH",
        "outputId": "27a52a0b-9688-4f10-ca3d-e995c71dc835"
      },
      "source": [
        "# 前処理\n",
        "from nltk.tokenize import wordpunct_tokenize, sent_tokenize\n",
        "# ＜使用しているNLTKライブラリの説明＞\n",
        "# nltk.corpus.stopwords: 文章を特徴付ける要素として不適切なものを除外するためのブラックリスト。通称ストップワード。\n",
        "# nltk.sent_tokenize: 文章(doc)を文(sentence)に分割する。\n",
        "# nltk.wordpunct_tokenize: 文(sentence)を単語(word)に分割する。通称トークン化。\n",
        "# nltk.lemmatize: 単語(word)を基本形(らしきもの)に修正する。通称ステミング。\n",
        "\n",
        "import numpy as np\n",
        "\n",
        "# ドキュメント例（3つのドキュメント）\n",
        "docs = []\n",
        "docs.append(\"You can get dis-counted price with trade-in.\")\n",
        "docs.append(\"iPhone 11 shoots beautifully sharp 4K video at 60 fps across all its cameras.\")\n",
        "docs.append(\"From $16.62/mo. or $399 with trade-in.\")\n",
        "\n",
        "def preprocess_docs(docs):\n",
        "    '''英文書集合 docs に対し前処理を施し、分かち書きしたリストのリストとして返す。\n",
        "\n",
        "    :param docs(list): 1文書1文字列で保存。複数文書をリストとして並べたもの。\n",
        "    :return (list): 文分割、単語分割、基本形、ストップワード除去した結果。\n",
        "    '''\n",
        "    stopwords = nltk.corpus.stopwords.words('english')\n",
        "    stopwords.append('.')  # ピリオドを追加。\n",
        "    stopwords.append(',')  # カンマを追加。\n",
        "    stopwords.append('')  # 空文字を追加。\n",
        "\n",
        "    result = []\n",
        "    wnl = nltk.stem.wordnet.WordNetLemmatizer()\n",
        "    for doc in docs:\n",
        "        temp = []\n",
        "        for sent in sent_tokenize(doc):\n",
        "            for word in wordpunct_tokenize(sent):\n",
        "                this_word = wnl.lemmatize(word.lower())\n",
        "                if this_word not in stopwords:\n",
        "                    temp.append(this_word)\n",
        "        result.append(temp)\n",
        "    return result\n",
        "\n",
        "docs2 = preprocess_docs(docs)\n",
        "for index in range(len(docs2)):\n",
        "    print('before: ', docs[index])\n",
        "    print('after: ', docs2[index])\n",
        "    print('----')\n"
      ],
      "execution_count": 4,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "before:  You can get dis-counted price with trade-in.\n",
            "after:  ['get', 'dis', '-', 'counted', 'price', 'trade', '-']\n",
            "----\n",
            "before:  iPhone 11 shoots beautifully sharp 4K video at 60 fps across all its cameras.\n",
            "after:  ['iphone', '11', 'shoot', 'beautifully', 'sharp', '4k', 'video', '60', 'fps', 'across', 'camera']\n",
            "----\n",
            "before:  From $16.62/mo. or $399 with trade-in.\n",
            "after:  ['$', '16', '62', '/', 'mo', '$', '399', 'trade', '-']\n",
            "----\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "T7zIO15ERoYI"
      },
      "source": [
        "### simple_matching()\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "yH2IszZMRoYI",
        "outputId": "b41c907c-a7d4-4e3a-ce74-e635f1c98d41"
      },
      "source": [
        "# simple matching\n",
        "def simple_matching(query, docs):\n",
        "    '''単純な単語マッチングによりマッチ数でスコアを算出。\n",
        "\n",
        "    :param query(str): クエリ（検索要求）。\n",
        "    :param docs(list): 1文書1文字列で保存。複数文書をリストとして並べたもの。\n",
        "    :return (list): 文書毎のスコア。\n",
        "    '''\n",
        "    query = query.split(\" \")\n",
        "    result = []\n",
        "    for doc in docs:\n",
        "        score = 0\n",
        "        for word in doc:\n",
        "            for key in query:\n",
        "                if key == word:\n",
        "                    score += 1\n",
        "        result.append(score)\n",
        "    return result\n",
        "\n",
        "user_query = \"how much iphone\"\n",
        "scores = simple_matching(user_query, docs2)\n",
        "print('simple_matching scores = ', scores)\n"
      ],
      "execution_count": 5,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "simple_matching scores =  [0, 1, 0]\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LT5p-w38RoYJ"
      },
      "source": [
        "### relation_matching()\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "dsWTpzGrRoYJ",
        "outputId": "1b9ba52a-e6f3-4fe7-bab8-e4799874e171"
      },
      "source": [
        "# relation matching\n",
        "related_words = {}\n",
        "related_words['buy'] = ['buy', '$', 'price', 'how much', 'trade-in']\n",
        "related_words['UX'] = ['UX', 'stylish', 'seamless']\n",
        "\n",
        "def relation_matching(query, docs, related_words):\n",
        "    '''予め用意された関連用語を利用し、マッチする数を加点して算出。\n",
        "\n",
        "    :param query(str): クエリ（検索要求）。\n",
        "    :param docs(list): 1文書1文字列で保存。複数文書をリストとして並べたもの。\n",
        "    :param related_words:\n",
        "    :return (list): 文書毎のスコア。\n",
        "    '''\n",
        "    scores = simple_matching(query, docs)\n",
        "\n",
        "    query = query.split(\" \")\n",
        "    for q in query:\n",
        "        for relation in related_words:\n",
        "            matches = [q in word for word in related_words[relation]]\n",
        "            if True in matches:\n",
        "                new_query = ' '.join(related_words[relation])\n",
        "                temp_scores = simple_matching(new_query, docs)\n",
        "                print('# q = {}, relation = {} => temp_scores = {}'.format(q, relation, temp_scores))\n",
        "                scores = list(np.array(scores) + np.array(temp_scores))\n",
        "    scores = list(scores)\n",
        "    return scores\n",
        "\n",
        "scores2 = relation_matching(user_query, docs2, related_words)\n",
        "print('simple_matching scores = ', scores)\n",
        "print('relation_matching scores = ', scores2)\n"
      ],
      "execution_count": 6,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "# q = how, relation = buy => temp_scores = [1, 0, 2]\n",
            "# q = much, relation = buy => temp_scores = [1, 0, 2]\n",
            "simple_matching scores =  [0, 1, 0]\n",
            "relation_matching scores =  [np.int64(2), np.int64(1), np.int64(4)]\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Uhc5-DUQRoYK"
      },
      "source": [
        "## Bag-of-Words (BoW)\n",
        "BoWでテキストをベクトル化するためには語彙集合を作り、各語彙が出現した回数（もしくは出現したか否かのバイナリコーディング）をカウントして構築することになる。\n",
        "\n",
        "以下ではまず collect_words_eng() でコードブック（語彙集合）を構築し、make_vectors_eng() で文書ベクトルを作成している。（なお今回の実装では分けているが、実際には両方を同時に行う方が処理効率が良いことが多い）\n",
        "\n",
        "特徴ベクトルを作成した後は、ユークリッド距離（euclidean_distance()）、コサイン距離（cosine_distance()）、コサイン類似度（cosine_similarity()）により文書館距離や類似度を確認してみている。\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Gbedc9x4RoYK",
        "outputId": "696d9ff6-c84a-4412-82a7-76aac6e492ae"
      },
      "source": [
        "import scipy.spatial.distance as distance\n",
        "\n",
        "# BoW\n",
        "# ドキュメント例（3つのドキュメント）\n",
        "docs3 = []\n",
        "docs3.append(\"This is test.\")\n",
        "docs3.append(\"That is test too.\")\n",
        "docs3.append(\"There are so many many tests.\")\n",
        "\n",
        "\n",
        "# 文書集合からターム素性集合（コードブック）を作る\n",
        "def collect_words_eng(docs):\n",
        "    '''英文書集合から単語コードブック作成。\n",
        "    シンプルに文書集合を予め決めうちした方式で処理する。\n",
        "    必要に応じて指定できるようにしていた方が使い易いかも。\n",
        "\n",
        "    :param docs(list): 1文書1文字列で保存。複数文書をリストとして並べたもの。\n",
        "    :return (list): 文分割、単語分割、基本形、ストップワード除去した、ユニークな単語一覧。\n",
        "    '''\n",
        "    codebook = []\n",
        "    stopwords = nltk.corpus.stopwords.words('english')\n",
        "    stopwords.append('.')   # ピリオドを追加。\n",
        "    stopwords.append(',')   # カンマを追加。\n",
        "    stopwords.append('')    # 空文字を追加。\n",
        "    wnl = nltk.stem.wordnet.WordNetLemmatizer()\n",
        "    for doc in docs:\n",
        "        for sent in sent_tokenize(doc):\n",
        "            for word in wordpunct_tokenize(sent):\n",
        "                this_word = wnl.lemmatize(word.lower())\n",
        "                if this_word not in codebook and this_word not in stopwords:\n",
        "                    codebook.append(this_word)\n",
        "    return codebook\n",
        "\n",
        "codebook = collect_words_eng(docs3)\n",
        "print('codebook = ',codebook)\n"
      ],
      "execution_count": 7,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "codebook =  ['test', 'many']\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "OFSt8UnORoYK",
        "outputId": "e05d04c2-f7b3-4294-9bc8-505fe04f787f"
      },
      "source": [
        "# コードブックを素性とする文書ベクトルを作る (直接ベクトル生成)\n",
        "def make_vectors_eng(docs, codebook):\n",
        "    '''コードブックを素性とする文書ベクトルを作る（直接ベクトル生成）\n",
        "\n",
        "    :param docs(list): 1文書1文字列で保存。複数文書をリストとして並べたもの。\n",
        "    :param codebook(list): ユニークな単語一覧。\n",
        "    :return (list): コードブックを元に、出現回数を特徴量とするベクトルを返す。\n",
        "    '''\n",
        "    vectors = []\n",
        "    wnl = nltk.stem.wordnet.WordNetLemmatizer()\n",
        "    for doc in docs:\n",
        "        this_vector = []\n",
        "        fdist = nltk.FreqDist()\n",
        "        for sent in sent_tokenize(doc):\n",
        "            for word in wordpunct_tokenize(sent):\n",
        "                this_word = wnl.lemmatize(word.lower())\n",
        "                fdist[this_word] += 1\n",
        "        for word in codebook:\n",
        "            this_vector.append(fdist[word])\n",
        "        vectors.append(this_vector)\n",
        "    return vectors\n",
        "\n",
        "vectors = make_vectors_eng(docs3, codebook)\n",
        "for index in range(len(docs3)):\n",
        "    print('docs[{}] = {}'.format(index,docs3[index]))\n",
        "    print('vectors[{}] = {}'.format(index,vectors[index]))\n",
        "    print('----')\n"
      ],
      "execution_count": 8,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "docs[0] = This is test.\n",
            "vectors[0] = [1, 0]\n",
            "----\n",
            "docs[1] = That is test too.\n",
            "vectors[1] = [1, 0]\n",
            "----\n",
            "docs[2] = There are so many many tests.\n",
            "vectors[2] = [1, 2]\n",
            "----\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "fCFUHGcdRoYL",
        "outputId": "6759487d-a18a-4d1a-b0fd-bcfedaecd42b"
      },
      "source": [
        "def euclidean_distance(vectors):\n",
        "    vectors = np.array(vectors)\n",
        "    distances = []\n",
        "    for i in range(len(vectors)):\n",
        "        temp = []\n",
        "        for j in range(len(vectors)):\n",
        "            temp.append(np.linalg.norm(vectors[i] - vectors[j]))\n",
        "        distances.append(temp)\n",
        "    return distances\n",
        "\n",
        "distances = euclidean_distance(vectors)\n",
        "print('# euclidean_distance')\n",
        "for index in range(len(distances)):\n",
        "    print(distances[index])\n",
        "\n",
        "def cosine_distance(vectors):\n",
        "    vectors = np.array(vectors)\n",
        "    distances = []\n",
        "    for i in range(len(vectors)):\n",
        "        temp = []\n",
        "        for j in range(len(vectors)):\n",
        "            temp.append(distance.cosine(vectors[i], vectors[j]))\n",
        "        distances.append(temp)\n",
        "    return distances\n",
        "\n",
        "distances = cosine_distance(vectors)\n",
        "print('# cosine_distance')\n",
        "for index in range(len(distances)):\n",
        "    print(distances[index])\n",
        "\n",
        "\n",
        "import sklearn.metrics.pairwise as pairwise\n",
        "distances = pairwise.cosine_similarity(vectors)\n",
        "print('# cosine_similarity')\n",
        "for index in range(len(distances)):\n",
        "    print(distances[index])\n"
      ],
      "execution_count": 9,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "# euclidean_distance\n",
            "[np.float64(0.0), np.float64(0.0), np.float64(2.0)]\n",
            "[np.float64(0.0), np.float64(0.0), np.float64(2.0)]\n",
            "[np.float64(2.0), np.float64(2.0), np.float64(0.0)]\n",
            "# cosine_distance\n",
            "[np.float64(0.0), np.float64(0.0), np.float64(0.5527864045000421)]\n",
            "[np.float64(0.0), np.float64(0.0), np.float64(0.5527864045000421)]\n",
            "[np.float64(0.5527864045000421), np.float64(0.5527864045000421), np.float64(0.0)]\n",
            "# cosine_similarity\n",
            "[1.        1.        0.4472136]\n",
            "[1.        1.        0.4472136]\n",
            "[0.4472136 0.4472136 1.       ]\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jGuHZ15hRoYM"
      },
      "source": [
        "## sklearnのBoWとTF-IDFを使った例\n",
        "BoWベースの特徴ベクトルと、TF-IDFベースの特徴ベクトルを作成し、どのような違いがあるかを観察している例。特徴として列挙されている素性群は同一だが、TF-IDFでは少し濃淡が表現されていることを確認しよう。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "c4T04o3URoYM",
        "outputId": "b042abd5-6638-4eeb-bc12-cde30bcd945b"
      },
      "source": [
        "import sklearn.feature_extraction.text as fe_text\n",
        "\n",
        "def bow(docs):\n",
        "    '''Bag-of-Wordsによるベクトルを生成。\n",
        "\n",
        "    :param docs(list): 1文書1文字列で保存。複数文書をリストとして並べたもの。\n",
        "    :return: 文書ベクトル。\n",
        "    '''\n",
        "    vectorizer = fe_text.CountVectorizer(stop_words='english')\n",
        "    vectors = vectorizer.fit_transform(docs)\n",
        "    return vectors.toarray(), vectorizer\n",
        "\n",
        "vectors, vectorizer = bow(docs)\n",
        "print('# normal BoW')\n",
        "print(vectorizer.get_feature_names_out())\n",
        "print(vectors)\n",
        "\n",
        "def bow_tfidf(docs):\n",
        "    '''Bag-of-WordsにTF-IDFで重み調整したベクトルを生成。\n",
        "\n",
        "    :param docs(list): 1文書1文字列で保存。複数文書をリストとして並べたもの。\n",
        "    :return: 重み調整したベクトル。\n",
        "    '''\n",
        "    vectorizer = fe_text.TfidfVectorizer(norm=None, stop_words='english')\n",
        "    vectors = vectorizer.fit_transform(docs)\n",
        "    return vectors.toarray(), vectorizer\n",
        "\n",
        "vectors, vectorizer = bow_tfidf(docs)\n",
        "print('# BoW + tfidf')\n",
        "print(vectorizer.get_feature_names_out())\n",
        "print(vectors)\n"
      ],
      "execution_count": 10,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "# normal BoW\n",
            "['11' '16' '399' '4k' '60' '62' 'beautifully' 'cameras' 'counted' 'dis'\n",
            " 'fps' 'iphone' 'mo' 'price' 'sharp' 'shoots' 'trade' 'video']\n",
            "[[0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0]\n",
            " [1 0 0 1 1 0 1 1 0 0 1 1 0 0 1 1 0 1]\n",
            " [0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0]]\n",
            "# BoW + tfidf\n",
            "['11' '16' '399' '4k' '60' '62' 'beautifully' 'cameras' 'counted' 'dis'\n",
            " 'fps' 'iphone' 'mo' 'price' 'sharp' 'shoots' 'trade' 'video']\n",
            "[[0.         0.         0.         0.         0.         0.\n",
            "  0.         0.         1.69314718 1.69314718 0.         0.\n",
            "  0.         1.69314718 0.         0.         1.28768207 0.        ]\n",
            " [1.69314718 0.         0.         1.69314718 1.69314718 0.\n",
            "  1.69314718 1.69314718 0.         0.         1.69314718 1.69314718\n",
            "  0.         0.         1.69314718 1.69314718 0.         1.69314718]\n",
            " [0.         1.69314718 1.69314718 0.         0.         1.69314718\n",
            "  0.         0.         0.         0.         0.         0.\n",
            "  1.69314718 0.         0.         0.         1.28768207 0.        ]]\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "WckDMfhZRoYM"
      },
      "source": [
        "## 共起行列に基づいた単語のベクトル化\n",
        "分布仮説に基づいた単語を特徴ベクトル表現する例として、共起行列を利用したコードを示している。なおここでは文書ベクトルではなく **単語ベクトルを構築している** ことに注意すること。\n",
        "\n",
        "preprocess()では、テキストに対する前処理として小文字化し、ピリオドの前にスペースを追加（ピリオド付きの単語にしたくない）した上で単語分割し、語彙集合を作成。処理しやすくするために単語=>id、id=>単語の両方向を参照するための辞書も用意し、文書をid系列として表現し直している。\n",
        "\n",
        "create_to_matrix()では、id系列となった文書を受け取り、共起行列を作成している。これで単語ベクトルを構築したことになる。\n",
        "\n",
        "most_similar()では、入力された単語に最も違い単語を共起行列から探し出している例を示している。"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "import pandas as pd\n",
        "\n",
        "sentence = 'pandas is an open source programming tools. The best way to get pandas is via conda. \"conda install pandas\"'\n",
        "print(f\"{sentence=}\")\n",
        "print(f\"{len(sentence)=}\")\n"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Ed7r8zdENqsp",
        "outputId": "d6b44746-b9f5-4c65-c0c9-63313527590c"
      },
      "execution_count": 11,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "sentence='pandas is an open source programming tools. The best way to get pandas is via conda. \"conda install pandas\"'\n",
            "len(sentence)=107\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "EjfrZOVIRoYM",
        "outputId": "3b8eb867-3b9b-4a7a-c712-9c74b593dafe"
      },
      "source": [
        "def preprocess(text):\n",
        "    \"\"\"テキストに対する前処理。\n",
        "    「ゼロから作るDeepLearning2 自然言語処理辺」p.66より。\n",
        "\n",
        "    :param text:\n",
        "    :return:\n",
        "      courpus(list): id_to_wordのidに基づいたone-hot vector。\n",
        "      word_to_id(dict): 単語をkeyとして、idを参照する辞書。\n",
        "      id_to_word(dict): idをkeyとして、単語を参照する辞書。\n",
        "    \"\"\"\n",
        "    text = text.lower()\n",
        "    text = text.replace('.', ' .')\n",
        "    text = text.replace('\"', '')\n",
        "    words = text.split(' ')\n",
        "\n",
        "    word_to_id = {}\n",
        "    id_to_word = {}\n",
        "    for word in words:\n",
        "        if word not in word_to_id:\n",
        "            new_id = len(word_to_id)\n",
        "            word_to_id[word] = new_id\n",
        "            id_to_word[new_id] = word\n",
        "    corpus = np.array([word_to_id[w] for w in words])\n",
        "    return corpus, word_to_id, id_to_word\n",
        "\n",
        "corpus, word_to_id, id_to_word = preprocess(sentence)\n",
        "vocab_size = len(word_to_id)\n",
        "print(f\"{corpus=}\")\n",
        "print(f\"{word_to_id=}\")\n",
        "print(f\"{id_to_word=}\")"
      ],
      "execution_count": 12,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "corpus=array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  0,  1, 13, 14,\n",
            "        7, 14, 15,  0])\n",
            "word_to_id={'pandas': 0, 'is': 1, 'an': 2, 'open': 3, 'source': 4, 'programming': 5, 'tools': 6, '.': 7, 'the': 8, 'best': 9, 'way': 10, 'to': 11, 'get': 12, 'via': 13, 'conda': 14, 'install': 15}\n",
            "id_to_word={0: 'pandas', 1: 'is', 2: 'an', 3: 'open', 4: 'source', 5: 'programming', 6: 'tools', 7: '.', 8: 'the', 9: 'best', 10: 'way', 11: 'to', 12: 'get', 13: 'via', 14: 'conda', 15: 'install'}\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 557
        },
        "id": "8FhiMV71RoYN",
        "outputId": "a8a12df4-8809-499e-8397-f642bdd8112b"
      },
      "source": [
        "def create_co_matrix(corpus, vocab_size, window_size=1):\n",
        "    \"\"\"共起行列を作成。\n",
        "    「ゼロから作るDeepLearning2 自然言語処理辺」p.72より。\n",
        "\n",
        "    :param corpus(str): テキスト文。\n",
        "    :param vocab_size: 語彙数。\n",
        "    :param window_size: 共起判定の範囲。\n",
        "    :return:\n",
        "    \"\"\"\n",
        "    corpus_size = len(corpus)\n",
        "    co_matrix = np.zeros((vocab_size, vocab_size), dtype=np.int32)\n",
        "\n",
        "    for idx, word_id in enumerate(corpus):\n",
        "        for i in range(1, window_size+1):\n",
        "            left_idx = idx - i\n",
        "            right_idx = idx + i\n",
        "            if left_idx >= 0:\n",
        "                left_word_id = corpus[left_idx]\n",
        "                co_matrix[word_id, left_word_id] += 1\n",
        "            if right_idx < corpus_size:\n",
        "                right_word_id = corpus[right_idx]\n",
        "                co_matrix[word_id, right_word_id] += 1\n",
        "    return co_matrix\n",
        "\n",
        "co_matrix = create_co_matrix(corpus, vocab_size, window_size=2)\n",
        "df = pd.DataFrame(co_matrix, index=word_to_id.keys(), columns=word_to_id.keys())\n",
        "df"
      ],
      "execution_count": 13,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "             pandas  is  an  open  source  programming  tools  .  the  best  \\\n",
              "pandas            0   2   1     0       0            0      0  0    0     0   \n",
              "is                2   0   1     1       0            0      0  0    0     0   \n",
              "an                1   1   0     1       1            0      0  0    0     0   \n",
              "open              0   1   1     0       1            1      0  0    0     0   \n",
              "source            0   0   1     1       0            1      1  0    0     0   \n",
              "programming       0   0   0     1       1            0      1  1    0     0   \n",
              "tools             0   0   0     0       1            1      0  1    1     0   \n",
              ".                 0   0   0     0       0            1      1  0    1     1   \n",
              "the               0   0   0     0       0            0      1  1    0     1   \n",
              "best              0   0   0     0       0            0      0  1    1     0   \n",
              "way               0   0   0     0       0            0      0  0    1     1   \n",
              "to                1   0   0     0       0            0      0  0    0     1   \n",
              "get               1   1   0     0       0            0      0  0    0     0   \n",
              "via               1   1   0     0       0            0      0  1    0     0   \n",
              "conda             1   1   0     0       0            0      0  2    0     0   \n",
              "install           1   0   0     0       0            0      0  1    0     0   \n",
              "\n",
              "             way  to  get  via  conda  install  \n",
              "pandas         0   1    1    1      1        1  \n",
              "is             0   0    1    1      1        0  \n",
              "an             0   0    0    0      0        0  \n",
              "open           0   0    0    0      0        0  \n",
              "source         0   0    0    0      0        0  \n",
              "programming    0   0    0    0      0        0  \n",
              "tools          0   0    0    0      0        0  \n",
              ".              0   0    0    1      2        1  \n",
              "the            1   0    0    0      0        0  \n",
              "best           1   1    0    0      0        0  \n",
              "way            0   1    1    0      0        0  \n",
              "to             1   0    1    0      0        0  \n",
              "get            1   1    0    0      0        0  \n",
              "via            0   0    0    0      1        0  \n",
              "conda          0   0    0    1      2        1  \n",
              "install        0   0    0    0      1        0  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-7f313dba-6386-4967-be87-f36e24969d20\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>pandas</th>\n",
              "      <th>is</th>\n",
              "      <th>an</th>\n",
              "      <th>open</th>\n",
              "      <th>source</th>\n",
              "      <th>programming</th>\n",
              "      <th>tools</th>\n",
              "      <th>.</th>\n",
              "      <th>the</th>\n",
              "      <th>best</th>\n",
              "      <th>way</th>\n",
              "      <th>to</th>\n",
              "      <th>get</th>\n",
              "      <th>via</th>\n",
              "      <th>conda</th>\n",
              "      <th>install</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>pandas</th>\n",
              "      <td>0</td>\n",
              "      <td>2</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>is</th>\n",
              "      <td>2</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>an</th>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>open</th>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>source</th>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>programming</th>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>tools</th>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>.</th>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>2</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>the</th>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>best</th>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>way</th>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>to</th>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>get</th>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>via</th>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>conda</th>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>2</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>2</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>install</th>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-7f313dba-6386-4967-be87-f36e24969d20')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-7f313dba-6386-4967-be87-f36e24969d20 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-7f313dba-6386-4967-be87-f36e24969d20');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-ac7b1e44-c85f-4f37-802a-4b6038974d94\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-ac7b1e44-c85f-4f37-802a-4b6038974d94')\"\n",
              "            title=\"Suggest charts\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-ac7b1e44-c85f-4f37-802a-4b6038974d94 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "\n",
              "  <div id=\"id_8404efe6-0d98-448e-b3c9-ee8ea40aeb2d\">\n",
              "    <style>\n",
              "      .colab-df-generate {\n",
              "        background-color: #E8F0FE;\n",
              "        border: none;\n",
              "        border-radius: 50%;\n",
              "        cursor: pointer;\n",
              "        display: none;\n",
              "        fill: #1967D2;\n",
              "        height: 32px;\n",
              "        padding: 0 0 0 0;\n",
              "        width: 32px;\n",
              "      }\n",
              "\n",
              "      .colab-df-generate:hover {\n",
              "        background-color: #E2EBFA;\n",
              "        box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "        fill: #174EA6;\n",
              "      }\n",
              "\n",
              "      [theme=dark] .colab-df-generate {\n",
              "        background-color: #3B4455;\n",
              "        fill: #D2E3FC;\n",
              "      }\n",
              "\n",
              "      [theme=dark] .colab-df-generate:hover {\n",
              "        background-color: #434B5C;\n",
              "        box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "        filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "        fill: #FFFFFF;\n",
              "      }\n",
              "    </style>\n",
              "    <button class=\"colab-df-generate\" onclick=\"generateWithVariable('df')\"\n",
              "            title=\"Generate code using this dataframe.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "    <script>\n",
              "      (() => {\n",
              "      const buttonEl =\n",
              "        document.querySelector('#id_8404efe6-0d98-448e-b3c9-ee8ea40aeb2d button.colab-df-generate');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      buttonEl.onclick = () => {\n",
              "        google.colab.notebook.generateWithVariable('df');\n",
              "      }\n",
              "      })();\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "variable_name": "df",
              "summary": "{\n  \"name\": \"df\",\n  \"rows\": 16,\n  \"fields\": [\n    {\n      \"column\": \"pandas\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0,\n          2,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"is\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          2,\n          0,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"an\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          0,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"open\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          1,\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"source\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          1,\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"programming\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          1,\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"tools\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          1,\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \".\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"the\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          1,\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"best\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          1,\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"way\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          1,\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"to\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          0,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"get\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          0,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"via\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          0,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"conda\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          1,\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"install\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          0,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
            }
          },
          "metadata": {},
          "execution_count": 13
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "MRbXiIsKRoYN",
        "outputId": "c0623e03-bcb1-4e27-e9b0-74b89fd69f7b"
      },
      "source": [
        "def cos_similarity(x, y, eps=1e-8):\n",
        "    nx = x / (np.sqrt(np.sum(x ** 2)) + eps)\n",
        "    ny = y / (np.sqrt(np.sum(y ** 2)) + eps)\n",
        "    return np.dot(nx, ny)\n",
        "\n",
        "def most_similar(query, word_to_id, id_to_word, word_matrix, top=5):\n",
        "    \"\"\"コサイン類似度Top5を出力。\n",
        "\n",
        "    :param query(str): クエリ。\n",
        "    :param word_to_id(dict): 単語をkeyとして、idを参照する辞書。\n",
        "    :param id_to_word(dict): idをkeyとして、単語を参照する辞書。\n",
        "    :param word_matrix: 共起行列。\n",
        "    :param top(int): 上位何件まで表示させるか。\n",
        "    :return: なし。\n",
        "    \"\"\"\n",
        "    if query not in word_to_id:\n",
        "        print('%s is not found' % query)\n",
        "        return\n",
        "\n",
        "    print('[query] ' + query)\n",
        "    query_id = word_to_id[query]\n",
        "    query_vec = word_matrix[query_id]\n",
        "\n",
        "    vocab_size = len(word_to_id)\n",
        "    similarity = np.zeros(vocab_size)\n",
        "    for i in range(vocab_size):\n",
        "        similarity[i] = cos_similarity(word_matrix[i], query_vec)\n",
        "\n",
        "    count = 0\n",
        "    for i in (-1 * similarity).argsort():\n",
        "        if id_to_word[i] == query:\n",
        "            continue\n",
        "        print(' %s: %s' % (id_to_word[i], similarity[i]))\n",
        "        count += 1\n",
        "        if count >= top:\n",
        "            return\n",
        "\n",
        "print('\\n# most_similar() with co_matrix')\n",
        "user_query = \"pandas\"\n",
        "most_similar(user_query, word_to_id, id_to_word, co_matrix)"
      ],
      "execution_count": 14,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "# most_similar() with co_matrix\n",
            "[query] pandas\n",
            " conda: 0.5477225541919766\n",
            " get: 0.4743416451535486\n",
            " open: 0.4743416451535486\n",
            " via: 0.4743416451535486\n",
            " is: 0.4216370186169938\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JcE9RH1kRoYO"
      },
      "source": [
        "## 相互情報量による分散表現の高度化\n",
        "共起行列をそのまま特徴してしまうと、theやaのような出現しやすい単語の重みを強くしすぎる傾向がある。これを緩和するため相互情報量を導入してみよう。\n",
        "\n",
        "- ppmi(): Positive PMI（正の相互情報量）。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 590
        },
        "id": "K8e08GIORoYP",
        "outputId": "34c76771-e0d8-497d-8fa4-ab5250b0445b"
      },
      "source": [
        "def ppmi(C, verbose=False, eps=1e-8):\n",
        "    \"\"\"Positive PMI（正の相互情報量）\n",
        "    「ゼロから作るDeepLearning2 自然言語処理辺」p.79より。\n",
        "\n",
        "    :param C: 共起行列。\n",
        "    :param verbose(boolean): 処理状況を出力するためのフラグ。\n",
        "    :param eps(float): np.log2演算時に-infとなるのを避けるための微小な値。\n",
        "    :return:\n",
        "    \"\"\"\n",
        "    M = np.zeros_like(C, dtype=np.float32)\n",
        "    N = np.sum(C)\n",
        "    S = np.sum(C, axis=0)\n",
        "    total = C.shape[0] * C.shape[1]\n",
        "    cnt = 0\n",
        "\n",
        "    for i in range(C.shape[0]):\n",
        "        for j in range(C.shape[1]):\n",
        "            pmi = np.log2(C[i, j] * N / (S[j]*S[i]) + eps)\n",
        "            M[i, j] = max(0, pmi)\n",
        "\n",
        "            if verbose:\n",
        "                cnt += 1\n",
        "                if cnt % (total//100) == 0:\n",
        "                    print('%.1f%% done' % (100+cnt/total))\n",
        "    return M\n",
        "\n",
        "M = ppmi(co_matrix)\n",
        "print('\\n# PPMI')\n",
        "df2 = pd.DataFrame(M, index=word_to_id.keys(), columns=word_to_id.keys())\n",
        "df2"
      ],
      "execution_count": 15,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "# PPMI\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "               pandas        is        an      open    source  programming  \\\n",
              "pandas       0.000000  1.478047  1.285402  0.000000  0.000000     0.000000   \n",
              "is           1.478047  0.000000  1.478047  1.478047  0.000000     0.000000   \n",
              "an           1.285402  1.478047  0.000000  2.285402  2.285402     0.000000   \n",
              "open         0.000000  1.478047  2.285402  0.000000  2.285402     2.285402   \n",
              "source       0.000000  0.000000  2.285402  2.285402  0.000000     2.285402   \n",
              "programming  0.000000  0.000000  0.000000  2.285402  2.285402     0.000000   \n",
              "tools        0.000000  0.000000  0.000000  0.000000  2.285402     2.285402   \n",
              ".            0.000000  0.000000  0.000000  0.000000  0.000000     1.285402   \n",
              "the          0.000000  0.000000  0.000000  0.000000  0.000000     0.000000   \n",
              "best         0.000000  0.000000  0.000000  0.000000  0.000000     0.000000   \n",
              "way          0.000000  0.000000  0.000000  0.000000  0.000000     0.000000   \n",
              "to           1.285402  0.000000  0.000000  0.000000  0.000000     0.000000   \n",
              "get          1.285402  1.478047  0.000000  0.000000  0.000000     0.000000   \n",
              "via          1.285402  1.478047  0.000000  0.000000  0.000000     0.000000   \n",
              "conda        0.285402  0.478047  0.000000  0.000000  0.000000     0.000000   \n",
              "install      1.700440  0.000000  0.000000  0.000000  0.000000     0.000000   \n",
              "\n",
              "                tools         .       the      best       way        to  \\\n",
              "pandas       0.000000  0.000000  0.000000  0.000000  0.000000  1.285402   \n",
              "is           0.000000  0.000000  0.000000  0.000000  0.000000  0.000000   \n",
              "an           0.000000  0.000000  0.000000  0.000000  0.000000  0.000000   \n",
              "open         0.000000  0.000000  0.000000  0.000000  0.000000  0.000000   \n",
              "source       2.285402  0.000000  0.000000  0.000000  0.000000  0.000000   \n",
              "programming  2.285402  1.285402  0.000000  0.000000  0.000000  0.000000   \n",
              "tools        0.000000  1.285402  2.285402  0.000000  0.000000  0.000000   \n",
              ".            1.285402  0.000000  1.285402  1.285402  0.000000  0.000000   \n",
              "the          2.285402  1.285402  0.000000  2.285402  2.285402  0.000000   \n",
              "best         0.000000  1.285402  2.285402  0.000000  2.285402  2.285402   \n",
              "way          0.000000  0.000000  2.285402  2.285402  0.000000  2.285402   \n",
              "to           0.000000  0.000000  0.000000  2.285402  2.285402  0.000000   \n",
              "get          0.000000  0.000000  0.000000  0.000000  2.285402  2.285402   \n",
              "via          0.000000  1.285402  0.000000  0.000000  0.000000  0.000000   \n",
              "conda        0.000000  1.285402  0.000000  0.000000  0.000000  0.000000   \n",
              "install      0.000000  1.700440  0.000000  0.000000  0.000000  0.000000   \n",
              "\n",
              "                  get       via     conda  install  \n",
              "pandas       1.285402  1.285402  0.285402  1.70044  \n",
              "is           1.478047  1.478047  0.478047  0.00000  \n",
              "an           0.000000  0.000000  0.000000  0.00000  \n",
              "open         0.000000  0.000000  0.000000  0.00000  \n",
              "source       0.000000  0.000000  0.000000  0.00000  \n",
              "programming  0.000000  0.000000  0.000000  0.00000  \n",
              "tools        0.000000  0.000000  0.000000  0.00000  \n",
              ".            0.000000  1.285402  1.285402  1.70044  \n",
              "the          0.000000  0.000000  0.000000  0.00000  \n",
              "best         0.000000  0.000000  0.000000  0.00000  \n",
              "way          2.285402  0.000000  0.000000  0.00000  \n",
              "to           2.285402  0.000000  0.000000  0.00000  \n",
              "get          0.000000  0.000000  0.000000  0.00000  \n",
              "via          0.000000  0.000000  1.285402  0.00000  \n",
              "conda        0.000000  1.285402  1.285402  1.70044  \n",
              "install      0.000000  0.000000  1.700440  0.00000  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-2e1a5dcc-ee76-4581-a38c-d12f388d08f1\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>pandas</th>\n",
              "      <th>is</th>\n",
              "      <th>an</th>\n",
              "      <th>open</th>\n",
              "      <th>source</th>\n",
              "      <th>programming</th>\n",
              "      <th>tools</th>\n",
              "      <th>.</th>\n",
              "      <th>the</th>\n",
              "      <th>best</th>\n",
              "      <th>way</th>\n",
              "      <th>to</th>\n",
              "      <th>get</th>\n",
              "      <th>via</th>\n",
              "      <th>conda</th>\n",
              "      <th>install</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>pandas</th>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.478047</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>0.285402</td>\n",
              "      <td>1.70044</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>is</th>\n",
              "      <td>1.478047</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.478047</td>\n",
              "      <td>1.478047</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.478047</td>\n",
              "      <td>1.478047</td>\n",
              "      <td>0.478047</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>an</th>\n",
              "      <td>1.285402</td>\n",
              "      <td>1.478047</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>open</th>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.478047</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>source</th>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>programming</th>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>tools</th>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>.</th>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>1.70044</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>the</th>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>best</th>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>way</th>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>to</th>\n",
              "      <td>1.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>get</th>\n",
              "      <td>1.285402</td>\n",
              "      <td>1.478047</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>via</th>\n",
              "      <td>1.285402</td>\n",
              "      <td>1.478047</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>conda</th>\n",
              "      <td>0.285402</td>\n",
              "      <td>0.478047</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>1.70044</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>install</th>\n",
              "      <td>1.700440</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.700440</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.700440</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-2e1a5dcc-ee76-4581-a38c-d12f388d08f1')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-2e1a5dcc-ee76-4581-a38c-d12f388d08f1 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-2e1a5dcc-ee76-4581-a38c-d12f388d08f1');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-bc7a0abf-40ce-4161-8f42-5506169f6ea1\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-bc7a0abf-40ce-4161-8f42-5506169f6ea1')\"\n",
              "            title=\"Suggest charts\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-bc7a0abf-40ce-4161-8f42-5506169f6ea1 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "\n",
              "  <div id=\"id_0175325b-7156-4143-a7af-6968f078a0da\">\n",
              "    <style>\n",
              "      .colab-df-generate {\n",
              "        background-color: #E8F0FE;\n",
              "        border: none;\n",
              "        border-radius: 50%;\n",
              "        cursor: pointer;\n",
              "        display: none;\n",
              "        fill: #1967D2;\n",
              "        height: 32px;\n",
              "        padding: 0 0 0 0;\n",
              "        width: 32px;\n",
              "      }\n",
              "\n",
              "      .colab-df-generate:hover {\n",
              "        background-color: #E2EBFA;\n",
              "        box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "        fill: #174EA6;\n",
              "      }\n",
              "\n",
              "      [theme=dark] .colab-df-generate {\n",
              "        background-color: #3B4455;\n",
              "        fill: #D2E3FC;\n",
              "      }\n",
              "\n",
              "      [theme=dark] .colab-df-generate:hover {\n",
              "        background-color: #434B5C;\n",
              "        box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "        filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "        fill: #FFFFFF;\n",
              "      }\n",
              "    </style>\n",
              "    <button class=\"colab-df-generate\" onclick=\"generateWithVariable('df2')\"\n",
              "            title=\"Generate code using this dataframe.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "    <script>\n",
              "      (() => {\n",
              "      const buttonEl =\n",
              "        document.querySelector('#id_0175325b-7156-4143-a7af-6968f078a0da button.colab-df-generate');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      buttonEl.onclick = () => {\n",
              "        google.colab.notebook.generateWithVariable('df2');\n",
              "      }\n",
              "      })();\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "variable_name": "df2",
              "summary": "{\n  \"name\": \"df2\",\n  \"rows\": 16,\n  \"fields\": [\n    {\n      \"column\": \"pandas\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 5,\n        \"samples\": [\n          1.478047251701355,\n          1.700439691543579,\n          1.2854021787643433\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"is\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          1.478047251701355,\n          0.0,\n          0.47804731130599976\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"an\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 4,\n        \"samples\": [\n          1.478047251701355,\n          2.285402297973633,\n          1.2854021787643433\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"open\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0.0,\n          1.478047251701355,\n          2.285402297973633\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"source\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          2.285402297973633,\n          0.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"programming\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0.0,\n          2.285402297973633\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"tools\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0.0,\n          2.285402297973633\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \".\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0.0,\n          1.2854021787643433\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"the\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0.0,\n          2.285402297973633\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"best\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0.0,\n          1.2854021787643433\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"way\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          2.285402297973633,\n          0.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"to\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          1.2854021787643433,\n          0.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"get\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 4,\n        \"samples\": [\n          1.478047251701355,\n          2.285402297973633\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"via\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          1.2854021787643433,\n          1.478047251701355\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"conda\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 5,\n        \"samples\": [\n          0.47804731130599976,\n          1.700439691543579\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"install\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          0.0,\n          1.700439691543579\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
            }
          },
          "metadata": {},
          "execution_count": 15
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 590
        },
        "id": "8tUoxAYZRoYP",
        "outputId": "a5d785f3-fd6e-419a-d434-5a00d267c066"
      },
      "source": [
        "#np.set_printoptions(precision=3) # 有効桁3桁（表示上の省略で、データは保持）\n",
        "pd.options.display.precision = 3 # 同上\n",
        "print('\\n# PPMI with precision=3')\n",
        "df2 = pd.DataFrame(M, index=word_to_id.keys(), columns=word_to_id.keys())\n",
        "df2"
      ],
      "execution_count": 16,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "# PPMI with precision=3\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "             pandas     is     an   open  source  programming  tools      .  \\\n",
              "pandas        0.000  1.478  1.285  0.000   0.000        0.000  0.000  0.000   \n",
              "is            1.478  0.000  1.478  1.478   0.000        0.000  0.000  0.000   \n",
              "an            1.285  1.478  0.000  2.285   2.285        0.000  0.000  0.000   \n",
              "open          0.000  1.478  2.285  0.000   2.285        2.285  0.000  0.000   \n",
              "source        0.000  0.000  2.285  2.285   0.000        2.285  2.285  0.000   \n",
              "programming   0.000  0.000  0.000  2.285   2.285        0.000  2.285  1.285   \n",
              "tools         0.000  0.000  0.000  0.000   2.285        2.285  0.000  1.285   \n",
              ".             0.000  0.000  0.000  0.000   0.000        1.285  1.285  0.000   \n",
              "the           0.000  0.000  0.000  0.000   0.000        0.000  2.285  1.285   \n",
              "best          0.000  0.000  0.000  0.000   0.000        0.000  0.000  1.285   \n",
              "way           0.000  0.000  0.000  0.000   0.000        0.000  0.000  0.000   \n",
              "to            1.285  0.000  0.000  0.000   0.000        0.000  0.000  0.000   \n",
              "get           1.285  1.478  0.000  0.000   0.000        0.000  0.000  0.000   \n",
              "via           1.285  1.478  0.000  0.000   0.000        0.000  0.000  1.285   \n",
              "conda         0.285  0.478  0.000  0.000   0.000        0.000  0.000  1.285   \n",
              "install       1.700  0.000  0.000  0.000   0.000        0.000  0.000  1.700   \n",
              "\n",
              "               the   best    way     to    get    via  conda  install  \n",
              "pandas       0.000  0.000  0.000  1.285  1.285  1.285  0.285      1.7  \n",
              "is           0.000  0.000  0.000  0.000  1.478  1.478  0.478      0.0  \n",
              "an           0.000  0.000  0.000  0.000  0.000  0.000  0.000      0.0  \n",
              "open         0.000  0.000  0.000  0.000  0.000  0.000  0.000      0.0  \n",
              "source       0.000  0.000  0.000  0.000  0.000  0.000  0.000      0.0  \n",
              "programming  0.000  0.000  0.000  0.000  0.000  0.000  0.000      0.0  \n",
              "tools        2.285  0.000  0.000  0.000  0.000  0.000  0.000      0.0  \n",
              ".            1.285  1.285  0.000  0.000  0.000  1.285  1.285      1.7  \n",
              "the          0.000  2.285  2.285  0.000  0.000  0.000  0.000      0.0  \n",
              "best         2.285  0.000  2.285  2.285  0.000  0.000  0.000      0.0  \n",
              "way          2.285  2.285  0.000  2.285  2.285  0.000  0.000      0.0  \n",
              "to           0.000  2.285  2.285  0.000  2.285  0.000  0.000      0.0  \n",
              "get          0.000  0.000  2.285  2.285  0.000  0.000  0.000      0.0  \n",
              "via          0.000  0.000  0.000  0.000  0.000  0.000  1.285      0.0  \n",
              "conda        0.000  0.000  0.000  0.000  0.000  1.285  1.285      1.7  \n",
              "install      0.000  0.000  0.000  0.000  0.000  0.000  1.700      0.0  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-85f39a8a-82b5-4086-ad75-1324aa0e7247\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>pandas</th>\n",
              "      <th>is</th>\n",
              "      <th>an</th>\n",
              "      <th>open</th>\n",
              "      <th>source</th>\n",
              "      <th>programming</th>\n",
              "      <th>tools</th>\n",
              "      <th>.</th>\n",
              "      <th>the</th>\n",
              "      <th>best</th>\n",
              "      <th>way</th>\n",
              "      <th>to</th>\n",
              "      <th>get</th>\n",
              "      <th>via</th>\n",
              "      <th>conda</th>\n",
              "      <th>install</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>pandas</th>\n",
              "      <td>0.000</td>\n",
              "      <td>1.478</td>\n",
              "      <td>1.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.285</td>\n",
              "      <td>1.285</td>\n",
              "      <td>1.285</td>\n",
              "      <td>0.285</td>\n",
              "      <td>1.7</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>is</th>\n",
              "      <td>1.478</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.478</td>\n",
              "      <td>1.478</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.478</td>\n",
              "      <td>1.478</td>\n",
              "      <td>0.478</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>an</th>\n",
              "      <td>1.285</td>\n",
              "      <td>1.478</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>open</th>\n",
              "      <td>0.000</td>\n",
              "      <td>1.478</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>source</th>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>programming</th>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>1.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>tools</th>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>.</th>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.285</td>\n",
              "      <td>1.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.285</td>\n",
              "      <td>1.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.285</td>\n",
              "      <td>1.285</td>\n",
              "      <td>1.7</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>the</th>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>1.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>best</th>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>way</th>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>to</th>\n",
              "      <td>1.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>get</th>\n",
              "      <td>1.285</td>\n",
              "      <td>1.478</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>via</th>\n",
              "      <td>1.285</td>\n",
              "      <td>1.478</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.285</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>conda</th>\n",
              "      <td>0.285</td>\n",
              "      <td>0.478</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.285</td>\n",
              "      <td>1.285</td>\n",
              "      <td>1.7</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>install</th>\n",
              "      <td>1.700</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.700</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.700</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-85f39a8a-82b5-4086-ad75-1324aa0e7247')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-85f39a8a-82b5-4086-ad75-1324aa0e7247 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-85f39a8a-82b5-4086-ad75-1324aa0e7247');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-56b2c79c-b210-4c04-af42-c6bd598e0fe0\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-56b2c79c-b210-4c04-af42-c6bd598e0fe0')\"\n",
              "            title=\"Suggest charts\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-56b2c79c-b210-4c04-af42-c6bd598e0fe0 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "\n",
              "  <div id=\"id_84f98ca0-0cb6-4f38-90d0-8b037e606f6a\">\n",
              "    <style>\n",
              "      .colab-df-generate {\n",
              "        background-color: #E8F0FE;\n",
              "        border: none;\n",
              "        border-radius: 50%;\n",
              "        cursor: pointer;\n",
              "        display: none;\n",
              "        fill: #1967D2;\n",
              "        height: 32px;\n",
              "        padding: 0 0 0 0;\n",
              "        width: 32px;\n",
              "      }\n",
              "\n",
              "      .colab-df-generate:hover {\n",
              "        background-color: #E2EBFA;\n",
              "        box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "        fill: #174EA6;\n",
              "      }\n",
              "\n",
              "      [theme=dark] .colab-df-generate {\n",
              "        background-color: #3B4455;\n",
              "        fill: #D2E3FC;\n",
              "      }\n",
              "\n",
              "      [theme=dark] .colab-df-generate:hover {\n",
              "        background-color: #434B5C;\n",
              "        box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "        filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "        fill: #FFFFFF;\n",
              "      }\n",
              "    </style>\n",
              "    <button class=\"colab-df-generate\" onclick=\"generateWithVariable('df2')\"\n",
              "            title=\"Generate code using this dataframe.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "    <script>\n",
              "      (() => {\n",
              "      const buttonEl =\n",
              "        document.querySelector('#id_84f98ca0-0cb6-4f38-90d0-8b037e606f6a button.colab-df-generate');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      buttonEl.onclick = () => {\n",
              "        google.colab.notebook.generateWithVariable('df2');\n",
              "      }\n",
              "      })();\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "variable_name": "df2",
              "summary": "{\n  \"name\": \"df2\",\n  \"rows\": 16,\n  \"fields\": [\n    {\n      \"column\": \"pandas\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 5,\n        \"samples\": [\n          1.478047251701355,\n          1.700439691543579,\n          1.2854021787643433\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"is\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          1.478047251701355,\n          0.0,\n          0.47804731130599976\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"an\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 4,\n        \"samples\": [\n          1.478047251701355,\n          2.285402297973633,\n          1.2854021787643433\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"open\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0.0,\n          1.478047251701355,\n          2.285402297973633\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"source\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          2.285402297973633,\n          0.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"programming\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0.0,\n          2.285402297973633\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"tools\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0.0,\n          2.285402297973633\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \".\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0.0,\n          1.2854021787643433\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"the\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0.0,\n          2.285402297973633\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"best\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0.0,\n          1.2854021787643433\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"way\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          2.285402297973633,\n          0.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"to\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          1.2854021787643433,\n          0.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"get\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 4,\n        \"samples\": [\n          1.478047251701355,\n          2.285402297973633\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"via\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          1.2854021787643433,\n          1.478047251701355\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"conda\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 5,\n        \"samples\": [\n          0.47804731130599976,\n          1.700439691543579\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"install\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          0.0,\n          1.700439691543579\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
            }
          },
          "metadata": {},
          "execution_count": 16
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "3OuF6wnMRoYP",
        "outputId": "cd0fc379-1d74-4761-bf17-411a951a84b7"
      },
      "source": [
        "print('\\n# most_similar() with PPMI')\n",
        "most_similar(user_query, word_to_id, id_to_word, M)\n"
      ],
      "execution_count": 17,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "# most_similar() with PPMI\n",
            "[query] pandas\n",
            " conda: 0.5733166933059692\n",
            " is: 0.5094797611236572\n",
            " .: 0.40005457401275635\n",
            " get: 0.39511924982070923\n",
            " way: 0.3747256100177765\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1qf7f7ZbRoYQ"
      },
      "source": [
        "## SVDによる次元削減\n",
        "- np.linalg.svd(): 線形代数ライブラリを利用。\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "ZhHJvY9pRoYQ",
        "outputId": "48303a36-b567-43cb-eb0d-646f21f35bc5"
      },
      "source": [
        "# svd\n",
        "U, S, V = np.linalg.svd(M)\n",
        "print('\\n# SVD: dense vectors with all singular values')\n",
        "print(U)\n",
        "\n",
        "use_s_values = 2\n",
        "U2 = U[:,0:use_s_values]\n",
        "print('\\n# SVD: dense vectors with singular values = {}'.format(use_s_values))\n",
        "print(U2)\n",
        "\n",
        "print('\\n# most_similar() with SVD-2')\n",
        "most_similar(user_query, word_to_id, id_to_word, U2)\n"
      ],
      "execution_count": 18,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "# SVD: dense vectors with all singular values\n",
            "[[-0.20909968 -0.05742509 -0.4260506  -0.17504837  0.2712971  -0.04672289\n",
            "   0.18823615 -0.18591174 -0.36170715 -0.02090307  0.33414906 -0.12803775\n",
            "  -0.20772134 -0.17504339  0.12016748 -0.49748433]\n",
            " [-0.20854177  0.06963065 -0.3888203   0.1402168   0.20635465 -0.15388878\n",
            "  -0.2102995  -0.13104966  0.38273078  0.39125082  0.11626503  0.32180378\n",
            "   0.35602686 -0.3037927  -0.09017423  0.09666564]\n",
            " [-0.23961712  0.26655334 -0.18719402  0.05390574 -0.35268205 -0.2592225\n",
            "  -0.32773367 -0.09349788 -0.2730225   0.0377553  -0.40234843  0.31079715\n",
            "  -0.17229767  0.28257638 -0.12586372 -0.25696287]\n",
            " [-0.2782892   0.35838643 -0.04367146 -0.32175225  0.05215078 -0.23192607\n",
            "   0.21995124  0.49517277 -0.11888672 -0.1294459   0.23939379  0.12363093\n",
            "   0.32118776  0.27299288  0.13509066  0.19793455]\n",
            " [-0.3138919   0.39330027  0.14138082  0.21887659  0.3510343  -0.1303136\n",
            "   0.33645037 -0.16895744  0.4147285  -0.14546476 -0.16639212 -0.12915234\n",
            "  -0.36436176  0.11097264 -0.11952758 -0.00717432]\n",
            " [-0.29259852  0.32289356  0.2055024   0.29696342 -0.23372465  0.05240921\n",
            "  -0.22590032 -0.37780765 -0.24121043 -0.03001606  0.29355177 -0.3378207\n",
            "   0.1754489  -0.11105473  0.2770583   0.22205435]\n",
            " [-0.29597434  0.18007208  0.3224701  -0.47838306 -0.15175404  0.19339524\n",
            "  -0.20657967  0.22351523  0.11675979  0.25128582  0.00841147 -0.15087281\n",
            "  -0.17578007 -0.40954658 -0.2627039  -0.1637667 ]\n",
            " [-0.25471854 -0.01025081  0.02076114 -0.05029602  0.36506298  0.47563514\n",
            "   0.0229755   0.00612366 -0.19963232  0.08954368 -0.58078074  0.02569766\n",
            "   0.26210678 -0.01673562  0.34222415 -0.00723523]\n",
            " [-0.29356492 -0.19401811  0.31909984  0.49437767 -0.02606995  0.1744973\n",
            "   0.20824735  0.21605654 -0.2066092  -0.09872612  0.19246763  0.389128\n",
            "   0.11316317 -0.11955503 -0.29818308 -0.22771738]\n",
            " [-0.28762156 -0.33267495  0.200779   -0.2494268  -0.28890833  0.02771297\n",
            "   0.27570227 -0.329145    0.23007524  0.28904447  0.13154629  0.28043836\n",
            "  -0.10255558  0.28061515  0.3373071   0.00459014]\n",
            " [-0.30796608 -0.3980363   0.13565156 -0.2586614   0.28628355 -0.16009486\n",
            "  -0.41644257 -0.21415797  0.02268608 -0.5027486   0.02590389  0.01947945\n",
            "   0.08477696  0.10067189 -0.20825593  0.13973713]\n",
            " [-0.26908046 -0.35622442 -0.04449803  0.3045457   0.09906036 -0.229743\n",
            "  -0.24744193  0.4765816   0.00974799  0.2936326  -0.00161449 -0.30523014\n",
            "  -0.30354518  0.11671898  0.24741183  0.0925235 ]\n",
            " [-0.23532367 -0.25857002 -0.19031367 -0.01856818 -0.34355265 -0.2738812\n",
            "   0.43220395 -0.04992938 -0.06145945 -0.03610538 -0.34404024 -0.3561516\n",
            "   0.258372   -0.24605696 -0.2454495   0.14502974]\n",
            " [-0.132769    0.00144362 -0.31328392  0.02380027 -0.25689903  0.2137918\n",
            "  -0.00529088  0.15167178  0.13748148 -0.45615938 -0.02669457  0.27819303\n",
            "  -0.35791093 -0.38731208  0.29279673  0.28283092]\n",
            " [-0.12850079 -0.00502957 -0.3181515  -0.0227449   0.01897933  0.46368992\n",
            "   0.04571127 -0.06436063 -0.18308993  0.2039377   0.152593   -0.05987117\n",
            "  -0.20431407  0.32411188 -0.46140748  0.44548053]\n",
            " [-0.12502334 -0.01886328 -0.2561411   0.08992951 -0.24950773  0.35275292\n",
            "  -0.11904678  0.12406847  0.44487035 -0.22767347  0.08375221 -0.287249\n",
            "   0.2736007   0.31247255 -0.00371157 -0.42694545]]\n",
            "\n",
            "# SVD: dense vectors with singular values = 2\n",
            "[[-0.20909968 -0.05742509]\n",
            " [-0.20854177  0.06963065]\n",
            " [-0.23961712  0.26655334]\n",
            " [-0.2782892   0.35838643]\n",
            " [-0.3138919   0.39330027]\n",
            " [-0.29259852  0.32289356]\n",
            " [-0.29597434  0.18007208]\n",
            " [-0.25471854 -0.01025081]\n",
            " [-0.29356492 -0.19401811]\n",
            " [-0.28762156 -0.33267495]\n",
            " [-0.30796608 -0.3980363 ]\n",
            " [-0.26908046 -0.35622442]\n",
            " [-0.23532367 -0.25857002]\n",
            " [-0.132769    0.00144362]\n",
            " [-0.12850079 -0.00502957]\n",
            " [-0.12502334 -0.01886328]]\n",
            "\n",
            "# most_similar() with SVD-2\n",
            "[query] pandas\n",
            " install: 0.9930136799812317\n",
            " .: 0.9741653800010681\n",
            " conda: 0.9739159345626831\n",
            " via: 0.9613599181175232\n",
            " the: 0.950492262840271\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "2nsKV0lhRoYQ"
      },
      "source": [],
      "execution_count": 18,
      "outputs": []
    }
  ]
}