{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.6"
    },
    "colab": {
      "name": "nlp1.ipynb",
      "provenance": []
    }
  },
  "cells": [
    {
      "cell_type": "code",
      "source": [
        "!date\n",
        "!python --version"
      ],
      "metadata": {
        "id": "LCat4RVnJmyt",
        "outputId": "58157d55-2fe1-491c-a4cf-ccc5d4799b32",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "execution_count": 1,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Thu May 30 01:43:21 AM UTC 2024\n",
            "Python 3.10.12\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mf7eFNWYRoYA"
      },
      "source": [
        "# コード例：シソーラス、カウントと推論に基づいた設計（生実装、NLTK編）\n",
        "- 補足\n",
        "  - 自然言語処理は利用するツールによって操作が大きく異なります。ここでは代表的な前処理（文分割、トークナイズ、ステミング等）を観察しやすくすることを優先しています。後日より使いやすいツールについても紹介する予定です。\n",
        "- 全体の流れ\n",
        "    - 事前準備\n",
        "    - シソーラスの例\n",
        "    - Bag-of-Words\n",
        "    - sklearnのBoWとTF-IDFを使った例\n",
        "    - 共起行列に基づいた単語のベクトル化\n",
        "    - 相互情報量による分散表現の高度化\n",
        "    - SVDによる次元削減"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Dqa9QDJ7RoYG"
      },
      "source": [
        "## 事前準備\n",
        "- 実行する際の注意\n",
        "    - [Natural Language Toolkit; NLTK](https://www.nltk.org)のインストールと、コーパス等の追加ダウンロードが必要。（全てをまとめてインストールすることも可能だが、それなりに容量を必要とするためデフォルトでは最小限しかインストールされない）。\n",
        "    - 手順\n",
        "      - NLTKインストール。\n",
        "      - pythonインタプリタから ``nltk.download()`` を実行。関連コーパス等（下記）をダウンロード。\n",
        "        - Corporaタブにある wordnet, stopwords\n",
        "        - Modelsタブにある punkt\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "MWObKXywR_i5"
      },
      "source": [
        "# 2024年5月現在、インストール不要。\n",
        "#!pip install nltk"
      ],
      "execution_count": 2,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "huRK9h9VSPil",
        "outputId": "8557726b-5a96-479b-a4ba-d136eb244f6d"
      },
      "source": [
        "import nltk\n",
        "nltk.download(['wordnet', 'stopwords', 'punkt'])"
      ],
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "[nltk_data] Downloading package wordnet to /root/nltk_data...\n",
            "[nltk_data]   Package wordnet is already up-to-date!\n",
            "[nltk_data] Downloading package stopwords to /root/nltk_data...\n",
            "[nltk_data]   Package stopwords is already up-to-date!\n",
            "[nltk_data] Downloading package punkt to /root/nltk_data...\n",
            "[nltk_data]   Package punkt is already up-to-date!\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "True"
            ]
          },
          "metadata": {},
          "execution_count": 3
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "e4EZ_PuwRoYH"
      },
      "source": [
        "## シソーラスの例\n",
        "- preprocess_docs(): テキストに対する前処理の例。\n",
        "- simple_matching(): ユーザクエリに対する単純な単語マッチングによるスコアを算出。\n",
        "- relation_matching(): 単純マッチングに加え、シソーラスを使って加点する例。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "rrF7i9ojRoYH",
        "outputId": "2e9a020f-e1c5-4f81-f03c-2c2c805754ce"
      },
      "source": [
        "# 前処理\n",
        "from nltk.tokenize import wordpunct_tokenize, sent_tokenize\n",
        "# ＜使用しているNLTKライブラリの説明＞\n",
        "# nltk.corpus.stopwords: 文章を特徴付ける要素として不適切なものを除外するためのブラックリスト。通称ストップワード。\n",
        "# nltk.sent_tokenize: 文章(doc)を文(sentence)に分割する。\n",
        "# nltk.wordpunct_tokenize: 文(sentence)を単語(word)に分割する。通称トークン化。\n",
        "# nltk.lemmatize: 単語(word)を基本形(らしきもの)に修正する。通称ステミング。\n",
        "\n",
        "import numpy as np\n",
        "\n",
        "# ドキュメント例（3つのドキュメント）\n",
        "docs = []\n",
        "docs.append(\"You can get dis-counted price with trade-in.\")\n",
        "docs.append(\"iPhone 11 shoots beautifully sharp 4K video at 60 fps across all its cameras.\")\n",
        "docs.append(\"From $16.62/mo. or $399 with trade-in.\")\n",
        "\n",
        "def preprocess_docs(docs):\n",
        "    '''英文書集合 docs に対し前処理を施し、分かち書きしたリストのリストとして返す。\n",
        "\n",
        "    :param docs(list): 1文書1文字列で保存。複数文書をリストとして並べたもの。\n",
        "    :return (list): 文分割、単語分割、基本形、ストップワード除去した結果。\n",
        "    '''\n",
        "    stopwords = nltk.corpus.stopwords.words('english')\n",
        "    stopwords.append('.')  # ピリオドを追加。\n",
        "    stopwords.append(',')  # カンマを追加。\n",
        "    stopwords.append('')  # 空文字を追加。\n",
        "\n",
        "    result = []\n",
        "    wnl = nltk.stem.wordnet.WordNetLemmatizer()\n",
        "    for doc in docs:\n",
        "        temp = []\n",
        "        for sent in sent_tokenize(doc):\n",
        "            for word in wordpunct_tokenize(sent):\n",
        "                this_word = wnl.lemmatize(word.lower())\n",
        "                if this_word not in stopwords:\n",
        "                    temp.append(this_word)\n",
        "        result.append(temp)\n",
        "    return result\n",
        "\n",
        "docs2 = preprocess_docs(docs)\n",
        "for index in range(len(docs2)):\n",
        "    print('before: ', docs[index])\n",
        "    print('after: ', docs2[index])\n",
        "    print('----')\n"
      ],
      "execution_count": 4,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "before:  You can get dis-counted price with trade-in.\n",
            "after:  ['get', 'dis', '-', 'counted', 'price', 'trade', '-']\n",
            "----\n",
            "before:  iPhone 11 shoots beautifully sharp 4K video at 60 fps across all its cameras.\n",
            "after:  ['iphone', '11', 'shoot', 'beautifully', 'sharp', '4k', 'video', '60', 'fps', 'across', 'camera']\n",
            "----\n",
            "before:  From $16.62/mo. or $399 with trade-in.\n",
            "after:  ['$', '16', '62', '/', 'mo', '$', '399', 'trade', '-']\n",
            "----\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "T7zIO15ERoYI"
      },
      "source": [
        "### simple_matching()\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "yH2IszZMRoYI",
        "outputId": "6fb02f9d-b70a-4398-af1f-1def15781ee1"
      },
      "source": [
        "# simple matching\n",
        "def simple_matching(query, docs):\n",
        "    '''単純な単語マッチングによりマッチ数でスコアを算出。\n",
        "\n",
        "    :param query(str): クエリ（検索要求）。\n",
        "    :param docs(list): 1文書1文字列で保存。複数文書をリストとして並べたもの。\n",
        "    :return (list): 文書毎のスコア。\n",
        "    '''\n",
        "    query = query.split(\" \")\n",
        "    result = []\n",
        "    for doc in docs:\n",
        "        score = 0\n",
        "        for word in doc:\n",
        "            for key in query:\n",
        "                if key == word:\n",
        "                    score += 1\n",
        "        result.append(score)\n",
        "    return result\n",
        "\n",
        "user_query = \"how much iphone\"\n",
        "scores = simple_matching(user_query, docs2)\n",
        "print('simple_matching scores = ', scores)\n"
      ],
      "execution_count": 5,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "simple_matching scores =  [0, 1, 0]\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LT5p-w38RoYJ"
      },
      "source": [
        "### relation_matching()\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "dsWTpzGrRoYJ",
        "outputId": "d8bcb2c3-6a68-4523-bf22-071b2ef5ae1e"
      },
      "source": [
        "# relation matching\n",
        "related_words = {}\n",
        "related_words['buy'] = ['buy', '$', 'price', 'how much', 'trade-in']\n",
        "related_words['UX'] = ['UX', 'stylish', 'seamless']\n",
        "\n",
        "def relation_matching(query, docs, related_words):\n",
        "    '''予め用意された関連用語を利用し、マッチする数を加点して算出。\n",
        "\n",
        "    :param query(str): クエリ（検索要求）。\n",
        "    :param docs(list): 1文書1文字列で保存。複数文書をリストとして並べたもの。\n",
        "    :param related_words:\n",
        "    :return (list): 文書毎のスコア。\n",
        "    '''\n",
        "    scores = simple_matching(query, docs)\n",
        "\n",
        "    query = query.split(\" \")\n",
        "    for q in query:\n",
        "        for relation in related_words:\n",
        "            matches = [q in word for word in related_words[relation]]\n",
        "            if True in matches:\n",
        "                new_query = ' '.join(related_words[relation])\n",
        "                temp_scores = simple_matching(new_query, docs)\n",
        "                print('# q = {}, relation = {} => temp_scores = {}'.format(q, relation, temp_scores))\n",
        "                scores = list(np.array(scores) + np.array(temp_scores))\n",
        "    scores = list(scores)\n",
        "    return scores\n",
        "\n",
        "scores2 = relation_matching(user_query, docs2, related_words)\n",
        "print('simple_matching scores = ', scores)\n",
        "print('relation_matching scores = ', scores2)\n"
      ],
      "execution_count": 6,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "# q = how, relation = buy => temp_scores = [1, 0, 2]\n",
            "# q = much, relation = buy => temp_scores = [1, 0, 2]\n",
            "simple_matching scores =  [0, 1, 0]\n",
            "relation_matching scores =  [2, 1, 4]\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Uhc5-DUQRoYK"
      },
      "source": [
        "## Bag-of-Words\n",
        "- collect_words_eng(): 英文書集合から単語コードブック作成\n",
        "- make_vectors_eng(): コードブックを素性とする文書ベクトルを作る\n",
        "- euclidean_distance(): ユークリッド距離\n",
        "- cosine_distance(): コサイン距離\n",
        "- cosine_similarity(): コサイン類似度"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Gbedc9x4RoYK",
        "outputId": "bd9c4bd6-3d92-4f70-cc23-2f44b4aaa49b"
      },
      "source": [
        "import scipy.spatial.distance as distance\n",
        "\n",
        "# BoW\n",
        "# ドキュメント例（3つのドキュメント）\n",
        "docs3 = []\n",
        "docs3.append(\"This is test.\")\n",
        "docs3.append(\"That is test too.\")\n",
        "docs3.append(\"There are so many many tests.\")\n",
        "\n",
        "\n",
        "# 文書集合からターム素性集合（コードブック）を作る\n",
        "def collect_words_eng(docs):\n",
        "    '''英文書集合から単語コードブック作成。\n",
        "    シンプルに文書集合を予め決めうちした方式で処理する。\n",
        "    必要に応じて指定できるようにしていた方が使い易いかも。\n",
        "\n",
        "    :param docs(list): 1文書1文字列で保存。複数文書をリストとして並べたもの。\n",
        "    :return (list): 文分割、単語分割、基本形、ストップワード除去した、ユニークな単語一覧。\n",
        "    '''\n",
        "    codebook = []\n",
        "    stopwords = nltk.corpus.stopwords.words('english')\n",
        "    stopwords.append('.')   # ピリオドを追加。\n",
        "    stopwords.append(',')   # カンマを追加。\n",
        "    stopwords.append('')    # 空文字を追加。\n",
        "    wnl = nltk.stem.wordnet.WordNetLemmatizer()\n",
        "    for doc in docs:\n",
        "        for sent in sent_tokenize(doc):\n",
        "            for word in wordpunct_tokenize(sent):\n",
        "                this_word = wnl.lemmatize(word.lower())\n",
        "                if this_word not in codebook and this_word not in stopwords:\n",
        "                    codebook.append(this_word)\n",
        "    return codebook\n",
        "\n",
        "codebook = collect_words_eng(docs3)\n",
        "print('codebook = ',codebook)\n"
      ],
      "execution_count": 7,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "codebook =  ['test', 'many']\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "OFSt8UnORoYK",
        "outputId": "7e7e91f2-aacb-449a-9500-52aa5495eb56"
      },
      "source": [
        "# コードブックを素性とする文書ベクトルを作る (直接ベクトル生成)\n",
        "def make_vectors_eng(docs, codebook):\n",
        "    '''コードブックを素性とする文書ベクトルを作る（直接ベクトル生成）\n",
        "\n",
        "    :param docs(list): 1文書1文字列で保存。複数文書をリストとして並べたもの。\n",
        "    :param codebook(list): ユニークな単語一覧。\n",
        "    :return (list): コードブックを元に、出現回数を特徴量とするベクトルを返す。\n",
        "    '''\n",
        "    vectors = []\n",
        "    wnl = nltk.stem.wordnet.WordNetLemmatizer()\n",
        "    for doc in docs:\n",
        "        this_vector = []\n",
        "        fdist = nltk.FreqDist()\n",
        "        for sent in sent_tokenize(doc):\n",
        "            for word in wordpunct_tokenize(sent):\n",
        "                this_word = wnl.lemmatize(word.lower())\n",
        "                fdist[this_word] += 1\n",
        "        for word in codebook:\n",
        "            this_vector.append(fdist[word])\n",
        "        vectors.append(this_vector)\n",
        "    return vectors\n",
        "\n",
        "vectors = make_vectors_eng(docs3, codebook)\n",
        "for index in range(len(docs3)):\n",
        "    print('docs[{}] = {}'.format(index,docs3[index]))\n",
        "    print('vectors[{}] = {}'.format(index,vectors[index]))\n",
        "    print('----')\n"
      ],
      "execution_count": 8,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "docs[0] = This is test.\n",
            "vectors[0] = [1, 0]\n",
            "----\n",
            "docs[1] = That is test too.\n",
            "vectors[1] = [1, 0]\n",
            "----\n",
            "docs[2] = There are so many many tests.\n",
            "vectors[2] = [1, 2]\n",
            "----\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "fCFUHGcdRoYL",
        "outputId": "7f56d2e3-94ab-47d8-d32d-1267f186e7c5"
      },
      "source": [
        "def euclidean_distance(vectors):\n",
        "    vectors = np.array(vectors)\n",
        "    distances = []\n",
        "    for i in range(len(vectors)):\n",
        "        temp = []\n",
        "        for j in range(len(vectors)):\n",
        "            temp.append(np.linalg.norm(vectors[i] - vectors[j]))\n",
        "        distances.append(temp)\n",
        "    return distances\n",
        "\n",
        "distances = euclidean_distance(vectors)\n",
        "print('# euclidean_distance')\n",
        "for index in range(len(distances)):\n",
        "    print(distances[index])\n",
        "\n",
        "def cosine_distance(vectors):\n",
        "    vectors = np.array(vectors)\n",
        "    distances = []\n",
        "    for i in range(len(vectors)):\n",
        "        temp = []\n",
        "        for j in range(len(vectors)):\n",
        "            temp.append(distance.cosine(vectors[i], vectors[j]))\n",
        "        distances.append(temp)\n",
        "    return distances\n",
        "\n",
        "distances = cosine_distance(vectors)\n",
        "print('# cosine_distance')\n",
        "for index in range(len(distances)):\n",
        "    print(distances[index])\n",
        "\n",
        "\n",
        "import sklearn.metrics.pairwise as pairwise\n",
        "distances = pairwise.cosine_similarity(vectors)\n",
        "print('# cosine_similarity')\n",
        "for index in range(len(distances)):\n",
        "    print(distances[index])\n"
      ],
      "execution_count": 9,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "# euclidean_distance\n",
            "[0.0, 0.0, 2.0]\n",
            "[0.0, 0.0, 2.0]\n",
            "[2.0, 2.0, 0.0]\n",
            "# cosine_distance\n",
            "[0, 0, 0.5527864045000421]\n",
            "[0, 0, 0.5527864045000421]\n",
            "[0.5527864045000421, 0.5527864045000421, 0]\n",
            "# cosine_similarity\n",
            "[1.        1.        0.4472136]\n",
            "[1.        1.        0.4472136]\n",
            "[0.4472136 0.4472136 1.       ]\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jGuHZ15hRoYM"
      },
      "source": [
        "## sklearnのBoWとTF-IDFを使った例\n",
        "- ステミング、ストップワード等の指定もできるが、細かな制御はしにくいかも。（主観）"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "c4T04o3URoYM",
        "outputId": "99f86511-58cb-449a-ce3b-1ff49f9f1045"
      },
      "source": [
        "import sklearn.feature_extraction.text as fe_text\n",
        "\n",
        "def bow(docs):\n",
        "    '''Bag-of-Wordsによるベクトルを生成。\n",
        "\n",
        "    :param docs(list): 1文書1文字列で保存。複数文書をリストとして並べたもの。\n",
        "    :return: 文書ベクトル。\n",
        "    '''\n",
        "    vectorizer = fe_text.CountVectorizer(stop_words='english')\n",
        "    vectors = vectorizer.fit_transform(docs)\n",
        "    return vectors.toarray(), vectorizer\n",
        "\n",
        "vectors, vectorizer = bow(docs)\n",
        "print('# normal BoW')\n",
        "print(vectorizer.get_feature_names_out())\n",
        "print(vectors)\n",
        "\n",
        "def bow_tfidf(docs):\n",
        "    '''Bag-of-WordsにTF-IDFで重み調整したベクトルを生成。\n",
        "\n",
        "    :param docs(list): 1文書1文字列で保存。複数文書をリストとして並べたもの。\n",
        "    :return: 重み調整したベクトル。\n",
        "    '''\n",
        "    vectorizer = fe_text.TfidfVectorizer(norm=None, stop_words='english')\n",
        "    vectors = vectorizer.fit_transform(docs)\n",
        "    return vectors.toarray(), vectorizer\n",
        "\n",
        "vectors, vectorizer = bow_tfidf(docs)\n",
        "print('# BoW + tfidf')\n",
        "print(vectorizer.get_feature_names_out())\n",
        "print(vectors)\n"
      ],
      "execution_count": 10,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "# normal BoW\n",
            "['11' '16' '399' '4k' '60' '62' 'beautifully' 'cameras' 'counted' 'dis'\n",
            " 'fps' 'iphone' 'mo' 'price' 'sharp' 'shoots' 'trade' 'video']\n",
            "[[0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0]\n",
            " [1 0 0 1 1 0 1 1 0 0 1 1 0 0 1 1 0 1]\n",
            " [0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0]]\n",
            "# BoW + tfidf\n",
            "['11' '16' '399' '4k' '60' '62' 'beautifully' 'cameras' 'counted' 'dis'\n",
            " 'fps' 'iphone' 'mo' 'price' 'sharp' 'shoots' 'trade' 'video']\n",
            "[[0.         0.         0.         0.         0.         0.\n",
            "  0.         0.         1.69314718 1.69314718 0.         0.\n",
            "  0.         1.69314718 0.         0.         1.28768207 0.        ]\n",
            " [1.69314718 0.         0.         1.69314718 1.69314718 0.\n",
            "  1.69314718 1.69314718 0.         0.         1.69314718 1.69314718\n",
            "  0.         0.         1.69314718 1.69314718 0.         1.69314718]\n",
            " [0.         1.69314718 1.69314718 0.         0.         1.69314718\n",
            "  0.         0.         0.         0.         0.         0.\n",
            "  1.69314718 0.         0.         0.         1.28768207 0.        ]]\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "WckDMfhZRoYM"
      },
      "source": [
        "## 共起行列に基づいた単語のベクトル化\n",
        "- preprocess(): テキストに対する前処理。\n",
        "- create_co_matrix(): 共起行列を作成。\n",
        "- most_similar(): コサイン類似度Top5を出力。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "EjfrZOVIRoYM",
        "outputId": "5a4c9b8d-2a5d-4f92-c75e-eca25a3e177b"
      },
      "source": [
        "import pandas as pd\n",
        "\n",
        "sentence = 'pandas is an open source programming tools. The best way to get pandas is via conda. \"conda install pandas\"'\n",
        "print(sentence)\n",
        "print('len(sentence) = ', len(sentence))\n",
        "\n",
        "\n",
        "def preprocess(text):\n",
        "    \"\"\"テキストに対する前処理。\n",
        "    「ゼロから作るDeepLearning2 自然言語処理辺」p.66より。\n",
        "\n",
        "    :param text:\n",
        "    :return:\n",
        "      courpus(list): id_to_wordのidに基づいたone-hot vector。\n",
        "      word_to_id(dict): 単語をkeyとして、idを参照する辞書。\n",
        "      id_to_word(dict): idをkeyとして、単語を参照する辞書。\n",
        "    \"\"\"\n",
        "    text = text.lower()\n",
        "    text = text.replace('.', ' .')\n",
        "    text = text.replace('\"', '')\n",
        "    words = text.split(' ')\n",
        "\n",
        "    word_to_id = {}\n",
        "    id_to_word = {}\n",
        "    for word in words:\n",
        "        if word not in word_to_id:\n",
        "            new_id = len(word_to_id)\n",
        "            word_to_id[word] = new_id\n",
        "            id_to_word[new_id] = word\n",
        "    corpus = np.array([word_to_id[w] for w in words])\n",
        "    return corpus, word_to_id, id_to_word\n",
        "\n",
        "corpus, word_to_id, id_to_word = preprocess(sentence)\n",
        "vocab_size = len(word_to_id)\n",
        "print(corpus)\n",
        "print(word_to_id)\n",
        "print(id_to_word)"
      ],
      "execution_count": 11,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "pandas is an open source programming tools. The best way to get pandas is via conda. \"conda install pandas\"\n",
            "len(sentence) =  107\n",
            "[ 0  1  2  3  4  5  6  7  8  9 10 11 12  0  1 13 14  7 14 15  0]\n",
            "{'pandas': 0, 'is': 1, 'an': 2, 'open': 3, 'source': 4, 'programming': 5, 'tools': 6, '.': 7, 'the': 8, 'best': 9, 'way': 10, 'to': 11, 'get': 12, 'via': 13, 'conda': 14, 'install': 15}\n",
            "{0: 'pandas', 1: 'is', 2: 'an', 3: 'open', 4: 'source', 5: 'programming', 6: 'tools', 7: '.', 8: 'the', 9: 'best', 10: 'way', 11: 'to', 12: 'get', 13: 'via', 14: 'conda', 15: 'install'}\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 565
        },
        "id": "8FhiMV71RoYN",
        "outputId": "24615c4f-00aa-4562-db93-7933fdc1ab00"
      },
      "source": [
        "def create_co_matrix(corpus, vocab_size, window_size=1):\n",
        "    \"\"\"共起行列を作成。\n",
        "    「ゼロから作るDeepLearning2 自然言語処理辺」p.72より。\n",
        "\n",
        "    :param corpus(str): テキスト文。\n",
        "    :param vocab_size: 語彙数。\n",
        "    :param window_size: 共起判定の範囲。\n",
        "    :return:\n",
        "    \"\"\"\n",
        "    corpus_size = len(corpus)\n",
        "    co_matrix = np.zeros((vocab_size, vocab_size), dtype=np.int32)\n",
        "\n",
        "    for idx, word_id in enumerate(corpus):\n",
        "        for i in range(1, window_size+1):\n",
        "            left_idx = idx - i\n",
        "            right_idx = idx + i\n",
        "            if left_idx >= 0:\n",
        "                left_word_id = corpus[left_idx]\n",
        "                co_matrix[word_id, left_word_id] += 1\n",
        "            if right_idx < corpus_size:\n",
        "                right_word_id = corpus[right_idx]\n",
        "                co_matrix[word_id, right_word_id] += 1\n",
        "    return co_matrix\n",
        "\n",
        "co_matrix = create_co_matrix(corpus, vocab_size, window_size=2)\n",
        "df = pd.DataFrame(co_matrix, index=word_to_id.keys(), columns=word_to_id.keys())\n",
        "df"
      ],
      "execution_count": 12,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "             pandas  is  an  open  source  programming  tools  .  the  best  \\\n",
              "pandas            0   2   1     0       0            0      0  0    0     0   \n",
              "is                2   0   1     1       0            0      0  0    0     0   \n",
              "an                1   1   0     1       1            0      0  0    0     0   \n",
              "open              0   1   1     0       1            1      0  0    0     0   \n",
              "source            0   0   1     1       0            1      1  0    0     0   \n",
              "programming       0   0   0     1       1            0      1  1    0     0   \n",
              "tools             0   0   0     0       1            1      0  1    1     0   \n",
              ".                 0   0   0     0       0            1      1  0    1     1   \n",
              "the               0   0   0     0       0            0      1  1    0     1   \n",
              "best              0   0   0     0       0            0      0  1    1     0   \n",
              "way               0   0   0     0       0            0      0  0    1     1   \n",
              "to                1   0   0     0       0            0      0  0    0     1   \n",
              "get               1   1   0     0       0            0      0  0    0     0   \n",
              "via               1   1   0     0       0            0      0  1    0     0   \n",
              "conda             1   1   0     0       0            0      0  2    0     0   \n",
              "install           1   0   0     0       0            0      0  1    0     0   \n",
              "\n",
              "             way  to  get  via  conda  install  \n",
              "pandas         0   1    1    1      1        1  \n",
              "is             0   0    1    1      1        0  \n",
              "an             0   0    0    0      0        0  \n",
              "open           0   0    0    0      0        0  \n",
              "source         0   0    0    0      0        0  \n",
              "programming    0   0    0    0      0        0  \n",
              "tools          0   0    0    0      0        0  \n",
              ".              0   0    0    1      2        1  \n",
              "the            1   0    0    0      0        0  \n",
              "best           1   1    0    0      0        0  \n",
              "way            0   1    1    0      0        0  \n",
              "to             1   0    1    0      0        0  \n",
              "get            1   1    0    0      0        0  \n",
              "via            0   0    0    0      1        0  \n",
              "conda          0   0    0    1      2        1  \n",
              "install        0   0    0    0      1        0  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-d7966fc7-f858-4c81-971b-ae21fee037cd\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>pandas</th>\n",
              "      <th>is</th>\n",
              "      <th>an</th>\n",
              "      <th>open</th>\n",
              "      <th>source</th>\n",
              "      <th>programming</th>\n",
              "      <th>tools</th>\n",
              "      <th>.</th>\n",
              "      <th>the</th>\n",
              "      <th>best</th>\n",
              "      <th>way</th>\n",
              "      <th>to</th>\n",
              "      <th>get</th>\n",
              "      <th>via</th>\n",
              "      <th>conda</th>\n",
              "      <th>install</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>pandas</th>\n",
              "      <td>0</td>\n",
              "      <td>2</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>is</th>\n",
              "      <td>2</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>an</th>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>open</th>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>source</th>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>programming</th>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>tools</th>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>.</th>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>2</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>the</th>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>best</th>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>way</th>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>to</th>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>get</th>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>via</th>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>conda</th>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>2</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>2</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>install</th>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-d7966fc7-f858-4c81-971b-ae21fee037cd')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-d7966fc7-f858-4c81-971b-ae21fee037cd button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-d7966fc7-f858-4c81-971b-ae21fee037cd');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-cf34cc10-8e56-472a-998f-c56aae483513\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-cf34cc10-8e56-472a-998f-c56aae483513')\"\n",
              "            title=\"Suggest charts\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-cf34cc10-8e56-472a-998f-c56aae483513 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "variable_name": "df",
              "summary": "{\n  \"name\": \"df\",\n  \"rows\": 16,\n  \"fields\": [\n    {\n      \"column\": \"pandas\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0,\n          2,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"is\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          2,\n          0,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"an\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          0,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"open\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          1,\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"source\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          1,\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"programming\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          1,\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"tools\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          1,\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \".\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"the\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          1,\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"best\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          1,\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"way\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          1,\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"to\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          0,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"get\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          0,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"via\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          0,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"conda\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          1,\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"install\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          0,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
            }
          },
          "metadata": {},
          "execution_count": 12
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "MRbXiIsKRoYN",
        "outputId": "0b4d0ecc-bbc4-436f-e56d-f42ff13e4faa"
      },
      "source": [
        "def cos_similarity(x, y, eps=1e-8):\n",
        "    nx = x / (np.sqrt(np.sum(x ** 2)) + eps)\n",
        "    ny = y / (np.sqrt(np.sum(y ** 2)) + eps)\n",
        "    return np.dot(nx, ny)\n",
        "\n",
        "def most_similar(query, word_to_id, id_to_word, word_matrix, top=5):\n",
        "    \"\"\"コサイン類似度Top5を出力。\n",
        "\n",
        "    :param query(str): クエリ。\n",
        "    :param word_to_id(dict): 単語をkeyとして、idを参照する辞書。\n",
        "    :param id_to_word(dict): idをkeyとして、単語を参照する辞書。\n",
        "    :param word_matrix: 共起行列。\n",
        "    :param top(int): 上位何件まで表示させるか。\n",
        "    :return: なし。\n",
        "    \"\"\"\n",
        "    if query not in word_to_id:\n",
        "        print('%s is not found' % query)\n",
        "        return\n",
        "\n",
        "    print('[query] ' + query)\n",
        "    query_id = word_to_id[query]\n",
        "    query_vec = word_matrix[query_id]\n",
        "\n",
        "    vocab_size = len(word_to_id)\n",
        "    similarity = np.zeros(vocab_size)\n",
        "    for i in range(vocab_size):\n",
        "        similarity[i] = cos_similarity(word_matrix[i], query_vec)\n",
        "\n",
        "    count = 0\n",
        "    for i in (-1 * similarity).argsort():\n",
        "        if id_to_word[i] == query:\n",
        "            continue\n",
        "        print(' %s: %s' % (id_to_word[i], similarity[i]))\n",
        "        count += 1\n",
        "        if count >= top:\n",
        "            return\n",
        "\n",
        "print('\\n# most_similar() with co_matrix')\n",
        "user_query = \"pandas\"\n",
        "most_similar(user_query, word_to_id, id_to_word, co_matrix)"
      ],
      "execution_count": 13,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "# most_similar() with co_matrix\n",
            "[query] pandas\n",
            " conda: 0.5477225541919766\n",
            " open: 0.4743416451535486\n",
            " get: 0.4743416451535486\n",
            " via: 0.4743416451535486\n",
            " is: 0.4216370186169938\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JcE9RH1kRoYO"
      },
      "source": [
        "## 相互情報量による分散表現の高度化\n",
        "- ppmi(): Positive PMI（正の相互情報量）。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 599
        },
        "id": "K8e08GIORoYP",
        "outputId": "9faa5bfe-1d43-4d15-afe4-103569440d02"
      },
      "source": [
        "def ppmi(C, verbose=False, eps=1e-8):\n",
        "    \"\"\"Positive PMI（正の相互情報量）\n",
        "    「ゼロから作るDeepLearning2 自然言語処理辺」p.79より。\n",
        "\n",
        "    :param C: 共起行列。\n",
        "    :param verbose(boolean): 処理状況を出力するためのフラグ。\n",
        "    :param eps(float): np.log2演算時に-infとなるのを避けるための微小な値。\n",
        "    :return:\n",
        "    \"\"\"\n",
        "    M = np.zeros_like(C, dtype=np.float32)\n",
        "    N = np.sum(C)\n",
        "    S = np.sum(C, axis=0)\n",
        "    total = C.shape[0] * C.shape[1]\n",
        "    cnt = 0\n",
        "\n",
        "    for i in range(C.shape[0]):\n",
        "        for j in range(C.shape[1]):\n",
        "            pmi = np.log2(C[i, j] * N / (S[j]*S[i]) + eps)\n",
        "            M[i, j] = max(0, pmi)\n",
        "\n",
        "            if verbose:\n",
        "                cnt += 1\n",
        "                if cnt % (total//100) == 0:\n",
        "                    print('%.1f%% done' % (100+cnt/total))\n",
        "    return M\n",
        "\n",
        "M = ppmi(co_matrix)\n",
        "print('\\n# PPMI')\n",
        "df2 = pd.DataFrame(M, index=word_to_id.keys(), columns=word_to_id.keys())\n",
        "df2"
      ],
      "execution_count": 14,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "# PPMI\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "               pandas        is        an      open    source  programming  \\\n",
              "pandas       0.000000  1.478047  1.285402  0.000000  0.000000     0.000000   \n",
              "is           1.478047  0.000000  1.478047  1.478047  0.000000     0.000000   \n",
              "an           1.285402  1.478047  0.000000  2.285402  2.285402     0.000000   \n",
              "open         0.000000  1.478047  2.285402  0.000000  2.285402     2.285402   \n",
              "source       0.000000  0.000000  2.285402  2.285402  0.000000     2.285402   \n",
              "programming  0.000000  0.000000  0.000000  2.285402  2.285402     0.000000   \n",
              "tools        0.000000  0.000000  0.000000  0.000000  2.285402     2.285402   \n",
              ".            0.000000  0.000000  0.000000  0.000000  0.000000     1.285402   \n",
              "the          0.000000  0.000000  0.000000  0.000000  0.000000     0.000000   \n",
              "best         0.000000  0.000000  0.000000  0.000000  0.000000     0.000000   \n",
              "way          0.000000  0.000000  0.000000  0.000000  0.000000     0.000000   \n",
              "to           1.285402  0.000000  0.000000  0.000000  0.000000     0.000000   \n",
              "get          1.285402  1.478047  0.000000  0.000000  0.000000     0.000000   \n",
              "via          1.285402  1.478047  0.000000  0.000000  0.000000     0.000000   \n",
              "conda        0.285402  0.478047  0.000000  0.000000  0.000000     0.000000   \n",
              "install      1.700440  0.000000  0.000000  0.000000  0.000000     0.000000   \n",
              "\n",
              "                tools         .       the      best       way        to  \\\n",
              "pandas       0.000000  0.000000  0.000000  0.000000  0.000000  1.285402   \n",
              "is           0.000000  0.000000  0.000000  0.000000  0.000000  0.000000   \n",
              "an           0.000000  0.000000  0.000000  0.000000  0.000000  0.000000   \n",
              "open         0.000000  0.000000  0.000000  0.000000  0.000000  0.000000   \n",
              "source       2.285402  0.000000  0.000000  0.000000  0.000000  0.000000   \n",
              "programming  2.285402  1.285402  0.000000  0.000000  0.000000  0.000000   \n",
              "tools        0.000000  1.285402  2.285402  0.000000  0.000000  0.000000   \n",
              ".            1.285402  0.000000  1.285402  1.285402  0.000000  0.000000   \n",
              "the          2.285402  1.285402  0.000000  2.285402  2.285402  0.000000   \n",
              "best         0.000000  1.285402  2.285402  0.000000  2.285402  2.285402   \n",
              "way          0.000000  0.000000  2.285402  2.285402  0.000000  2.285402   \n",
              "to           0.000000  0.000000  0.000000  2.285402  2.285402  0.000000   \n",
              "get          0.000000  0.000000  0.000000  0.000000  2.285402  2.285402   \n",
              "via          0.000000  1.285402  0.000000  0.000000  0.000000  0.000000   \n",
              "conda        0.000000  1.285402  0.000000  0.000000  0.000000  0.000000   \n",
              "install      0.000000  1.700440  0.000000  0.000000  0.000000  0.000000   \n",
              "\n",
              "                  get       via     conda  install  \n",
              "pandas       1.285402  1.285402  0.285402  1.70044  \n",
              "is           1.478047  1.478047  0.478047  0.00000  \n",
              "an           0.000000  0.000000  0.000000  0.00000  \n",
              "open         0.000000  0.000000  0.000000  0.00000  \n",
              "source       0.000000  0.000000  0.000000  0.00000  \n",
              "programming  0.000000  0.000000  0.000000  0.00000  \n",
              "tools        0.000000  0.000000  0.000000  0.00000  \n",
              ".            0.000000  1.285402  1.285402  1.70044  \n",
              "the          0.000000  0.000000  0.000000  0.00000  \n",
              "best         0.000000  0.000000  0.000000  0.00000  \n",
              "way          2.285402  0.000000  0.000000  0.00000  \n",
              "to           2.285402  0.000000  0.000000  0.00000  \n",
              "get          0.000000  0.000000  0.000000  0.00000  \n",
              "via          0.000000  0.000000  1.285402  0.00000  \n",
              "conda        0.000000  1.285402  1.285402  1.70044  \n",
              "install      0.000000  0.000000  1.700440  0.00000  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-757fed02-6e44-4a8a-81b8-0a058204c497\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>pandas</th>\n",
              "      <th>is</th>\n",
              "      <th>an</th>\n",
              "      <th>open</th>\n",
              "      <th>source</th>\n",
              "      <th>programming</th>\n",
              "      <th>tools</th>\n",
              "      <th>.</th>\n",
              "      <th>the</th>\n",
              "      <th>best</th>\n",
              "      <th>way</th>\n",
              "      <th>to</th>\n",
              "      <th>get</th>\n",
              "      <th>via</th>\n",
              "      <th>conda</th>\n",
              "      <th>install</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>pandas</th>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.478047</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>0.285402</td>\n",
              "      <td>1.70044</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>is</th>\n",
              "      <td>1.478047</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.478047</td>\n",
              "      <td>1.478047</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.478047</td>\n",
              "      <td>1.478047</td>\n",
              "      <td>0.478047</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>an</th>\n",
              "      <td>1.285402</td>\n",
              "      <td>1.478047</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>open</th>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.478047</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>source</th>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>programming</th>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>tools</th>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>.</th>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>1.70044</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>the</th>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>best</th>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>way</th>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>to</th>\n",
              "      <td>1.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>get</th>\n",
              "      <td>1.285402</td>\n",
              "      <td>1.478047</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>2.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>via</th>\n",
              "      <td>1.285402</td>\n",
              "      <td>1.478047</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>conda</th>\n",
              "      <td>0.285402</td>\n",
              "      <td>0.478047</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>1.285402</td>\n",
              "      <td>1.70044</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>install</th>\n",
              "      <td>1.700440</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.700440</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>1.700440</td>\n",
              "      <td>0.00000</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-757fed02-6e44-4a8a-81b8-0a058204c497')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-757fed02-6e44-4a8a-81b8-0a058204c497 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-757fed02-6e44-4a8a-81b8-0a058204c497');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-2caacc3c-3d36-45fc-bf70-becf87da7853\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-2caacc3c-3d36-45fc-bf70-becf87da7853')\"\n",
              "            title=\"Suggest charts\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-2caacc3c-3d36-45fc-bf70-becf87da7853 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "variable_name": "df2",
              "summary": "{\n  \"name\": \"df2\",\n  \"rows\": 16,\n  \"fields\": [\n    {\n      \"column\": \"pandas\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 5,\n        \"samples\": [\n          1.478047251701355,\n          1.700439691543579,\n          1.2854021787643433\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"is\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          1.478047251701355,\n          0.0,\n          0.47804731130599976\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"an\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 4,\n        \"samples\": [\n          1.478047251701355,\n          2.285402297973633,\n          1.2854021787643433\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"open\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0.0,\n          1.478047251701355,\n          2.285402297973633\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"source\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          2.285402297973633,\n          0.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"programming\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0.0,\n          2.285402297973633\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"tools\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0.0,\n          2.285402297973633\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \".\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0.0,\n          1.2854021787643433\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"the\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0.0,\n          2.285402297973633\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"best\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0.0,\n          1.2854021787643433\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"way\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          2.285402297973633,\n          0.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"to\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          1.2854021787643433,\n          0.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"get\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 4,\n        \"samples\": [\n          1.478047251701355,\n          2.285402297973633\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"via\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          1.2854021787643433,\n          1.478047251701355\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"conda\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 5,\n        \"samples\": [\n          0.47804731130599976,\n          1.700439691543579\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"install\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          0.0,\n          1.700439691543579\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
            }
          },
          "metadata": {},
          "execution_count": 14
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 599
        },
        "id": "8tUoxAYZRoYP",
        "outputId": "7ea0ac0b-1c62-4266-ee0d-7c4d3ffdf27a"
      },
      "source": [
        "#np.set_printoptions(precision=3) # 有効桁3桁（表示上の省略で、データは保持）\n",
        "pd.options.display.precision = 3 # 同上\n",
        "print('\\n# PPMI with precision=3')\n",
        "df2 = pd.DataFrame(M, index=word_to_id.keys(), columns=word_to_id.keys())\n",
        "df2"
      ],
      "execution_count": 15,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "# PPMI with precision=3\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "             pandas     is     an   open  source  programming  tools      .  \\\n",
              "pandas        0.000  1.478  1.285  0.000   0.000        0.000  0.000  0.000   \n",
              "is            1.478  0.000  1.478  1.478   0.000        0.000  0.000  0.000   \n",
              "an            1.285  1.478  0.000  2.285   2.285        0.000  0.000  0.000   \n",
              "open          0.000  1.478  2.285  0.000   2.285        2.285  0.000  0.000   \n",
              "source        0.000  0.000  2.285  2.285   0.000        2.285  2.285  0.000   \n",
              "programming   0.000  0.000  0.000  2.285   2.285        0.000  2.285  1.285   \n",
              "tools         0.000  0.000  0.000  0.000   2.285        2.285  0.000  1.285   \n",
              ".             0.000  0.000  0.000  0.000   0.000        1.285  1.285  0.000   \n",
              "the           0.000  0.000  0.000  0.000   0.000        0.000  2.285  1.285   \n",
              "best          0.000  0.000  0.000  0.000   0.000        0.000  0.000  1.285   \n",
              "way           0.000  0.000  0.000  0.000   0.000        0.000  0.000  0.000   \n",
              "to            1.285  0.000  0.000  0.000   0.000        0.000  0.000  0.000   \n",
              "get           1.285  1.478  0.000  0.000   0.000        0.000  0.000  0.000   \n",
              "via           1.285  1.478  0.000  0.000   0.000        0.000  0.000  1.285   \n",
              "conda         0.285  0.478  0.000  0.000   0.000        0.000  0.000  1.285   \n",
              "install       1.700  0.000  0.000  0.000   0.000        0.000  0.000  1.700   \n",
              "\n",
              "               the   best    way     to    get    via  conda  install  \n",
              "pandas       0.000  0.000  0.000  1.285  1.285  1.285  0.285      1.7  \n",
              "is           0.000  0.000  0.000  0.000  1.478  1.478  0.478      0.0  \n",
              "an           0.000  0.000  0.000  0.000  0.000  0.000  0.000      0.0  \n",
              "open         0.000  0.000  0.000  0.000  0.000  0.000  0.000      0.0  \n",
              "source       0.000  0.000  0.000  0.000  0.000  0.000  0.000      0.0  \n",
              "programming  0.000  0.000  0.000  0.000  0.000  0.000  0.000      0.0  \n",
              "tools        2.285  0.000  0.000  0.000  0.000  0.000  0.000      0.0  \n",
              ".            1.285  1.285  0.000  0.000  0.000  1.285  1.285      1.7  \n",
              "the          0.000  2.285  2.285  0.000  0.000  0.000  0.000      0.0  \n",
              "best         2.285  0.000  2.285  2.285  0.000  0.000  0.000      0.0  \n",
              "way          2.285  2.285  0.000  2.285  2.285  0.000  0.000      0.0  \n",
              "to           0.000  2.285  2.285  0.000  2.285  0.000  0.000      0.0  \n",
              "get          0.000  0.000  2.285  2.285  0.000  0.000  0.000      0.0  \n",
              "via          0.000  0.000  0.000  0.000  0.000  0.000  1.285      0.0  \n",
              "conda        0.000  0.000  0.000  0.000  0.000  1.285  1.285      1.7  \n",
              "install      0.000  0.000  0.000  0.000  0.000  0.000  1.700      0.0  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-d54ed03a-9851-427b-ac6d-760a08e27efd\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>pandas</th>\n",
              "      <th>is</th>\n",
              "      <th>an</th>\n",
              "      <th>open</th>\n",
              "      <th>source</th>\n",
              "      <th>programming</th>\n",
              "      <th>tools</th>\n",
              "      <th>.</th>\n",
              "      <th>the</th>\n",
              "      <th>best</th>\n",
              "      <th>way</th>\n",
              "      <th>to</th>\n",
              "      <th>get</th>\n",
              "      <th>via</th>\n",
              "      <th>conda</th>\n",
              "      <th>install</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>pandas</th>\n",
              "      <td>0.000</td>\n",
              "      <td>1.478</td>\n",
              "      <td>1.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.285</td>\n",
              "      <td>1.285</td>\n",
              "      <td>1.285</td>\n",
              "      <td>0.285</td>\n",
              "      <td>1.7</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>is</th>\n",
              "      <td>1.478</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.478</td>\n",
              "      <td>1.478</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.478</td>\n",
              "      <td>1.478</td>\n",
              "      <td>0.478</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>an</th>\n",
              "      <td>1.285</td>\n",
              "      <td>1.478</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>open</th>\n",
              "      <td>0.000</td>\n",
              "      <td>1.478</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>source</th>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>programming</th>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>1.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>tools</th>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>.</th>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.285</td>\n",
              "      <td>1.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.285</td>\n",
              "      <td>1.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.285</td>\n",
              "      <td>1.285</td>\n",
              "      <td>1.7</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>the</th>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>1.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>best</th>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>way</th>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>to</th>\n",
              "      <td>1.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>get</th>\n",
              "      <td>1.285</td>\n",
              "      <td>1.478</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>2.285</td>\n",
              "      <td>2.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>via</th>\n",
              "      <td>1.285</td>\n",
              "      <td>1.478</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.285</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>conda</th>\n",
              "      <td>0.285</td>\n",
              "      <td>0.478</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.285</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.285</td>\n",
              "      <td>1.285</td>\n",
              "      <td>1.7</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>install</th>\n",
              "      <td>1.700</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.700</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>0.000</td>\n",
              "      <td>1.700</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-d54ed03a-9851-427b-ac6d-760a08e27efd')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-d54ed03a-9851-427b-ac6d-760a08e27efd button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-d54ed03a-9851-427b-ac6d-760a08e27efd');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-841aec46-c68f-4d66-9d9a-d172acec30cc\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-841aec46-c68f-4d66-9d9a-d172acec30cc')\"\n",
              "            title=\"Suggest charts\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-841aec46-c68f-4d66-9d9a-d172acec30cc button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "variable_name": "df2",
              "summary": "{\n  \"name\": \"df2\",\n  \"rows\": 16,\n  \"fields\": [\n    {\n      \"column\": \"pandas\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 5,\n        \"samples\": [\n          1.478047251701355,\n          1.700439691543579,\n          1.2854021787643433\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"is\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          1.478047251701355,\n          0.0,\n          0.47804731130599976\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"an\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 4,\n        \"samples\": [\n          1.478047251701355,\n          2.285402297973633,\n          1.2854021787643433\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"open\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0.0,\n          1.478047251701355,\n          2.285402297973633\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"source\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          2.285402297973633,\n          0.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"programming\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0.0,\n          2.285402297973633\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"tools\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0.0,\n          2.285402297973633\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \".\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0.0,\n          1.2854021787643433\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"the\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0.0,\n          2.285402297973633\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"best\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0.0,\n          1.2854021787643433\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"way\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          2.285402297973633,\n          0.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"to\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          1.2854021787643433,\n          0.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"get\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 4,\n        \"samples\": [\n          1.478047251701355,\n          2.285402297973633\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"via\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          1.2854021787643433,\n          1.478047251701355\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"conda\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 5,\n        \"samples\": [\n          0.47804731130599976,\n          1.700439691543579\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"install\",\n      \"properties\": {\n        \"dtype\": \"float32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          0.0,\n          1.700439691543579\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
            }
          },
          "metadata": {},
          "execution_count": 15
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "3OuF6wnMRoYP",
        "outputId": "2533d3d2-f482-4689-9033-ad3791b6c0ee"
      },
      "source": [
        "print('\\n# most_similar() with PPMI')\n",
        "most_similar(user_query, word_to_id, id_to_word, M)\n"
      ],
      "execution_count": 16,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "# most_similar() with PPMI\n",
            "[query] pandas\n",
            " conda: 0.5733166933059692\n",
            " is: 0.5094797611236572\n",
            " .: 0.40005457401275635\n",
            " get: 0.39511924982070923\n",
            " way: 0.3747256100177765\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1qf7f7ZbRoYQ"
      },
      "source": [
        "## SVDによる次元削減\n",
        "- np.linalg.svd(): 線形代数ライブラリを利用。\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "ZhHJvY9pRoYQ",
        "outputId": "06d3a2c7-b95f-48eb-e58f-ccff7715be37"
      },
      "source": [
        "# svd\n",
        "U, S, V = np.linalg.svd(M)\n",
        "print('\\n# SVD: dense vectors with all singular values')\n",
        "print(U)\n",
        "\n",
        "use_s_values = 2\n",
        "U2 = U[:,0:use_s_values]\n",
        "print('\\n# SVD: dense vectors with singular values = {}'.format(use_s_values))\n",
        "print(U2)\n",
        "\n",
        "print('\\n# most_similar() with SVD-2')\n",
        "most_similar(user_query, word_to_id, id_to_word, U2)\n"
      ],
      "execution_count": 17,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "# SVD: dense vectors with all singular values\n",
            "[[-0.20909968 -0.05742509 -0.4260506  -0.17504837  0.2712971  -0.04672289\n",
            "   0.18823615 -0.18591174 -0.36170715 -0.02090307  0.33414906 -0.12803775\n",
            "  -0.20772134 -0.17504339  0.12016748 -0.49748433]\n",
            " [-0.20854177  0.06963065 -0.3888203   0.1402168   0.20635465 -0.15388878\n",
            "  -0.2102995  -0.13104966  0.38273078  0.39125082  0.11626503  0.32180378\n",
            "   0.35602686 -0.3037927  -0.09017423  0.09666564]\n",
            " [-0.23961712  0.26655334 -0.18719402  0.05390574 -0.35268205 -0.2592225\n",
            "  -0.32773367 -0.09349788 -0.2730225   0.0377553  -0.40234843  0.31079715\n",
            "  -0.17229767  0.28257638 -0.12586372 -0.25696287]\n",
            " [-0.2782892   0.35838643 -0.04367146 -0.32175225  0.05215078 -0.23192607\n",
            "   0.21995124  0.49517277 -0.11888672 -0.1294459   0.23939379  0.12363093\n",
            "   0.32118776  0.27299288  0.13509066  0.19793455]\n",
            " [-0.3138919   0.39330027  0.14138082  0.21887659  0.3510343  -0.1303136\n",
            "   0.33645037 -0.16895744  0.4147285  -0.14546476 -0.16639212 -0.12915234\n",
            "  -0.36436176  0.11097264 -0.11952758 -0.00717432]\n",
            " [-0.29259852  0.32289356  0.2055024   0.29696342 -0.23372465  0.05240921\n",
            "  -0.22590032 -0.37780765 -0.24121043 -0.03001606  0.29355177 -0.3378207\n",
            "   0.1754489  -0.11105473  0.2770583   0.22205435]\n",
            " [-0.29597434  0.18007208  0.3224701  -0.47838306 -0.15175404  0.19339524\n",
            "  -0.20657967  0.22351523  0.11675979  0.25128582  0.00841147 -0.15087281\n",
            "  -0.17578007 -0.40954658 -0.2627039  -0.1637667 ]\n",
            " [-0.25471854 -0.01025081  0.02076114 -0.05029602  0.36506298  0.47563514\n",
            "   0.0229755   0.00612366 -0.19963232  0.08954368 -0.58078074  0.02569766\n",
            "   0.26210678 -0.01673562  0.34222415 -0.00723523]\n",
            " [-0.29356492 -0.19401811  0.31909984  0.49437767 -0.02606995  0.1744973\n",
            "   0.20824735  0.21605654 -0.2066092  -0.09872612  0.19246763  0.389128\n",
            "   0.11316317 -0.11955503 -0.29818308 -0.22771738]\n",
            " [-0.28762156 -0.33267495  0.200779   -0.2494268  -0.28890833  0.02771297\n",
            "   0.27570227 -0.329145    0.23007524  0.28904447  0.13154629  0.28043836\n",
            "  -0.10255558  0.28061515  0.3373071   0.00459014]\n",
            " [-0.30796608 -0.3980363   0.13565156 -0.2586614   0.28628355 -0.16009486\n",
            "  -0.41644257 -0.21415797  0.02268608 -0.5027486   0.02590389  0.01947945\n",
            "   0.08477696  0.10067189 -0.20825593  0.13973713]\n",
            " [-0.26908046 -0.35622442 -0.04449803  0.3045457   0.09906036 -0.229743\n",
            "  -0.24744193  0.4765816   0.00974799  0.2936326  -0.00161449 -0.30523014\n",
            "  -0.30354518  0.11671898  0.24741183  0.0925235 ]\n",
            " [-0.23532367 -0.25857002 -0.19031367 -0.01856818 -0.34355265 -0.2738812\n",
            "   0.43220395 -0.04992938 -0.06145945 -0.03610538 -0.34404024 -0.3561516\n",
            "   0.258372   -0.24605696 -0.2454495   0.14502974]\n",
            " [-0.132769    0.00144362 -0.31328392  0.02380027 -0.25689903  0.2137918\n",
            "  -0.00529088  0.15167178  0.13748148 -0.45615938 -0.02669457  0.27819303\n",
            "  -0.35791093 -0.38731208  0.29279673  0.28283092]\n",
            " [-0.12850079 -0.00502957 -0.3181515  -0.0227449   0.01897933  0.46368992\n",
            "   0.04571127 -0.06436063 -0.18308993  0.2039377   0.152593   -0.05987117\n",
            "  -0.20431407  0.32411188 -0.46140748  0.44548053]\n",
            " [-0.12502334 -0.01886328 -0.2561411   0.08992951 -0.24950773  0.35275292\n",
            "  -0.11904678  0.12406847  0.44487035 -0.22767347  0.08375221 -0.287249\n",
            "   0.2736007   0.31247255 -0.00371157 -0.42694545]]\n",
            "\n",
            "# SVD: dense vectors with singular values = 2\n",
            "[[-0.20909968 -0.05742509]\n",
            " [-0.20854177  0.06963065]\n",
            " [-0.23961712  0.26655334]\n",
            " [-0.2782892   0.35838643]\n",
            " [-0.3138919   0.39330027]\n",
            " [-0.29259852  0.32289356]\n",
            " [-0.29597434  0.18007208]\n",
            " [-0.25471854 -0.01025081]\n",
            " [-0.29356492 -0.19401811]\n",
            " [-0.28762156 -0.33267495]\n",
            " [-0.30796608 -0.3980363 ]\n",
            " [-0.26908046 -0.35622442]\n",
            " [-0.23532367 -0.25857002]\n",
            " [-0.132769    0.00144362]\n",
            " [-0.12850079 -0.00502957]\n",
            " [-0.12502334 -0.01886328]]\n",
            "\n",
            "# most_similar() with SVD-2\n",
            "[query] pandas\n",
            " install: 0.9930136799812317\n",
            " .: 0.9741653800010681\n",
            " conda: 0.9739159345626831\n",
            " via: 0.9613599181175232\n",
            " the: 0.950492262840271\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "2nsKV0lhRoYQ"
      },
      "source": [],
      "execution_count": 17,
      "outputs": []
    }
  ]
}