{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# 07wk-2: Pandas – `pivot_table`, `groupby`+`agg`\n",
        "\n",
        "최규빈  \n",
        "2023-10-16\n",
        "\n",
        "<a href=\"https://colab.research.google.com/github/guebin/DV2023/blob/main/posts/07wk-2.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" style=\"text-align: left\"></a>\n",
        "\n",
        "# 1. 강의영상\n",
        "\n",
        "<https://youtu.be/playlist?list=PLQqh36zP38-wtcnyfm_V0vIvzVnnj8W0J&si=ndZWtP-XX79IklkD>\n",
        "\n",
        "# 2. Imports"
      ],
      "id": "25926a95-f0f2-4e4a-9ec6-76e3cbe3cb2b"
    },
    {
      "cell_type": "code",
      "execution_count": 138,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "import pandas as pd \n",
        "import numpy as np"
      ],
      "id": "cell-5"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# 3. pivot_table, groupby+agg\n",
        "\n",
        "## A. intro\n",
        "\n",
        "`-` 개념: 그룹화 $\\to$ 집계\n",
        "\n",
        "`# 예제1`: 아래의 예제에서 (학과,성별)로 count의 합계를 구하라."
      ],
      "id": "13035ce9-6479-445c-8292-2e1164ac74e0"
    },
    {
      "cell_type": "code",
      "execution_count": 139,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "df=pd.read_csv(\"https://raw.githubusercontent.com/guebin/DV2022/master/posts/Simpson.csv\",index_col=0,header=[0,1])\\\n",
        ".stack().stack().reset_index()\\\n",
        ".rename({'level_0':'department','level_1':'result','level_2':'gender',0:'count'},axis=1)\n",
        "df"
      ],
      "id": "cell-10"
    },
    {
      "cell_type": "code",
      "execution_count": 140,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "df.pivot_table(index=['department','gender'],values='count',aggfunc='sum')"
      ],
      "id": "cell-11"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`#`\n",
        "\n",
        "`-` 예시에서 본 작업은 아래의 작업들로 세분화 할 수 있다.\n",
        "\n",
        "1.  그룹화(쿼리): 하나의 dataframe을 sub-dataframe으로 나누는 과정 –\n",
        "    전체자료를 (학과,성별)로 묶어 총 10개의 sub-dataframe을 만듦\n",
        "2.  각각집계(각각계산): 나누어진 sub-dataframe에서 어떠한 계산을 각각\n",
        "    수행함. – 나누어진 sub-dataframe에서 지원자수의 합계를 각각 구함\n",
        "\n",
        "`-` 위의 같은 작업을 하려면 아래와 같은 요소들이 필요하다.\n",
        "\n",
        "1.  그룹변수[1] – 그룹화를 위해 필요한 변수, dataframe을\n",
        "    sub-dataframe으로 나누는 역할.\n",
        "2.  집계변수[2] – 집계함수의 대상이 되는 변수.\n",
        "3.  집계변수 – 그룹화된 데이터프레임에 수행하는 계산을 정의하는 함수.\n",
        "\n",
        "## B. `pivot_table`의 문법\n",
        "\n",
        "`-` pivot_table의 문법\n",
        "\n",
        "``` python\n",
        "df.pivot_table(\n",
        "    index = 그룹변수\n",
        "    colums = 그룹변수 \n",
        "    values = 집계변수\n",
        "    aggfunc = 집계함수\n",
        ")\n",
        "```\n",
        "\n",
        "`-` 그룹변수: string, 혹은 list of string 으로 전달한다.\n",
        "\n",
        "-   예시: ‘department’, \\[‘department’\\], \\[‘department’,‘gender’\\]\n",
        "\n",
        "`-` 집계변수: string, 혹은 list of string 으로 전달한다.\n",
        "\n",
        "-   예시: ‘CANCELLED’, \\[‘CANCELLED’\\], \\[‘CANCELLED’,‘AIR_TIME’\\]\n",
        "\n",
        "`-` 집계함수: 함수자체[3]를 전달하거나, 함수를 의미하는 문자열[4], 혹은\n",
        "그것들의 리스트형태로 전달한다.\n",
        "\n",
        "`# 예시`: 집계합수를 전달하는 방법\n",
        "\n",
        "**data**\n",
        "\n",
        "[1] 이건 없는 용어에요\n",
        "\n",
        "[2] 이것도 없는 용어에요\n",
        "\n",
        "[3] `np.mean`,`sum`\n",
        "\n",
        "[4] ‘count’, ‘sum’, ‘mean’, ‘median’, ‘min’, ‘max’, ‘std’, ‘var’"
      ],
      "id": "35bd294c-b144-4686-8f4e-18e4f1714b21"
    },
    {
      "cell_type": "code",
      "execution_count": 141,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "df = pd.DataFrame({'category':['A']*5+['B']*5, 'value':np.concatenate([np.random.randn(5), np.random.randn(5)+10])})\n",
        "df"
      ],
      "id": "cell-23"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "**방법1** – 함수자체를 전달"
      ],
      "id": "3e95d4fc-6d61-4d4e-ba68-71bd19bcef6f"
    },
    {
      "cell_type": "code",
      "execution_count": 142,
      "metadata": {
        "tags": []
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "/tmp/ipykernel_3437616/3923531937.py:1: FutureWarning: The provided callable <function sum at 0x7f2bb810b1c0> is currently using DataFrameGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string \"sum\" instead.\n",
            "  df.pivot_table(index=['category'],values='value',aggfunc=np.sum) # 함수자체"
          ]
        }
      ],
      "source": [
        "df.pivot_table(index=['category'],values='value',aggfunc=np.sum) # 함수자체"
      ],
      "id": "cell-25"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "**방법2** – 함수를 의미하는 문자열을 전달"
      ],
      "id": "b92dadda-b2bf-4e58-a9cd-e48b7647d8ee"
    },
    {
      "cell_type": "code",
      "execution_count": 143,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "df.pivot_table(index=['category'],values='value',aggfunc='sum')  # 리스트"
      ],
      "id": "cell-27"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "**방법3** – 리스트를 전달"
      ],
      "id": "7e1d6ae4-b5d7-4288-8b5c-b376ad9d29f6"
    },
    {
      "cell_type": "code",
      "execution_count": 144,
      "metadata": {
        "tags": []
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "/tmp/ipykernel_3437616/1880687379.py:1: FutureWarning: The provided callable <function mean at 0x7f2bb8110670> is currently using DataFrameGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string \"mean\" instead.\n",
            "  df.pivot_table(\n",
            "/tmp/ipykernel_3437616/1880687379.py:1: FutureWarning: The provided callable <function amax at 0x7f2bb810ba30> is currently using DataFrameGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string \"max\" instead.\n",
            "  df.pivot_table("
          ]
        }
      ],
      "source": [
        "df.pivot_table(\n",
        "    index=['category'],\n",
        "    values='value',\n",
        "    aggfunc=['sum','min',np.mean,np.max,'count']\n",
        ")"
      ],
      "id": "cell-29"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## C. `groupby` + `aggregate` 의 문법\n",
        "\n",
        "`-` groupby + aggregate 의 문법\n",
        "\n",
        "``` python\n",
        "df.groupby(그룹변수).aggregate({집계변수:집계함수})\n",
        "```\n",
        "\n",
        "# 4. AIRLINE 자료로 연습"
      ],
      "id": "974afe7c-730f-4f95-9327-59c924212085"
    },
    {
      "cell_type": "code",
      "execution_count": 145,
      "metadata": {
        "tags": []
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "<class 'pandas.core.frame.DataFrame'>\n",
            "RangeIndex: 58492 entries, 0 to 58491\n",
            "Data columns (total 14 columns):\n",
            " #   Column     Non-Null Count  Dtype  \n",
            "---  ------     --------------  -----  \n",
            " 0   MONTH      58492 non-null  int64  \n",
            " 1   DAY        58492 non-null  int64  \n",
            " 2   WEEKDAY    58492 non-null  int64  \n",
            " 3   AIRLINE    58492 non-null  object \n",
            " 4   ORG_AIR    58492 non-null  object \n",
            " 5   DEST_AIR   58492 non-null  object \n",
            " 6   SCHED_DEP  58492 non-null  int64  \n",
            " 7   DEP_DELAY  57659 non-null  float64\n",
            " 8   AIR_TIME   57474 non-null  float64\n",
            " 9   DIST       58492 non-null  int64  \n",
            " 10  SCHED_ARR  58492 non-null  int64  \n",
            " 11  ARR_DELAY  57474 non-null  float64\n",
            " 12  DIVERTED   58492 non-null  int64  \n",
            " 13  CANCELLED  58492 non-null  int64  \n",
            "dtypes: float64(3), int64(8), object(3)\n",
            "memory usage: 6.2+ MB"
          ]
        }
      ],
      "source": [
        "df=pd.read_csv('https://raw.githubusercontent.com/PacktPublishing/Pandas-Cookbook/master/data/flights.csv')\n",
        "df.info()"
      ],
      "id": "cell-33"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`-` 각 변수들에 대한 설명은 아래와 같다. (ChatGPT의 도움을 받아 정리함)\n",
        "\n",
        "1.  `MONTH`: 비행이 이루어진 월을 나타냄. 1에서 12 사이의 값을 갖음.\n",
        "2.  `DAY`: 비행이 이루어진 일자를 나타냄. 월에 따라 1~28/29/30/31 사이의\n",
        "    값을 가질 수 있음.\n",
        "3.  `WEEKDAY`: 비행이 이루어진 요일을 나타냄. 일반적으로 1(일요일)부터\n",
        "    7(토요일)까지의 값을 갖음.\n",
        "4.  `AIRLINE`: 해당 항공편을 운영하는 항공사의 약어나 코드를 나타냄.\n",
        "5.  `ORG_AIR`: 비행기가 출발하는 공항의 약어나 코드를 나타냄.\n",
        "6.  `DEST_AIR`: 비행기가 도착하는 공항의 약어나 코드를 나타냄.\n",
        "7.  `SCHED_DEP`: 원래의 예정된 출발 시간을 나타냄. 시간은 일반적으로\n",
        "    HHMM 형식으로 표시될 수 있음.\n",
        "8.  `DEP_DELAY`: 출발 지연 시간을 나타냄. 음수 값은 조기 출발, 양수 값은\n",
        "    지연을 의미함.\n",
        "9.  `AIR_TIME`: 실제 공중에서 비행한 시간을 분 단위로 나타냄.\n",
        "10. `DIST`: 비행 거리를 나타냄. 일반적으로 마일 또는 킬로미터로 표시됨.\n",
        "11. `SCHED_ARR`: 원래의 예정된 도착 시간을 나타냄. `SCHED_DEP`와 같은\n",
        "    형식으로 표시될 수 있음.\n",
        "12. `ARR_DELAY`: 도착 지연 시간을 나타냄. 음수는 조기 도착, 양수는\n",
        "    지연을 의미함.\n",
        "13. `DIVERTED`: 항공편이 다른 곳으로 우회되었는지를 나타냄. 1은 우회,\n",
        "    0은 정상 경로를 의미함.\n",
        "14. `CANCELLED`: 항공편이 취소되었는지 여부를 나타냄. 1은 취소, 0은\n",
        "    취소되지 않음을 의미함.\n",
        "\n",
        "`# 예제1`: 항공사별로 도착지연시간의 평균을 구하라.\n",
        "\n",
        "`-` 풀이1"
      ],
      "id": "28b5e3fd-7a4d-4bde-a636-78a9625615ff"
    },
    {
      "cell_type": "code",
      "execution_count": 146,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "df.pivot_table(index='AIRLINE',values='ARR_DELAY')"
      ],
      "id": "cell-37"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`-` 풀이2"
      ],
      "id": "ef061a5b-13ec-4d9a-b540-67042218611e"
    },
    {
      "cell_type": "code",
      "execution_count": 147,
      "metadata": {
        "tags": []
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "/tmp/ipykernel_3437616/4188396604.py:1: FutureWarning: The provided callable <function mean at 0x7f2bb8110670> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string \"mean\" instead.\n",
            "  df.groupby(by=\"AIRLINE\").aggregate({'ARR_DELAY':np.mean})"
          ]
        }
      ],
      "source": [
        "df.groupby(by=\"AIRLINE\").aggregate({'ARR_DELAY':np.mean})"
      ],
      "id": "cell-39"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`#`\n",
        "\n",
        "`# 예제2`: 항공사별로 비행취소건수의 합계를 구하라. 취소건수가 높은\n",
        "항공사순으로 정렬하라.\n",
        "\n",
        "`-` 풀이1: `.pivot_table()`을 이용"
      ],
      "id": "49e1c599-6b5a-494e-80b7-1827efd4c872"
    },
    {
      "cell_type": "code",
      "execution_count": 148,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "df.pivot_table(index='AIRLINE',values='CANCELLED',aggfunc='sum').sort_values('CANCELLED',ascending=False)"
      ],
      "id": "cell-43"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`-` 풀이2: `.groupby()`+`.aggregate()`를 이용"
      ],
      "id": "1357baf5-7c29-45a3-86be-f38d8b8caef9"
    },
    {
      "cell_type": "code",
      "execution_count": 149,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "df.groupby('AIRLINE').aggregate({'CANCELLED':'sum'}).sort_values('CANCELLED',ascending=False)"
      ],
      "id": "cell-45"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`# 예제3`: 항공사별로 비행취소율을 구하라. 비행취소율이 가장 높은 항공사\n",
        "순으로 정렬하라.\n",
        "\n",
        "`-` 풀이1: `.pivot_table()`을 이용"
      ],
      "id": "ae6583bb-72f1-4f62-90dd-2e99e5acb6cf"
    },
    {
      "cell_type": "code",
      "execution_count": 150,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "df.pivot_table(index='AIRLINE',values='CANCELLED',aggfunc='mean').sort_values('CANCELLED',ascending=False)"
      ],
      "id": "cell-48"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`-` 풀이2: `.groupby()`+`.aggregate()`를 이용"
      ],
      "id": "2aa2c3df-f429-4a9a-96bc-b74ca1b5bb79"
    },
    {
      "cell_type": "code",
      "execution_count": 151,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "df.groupby('AIRLINE').aggregate({'CANCELLED':'mean'}).sort_values('CANCELLED',ascending=False)"
      ],
      "id": "cell-50"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`# 예제4` (항공사,요일)별 비행취소건수와 비행취소율을 조사하라.\n",
        "\n",
        "`-` 풀이1: `.pivot_table()`을 이용"
      ],
      "id": "5118fab3-10a8-4759-8537-62313160f378"
    },
    {
      "cell_type": "code",
      "execution_count": 152,
      "metadata": {
        "tags": []
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "/tmp/ipykernel_3437616/757677508.py:1: FutureWarning: The provided callable <function mean at 0x7f2bb8110670> is currently using DataFrameGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string \"mean\" instead.\n",
            "  df.pivot_table(index=['AIRLINE','WEEKDAY'],values='CANCELLED',aggfunc=[np.mean,sum])\n",
            "/tmp/ipykernel_3437616/757677508.py:1: FutureWarning: The provided callable <built-in function sum> is currently using DataFrameGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string \"sum\" instead.\n",
            "  df.pivot_table(index=['AIRLINE','WEEKDAY'],values='CANCELLED',aggfunc=[np.mean,sum])"
          ]
        }
      ],
      "source": [
        "df.pivot_table(index=['AIRLINE','WEEKDAY'],values='CANCELLED',aggfunc=[np.mean,sum])"
      ],
      "id": "cell-53"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`-` 풀이2: `.groupby()`+`.aggregate()`를 이용"
      ],
      "id": "db6f69a3-59ba-461c-882d-814c9d5d9c28"
    },
    {
      "cell_type": "code",
      "execution_count": 153,
      "metadata": {
        "tags": []
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "/tmp/ipykernel_3437616/1354178761.py:1: FutureWarning: The provided callable <function mean at 0x7f2bb8110670> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string \"mean\" instead.\n",
            "  df.groupby(['AIRLINE','WEEKDAY']).aggregate({'CANCELLED':[np.mean,sum]})\n",
            "/tmp/ipykernel_3437616/1354178761.py:1: FutureWarning: The provided callable <built-in function sum> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string \"sum\" instead.\n",
            "  df.groupby(['AIRLINE','WEEKDAY']).aggregate({'CANCELLED':[np.mean,sum]})"
          ]
        }
      ],
      "source": [
        "df.groupby(['AIRLINE','WEEKDAY']).aggregate({'CANCELLED':[np.mean,sum]})"
      ],
      "id": "cell-55"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`# 예제4`: (항공사,요일)별로 `CANCELLED`는 평균과 합계를 구하고 (즉\n",
        "비행취소건수와 취소율을 구하고), `AIR_TIME`은 평균과 표준편차를\n",
        "구하여라.\n",
        "\n",
        "`-` 풀이1: `.pivot_table()` –\\> 이거 제가 수업할때는 못한다고 했는데\n",
        "찾아보니까 아래처럼 할 수 있습니다."
      ],
      "id": "2de13da7-ba0b-420d-b711-e00487b051ac"
    },
    {
      "cell_type": "code",
      "execution_count": 160,
      "metadata": {
        "tags": []
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "/tmp/ipykernel_3437616/2860571331.py:1: FutureWarning: The provided callable <function mean at 0x7f2bb8110670> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string \"mean\" instead.\n",
            "  df.pivot_table(\n",
            "/tmp/ipykernel_3437616/2860571331.py:1: FutureWarning: The provided callable <built-in function sum> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string \"sum\" instead.\n",
            "  df.pivot_table("
          ]
        }
      ],
      "source": [
        "df.pivot_table(\n",
        "    index=['AIRLINE', 'WEEKDAY'],\n",
        "    aggfunc={'CANCELLED': [np.mean, sum], 'AIR_TIME': ['mean','std']}\n",
        ")"
      ],
      "id": "cell-58"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`-` 풀이2: `.groupby()`+`.aggregate()`를 이용"
      ],
      "id": "228c7511-dc36-44fa-b134-4c1c5bb2cea4"
    },
    {
      "cell_type": "code",
      "execution_count": 161,
      "metadata": {
        "tags": []
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "/tmp/ipykernel_3437616/2787668343.py:2: FutureWarning: The provided callable <function mean at 0x7f2bb8110670> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string \"mean\" instead.\n",
            "  .aggregate({'CANCELLED': [np.mean,sum],'AIR_TIME': ['mean','std']})\n",
            "/tmp/ipykernel_3437616/2787668343.py:2: FutureWarning: The provided callable <built-in function sum> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string \"sum\" instead.\n",
            "  .aggregate({'CANCELLED': [np.mean,sum],'AIR_TIME': ['mean','std']})"
          ]
        }
      ],
      "source": [
        "df.groupby(['AIRLINE','WEEKDAY'])\\\n",
        ".aggregate({'CANCELLED': [np.mean,sum],'AIR_TIME': ['mean','std']})"
      ],
      "id": "cell-60"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`#`\n",
        "\n",
        "`# 예제5`: 운행구간을 그룹화하고, 운행구간별 비행취소건수와 취소율을\n",
        "구하여라.\n",
        "\n",
        "`-` 풀이1"
      ],
      "id": "d50e93f2-34c6-44d3-bcd7-70a39fd8530a"
    },
    {
      "cell_type": "code",
      "execution_count": 156,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "df.assign(DIST_CUT= pd.qcut(df.DIST,q=4)).pivot_table(\n",
        "    index= ['DIST_CUT'],\n",
        "    values= 'CANCELLED',\n",
        "    aggfunc= ['mean','sum']\n",
        ")"
      ],
      "id": "cell-64"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`-` 풀이2"
      ],
      "id": "ab1a7ce1-1195-46fa-809d-863d9d8c5e73"
    },
    {
      "cell_type": "code",
      "execution_count": 157,
      "metadata": {
        "tags": []
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "/tmp/ipykernel_3437616/402182245.py:2: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.\n",
            "  .groupby('DIST_CUT')\\"
          ]
        }
      ],
      "source": [
        "df.assign(DIST_CUT= pd.qcut(df.DIST,q=4))\\\n",
        ".groupby('DIST_CUT')\\\n",
        ".aggregate({'CANCELLED':['mean','sum']})"
      ],
      "id": "cell-66"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`#`\n",
        "\n",
        "# 5. HW"
      ],
      "id": "68d580b2-6800-4f3b-9ba2-c9690d703173"
    },
    {
      "cell_type": "code",
      "execution_count": 172,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "np.random.seed(43052)\n",
        "df = pd.DataFrame({'X1':['A']*5+['B']*5, 'X2':np.concatenate([np.random.randn(5), np.random.randn(5)+10])})\n",
        "df"
      ],
      "id": "cell-69"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "위의 자료에서 X1을 기준으로 그룹화한뒤 X2에 대하여 아래의 함수를\n",
        "적용하라.\n",
        "\n",
        "$$\\max(\\text{X2})-\\min(\\text{X2})$$\n",
        "\n",
        "출력결과는 아래와 같아야 한다 ."
      ],
      "id": "80dbc91a-482c-42da-a0bb-a6d6df10bd63"
    },
    {
      "cell_type": "code",
      "execution_count": 166,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "# "
      ],
      "id": "cell-72"
    }
  ],
  "nbformat": 4,
  "nbformat_minor": 5,
  "metadata": {
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3 (ipykernel)",
      "language": "python"
    },
    "language_info": {
      "name": "python",
      "codemirror_mode": {
        "name": "ipython",
        "version": "3"
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.13"
    }
  }
}