{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# 12wk-1: 퀴즈7\n",
        "\n",
        "최규빈  \n",
        "2024-05-22\n",
        "\n",
        "<a href=\"https://colab.research.google.com/github/guebin/PP2024/blob/main/posts/12wk-1.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" style=\"text-align: left\"></a>\n",
        "\n",
        "<https://youtu.be/playlist?list=PLQqh36zP38-wXFQNvVVlcusd_nm2QZBen&si=RIFyvAFnoOVLIRVG>\n",
        "\n",
        "> **Caution**\n",
        ">\n",
        "> -   전북대 학생들을 시험당일 학생증을 지참할 것. (출석체크 및\n",
        ">     본인확인) 학생증 외에 신분증 여권등도 가능.\n",
        "> -   부정행위 (카카오톡 채팅을 통한 코드공유, 생성형모델 사용, 대리시험\n",
        ">     등) 적발시 F 처리함.\n",
        "> -   퀴즈 중 지각할 경우 지각사실을 기록함. 하지만 별 다른 감점은 하지\n",
        ">     않음.\n",
        "> -   `.ipynb` 파일 형태로 제출된 답안지만 채점하며 그 외의 형식\n",
        ">     (`.hwp`, `.py` 등)은 채점하지 않음. 즉 0점 처리함."
      ],
      "id": "08c735c6-4d93-4387-b153-1a8db1ea593b"
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "import pandas as pd\n",
        "import numpy as np"
      ],
      "id": "f4e7c4f4-d8b9-4418-9553-df9108f7630e"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# 1. – 10점\n",
        "\n",
        "아래의 자료를 불러오라."
      ],
      "id": "96ce08b5-8d48-4b05-8e04-072d2a106c80"
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "df = pd.read_csv('https://raw.githubusercontent.com/guebin/DV2022/master/posts/FIFA23_official_data.csv').drop(['Loaned From','Best Overall Rating'],axis=1).dropna().reset_index(drop=True)\n",
        "df.head()"
      ],
      "id": "f08c9585-1db5-4656-a79b-d12db599e4df"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "선수들의 키의 평균을 구하여라. (선수들의키는 ’Height’라는 열로 저장되어\n",
        "있음)\n",
        "\n",
        "(풀이)"
      ],
      "id": "7d06303f-4a9c-45e6-b99e-8dc69b1dfde2"
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "np.mean([int(l.replace('cm','')) for l in df.Height])"
      ],
      "id": "eed4faa1-f4c2-412d-90e0-98dbdd511287"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# 2. – 10점\n",
        "\n",
        "표준정규분포에서 1000개의 난수를 생성하여 아래와 같은 데이터프레임을\n",
        "만들어라."
      ],
      "id": "6a311fcf-7998-48c6-947d-3050ac6709d4"
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "# 생성예시 "
      ],
      "id": "ac519ffc-404d-420f-8cea-0d2e67f06a3c"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "**Notes**\n",
        "\n",
        "1.  column의 이름은 `X0,...,X99`와 같이 되어야 한다.\n",
        "2.  표준정규분포에서 난수를 뽑는 코드는 `np.random.randn` 혹은\n",
        "    `np.random.normal` 을 이용한다.\n",
        "\n",
        "(풀이)"
      ],
      "id": "cb11390f-df96-4c72-8416-78369df3d4c6"
    },
    {
      "cell_type": "code",
      "execution_count": 17,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "df = pd.DataFrame(np.random.randn(1000).reshape(10,100))\n",
        "df.columns = [f\"X{l}\" for l in df.columns]\n",
        "df"
      ],
      "id": "b086968e-8458-4b09-a332-bf8810f272ab"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# 3. – 10점\n",
        "\n",
        "아래의 데이터프레임을 관찰하라."
      ],
      "id": "4074f2a8-7fee-4519-99c3-c75d0ff86c0d"
    },
    {
      "cell_type": "code",
      "execution_count": 18,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "df = pd.read_csv('https://raw.githubusercontent.com/guebin/DV2022/master/posts/FIFA23_official_data.csv')\n",
        "df.head()"
      ],
      "id": "0443844c-4f81-4951-9d3a-9ee2c57706c5"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "열의 이름에 공백 `' '`이 포함된 열은 모두 몇개인가?\n",
        "\n",
        "> 확장가능하지 않은 코드는 정답인정하지 않음 (예를들어 직접세는 경우)\n",
        "\n",
        "(풀이)"
      ],
      "id": "3c554aa9-796a-4e07-9cf7-b28fc93e7896"
    },
    {
      "cell_type": "code",
      "execution_count": 22,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "sum([' ' in l for l in df.columns])"
      ],
      "id": "44af266f-6672-4299-931a-65a12f8f5eed"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# 4. – 10점"
      ],
      "id": "1e414a24-b387-4cb2-8f38-083901105942"
    },
    {
      "cell_type": "code",
      "execution_count": 23,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "np.random.seed(43052)\n",
        "att = np.random.choice(np.arange(10,21)*5,20)\n",
        "rep = np.random.choice(np.arange(5,21)*5,20)\n",
        "mid = np.random.choice(np.arange(0,21)*5,20)\n",
        "fin = np.random.choice(np.arange(0,21)*5,20)\n",
        "df = pd.DataFrame({'att':att,'rep':rep,'mid':mid,'fin':fin})\n",
        "df"
      ],
      "id": "ca966c17-6ead-4ed1-b372-0e6e56c5bc8d"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "위의 데이터프레임에 아래의 공식을 적용하여 `total`을 계산하라.\n",
        "\n",
        "> `total = att*0.1 + rep*0.2 + mid*0.35 + fin*0.35`\n",
        "\n",
        "계산된 `total`을 바탕으로 아래의 규칙을 적용하여 `grade`를 정하라.\n",
        "\n",
        "-   `total` \\>= 70: A+\n",
        "-   40 \\< `total`\\< 70: B0\n",
        "-   `total`\\<= 40: F\n",
        "\n",
        "`grade`를 원래 df에 할당하여 최종결과를 출력하라.\n",
        "\n",
        "(풀이)"
      ],
      "id": "6d86bc9a-8fd9-4a86-b963-be5f919d92e3"
    },
    {
      "cell_type": "code",
      "execution_count": 27,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "total = df.att*0.1 + df.rep*0.2 + df.mid*0.35 + df.fin*0.35 \n",
        "total"
      ],
      "id": "373bf3f3-4a34-4921-a2ca-9df84fccf9f9"
    },
    {
      "cell_type": "code",
      "execution_count": 28,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "def make_grade(total):\n",
        "    if total >= 70: \n",
        "        return \"A+\"\n",
        "    elif 40 < total: \n",
        "        return \"B0\"\n",
        "    else: \n",
        "        return \"F\""
      ],
      "id": "48ab9045-1cf4-46eb-bcdd-8ce9d0c31b14"
    },
    {
      "cell_type": "code",
      "execution_count": 31,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "df.assign(grade = [make_grade(l) for l in total])"
      ],
      "id": "f39017da-5b64-4950-97d6-a35272bbb105"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# 5. – 10점\n",
        "\n",
        "아래의 2개의 list를 관찰하자."
      ],
      "id": "ace59dba-5916-4613-b61f-f9faf0275ff9"
    },
    {
      "cell_type": "code",
      "execution_count": 32,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "eng = ['apple', 'banana', 'carrot', 'dragonfly', 'elephant', 'forest', 'giraffe', 'honey', 'island', 'jungle']\n",
        "kor = ['사과', '바나나', '당근', '잠자리', '코끼리', '숲', '기린', '꿀', '섬', '정글']"
      ],
      "id": "6b9814f0-a9d6-4330-8c77-f3e94a7e2a98"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "위의 list와 zip을 이용하여 아래와 같은 Dictionary를 만들어라."
      ],
      "id": "03a461b1-cd62-4c2a-ba1d-73db6ae28361"
    },
    {
      "cell_type": "code",
      "execution_count": 101,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "dct = {'apple': '사과',\n",
        " 'banana': '바나나',\n",
        " 'carrot': '당근',\n",
        " 'dragonfly': '잠자리',\n",
        " 'elephant': '코끼리',\n",
        " 'forest': '숲',\n",
        " 'giraffe': '기린',\n",
        " 'honey': '꿀',\n",
        " 'island': '섬',\n",
        " 'jungle': '정글'}"
      ],
      "id": "d93a8182-5a28-444e-a7b8-29e2faf4836a"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "(풀이)"
      ],
      "id": "588ac97d-ac27-44b5-91bc-ec1080d2f59c"
    },
    {
      "cell_type": "code",
      "execution_count": 35,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "{e:k for e,k in zip(eng,kor)}"
      ],
      "id": "add7d480-33aa-4b74-8ad7-08c718a44abd"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# 6. – 50점\n",
        "\n",
        "> 각 문제당 10점\n",
        "\n",
        "아래의 데이터프레임을 관찰하자."
      ],
      "id": "5efdda33-c559-4688-b886-3a29c3e8066a"
    },
    {
      "cell_type": "code",
      "execution_count": 36,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "df = pd.read_csv(\"https://raw.githubusercontent.com/guebin/DV2023/main/posts/titanic.csv\").drop(['PassengerId','logFare','Cabin'],axis=1).dropna()\n",
        "df"
      ],
      "id": "8ebd1e6f-a614-48c2-9902-5dc0fbbe7aa2"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "이 데이터프레임은 타이타닉 데이터셋으로 1912년에 침몰한 타이타닉 호에\n",
        "대한 자료이다. 자료의 row는 개별승객을 의미하고 자료의 column은 승객에\n",
        "대한 정보를 의미한다. 각 column에 대한 설명은 아래와 같다.\n",
        "\n",
        "1.  **Survived**: 생존 여부 (0 = 사망, 1 = 생존)\n",
        "2.  **Pclass**: 객실 등급 (1 = 일등석, 2 = 이등석, 3 = 삼등석)\n",
        "3.  **Name**: 승객 이름\n",
        "4.  **Sex**: 성별\n",
        "5.  **Age**: 나이\n",
        "6.  **SibSp**: 함께 탑승한 형제자매 또는 배우자 수\n",
        "7.  **Parch**: 함께 탑승한 부모 또는 자녀 수\n",
        "8.  **Ticket**: 티켓 번호\n",
        "9.  **Fare**: 요금\n",
        "10. **Embarked**: 탑승한 항구 (C = Cherbourg, Q = Queenstown, S =\n",
        "    Southampton)\n",
        "\n",
        "예를들어 아래와 같은 첫번째 승객을 고려하면,"
      ],
      "id": "3ae99ec0-e1c0-4b58-9166-a9fd1f6e5874"
    },
    {
      "cell_type": "code",
      "execution_count": 90,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "df.iloc[[0]]"
      ],
      "id": "a4f430f1-84fc-4264-9985-bbc3f36c67e3"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "이 승객은 사망한 승객이며[1] 3등석에 타고 있으며, 이름은\n",
        "`Braund, Mr. Owen Harris` 성별은 남, 나이는 22, `Southampton` 에서\n",
        "탑승했다는 것을 알 수 있다.\n",
        "\n",
        "`(1)` 남성승객과 여성승객은 각각 모두 몇명인가?\n",
        "\n",
        "(풀이)\n",
        "\n",
        "[1] `Survived = 0`"
      ],
      "id": "70dbf970-9f4f-4536-a0ef-f56225868634"
    },
    {
      "cell_type": "code",
      "execution_count": 43,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "sum(df.Sex == 'male'),sum(df.Sex == 'female') "
      ],
      "id": "f4935b74-d0d5-40c0-9058-9b84bad2cf2e"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`(2)` 남성승객중 몇명이 살아남았는가? 여성승객중 몇명이 살아남았는가?\n",
        "남성과 여성중 어떠한 성별이 더 많이 생존했다고 생각하는가?\n",
        "\n",
        "(풀이)"
      ],
      "id": "bedfb00b-10ed-4721-8691-ff219a45f688"
    },
    {
      "cell_type": "code",
      "execution_count": 47,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "sum((df.Sex == 'male') & (df.Survived ==1)),sum((df.Sex == 'female') & (df.Survived ==1))"
      ],
      "id": "6a0dddeb-4d0c-404a-9e17-deebae2d12dd"
    },
    {
      "cell_type": "code",
      "execution_count": 48,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "93/453, 195/259"
      ],
      "id": "b7c48327-4f58-4a02-b6d7-ea18a16954a8"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`(3)` `Pclass == 3` 인 승객들에 한정하여 `Fare`의 평균을 계산하라.\n",
        "\n",
        "(풀이)"
      ],
      "id": "c72d7989-b22e-4ca3-9c86-632f499a5a27"
    },
    {
      "cell_type": "code",
      "execution_count": 52,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "df[df.Pclass == 3].Fare.mean()"
      ],
      "id": "5523a351-05af-45ef-9baa-ea036e85470e"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`(4)` 혼자 탄 승객은 모두 몇명인가?\n",
        "\n",
        "**hint: `SibSp=0` 이고 `Parch=0` 인 승객을 조사하면된다.**\n",
        "\n",
        "(풀이)"
      ],
      "id": "6ceb71cd-bd71-4d92-8f17-f799b6cad132"
    },
    {
      "cell_type": "code",
      "execution_count": 58,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "sum((df.SibSp == 0) & (df.Parch==0))"
      ],
      "id": "e1a2c3db-8f45-4db6-9d52-5fc1f1477ff6"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`(5)` 혼자 탄 승객들이 각각 어떠한 항구에서 탔는지 조사하라.\n",
        "\n",
        "(풀이)"
      ],
      "id": "30a0afde-fb55-4544-a9b5-2533272baad4"
    },
    {
      "cell_type": "code",
      "execution_count": 66,
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "lst = list(df[(df.SibSp == 0) & (df.Parch==0)].Embarked)\n",
        "{s:lst.count(s) for s in set(lst)}"
      ],
      "id": "4da973fc-9e5d-4333-a4f6-80c1ded40da9"
    }
  ],
  "nbformat": 4,
  "nbformat_minor": 5,
  "metadata": {
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3 (ipykernel)",
      "language": "python"
    },
    "language_info": {
      "name": "python",
      "codemirror_mode": {
        "name": "ipython",
        "version": "3"
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.13"
    }
  }
}