강의영상

- (1/3) 행을 선택하는 방법 (기본)

- (2/3) 행을 선택하는 방법 (람다사용)

- (3/3) 과제설명

import

import numpy as np 
import pandas as pd

첫번째 행을 선택하는 법

np.random.seed(1)
dic= {'X1':np.random.normal(0,1,5), 
      'X2':np.random.normal(0,1,5), 
      'X3':np.random.normal(0,1,5), 
      'X4':np.random.normal(0,1,5), 
      'X5':np.random.normal(0,1,5), 
      'X6':np.random.normal(0,1,5)}
df1=pd.DataFrame(dic)
df1
X1 X2 X3 X4 X5 X6
0 1.624345 -2.301539 1.462108 -1.099891 -1.100619 -0.683728
1 -0.611756 1.744812 -2.060141 -0.172428 1.144724 -0.122890
2 -0.528172 -0.761207 -0.322417 -0.877858 0.901591 -0.935769
3 -1.072969 0.319039 -0.384054 0.042214 0.502494 -0.267888
4 0.865408 -0.249370 1.133769 0.582815 0.900856 0.530355

- 방법1

df1.iloc[0] # 이상해 
X1    1.624345
X2   -2.301539
X3    1.462108
X4   -1.099891
X5   -1.100619
X6   -0.683728
Name: 0, dtype: float64

- 방법2

df1.iloc[[0]]
X1 X2 X3 X4 X5 X6
0 1.624345 -2.301539 1.462108 -1.099891 -1.100619 -0.683728

- 방법3

df1.iloc[0,:]
X1    1.624345
X2   -2.301539
X3    1.462108
X4   -1.099891
X5   -1.100619
X6   -0.683728
Name: 0, dtype: float64

- 방법4

df1.iloc[[0],:]
X1 X2 X3 X4 X5 X6
0 1.624345 -2.301539 1.462108 -1.099891 -1.100619 -0.683728

- 방법5

df1.loc[0] # 이상해 
X1    1.624345
X2   -2.301539
X3    1.462108
X4   -1.099891
X5   -1.100619
X6   -0.683728
Name: 0, dtype: float64

- 방법6

df1.loc[[0]]
X1 X2 X3 X4 X5 X6
0 1.624345 -2.301539 1.462108 -1.099891 -1.100619 -0.683728

- 방법7

df1.loc[0,:]
X1    1.624345
X2   -2.301539
X3    1.462108
X4   -1.099891
X5   -1.100619
X6   -0.683728
Name: 0, dtype: float64

- 방법8

df1.loc[[0],:]
X1 X2 X3 X4 X5 X6
0 1.624345 -2.301539 1.462108 -1.099891 -1.100619 -0.683728

- 방법9

df1.iloc[[True,False,False,False,False]]
X1 X2 X3 X4 X5 X6
0 1.624345 -2.301539 1.462108 -1.099891 -1.100619 -0.683728

- 방법10

df1.iloc[[True,False,False,False,False],:]
X1 X2 X3 X4 X5 X6
0 1.624345 -2.301539 1.462108 -1.099891 -1.100619 -0.683728

- 방법11

df1.loc[[True,False,False,False,False]]
X1 X2 X3 X4 X5 X6
0 1.624345 -2.301539 1.462108 -1.099891 -1.100619 -0.683728

- 방법12

df1.loc[[True,False,False,False,False],:]
X1 X2 X3 X4 X5 X6
0 1.624345 -2.301539 1.462108 -1.099891 -1.100619 -0.683728

1,3행을 선택하는 방법

df1
X1 X2 X3 X4 X5 X6
0 1.624345 -2.301539 1.462108 -1.099891 -1.100619 -0.683728
1 -0.611756 1.744812 -2.060141 -0.172428 1.144724 -0.122890
2 -0.528172 -0.761207 -0.322417 -0.877858 0.901591 -0.935769
3 -1.072969 0.319039 -0.384054 0.042214 0.502494 -0.267888
4 0.865408 -0.249370 1.133769 0.582815 0.900856 0.530355

- 방법1

df1.iloc[[0,2],:]
X1 X2 X3 X4 X5 X6
0 1.624345 -2.301539 1.462108 -1.099891 -1.100619 -0.683728
2 -0.528172 -0.761207 -0.322417 -0.877858 0.901591 -0.935769

- 방법2

df1.loc[[0,2],:]
X1 X2 X3 X4 X5 X6
0 1.624345 -2.301539 1.462108 -1.099891 -1.100619 -0.683728
2 -0.528172 -0.761207 -0.322417 -0.877858 0.901591 -0.935769

- 그외에 여러행을 뽑는 방법이 있음; 슬라이싱, 불인덱싱

loc vs iloc ??

- 인덱스가 정수가 아닌경우

_df= pd.DataFrame({'A':[1,2,3,4],'B':[4,5,6,7]},index=list('abcd'))
_df
A B
a 1 4
b 2 5
c 3 6
d 4 7
_df.loc['a':'c',:]
A B
a 1 4
b 2 5
c 3 6
_df.iloc[0:3,:]
A B
a 1 4
b 2 5
c 3 6
_df.loc[['a','b','c'],:]
A B
a 1 4
b 2 5
c 3 6
_df.iloc[[0,1,2],:]
A B
a 1 4
b 2 5
c 3 6

- 대부분의 경우 observation에 특정한 이름이 있는 경우는 없으므로 loc이 그다지 쓸모 없음

- 그렇지만 특정경우에는 쓸모가 있음

np.random.normal(size=(20,4))
array([[ 0.83600472,  1.54335911,  0.75880566,  0.88490881],
       [-0.87728152, -0.86778722, -1.44087602,  1.23225307],
       [-0.25417987,  1.39984394, -0.78191168, -0.43750898],
       [ 0.09542509,  0.92145007,  0.0607502 ,  0.21112476],
       [ 0.01652757,  0.17718772, -1.11647002,  0.0809271 ],
       [-0.18657899, -0.05682448,  0.49233656, -0.68067814],
       [-0.08450803, -0.29736188,  0.417302  ,  0.78477065],
       [-0.95542526,  0.58591043,  2.06578332, -1.47115693],
       [-0.8301719 , -0.8805776 , -0.27909772,  1.62284909],
       [ 0.01335268, -0.6946936 ,  0.6218035 , -0.59980453],
       [ 1.12341216,  0.30526704,  1.3887794 , -0.66134424],
       [ 3.03085711,  0.82458463,  0.65458015, -0.05118845],
       [-0.72559712, -0.86776868, -0.13597733, -0.79726979],
       [ 0.28267571, -0.82609743,  0.6210827 ,  0.9561217 ],
       [-0.70584051,  1.19268607, -0.23794194,  1.15528789],
       [ 0.43816635,  1.12232832, -0.9970198 , -0.10679399],
       [ 1.45142926, -0.61803685, -2.03720123, -1.94258918],
       [-2.50644065, -2.11416392, -0.41163916,  1.27852808],
       [-0.44222928,  0.32352735, -0.10999149,  0.00854895],
       [-0.16819884, -0.17418034,  0.4611641 , -1.17598267]])
np.random.seed(1)
_df= pd.DataFrame(np.random.normal(size=(20,4)), columns=list('ABCD'), index=pd.date_range('20201225',periods=20))
_df
A B C D
2020-12-25 1.624345 -0.611756 -0.528172 -1.072969
2020-12-26 0.865408 -2.301539 1.744812 -0.761207
2020-12-27 0.319039 -0.249370 1.462108 -2.060141
2020-12-28 -0.322417 -0.384054 1.133769 -1.099891
2020-12-29 -0.172428 -0.877858 0.042214 0.582815
2020-12-30 -1.100619 1.144724 0.901591 0.502494
2020-12-31 0.900856 -0.683728 -0.122890 -0.935769
2021-01-01 -0.267888 0.530355 -0.691661 -0.396754
2021-01-02 -0.687173 -0.845206 -0.671246 -0.012665
2021-01-03 -1.117310 0.234416 1.659802 0.742044
2021-01-04 -0.191836 -0.887629 -0.747158 1.692455
2021-01-05 0.050808 -0.636996 0.190915 2.100255
2021-01-06 0.120159 0.617203 0.300170 -0.352250
2021-01-07 -1.142518 -0.349343 -0.208894 0.586623
2021-01-08 0.838983 0.931102 0.285587 0.885141
2021-01-09 -0.754398 1.252868 0.512930 -0.298093
2021-01-10 0.488518 -0.075572 1.131629 1.519817
2021-01-11 2.185575 -1.396496 -1.444114 -0.504466
2021-01-12 0.160037 0.876169 0.315635 -2.022201
2021-01-13 -0.306204 0.827975 0.230095 0.762011

- 1월5일부터 1월8일까지의 자료만 보고싶다.

_df.loc['20210105':'20210108']
A B C D
2021-01-05 0.050808 -0.636996 0.190915 2.100255
2021-01-06 0.120159 0.617203 0.300170 -0.352250
2021-01-07 -1.142518 -0.349343 -0.208894 0.586623
2021-01-08 0.838983 0.931102 0.285587 0.885141

- iloc으로 하려면 힘들다.

_df.index
DatetimeIndex(['2020-12-25', '2020-12-26', '2020-12-27', '2020-12-28',
               '2020-12-29', '2020-12-30', '2020-12-31', '2021-01-01',
               '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05',
               '2021-01-06', '2021-01-07', '2021-01-08', '2021-01-09',
               '2021-01-10', '2021-01-11', '2021-01-12', '2021-01-13'],
              dtype='datetime64[ns]', freq='D')
pd.Series(_df.index)
0    2020-12-25
1    2020-12-26
2    2020-12-27
3    2020-12-28
4    2020-12-29
5    2020-12-30
6    2020-12-31
7    2021-01-01
8    2021-01-02
9    2021-01-03
10   2021-01-04
11   2021-01-05
12   2021-01-06
13   2021-01-07
14   2021-01-08
15   2021-01-09
16   2021-01-10
17   2021-01-11
18   2021-01-12
19   2021-01-13
dtype: datetime64[ns]
_df.iloc[11:15]
A B C D
2021-01-05 0.050808 -0.636996 0.190915 2.100255
2021-01-06 0.120159 0.617203 0.300170 -0.352250
2021-01-07 -1.142518 -0.349343 -0.208894 0.586623
2021-01-08 0.838983 0.931102 0.285587 0.885141

자주하는 실수

- 저는 아래와 같은 실수를 자주해요

_df.loc['A']
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/anaconda3/envs/dv2021/lib/python3.8/site-packages/pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion._convert_str_to_tsobject()

~/anaconda3/envs/dv2021/lib/python3.8/site-packages/pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string()

ValueError: Given date string not likely a datetime.

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
~/anaconda3/envs/dv2021/lib/python3.8/site-packages/pandas/core/indexes/datetimes.py in get_loc(self, key, method, tolerance)
    680             try:
--> 681                 key = self._maybe_cast_for_get_loc(key)
    682             except ValueError as err:

~/anaconda3/envs/dv2021/lib/python3.8/site-packages/pandas/core/indexes/datetimes.py in _maybe_cast_for_get_loc(self, key)
    708         # needed to localize naive datetimes or dates (GH 35690)
--> 709         key = Timestamp(key)
    710         if key.tzinfo is None:

~/anaconda3/envs/dv2021/lib/python3.8/site-packages/pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.__new__()

~/anaconda3/envs/dv2021/lib/python3.8/site-packages/pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()

~/anaconda3/envs/dv2021/lib/python3.8/site-packages/pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion._convert_str_to_tsobject()

ValueError: could not convert string to Timestamp

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_38055/4015539971.py in <module>
----> 1 _df.loc['A']

~/anaconda3/envs/dv2021/lib/python3.8/site-packages/pandas/core/indexing.py in __getitem__(self, key)
    929 
    930             maybe_callable = com.apply_if_callable(key, self.obj)
--> 931             return self._getitem_axis(maybe_callable, axis=axis)
    932 
    933     def _is_scalar_access(self, key: tuple):

~/anaconda3/envs/dv2021/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1162         # fall thru to straight lookup
   1163         self._validate_key(key, axis)
-> 1164         return self._get_label(key, axis=axis)
   1165 
   1166     def _get_slice_axis(self, slice_obj: slice, axis: int):

~/anaconda3/envs/dv2021/lib/python3.8/site-packages/pandas/core/indexing.py in _get_label(self, label, axis)
   1111     def _get_label(self, label, axis: int):
   1112         # GH#5667 this will fail if the label is not present in the axis.
-> 1113         return self.obj.xs(label, axis=axis)
   1114 
   1115     def _handle_lowerdim_multi_index_axis0(self, tup: tuple):

~/anaconda3/envs/dv2021/lib/python3.8/site-packages/pandas/core/generic.py in xs(self, key, axis, level, drop_level)
   3774                 raise TypeError(f"Expected label or tuple of labels, got {key}") from e
   3775         else:
-> 3776             loc = index.get_loc(key)
   3777 
   3778             if isinstance(loc, np.ndarray):

~/anaconda3/envs/dv2021/lib/python3.8/site-packages/pandas/core/indexes/datetimes.py in get_loc(self, key, method, tolerance)
    681                 key = self._maybe_cast_for_get_loc(key)
    682             except ValueError as err:
--> 683                 raise KeyError(key) from err
    684 
    685         elif isinstance(key, timedelta):

KeyError: 'A'

- 올바른 방법

_df['A']
2020-12-25    1.624345
2020-12-26    0.865408
2020-12-27    0.319039
2020-12-28   -0.322417
2020-12-29   -0.172428
2020-12-30   -1.100619
2020-12-31    0.900856
2021-01-01   -0.267888
2021-01-02   -0.687173
2021-01-03   -1.117310
2021-01-04   -0.191836
2021-01-05    0.050808
2021-01-06    0.120159
2021-01-07   -1.142518
2021-01-08    0.838983
2021-01-09   -0.754398
2021-01-10    0.488518
2021-01-11    2.185575
2021-01-12    0.160037
2021-01-13   -0.306204
Freq: D, Name: A, dtype: float64

- 아래의 사실을 기억하자.

  • 기본적으로는 iloc, loc은 [2], [2:] 처럼 1차원으로 원소를 인덱싱할수도 있고, [2,3], [:,2] 와 같이 2차원으로 인덱싱할 수도 있다.

  • 1차원으로 인덱싱하는 경우는 기본적으로 행을 인덱싱한다 $\to$ iloc, loc은 행과 더 친하고 열과 친하지 않다.

  • 따라서 열을 선택하는 방법에 있어서 loc, iloc이 그렇제 좋은 방법은 아니다.

  • 그렇지만 열을 선택하는 방법은 iloc이나 loc이 제일 편리하다. (이외의 다른 방법이 마땅하게 없음) 그래서 열을 선택할때도 iloc이나 loc을 선호한다.

슬라이싱에 대한 응용

_df.iloc[::2]
A B C D
2020-12-25 1.624345 -0.611756 -0.528172 -1.072969
2020-12-27 0.319039 -0.249370 1.462108 -2.060141
2020-12-29 -0.172428 -0.877858 0.042214 0.582815
2020-12-31 0.900856 -0.683728 -0.122890 -0.935769
2021-01-02 -0.687173 -0.845206 -0.671246 -0.012665
2021-01-04 -0.191836 -0.887629 -0.747158 1.692455
2021-01-06 0.120159 0.617203 0.300170 -0.352250
2021-01-08 0.838983 0.931102 0.285587 0.885141
2021-01-10 0.488518 -0.075572 1.131629 1.519817
2021-01-12 0.160037 0.876169 0.315635 -2.022201

- 이 방법은 칼럼에도 적용가능

_df.iloc[:,::2]
A C
2020-12-25 1.624345 -0.528172
2020-12-26 0.865408 1.744812
2020-12-27 0.319039 1.462108
2020-12-28 -0.322417 1.133769
2020-12-29 -0.172428 0.042214
2020-12-30 -1.100619 0.901591
2020-12-31 0.900856 -0.122890
2021-01-01 -0.267888 -0.691661
2021-01-02 -0.687173 -0.671246
2021-01-03 -1.117310 1.659802
2021-01-04 -0.191836 -0.747158
2021-01-05 0.050808 0.190915
2021-01-06 0.120159 0.300170
2021-01-07 -1.142518 -0.208894
2021-01-08 0.838983 0.285587
2021-01-09 -0.754398 0.512930
2021-01-10 0.488518 1.131629
2021-01-11 2.185575 -1.444114
2021-01-12 0.160037 0.315635
2021-01-13 -0.306204 0.230095

- 칼럼에는 잘 쓰지 않는이유?

  • row는 특정간격으로 뽑는 일이 빈번함. (예를들어 일별데이터를 주별데이터로 바꾸고싶을때, 바꾸고 싶을 경우?)

  • col을 특정간격으로 뽑아야 하는 일은 없음

lambda + map

np.random.seed(1)
df2= pd.DataFrame(np.random.normal(size=(10,4)),columns=list('ABCD'))
df2
A B C D
0 1.624345 -0.611756 -0.528172 -1.072969
1 0.865408 -2.301539 1.744812 -0.761207
2 0.319039 -0.249370 1.462108 -2.060141
3 -0.322417 -0.384054 1.133769 -1.099891
4 -0.172428 -0.877858 0.042214 0.582815
5 -1.100619 1.144724 0.901591 0.502494
6 0.900856 -0.683728 -0.122890 -0.935769
7 -0.267888 0.530355 -0.691661 -0.396754
8 -0.687173 -0.845206 -0.671246 -0.012665
9 -1.117310 0.234416 1.659802 0.742044

칼럼A의 값이 0보다 큰 경우만

- 방법1

df2.loc[map(lambda x: x>0,df2['A']),:]
A B C D
0 1.624345 -0.611756 -0.528172 -1.072969
1 0.865408 -2.301539 1.744812 -0.761207
2 0.319039 -0.249370 1.462108 -2.060141
6 0.900856 -0.683728 -0.122890 -0.935769

- 방법2

df2.loc[lambda df: df['A']>0,:]
A B C D
0 1.624345 -0.611756 -0.528172 -1.072969
1 0.865408 -2.301539 1.744812 -0.761207
2 0.319039 -0.249370 1.462108 -2.060141
6 0.900856 -0.683728 -0.122890 -0.935769
  • ??
  • map의 기능은 (1) 리스트를 원소별로 분해하여 (2) 어떠한 함수를 적용하여 아웃풋을 구한뒤 (3) 각각의 아웃풋을 다시 하나의 리스트로 묶음
  • 우리는 이중에서 (1),(3)에만 집중했음
  • 하지만 생각해보면 일단 (2) 일단 함수를 적용하는 기능이 있었음
  • 그런데 위의 코든느 함수를 적용한 결과가 아니라 함수 오브젝트 자체를 전달하여도 동작함

요약!!

  • True, False로 이루어진 벡터를 리스트의 형태로 전달하여 인덱상했음 (원래 우리가 알고 있는 개념)
  • True, False로 이루어진 벡터를 리턴할 수 있는 함수오브젝트 자체를 전달해도 인덱싱이 가능

- 방법3

df2.iloc[map(lambda x: x>0,df2['A']),:]
A B C D
0 1.624345 -0.611756 -0.528172 -1.072969
1 0.865408 -2.301539 1.744812 -0.761207
2 0.319039 -0.249370 1.462108 -2.060141
6 0.900856 -0.683728 -0.122890 -0.935769

- 방법4: 실패

df2.iloc[lambda df: df['A']>0,:]
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
/tmp/ipykernel_38055/235064012.py in <module>
----> 1 df2.iloc[lambda df: df['A']>0,:]

~/anaconda3/envs/dv2021/lib/python3.8/site-packages/pandas/core/indexing.py in __getitem__(self, key)
    923                 with suppress(KeyError, IndexError):
    924                     return self.obj._get_value(*key, takeable=self._takeable)
--> 925             return self._getitem_tuple(key)
    926         else:
    927             # we by definition only have the 0th axis

~/anaconda3/envs/dv2021/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
   1504     def _getitem_tuple(self, tup: tuple):
   1505 
-> 1506         self._has_valid_tuple(tup)
   1507         with suppress(IndexingError):
   1508             return self._getitem_lowerdim(tup)

~/anaconda3/envs/dv2021/lib/python3.8/site-packages/pandas/core/indexing.py in _has_valid_tuple(self, key)
    752         for i, k in enumerate(key):
    753             try:
--> 754                 self._validate_key(k, i)
    755             except ValueError as err:
    756                 raise ValueError(

~/anaconda3/envs/dv2021/lib/python3.8/site-packages/pandas/core/indexing.py in _validate_key(self, key, axis)
   1393             if hasattr(key, "index") and isinstance(key.index, Index):
   1394                 if key.index.inferred_type == "integer":
-> 1395                     raise NotImplementedError(
   1396                         "iLocation based boolean "
   1397                         "indexing on an integer type "

NotImplementedError: iLocation based boolean indexing on an integer type is not available
  • 위에서 iloc을 loc으로 바꾸면 되는데..
  • iloc입장에서는 조금 서운함

칼럼A>0 이고 칼럼C<0 인 경우

df2
A B C D
0 1.624345 -0.611756 -0.528172 -1.072969
1 0.865408 -2.301539 1.744812 -0.761207
2 0.319039 -0.249370 1.462108 -2.060141
3 -0.322417 -0.384054 1.133769 -1.099891
4 -0.172428 -0.877858 0.042214 0.582815
5 -1.100619 1.144724 0.901591 0.502494
6 0.900856 -0.683728 -0.122890 -0.935769
7 -0.267888 0.530355 -0.691661 -0.396754
8 -0.687173 -0.845206 -0.671246 -0.012665
9 -1.117310 0.234416 1.659802 0.742044

- 방법1

df2.loc[map(lambda x,y: x>0 and y<0, df2['A'],df2['C']),:] 
A B C D
0 1.624345 -0.611756 -0.528172 -1.072969
6 0.900856 -0.683728 -0.122890 -0.935769

- 방법2

df2.loc[map(lambda x,y: x>0 & y<0, df2['A'],df2['C']),:] 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_38055/232901037.py in <module>
----> 1 df2.loc[map(lambda x,y: x>0 & y<0, df2['A'],df2['C']),:]

~/anaconda3/envs/dv2021/lib/python3.8/site-packages/pandas/core/indexing.py in __getitem__(self, key)
    918     def __getitem__(self, key):
    919         if type(key) is tuple:
--> 920             key = tuple(list(x) if is_iterator(x) else x for x in key)
    921             key = tuple(com.apply_if_callable(x, self.obj) for x in key)
    922             if self._is_scalar_access(key):

~/anaconda3/envs/dv2021/lib/python3.8/site-packages/pandas/core/indexing.py in <genexpr>(.0)
    918     def __getitem__(self, key):
    919         if type(key) is tuple:
--> 920             key = tuple(list(x) if is_iterator(x) else x for x in key)
    921             key = tuple(com.apply_if_callable(x, self.obj) for x in key)
    922             if self._is_scalar_access(key):

/tmp/ipykernel_38055/232901037.py in <lambda>(x, y)
----> 1 df2.loc[map(lambda x,y: x>0 & y<0, df2['A'],df2['C']),:]

TypeError: unsupported operand type(s) for &: 'int' and 'float'
  • ??? 아래를 관찰

보충학습

0<3.2 &  0<2.2
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_38055/993001292.py in <module>
----> 1 0<3.2 &  0<2.2

TypeError: unsupported operand type(s) for &: 'float' and 'int'
(0<3.2) &  (0<2.2)
True

보충학습끝

위의코드도 괄호로 묶어주면 잘 동작한다.

df2.loc[map(lambda x,y: (x>0) & (y<0), df2['A'],df2['C']),:] 
A B C D
0 1.624345 -0.611756 -0.528172 -1.072969
6 0.900856 -0.683728 -0.122890 -0.935769

- 방법3

df2.loc[lambda df: (df['A'] >0) & (df['C']<0)] 
A B C D
0 1.624345 -0.611756 -0.528172 -1.072969
6 0.900856 -0.683728 -0.122890 -0.935769

아래는 실행되지 않는다

df2.loc[lambda df: (df['A'] >0) and (df['C']<0)] 
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_38055/2807753716.py in <module>
----> 1 df2.loc[lambda df: (df['A'] >0) and (df['C']<0)]

~/anaconda3/envs/dv2021/lib/python3.8/site-packages/pandas/core/indexing.py in __getitem__(self, key)
    928             axis = self.axis or 0
    929 
--> 930             maybe_callable = com.apply_if_callable(key, self.obj)
    931             return self._getitem_axis(maybe_callable, axis=axis)
    932 

~/anaconda3/envs/dv2021/lib/python3.8/site-packages/pandas/core/common.py in apply_if_callable(maybe_callable, obj, **kwargs)
    356     """
    357     if callable(maybe_callable):
--> 358         return maybe_callable(obj, **kwargs)
    359 
    360     return maybe_callable

/tmp/ipykernel_38055/2807753716.py in <lambda>(df)
----> 1 df2.loc[lambda df: (df['A'] >0) and (df['C']<0)]

~/anaconda3/envs/dv2021/lib/python3.8/site-packages/pandas/core/generic.py in __nonzero__(self)
   1535     @final
   1536     def __nonzero__(self):
-> 1537         raise ValueError(
   1538             f"The truth value of a {type(self).__name__} is ambiguous. "
   1539             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

실행되지 않는이유

np.array([True, False]) & np.array([True, True])
array([ True, False])
np.array([True, False]) and np.array([True, True])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_38055/1293310712.py in <module>
----> 1 np.array([True, False]) and np.array([True, True])

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

- iloc을 이용한 방법은 생략

숙제

df2
A B C D
0 1.624345 -0.611756 -0.528172 -1.072969
1 0.865408 -2.301539 1.744812 -0.761207
2 0.319039 -0.249370 1.462108 -2.060141
3 -0.322417 -0.384054 1.133769 -1.099891
4 -0.172428 -0.877858 0.042214 0.582815
5 -1.100619 1.144724 0.901591 0.502494
6 0.900856 -0.683728 -0.122890 -0.935769
7 -0.267888 0.530355 -0.691661 -0.396754
8 -0.687173 -0.845206 -0.671246 -0.012665
9 -1.117310 0.234416 1.659802 0.742044

A> -1 이고 C < 1.5 인 row를 선택