728x90
In [68]:
import pandas as pd
import datetime as dt
In [69]:
avo_df = pd.read_csv('Avocado.csv')
avo_df
Out[69]:
Date | AveragePrice | Total Volume | type | region | |
---|---|---|---|---|---|
0 | 2015-12-27 | 1.33 | 64236.62 | conventional | Albany |
1 | 2015-12-20 | 1.35 | 54876.98 | conventional | Albany |
2 | 2015-12-13 | 0.93 | 118220.22 | conventional | Albany |
3 | 2015-12-06 | 1.08 | 78992.15 | conventional | Albany |
4 | 2015-11-29 | 1.28 | 51039.60 | conventional | Albany |
... | ... | ... | ... | ... | ... |
18244 | 2018-02-04 | 1.63 | 17074.83 | organic | WestTexNewMexico |
18245 | 2018-01-28 | 1.71 | 13888.04 | organic | WestTexNewMexico |
18246 | 2018-01-21 | 1.87 | 13766.76 | organic | WestTexNewMexico |
18247 | 2018-01-14 | 1.93 | 16205.22 | organic | WestTexNewMexico |
18248 | 2018-01-07 | 1.62 | 17489.58 | organic | WestTexNewMexico |
18249 rows × 5 columns
In [70]:
avo_df['Date'] = pd.to_datetime(avo_df['Date'])
avo_df.set_index(keys=['Date'], inplace = True)
avo_df
Out[70]:
AveragePrice | Total Volume | type | region | |
---|---|---|---|---|
Date | ||||
2015-12-27 | 1.33 | 64236.62 | conventional | Albany |
2015-12-20 | 1.35 | 54876.98 | conventional | Albany |
2015-12-13 | 0.93 | 118220.22 | conventional | Albany |
2015-12-06 | 1.08 | 78992.15 | conventional | Albany |
2015-11-29 | 1.28 | 51039.60 | conventional | Albany |
... | ... | ... | ... | ... |
2018-02-04 | 1.63 | 17074.83 | organic | WestTexNewMexico |
2018-01-28 | 1.71 | 13888.04 | organic | WestTexNewMexico |
2018-01-21 | 1.87 | 13766.76 | organic | WestTexNewMexico |
2018-01-14 | 1.93 | 16205.22 | organic | WestTexNewMexico |
2018-01-07 | 1.62 | 17489.58 | organic | WestTexNewMexico |
18249 rows × 4 columns
In [73]:
# Datetime indexing
avo_df.loc['2018-01-14']
Out[73]:
AveragePrice | Total Volume | type | region | |
---|---|---|---|---|
Date | ||||
2018-01-14 | 1.42 | 95246.38 | conventional | Albany |
2018-01-14 | 1.10 | 670766.04 | conventional | Atlanta |
2018-01-14 | 1.49 | 799726.89 | conventional | BaltimoreWashington |
2018-01-14 | 1.15 | 111113.11 | conventional | Boise |
2018-01-14 | 1.49 | 533150.49 | conventional | Boston |
... | ... | ... | ... | ... |
2018-01-14 | 1.25 | 4343.09 | organic | Syracuse |
2018-01-14 | 1.53 | 7238.04 | organic | Tampa |
2018-01-14 | 1.59 | 1476651.08 | organic | TotalUS |
2018-01-14 | 1.82 | 207999.67 | organic | West |
2018-01-14 | 1.93 | 16205.22 | organic | WestTexNewMexico |
108 rows × 4 columns
In [74]:
avo_df.iloc[0]
Out[74]:
AveragePrice 1.33
Total Volume 64236.62
type conventional
region Albany
Name: 2015-12-27 00:00:00, dtype: object
In [75]:
# multiple elements
avo_df.loc["2015-12-27":'2016-12-27']
Out[75]:
AveragePrice | Total Volume | type | region | |
---|---|---|---|---|
Date | ||||
2015-12-27 | 1.33 | 64236.62 | conventional | Albany |
2015-12-27 | 0.99 | 386100.49 | conventional | Atlanta |
2015-12-27 | 1.17 | 596819.40 | conventional | BaltimoreWashington |
2015-12-27 | 0.97 | 62909.69 | conventional | Boise |
2015-12-27 | 1.13 | 450816.39 | conventional | Boston |
... | ... | ... | ... | ... |
2016-01-31 | 1.64 | 11626.56 | organic | WestTexNewMexico |
2016-01-24 | 1.47 | 12171.04 | organic | WestTexNewMexico |
2016-01-17 | 1.45 | 13237.48 | organic | WestTexNewMexico |
2016-01-10 | 1.37 | 12647.90 | organic | WestTexNewMexico |
2016-01-03 | 1.58 | 9667.05 | organic | WestTexNewMexico |
5724 rows × 4 columns
df.truncate(before=None, after=None, axis=None, copy=True)
turncate메서드는 행이나 열에 대해서 앞뒤를 자르는 메서드
before : 이 기준 이전을 삭제합니다.
after : 이 기준 이후를 삭제합니다.
axis : 자를 축 입니다. 0은 행값, 1은 열값
copy : 사본을 생성할지 여부입니다.
In [76]:
# sort
avo_df.sort_index(inplace=True)
avo_df.truncate("2017-01-01",'2018-02-02')
Out[76]:
AveragePrice | Total Volume | type | region | |
---|---|---|---|---|
Date | ||||
2017-01-01 | 1.23 | 66616.54 | organic | SouthCentral |
2017-01-01 | 1.34 | 36915.87 | organic | Chicago |
2017-01-01 | 1.06 | 15669.84 | organic | DallasFtWorth |
2017-01-01 | 1.43 | 49777.56 | organic | Southeast |
2017-01-01 | 1.51 | 3969.98 | organic | Roanoke |
... | ... | ... | ... | ... |
2018-01-28 | 1.70 | 4300.76 | organic | Spokane |
2018-01-28 | 1.20 | 2278728.69 | conventional | NewYork |
2018-01-28 | 1.43 | 7669.31 | organic | Columbus |
2018-01-28 | 1.34 | 486679.98 | conventional | Portland |
2018-01-28 | 1.34 | 548587.07 | conventional | Seattle |
6154 rows × 4 columns
DateOffset : 년, 월, 일, 시, 분, 초 계산 가능
날짜와 날짜 계산
In [77]:
avo_df.index = avo_df.index + pd.DateOffset(months = 12, days = 30)
In [78]:
avo_df
Out[78]:
AveragePrice | Total Volume | type | region | |
---|---|---|---|---|
Date | ||||
2016-02-03 | 1.75 | 27365.89 | organic | Southeast |
2016-02-03 | 1.49 | 17723.17 | organic | Chicago |
2016-02-03 | 1.68 | 2896.72 | organic | HarrisburgScranton |
2016-02-03 | 1.52 | 54956.80 | conventional | Pittsburgh |
2016-02-03 | 1.64 | 1505.12 | organic | Boise |
... | ... | ... | ... | ... |
2019-04-24 | 1.36 | 908202.13 | conventional | Chicago |
2019-04-24 | 0.70 | 9010588.32 | conventional | SouthCentral |
2019-04-24 | 1.42 | 163496.70 | organic | SouthCentral |
2019-04-24 | 1.70 | 190257.38 | organic | California |
2019-04-24 | 1.34 | 1774776.77 | conventional | NewYork |
18249 rows × 4 columns
In [79]:
# back to the original dataset
avo_df.index = avo_df.index - pd.DateOffset(months = 12, days = 30)
avo_df
Out[79]:
AveragePrice | Total Volume | type | region | |
---|---|---|---|---|
Date | ||||
2015-01-04 | 1.75 | 27365.89 | organic | Southeast |
2015-01-04 | 1.49 | 17723.17 | organic | Chicago |
2015-01-04 | 1.68 | 2896.72 | organic | HarrisburgScranton |
2015-01-04 | 1.52 | 54956.80 | conventional | Pittsburgh |
2015-01-04 | 1.64 | 1505.12 | organic | Boise |
... | ... | ... | ... | ... |
2018-03-25 | 1.36 | 908202.13 | conventional | Chicago |
2018-03-25 | 0.70 | 9010588.32 | conventional | SouthCentral |
2018-03-25 | 1.42 | 163496.70 | organic | SouthCentral |
2018-03-25 | 1.70 | 190257.38 | organic | California |
2018-03-25 | 1.34 | 1774776.77 | conventional | NewYork |
18249 rows × 4 columns
In [80]:
# Aggregating the data by year (A = annual)
avo_df.resample(rule='A').mean()
Out[80]:
AveragePrice | Total Volume | |
---|---|---|
Date | ||
2015-12-31 | 1.375590 | 7.810274e+05 |
2016-12-31 | 1.338640 | 8.584206e+05 |
2017-12-31 | 1.515128 | 8.623393e+05 |
2018-12-31 | 1.347531 | 1.066928e+06 |
In [81]:
# Aggregating the data by months (M = month)
avo_df.resample(rule='M').mean()
Out[81]:
AveragePrice | Total Volume | |
---|---|---|
Date | ||
2015-01-31 | 1.341444 | 8.102564e+05 |
2015-02-28 | 1.353580 | 7.505690e+05 |
2015-03-31 | 1.361981 | 7.751538e+05 |
2015-04-30 | 1.383449 | 7.966305e+05 |
2015-05-31 | 1.344685 | 8.958964e+05 |
2015-06-30 | 1.399074 | 8.838582e+05 |
2015-07-31 | 1.409444 | 8.256988e+05 |
2015-08-31 | 1.444481 | 7.763793e+05 |
2015-09-30 | 1.449144 | 7.620806e+05 |
2015-10-31 | 1.391968 | 7.078151e+05 |
2015-11-30 | 1.330611 | 6.841482e+05 |
2015-12-31 | 1.305244 | 6.868351e+05 |
2016-01-31 | 1.236991 | 8.828596e+05 |
2016-02-29 | 1.240278 | 9.678819e+05 |
2016-03-31 | 1.224213 | 9.055843e+05 |
2016-04-30 | 1.205023 | 9.301684e+05 |
2016-05-31 | 1.199722 | 1.039316e+06 |
2016-06-30 | 1.283056 | 9.498337e+05 |
2016-07-31 | 1.401333 | 8.653247e+05 |
2016-08-31 | 1.411111 | 8.583263e+05 |
2016-09-30 | 1.427870 | 8.386650e+05 |
2016-10-31 | 1.557093 | 6.670404e+05 |
2016-11-30 | 1.550023 | 6.023364e+05 |
2016-12-31 | 1.315995 | 7.672402e+05 |
2017-01-31 | 1.277500 | 9.876538e+05 |
2017-02-28 | 1.214792 | 1.091359e+06 |
2017-03-31 | 1.398935 | 8.307949e+05 |
2017-04-30 | 1.502611 | 9.070639e+05 |
2017-05-31 | 1.532593 | 9.854874e+05 |
2017-06-30 | 1.545395 | 9.544434e+05 |
2017-07-31 | 1.564926 | 8.794513e+05 |
2017-08-31 | 1.698958 | 7.831782e+05 |
2017-09-30 | 1.840949 | 6.592119e+05 |
2017-10-31 | 1.752444 | 6.718110e+05 |
2017-11-30 | 1.517292 | 7.512021e+05 |
2017-12-31 | 1.363852 | 8.506164e+05 |
2018-01-31 | 1.387431 | 9.946923e+05 |
2018-02-28 | 1.318704 | 1.152264e+06 |
2018-03-31 | 1.336458 | 1.053827e+06 |
In [82]:
# the maximum value for each quarter
avo_df.resample(rule='Q').max()
Out[82]:
AveragePrice | Total Volume | type | region | |
---|---|---|---|---|
Date | ||||
2015-03-31 | 2.38 | 44655461.51 | organic | WestTexNewMexico |
2015-06-30 | 2.40 | 41291704.39 | organic | WestTexNewMexico |
2015-09-30 | 2.79 | 37943670.34 | organic | WestTexNewMexico |
2015-12-31 | 2.79 | 31047484.27 | organic | WestTexNewMexico |
2016-03-31 | 2.38 | 52288697.89 | organic | WestTexNewMexico |
2016-06-30 | 2.40 | 46324529.70 | organic | WestTexNewMexico |
2016-09-30 | 2.91 | 39993186.04 | organic | WestTexNewMexico |
2016-12-31 | 3.25 | 31621221.90 | organic | WestTexNewMexico |
2017-03-31 | 3.05 | 61034457.10 | organic | WestTexNewMexico |
2017-06-30 | 3.17 | 47293921.60 | organic | WestTexNewMexico |
2017-09-30 | 3.04 | 39367336.18 | organic | WestTexNewMexico |
2017-12-31 | 3.00 | 38267341.61 | organic | WestTexNewMexico |
2018-03-31 | 2.30 | 62505646.52 | organic | WestTexNewMexico |
In [83]:
# the rows that satisfies a given critirea
low_price = avo_df['AveragePrice'].where(avo_df['AveragePrice']<1.2)
low_price
Out[83]:
Date
2015-01-04 NaN
2015-01-04 NaN
2015-01-04 NaN
2015-01-04 NaN
2015-01-04 NaN
...
2018-03-25 NaN
2018-03-25 0.7
2018-03-25 NaN
2018-03-25 NaN
2018-03-25 NaN
Name: AveragePrice, Length: 18249, dtype: float64
In [84]:
low_price = avo_df['AveragePrice'].where(avo_df['AveragePrice']<1.3).resample('1M').count()
low_price
Out[84]:
Date
2015-01-31 288
2015-02-28 156
2015-03-31 271
2015-04-30 198
2015-05-31 271
2015-06-30 201
2015-07-31 224
2015-08-31 254
2015-09-30 202
2015-10-31 217
2015-11-30 284
2015-12-31 226
2016-01-31 240
2016-02-29 314
2016-03-31 270
2016-04-30 276
2016-05-31 330
2016-06-30 240
2016-07-31 248
2016-08-31 185
2016-09-30 175
2016-10-31 155
2016-11-30 109
2016-12-31 233
2017-01-31 306
2017-02-28 269
2017-03-31 180
2017-04-30 187
2017-05-31 135
2017-06-30 140
2017-07-31 124
2017-08-31 55
2017-09-30 21
2017-10-31 41
2017-11-30 117
2017-12-31 232
2018-01-31 177
2018-02-28 189
2018-03-31 196
Freq: M, Name: AveragePrice, dtype: int64
In [87]:
# new features from the datetime information
avo_df.reset_index(inplace=True)
avo_df
Out[87]:
Date | AveragePrice | Total Volume | type | region | |
---|---|---|---|---|---|
0 | 2015-01-04 | 1.75 | 27365.89 | organic | Southeast |
1 | 2015-01-04 | 1.49 | 17723.17 | organic | Chicago |
2 | 2015-01-04 | 1.68 | 2896.72 | organic | HarrisburgScranton |
3 | 2015-01-04 | 1.52 | 54956.80 | conventional | Pittsburgh |
4 | 2015-01-04 | 1.64 | 1505.12 | organic | Boise |
... | ... | ... | ... | ... | ... |
18244 | 2018-03-25 | 1.36 | 908202.13 | conventional | Chicago |
18245 | 2018-03-25 | 0.70 | 9010588.32 | conventional | SouthCentral |
18246 | 2018-03-25 | 1.42 | 163496.70 | organic | SouthCentral |
18247 | 2018-03-25 | 1.70 | 190257.38 | organic | California |
18248 | 2018-03-25 | 1.34 | 1774776.77 | conventional | NewYork |
18249 rows × 5 columns
In [88]:
avo_df['Day'] = avo_df['Date'].dt.day
avo_df
Out[88]:
Date | AveragePrice | Total Volume | type | region | Day | |
---|---|---|---|---|---|---|
0 | 2015-01-04 | 1.75 | 27365.89 | organic | Southeast | 4 |
1 | 2015-01-04 | 1.49 | 17723.17 | organic | Chicago | 4 |
2 | 2015-01-04 | 1.68 | 2896.72 | organic | HarrisburgScranton | 4 |
3 | 2015-01-04 | 1.52 | 54956.80 | conventional | Pittsburgh | 4 |
4 | 2015-01-04 | 1.64 | 1505.12 | organic | Boise | 4 |
... | ... | ... | ... | ... | ... | ... |
18244 | 2018-03-25 | 1.36 | 908202.13 | conventional | Chicago | 25 |
18245 | 2018-03-25 | 0.70 | 9010588.32 | conventional | SouthCentral | 25 |
18246 | 2018-03-25 | 1.42 | 163496.70 | organic | SouthCentral | 25 |
18247 | 2018-03-25 | 1.70 | 190257.38 | organic | California | 25 |
18248 | 2018-03-25 | 1.34 | 1774776.77 | conventional | NewYork | 25 |
18249 rows × 6 columns
In [89]:
avo_df['Month'] = avo_df['Date'].dt.month
avo_df
Out[89]:
Date | AveragePrice | Total Volume | type | region | Day | Month | |
---|---|---|---|---|---|---|---|
0 | 2015-01-04 | 1.75 | 27365.89 | organic | Southeast | 4 | 1 |
1 | 2015-01-04 | 1.49 | 17723.17 | organic | Chicago | 4 | 1 |
2 | 2015-01-04 | 1.68 | 2896.72 | organic | HarrisburgScranton | 4 | 1 |
3 | 2015-01-04 | 1.52 | 54956.80 | conventional | Pittsburgh | 4 | 1 |
4 | 2015-01-04 | 1.64 | 1505.12 | organic | Boise | 4 | 1 |
... | ... | ... | ... | ... | ... | ... | ... |
18244 | 2018-03-25 | 1.36 | 908202.13 | conventional | Chicago | 25 | 3 |
18245 | 2018-03-25 | 0.70 | 9010588.32 | conventional | SouthCentral | 25 | 3 |
18246 | 2018-03-25 | 1.42 | 163496.70 | organic | SouthCentral | 25 | 3 |
18247 | 2018-03-25 | 1.70 | 190257.38 | organic | California | 25 | 3 |
18248 | 2018-03-25 | 1.34 | 1774776.77 | conventional | NewYork | 25 | 3 |
18249 rows × 7 columns
In [90]:
avo_df['Year'] = avo_df['Date'].dt.year
avo_df
Out[90]:
Date | AveragePrice | Total Volume | type | region | Day | Month | Year | |
---|---|---|---|---|---|---|---|---|
0 | 2015-01-04 | 1.75 | 27365.89 | organic | Southeast | 4 | 1 | 2015 |
1 | 2015-01-04 | 1.49 | 17723.17 | organic | Chicago | 4 | 1 | 2015 |
2 | 2015-01-04 | 1.68 | 2896.72 | organic | HarrisburgScranton | 4 | 1 | 2015 |
3 | 2015-01-04 | 1.52 | 54956.80 | conventional | Pittsburgh | 4 | 1 | 2015 |
4 | 2015-01-04 | 1.64 | 1505.12 | organic | Boise | 4 | 1 | 2015 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
18244 | 2018-03-25 | 1.36 | 908202.13 | conventional | Chicago | 25 | 3 | 2018 |
18245 | 2018-03-25 | 0.70 | 9010588.32 | conventional | SouthCentral | 25 | 3 | 2018 |
18246 | 2018-03-25 | 1.42 | 163496.70 | organic | SouthCentral | 25 | 3 | 2018 |
18247 | 2018-03-25 | 1.70 | 190257.38 | organic | California | 25 | 3 | 2018 |
18248 | 2018-03-25 | 1.34 | 1774776.77 | conventional | NewYork | 25 | 3 | 2018 |
18249 rows × 8 columns
In [91]:
avo_df.set_index(keys=['Date'], inplace=True); avo_df
Out[91]:
AveragePrice | Total Volume | type | region | Day | Month | Year | |
---|---|---|---|---|---|---|---|
Date | |||||||
2015-01-04 | 1.75 | 27365.89 | organic | Southeast | 4 | 1 | 2015 |
2015-01-04 | 1.49 | 17723.17 | organic | Chicago | 4 | 1 | 2015 |
2015-01-04 | 1.68 | 2896.72 | organic | HarrisburgScranton | 4 | 1 | 2015 |
2015-01-04 | 1.52 | 54956.80 | conventional | Pittsburgh | 4 | 1 | 2015 |
2015-01-04 | 1.64 | 1505.12 | organic | Boise | 4 | 1 | 2015 |
... | ... | ... | ... | ... | ... | ... | ... |
2018-03-25 | 1.36 | 908202.13 | conventional | Chicago | 25 | 3 | 2018 |
2018-03-25 | 0.70 | 9010588.32 | conventional | SouthCentral | 25 | 3 | 2018 |
2018-03-25 | 1.42 | 163496.70 | organic | SouthCentral | 25 | 3 | 2018 |
2018-03-25 | 1.70 | 190257.38 | organic | California | 25 | 3 | 2018 |
2018-03-25 | 1.34 | 1774776.77 | conventional | NewYork | 25 | 3 | 2018 |
18249 rows × 7 columns
728x90
'Data Analytics with python > [Data Analysis]' 카테고리의 다른 글
[Text]S8_01_upper_lower (0) | 2023.01.21 |
---|---|
[datetime]S7_05_Practical_example3 (0) | 2023.01.21 |
[datetime]S7_03_Practical_example1 (0) | 2023.01.21 |
[datetime]S7_02_Timestamp (0) | 2023.01.21 |
[datetime]S7_01_datetime (0) | 2023.01.21 |
댓글