728x90
In [1]:
import pandas as pd
In [2]:
sales_df = pd.read_csv('Online Retail.csv', encoding= 'unicode_escape')
In [12]:
sales_df.head()
Out[12]:
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | |
---|---|---|---|---|---|---|---|---|
0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2010-12-01 08:26:00 | 2.55 | 17850.0 | United Kingdom |
1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 2010-12-01 08:26:00 | 3.39 | 17850.0 | United Kingdom |
2 | 536365 | 84406B | CREAM CUPID HEARTS COAT HANGER | 8 | 2010-12-01 08:26:00 | 2.75 | 17850.0 | United Kingdom |
3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 2010-12-01 08:26:00 | 3.39 | 17850.0 | United Kingdom |
4 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 2010-12-01 08:26:00 | 3.39 | 17850.0 | United Kingdom |
In [13]:
# group large amounts of data and compute operations on these groups
sales_df.groupby('Country')['UnitPrice'].mean()
Out[13]:
Country
Australia 3.220612
Austria 4.243192
Bahrain 4.556316
Belgium 3.644335
Brazil 4.456250
Canada 6.030331
Channel Islands 4.932124
Cyprus 6.302363
Czech Republic 2.938333
Denmark 3.256941
EIRE 5.911077
European Community 4.820492
Finland 5.448705
France 5.028864
Germany 3.966930
Greece 4.885548
Hong Kong 42.505208
Iceland 2.644011
Israel 3.633131
Italy 4.831121
Japan 2.276145
Lebanon 5.387556
Lithuania 2.841143
Malta 5.244173
Netherlands 2.738317
Norway 6.012026
Poland 4.170880
Portugal 8.582976
RSA 4.277586
Saudi Arabia 2.411000
Singapore 109.645808
Spain 4.987544
Sweden 3.910887
Switzerland 3.403442
USA 2.216426
United Arab Emirates 3.380735
United Kingdom 4.532422
Unspecified 2.699574
Name: UnitPrice, dtype: float64
In [14]:
sales_df.groupby('Country')['UnitPrice'].min()
Out[14]:
Country
Australia 0.00
Austria 0.12
Bahrain 1.25
Belgium 0.12
Brazil 0.85
Canada 0.10
Channel Islands 0.19
Cyprus 0.12
Czech Republic 0.29
Denmark 0.21
EIRE 0.00
European Community 0.55
Finland 0.12
France 0.00
Germany 0.00
Greece 0.14
Hong Kong 0.21
Iceland 0.25
Israel 0.06
Italy 0.12
Japan 0.21
Lebanon 0.55
Lithuania 1.25
Malta 0.19
Netherlands 0.00
Norway 0.00
Poland 0.19
Portugal 0.12
RSA 0.00
Saudi Arabia 0.42
Singapore 0.19
Spain 0.00
Sweden 0.19
Switzerland 0.00
USA 0.42
United Arab Emirates 0.29
United Kingdom -11062.06
Unspecified 0.19
Name: UnitPrice, dtype: float64
In [15]:
sales_df.groupby('Country')['UnitPrice'].max()
Out[15]:
Country
Australia 350.00
Austria 40.00
Bahrain 12.75
Belgium 39.95
Brazil 10.95
Canada 550.94
Channel Islands 293.00
Cyprus 320.69
Czech Republic 40.00
Denmark 18.00
EIRE 1917.00
European Community 18.00
Finland 275.60
France 4161.06
Germany 599.50
Greece 50.00
Hong Kong 2653.95
Iceland 12.75
Israel 125.00
Italy 300.00
Japan 45.57
Lebanon 14.95
Lithuania 5.95
Malta 65.00
Netherlands 206.40
Norway 700.00
Poland 40.00
Portugal 1241.98
RSA 14.95
Saudi Arabia 5.49
Singapore 3949.32
Spain 1715.85
Sweden 40.00
Switzerland 40.00
USA 16.95
United Arab Emirates 37.50
United Kingdom 38970.00
Unspecified 16.95
Name: UnitPrice, dtype: float64
In [18]:
sales_df.groupby('InvoiceDate')['UnitPrice'].mean()
Out[18]:
InvoiceDate
2010-12-01 08:26:00 3.910000
2010-12-01 08:28:00 1.850000
2010-12-01 08:34:00 4.833750
2010-12-01 08:35:00 5.950000
2010-12-01 08:45:00 2.764500
...
2011-12-09 12:23:00 1.650000
2011-12-09 12:25:00 1.285000
2011-12-09 12:31:00 1.799048
2011-12-09 12:49:00 5.057500
2011-12-09 12:50:00 2.966667
Name: UnitPrice, Length: 23260, dtype: float64
In [19]:
sales_df.groupby(['Country','InvoiceDate'])['UnitPrice'].mean()
Out[19]:
Country InvoiceDate
Australia 2010-12-01 10:03:00 5.278571
2010-12-08 09:53:00 2.726250
2010-12-14 11:12:00 4.283333
2010-12-17 14:10:00 3.510000
2011-01-06 11:12:00 1.871304
...
Unspecified 2011-08-22 10:18:00 2.381429
2011-08-22 13:32:00 8.115000
2011-09-02 12:17:00 1.642879
2011-11-16 10:18:00 2.339474
2011-11-24 14:55:00 2.107353
Name: UnitPrice, Length: 23616, dtype: float64
In [28]:
sales_df.groupby(['InvoiceDate'])['UnitPrice'].min()['2010-12-01 08:34:00']
Out[28]:
1.65
728x90
'Data Analytics with python > [Data Analysis]' 카테고리의 다른 글
[Pandas][DataFrame][MultiIndex]S4_04_Multi_indexing_operations1 (0) | 2023.01.21 |
---|---|
[Pandas][DataFrame][MultiIndex]S4_03_Multi_index_dataframe (0) | 2023.01.21 |
[Pandas][DataFrame][MultiIndex]S4_01_Explore_dataset (0) | 2023.01.21 |
[Pandas][DataFrame][concat]S3_02_concatenation_with_multi_indexing (0) | 2023.01.21 |
[Pandas][DataFrame][concat]S3_01_concatenation (0) | 2023.01.21 |
댓글