728x90
In [11]:
import os
import pandas as pd
In [12]:
os.chdir('C:/Users/KANG/Downloads')
os.getcwd()
Out[12]:
'C:\\Users\\KANG\\Downloads'
In [13]:
groceries_df = pd.read_csv('./Groceries data.csv')
groceries_df.head()
Out[13]:
Member_number | Date | itemDescription | year | month | day | day_of_week | |
---|---|---|---|---|---|---|---|
0 | 1808 | 2015-07-21 | tropical fruit | 2015 | 7 | 21 | 1 |
1 | 2552 | 2015-05-01 | whole milk | 2015 | 5 | 1 | 4 |
2 | 2300 | 2015-09-19 | pip fruit | 2015 | 9 | 19 | 5 |
3 | 1187 | 2015-12-12 | other vegetables | 2015 | 12 | 12 | 5 |
4 | 3037 | 2015-01-02 | whole milk | 2015 | 1 | 2 | 4 |
In [14]:
temp_df = groceries_df.set_index('Member_number')
temp_keys =temp_df.index.drop_duplicates()
temp_keys
Out[14]:
Int64Index([1808, 2552, 2300, 1187, 3037, 4941, 4501, 3803, 2762, 4119,
...
4639, 2456, 1221, 3431, 3080, 4590, 4703, 3607, 4587, 2417],
dtype='int64', name='Member_number', length=3898)
In [15]:
# ID가 소비한 품목 확인 (2552인 고객)
temp_df.loc[temp_keys[1]]['itemDescription'].values
Out[15]:
array(['whole milk', 'butter', 'female sanitary products', 'pot plants',
'other vegetables', 'tropical fruit', 'root vegetables',
'whole milk', 'shopping bags', 'chocolate', 'chocolate', 'coffee',
'hygiene articles'], dtype=object)
In [29]:
from tqdm import tqdm
tqdm.pandas() # tqdm의 pandas전용 메소드를 호출합니다. apply의 진척도를 실시간으로 보여줍니다.
ar_df= groceries_df.groupby(['Member_number']).progress_apply(lambda x: ','.join(x['itemDescription']))
ar_df_final = pd.DataFrame(ar_df).reset_index().rename(columns={0:'item_names'})
ar_df_final
100%|███████████████████████████████████████████████████████████████████████████| 3898/3898 [00:00<00:00, 28420.39it/s]
Out[29]:
Member_number | item_names | |
---|---|---|
0 | 1000 | soda,canned beer,sausage,sausage,whole milk,wh... |
1 | 1001 | frankfurter,frankfurter,beef,sausage,whole mil... |
2 | 1002 | tropical fruit,butter milk,butter,frozen veget... |
3 | 1003 | sausage,root vegetables,rolls/buns,detergent,f... |
4 | 1004 | other vegetables,pip fruit,root vegetables,can... |
... | ... | ... |
3893 | 4996 | dessert,salty snack,rolls/buns,misc. beverages... |
3894 | 4997 | tropical fruit,white wine,whole milk,curd,grap... |
3895 | 4998 | rolls/buns,curd |
3896 | 4999 | bottled water,butter milk,tropical fruit,berri... |
3897 | 5000 | soda,bottled beer,fruit/vegetable juice,root v... |
3898 rows × 2 columns
In [33]:
items_split = ar_df_final.item_names.str.split(',', expand = True)
# default는 시리즈이나 expand = True를 사용하면 데이터 프레임 형태로 나오게 된다.
items_split
Out[33]:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | soda | canned beer | sausage | sausage | whole milk | whole milk | pickled vegetables | misc. beverages | semi-finished bread | hygiene articles | ... | None | None | None | None | None | None | None | None | None | None |
1 | frankfurter | frankfurter | beef | sausage | whole milk | soda | curd | white bread | whole milk | soda | ... | None | None | None | None | None | None | None | None | None | None |
2 | tropical fruit | butter milk | butter | frozen vegetables | sugar | specialty chocolate | whole milk | other vegetables | None | None | ... | None | None | None | None | None | None | None | None | None | None |
3 | sausage | root vegetables | rolls/buns | detergent | frozen meals | rolls/buns | dental care | rolls/buns | None | None | ... | None | None | None | None | None | None | None | None | None | None |
4 | other vegetables | pip fruit | root vegetables | canned beer | rolls/buns | whole milk | other vegetables | hygiene articles | whole milk | whole milk | ... | None | None | None | None | None | None | None | None | None | None |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3893 | dessert | salty snack | rolls/buns | misc. beverages | bottled beer | tropical fruit | bottled water | decalcifier | semi-finished bread | soda | ... | None | None | None | None | None | None | None | None | None | None |
3894 | tropical fruit | white wine | whole milk | curd | grapes | canned beer | None | None | None | None | ... | None | None | None | None | None | None | None | None | None | None |
3895 | rolls/buns | curd | None | None | None | None | None | None | None | None | ... | None | None | None | None | None | None | None | None | None | None |
3896 | bottled water | butter milk | tropical fruit | berries | berries | other vegetables | semi-finished bread | herbs | whipped/sour cream | other vegetables | ... | None | None | None | None | None | None | None | None | None | None |
3897 | soda | bottled beer | fruit/vegetable juice | root vegetables | other vegetables | onions | semi-finished bread | None | None | None | ... | None | None | None | None | None | None | None | None | None | None |
3898 rows × 36 columns
In [34]:
items_split.to_csv('market_basket.csv', index=False)
728x90
'Data Analytics with python > [Machine Learning ]' 카테고리의 다른 글
[Dimension Reduction] LDA 변환 (0) | 2023.02.13 |
---|---|
[Dimension Reduction] PCA components 기반 변환 (0) | 2023.02.13 |
[연관 규칙 분석] Association_rules 분석 (0) | 2023.02.03 |
[회귀 구현] data : california_housing (0) | 2023.02.02 |
[학습 01] 주택 가격 예측하기 (1) | 2023.02.01 |
댓글