728x90
Link to Dataset: https://www.kaggle.com/datasets/mysarahmadbhat/customersegmentation¶
In [1]:
import pandas as pd
In [2]:
sales_df = pd.read_csv('Online Retail.csv', encoding= 'unicode_escape')
In [3]:
sales_df.head()
Out[3]:
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | |
---|---|---|---|---|---|---|---|---|
0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 12/1/2010 8:26 | 2.55 | 17850.0 | United Kingdom |
1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 12/1/2010 8:26 | 3.39 | 17850.0 | United Kingdom |
2 | 536365 | 84406B | CREAM CUPID HEARTS COAT HANGER | 8 | 12/1/2010 8:26 | 2.75 | 17850.0 | United Kingdom |
3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 12/1/2010 8:26 | 3.39 | 17850.0 | United Kingdom |
4 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 12/1/2010 8:26 | 3.39 | 17850.0 | United Kingdom |
In [4]:
sales_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 541909 entries, 0 to 541908
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 InvoiceNo 541909 non-null object
1 StockCode 541909 non-null object
2 Description 540455 non-null object
3 Quantity 541909 non-null int64
4 InvoiceDate 541909 non-null object
5 UnitPrice 541909 non-null float64
6 CustomerID 406829 non-null float64
7 Country 541909 non-null object
dtypes: float64(2), int64(1), object(5)
memory usage: 33.1+ MB
In [5]:
# Convert Invoice date to datetime format
sales_df['InvoiceDate'] = pd.to_datetime(sales_df['InvoiceDate'])
In [6]:
# Check the number of Null value
sales_df.isnull().sum()
Out[6]:
InvoiceNo 0
StockCode 0
Description 1454
Quantity 0
InvoiceDate 0
UnitPrice 0
CustomerID 135080
Country 0
dtype: int64
In [8]:
# How many unique countries are present
sales_df['Country'].unique()
Out[8]:
array(['United Kingdom', 'France', 'Australia', 'Netherlands', 'Germany',
'Norway', 'EIRE', 'Switzerland', 'Spain', 'Poland', 'Portugal',
'Italy', 'Belgium', 'Lithuania', 'Japan', 'Iceland',
'Channel Islands', 'Denmark', 'Cyprus', 'Sweden', 'Austria',
'Israel', 'Finland', 'Bahrain', 'Greece', 'Hong Kong', 'Singapore',
'Lebanon', 'United Arab Emirates', 'Saudi Arabia',
'Czech Republic', 'Canada', 'Unspecified', 'Brazil', 'USA',
'European Community', 'Malta', 'RSA'], dtype=object)
In [10]:
len(sales_df['Country'].unique())
Out[10]:
38
In [9]:
sales_df.nunique()
Out[9]:
InvoiceNo 25900
StockCode 4070
Description 4223
Quantity 722
InvoiceDate 23260
UnitPrice 1630
CustomerID 4372
Country 38
dtype: int64
728x90
'Data Analytics with python > [Data Analysis]' 카테고리의 다른 글
[Pandas][DataFrame][MultiIndex]S4_03_Multi_index_dataframe (0) | 2023.01.21 |
---|---|
[Pandas][DataFrame][MultiIndex]S4_02_grouping (0) | 2023.01.21 |
[Pandas][DataFrame][concat]S3_02_concatenation_with_multi_indexing (0) | 2023.01.21 |
[Pandas][DataFrame][concat]S3_01_concatenation (0) | 2023.01.21 |
[Pandas][DataFrame]S2_14_change_datatypes (0) | 2023.01.21 |
댓글