728x90

In [59]:

import pandas as pd

In [60]:

employee_df = pd.read_csv('Human_Resources_Employee.csv')
employee_df.head()

Out[60]:

	Age	Attrition	BusinessTravel	DailyRate	Department	DistanceFromHome	Education	EducationField	EmployeeCount	EmployeeNumber	...	RelationshipSatisfaction	StandardHours	StockOptionLevel	TotalWorkingYears	TrainingTimesLastYear	WorkLifeBalance	YearsAtCompany	YearsInCurrentRole	YearsSinceLastPromotion	YearsWithCurrManager
0	41	Yes	Travel_Rarely	1102	Sales	1	2	Life Sciences	1	1.0	...	1	80	0	8	0	1	6	4	0	5
1	49	No	Travel_Frequently	279	Research & Development	8	1	Life Sciences	1	2.0	...	4	80	1	10	3	3	10	7	1	7
2	37	Yes	Travel_Rarely	1373	Research & Development	2	2	Other	1	4.0	...	2	80	0	7	3	3	0	0	0	0
3	33	No	Travel_Frequently	1392	Research & Development	3	4	Life Sciences	1	5.0	...	3	80	0	8	3	3	8	7	3	0
4	27	No	Travel_Rarely	591	Research & Development	2	1	Medical	1	7.0	...	4	80	1	6	3	3	2	2	2	2

5 rows × 35 columns

In [61]:

employee_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1470 entries, 0 to 1469
Data columns (total 35 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Age                       1470 non-null   int64  
 1   Attrition                 1470 non-null   object 
 2   BusinessTravel            1470 non-null   object 
 3   DailyRate                 1470 non-null   int64  
 4   Department                1469 non-null   object 
 5   DistanceFromHome          1470 non-null   int64  
 6   Education                 1470 non-null   int64  
 7   EducationField            1469 non-null   object 
 8   EmployeeCount             1470 non-null   int64  
 9   EmployeeNumber            1469 non-null   float64
 10  EnvironmentSatisfaction   1470 non-null   int64  
 11  Gender                    1469 non-null   object 
 12  HourlyRate                1470 non-null   int64  
 13  JobInvolvement            1470 non-null   int64  
 14  JobLevel                  1470 non-null   int64  
 15  JobRole                   1469 non-null   object 
 16  JobSatisfaction           1470 non-null   int64  
 17  MaritalStatus             1469 non-null   object 
 18  MonthlyIncome             1467 non-null   float64
 19  MonthlyRate               1468 non-null   float64
 20  NumCompaniesWorked        1470 non-null   int64  
 21  Over18                    1470 non-null   object 
 22  OverTime                  1470 non-null   object 
 23  PercentSalaryHike         1469 non-null   float64
 24  PerformanceRating         1469 non-null   float64
 25  RelationshipSatisfaction  1470 non-null   int64  
 26  StandardHours             1470 non-null   int64  
 27  StockOptionLevel          1470 non-null   int64  
 28  TotalWorkingYears         1470 non-null   int64  
 29  TrainingTimesLastYear     1470 non-null   int64  
 30  WorkLifeBalance           1470 non-null   int64  
 31  YearsAtCompany            1470 non-null   int64  
 32  YearsInCurrentRole        1470 non-null   int64  
 33  YearsSinceLastPromotion   1470 non-null   int64  
 34  YearsWithCurrManager      1470 non-null   int64  
dtypes: float64(5), int64(21), object(9)
memory usage: 402.1+ KB

In [62]:

# convert the hourly rate from int64 to float64
employee_df['HourlyRate'] = employee_df['HourlyRate'].astype('float64')

In [65]:

employee_df['HourlyRate'].dtype

Out[65]:

dtype('float64')

In [69]:

# convert the performancerating float64 to category
employee_df['PerformanceRating'] = employee_df['PerformanceRating'].astype('category')
employee_df['RelationshipSatisfaction'] = employee_df['RelationshipSatisfaction'].astype('category')

In [67]:

employee_df['PerformanceRating'].dtype

Out[67]:

CategoricalDtype(categories=[3.0, 4.0], ordered=False)

In [70]:

employee_df['RelationshipSatisfaction'].dtype

Out[70]:

CategoricalDtype(categories=[1, 2, 3, 4], ordered=False)

728x90

'Data Analytics with python > [Data Analysis]' 카테고리의 다른 글

[Pandas][DataFrame][concat]S3_02_concatenation_with_multi_indexing (0)	2023.01.21
[Pandas][DataFrame][concat]S3_01_concatenation (0)	2023.01.21
[Pandas][DataFrame]S2_13_FeatureEngineering (0)	2023.01.21
[Pandas][DataFrame]S2_12_Operations_Filtering (0)	2023.01.21
[Pandas][DataFrame]S2_11_define_functions (0)	2023.01.21

Kang's Note

[Pandas][DataFrame]S2_14_change_datatypes

'Data Analytics with python > [Data Analysis]' 카테고리의 다른 글

댓글

티스토리툴바

[Pandas][DataFrame]S2_14_change_datatypes

'Data Analytics with python > [Data Analysis]' 카테고리의 다른 글

관련글

댓글

티스토리툴바