728x90
In [59]:
import pandas as pd
In [60]:
employee_df = pd.read_csv('Human_Resources_Employee.csv')
employee_df.head()
Out[60]:
Age | Attrition | BusinessTravel | DailyRate | Department | DistanceFromHome | Education | EducationField | EmployeeCount | EmployeeNumber | ... | RelationshipSatisfaction | StandardHours | StockOptionLevel | TotalWorkingYears | TrainingTimesLastYear | WorkLifeBalance | YearsAtCompany | YearsInCurrentRole | YearsSinceLastPromotion | YearsWithCurrManager | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 41 | Yes | Travel_Rarely | 1102 | Sales | 1 | 2 | Life Sciences | 1 | 1.0 | ... | 1 | 80 | 0 | 8 | 0 | 1 | 6 | 4 | 0 | 5 |
1 | 49 | No | Travel_Frequently | 279 | Research & Development | 8 | 1 | Life Sciences | 1 | 2.0 | ... | 4 | 80 | 1 | 10 | 3 | 3 | 10 | 7 | 1 | 7 |
2 | 37 | Yes | Travel_Rarely | 1373 | Research & Development | 2 | 2 | Other | 1 | 4.0 | ... | 2 | 80 | 0 | 7 | 3 | 3 | 0 | 0 | 0 | 0 |
3 | 33 | No | Travel_Frequently | 1392 | Research & Development | 3 | 4 | Life Sciences | 1 | 5.0 | ... | 3 | 80 | 0 | 8 | 3 | 3 | 8 | 7 | 3 | 0 |
4 | 27 | No | Travel_Rarely | 591 | Research & Development | 2 | 1 | Medical | 1 | 7.0 | ... | 4 | 80 | 1 | 6 | 3 | 3 | 2 | 2 | 2 | 2 |
5 rows × 35 columns
In [61]:
employee_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1470 entries, 0 to 1469
Data columns (total 35 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Age 1470 non-null int64
1 Attrition 1470 non-null object
2 BusinessTravel 1470 non-null object
3 DailyRate 1470 non-null int64
4 Department 1469 non-null object
5 DistanceFromHome 1470 non-null int64
6 Education 1470 non-null int64
7 EducationField 1469 non-null object
8 EmployeeCount 1470 non-null int64
9 EmployeeNumber 1469 non-null float64
10 EnvironmentSatisfaction 1470 non-null int64
11 Gender 1469 non-null object
12 HourlyRate 1470 non-null int64
13 JobInvolvement 1470 non-null int64
14 JobLevel 1470 non-null int64
15 JobRole 1469 non-null object
16 JobSatisfaction 1470 non-null int64
17 MaritalStatus 1469 non-null object
18 MonthlyIncome 1467 non-null float64
19 MonthlyRate 1468 non-null float64
20 NumCompaniesWorked 1470 non-null int64
21 Over18 1470 non-null object
22 OverTime 1470 non-null object
23 PercentSalaryHike 1469 non-null float64
24 PerformanceRating 1469 non-null float64
25 RelationshipSatisfaction 1470 non-null int64
26 StandardHours 1470 non-null int64
27 StockOptionLevel 1470 non-null int64
28 TotalWorkingYears 1470 non-null int64
29 TrainingTimesLastYear 1470 non-null int64
30 WorkLifeBalance 1470 non-null int64
31 YearsAtCompany 1470 non-null int64
32 YearsInCurrentRole 1470 non-null int64
33 YearsSinceLastPromotion 1470 non-null int64
34 YearsWithCurrManager 1470 non-null int64
dtypes: float64(5), int64(21), object(9)
memory usage: 402.1+ KB
In [62]:
# convert the hourly rate from int64 to float64
employee_df['HourlyRate'] = employee_df['HourlyRate'].astype('float64')
In [65]:
employee_df['HourlyRate'].dtype
Out[65]:
dtype('float64')
In [69]:
# convert the performancerating float64 to category
employee_df['PerformanceRating'] = employee_df['PerformanceRating'].astype('category')
employee_df['RelationshipSatisfaction'] = employee_df['RelationshipSatisfaction'].astype('category')
In [67]:
employee_df['PerformanceRating'].dtype
Out[67]:
CategoricalDtype(categories=[3.0, 4.0], ordered=False)
In [70]:
employee_df['RelationshipSatisfaction'].dtype
Out[70]:
CategoricalDtype(categories=[1, 2, 3, 4], ordered=False)
728x90
'Data Analytics with python > [Data Analysis]' 카테고리의 다른 글
[Pandas][DataFrame][concat]S3_02_concatenation_with_multi_indexing (0) | 2023.01.21 |
---|---|
[Pandas][DataFrame][concat]S3_01_concatenation (0) | 2023.01.21 |
[Pandas][DataFrame]S2_13_FeatureEngineering (0) | 2023.01.21 |
[Pandas][DataFrame]S2_12_Operations_Filtering (0) | 2023.01.21 |
[Pandas][DataFrame]S2_11_define_functions (0) | 2023.01.21 |
댓글