본문 바로가기
728x90

분류 전체보기279

[Pandas][DataFrame]S2_04_index_setting 데이터 출처: https://www.kaggle.com/datasets/mathchi/churn-for-bank-customers?resource=download In [1]: import pandas as pd In [7]: bank_df = pd.read_csv('bank customers.csv'); bank_df Out[7]: RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited 0 1 15634602 Hargrave 619 France Female 42 2 0.00 1 1 1 101348.88 1 1 2.. 2023. 1. 21.
[Pandas][DataFrame]S2_03_Outputs In [1]: import pandas as pd In [2]: bank_df = pd.read_csv('bank customers.csv'); bank_df Out[2]: RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited 0 1 15634602 Hargrave 619 France Female 42 2 0.00 1 1 1 101348.88 1 1 2 15647311 Hill 608 Spain Female 41 1 83807.86 1 0 1 112542.58 0 2 3 15619304 Onio 502 Franc.. 2023. 1. 21.
[Pandas][DataFrame]S2_02_Inputs 데이터 출처: https://www.kaggle.com/datasets/mathchi/churn-for-bank-customers In [1]: import pandas as pd In [2]: bank_df = pd.read_csv('bank customers.csv'); bank_df Out[2]: RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited 0 1 15634602 Hargrave 619 France Female 42 2 0.00 1 1 1 101348.88 1 1 2 15647311 Hill 608.. 2023. 1. 21.
[Pandas][DaraFrame]S2_01_DataFrame In [1]: import pandas as pd In [3]: # data client_df = pd.DataFrame({'Client ID':[111, 112, 113, 114], 'Client Name':['Michael','Donald','John','Matthew'], 'Net Worth[$]': [3000, 40000, 100000, 15000], 'Years': [5, 9, 10, 12]}) client_df Out[3]: Client ID Client Name Net Worth[$] Years 0 111 Michael 3000 5 1 112 Donald 40000 9 2 113 John 100000 10 3 114 Matthew 15000 12 In [4]: # the data type t.. 2023. 1. 17.
[Pandas][Series]S1_12_Slicing In [2]: import pandas as pd In [3]: prices = pd.read_csv('/content/sample_data/prices.csv', squeeze=True); prices Out[3]: 0 2.55 1 3.39 2 2.75 3 3.39 4 3.39 ... 541905 2.10 541906 4.15 541907 4.15 541908 4.95 541909 18.00 Name: Price, Length: 541910, dtype: float64 In [4]: # Slice elements from a Pandas Series # starting from index 0 up until and not including stop index prices[0:5] Out[4]: 0 2... 2023. 1. 17.
[Pandas][Series]S1_11_Indexing In [1]: import pandas as pd In [6]: prices = pd.read_csv('/content/sample_data/prices.csv', squeeze=True); prices Out[6]: 0 2.55 1 3.39 2 2.75 3 3.39 4 3.39 ... 541905 2.10 541906 4.15 541907 4.15 541908 4.95 541909 18.00 Name: Price, Length: 541910, dtype: float64 In [3]: # the first element in a Pandas Series # ★ index starts from zero! prices[0] Out[3]: 2.55 In [12]: # the fifth element in a .. 2023. 1. 17.
[pandas][Series]S1_10_Checking element In [1]: import pandas as pd In [2]: prices = pd.read_csv('/content/sample_data/prices.csv', squeeze=True) In [6]: # Check if a given number exists in a pandas Series values 5.79 in prices.values Out[6]: True In [7]: # Check if a given number exists in a pandas Series index 5.79 in prices.index Out[7]: False In [8]: # 'in' will search in pandas index by default 5.79 in prices Out[8]: False 2023. 1. 17.
[pandas][Series] S1_09_Math Operations In [ ]: import pandas as pd In [ ]: prices = pd.read_csv('/content/sample_data/prices.csv', squeeze=True) In [ ]: # Apply Sum prices.sum() Out[ ]: 2498821.9739999995 In [ ]: # Apply count prices.count() Out[ ]: 541910 In [ ]: # the maximum value prices.max() Out[ ]: 38970.0 In [ ]: # the minimum value prices.min() Out[ ]: -11062.06 In [ ]: # all statistical information prices.describe() Out[ ]: .. 2023. 1. 17.
[Pandas][Series] S1_08_Sorting: 정렬 In [1]: import pandas as pd In [2]: prices = pd.read_csv('/content/sample_data/prices.csv', squeeze=True) In [3]: # sort the values : that nothing changed in memory # default: 오름차순 prices.sort_values() Out[3]: 299984 -11062.06 299983 -11062.06 40984 0.00 345010 0.00 345008 0.00 ... 16356 13541.33 43703 16453.71 43702 16888.02 524602 17836.46 222681 38970.00 Name: Price, Length: 541910, dtype: fl.. 2023. 1. 17.
[Pandas][Series] S1_07_Bulit-in functions: 내장 함수 In [1]: import pandas as pd functions https://docs.python.org/3/library/functions.html In [3]: prices = pd.read_csv('/content/sample_data/prices.csv', squeeze=True) In [7]: prices Out[7]: 0 2.55 1 3.39 2 2.75 3 3.39 4 3.39 ... 541905 2.10 541906 4.15 541907 4.15 541908 4.95 541909 18.00 Name: Price, Length: 541910, dtype: float64 In [4]: # the length of the Series len(prices) Out[4]: 541910 In [.. 2023. 1. 17.
[Pandas][Series] S1_06_Reading data (CSV) In [1]: import pandas as pd In [9]: prices = pd.read_csv('/content/sample_data/prices.csv', squeeze=True) squeeze는 차원을 축소(압축)한다. 예를들어 한개의 행이나 열만 있는 DataFrame을 squeeze하면 Series 객체가 됩니다. 1개 인덱스만 있는 Series를 squeeze하면 스칼라값이 됩니다. 마찬가지로 1행,1열만 있는 DataFrame 객체를 squeeze하면 스칼라 값이 됩니다. In [10]: # automatic formatting - 파이썬은 자동으로 포맷된다. prices Out[10]: 0 2.55 1 3.39 2 2.75 3 3.39 4 3.39 ... 366514 2.10 3665.. 2023. 1. 17.
[Pandas][Series] S1_05_Methods: 기본 함수 In [1]: import pandas as pd In [2]: # examples s1 = pd.Series(data = [100, 300, 500, 1000, 1500]); s1 Out[2]: 0 100 1 300 2 500 3 1000 4 1500 dtype: int64 In [3]: # sum s1.sum() Out[3]: 3400 In [4]: # multiplication s1.product() Out[4]: 22500000000000 In [5]: # average s1.mean() Out[5]: 680.0 In [6]: # show the first cople of elements #s1.head(2) Out[6]: 0 100 1 300 dtype: int64 In [9]: # create.. 2023. 1. 17.
그림 크기 조절 "style" 부분을 찾아가서 /> 기호 앞에 style="width: 700px; height: 700px;" 로 그림 크기 조정 가능하다. 2023. 1. 17.
[Pandas][Series] S1_04_Attributes: 기본 속성 In [1]: import pandas as pd Attributes / Properties : do not use parantehsees(소괄호) "()" ① use parantheses in Methods: "()"은 인자를 포함하고 시리즈 객체를 바꾸는 경우 사용한다. ex) data.heand() ② use square brackets in Indexers : "[]"은 시리즈나 데이터 프레임 안에 구체적인 요소에 접근할 경우 사용한다. ex) data.loc[], data.iloc[] In [2]: # example1 list1 = ['NVDA','MSFT','META','AMZN','GOOGL'] s1 = pd.Series(data = list1); s1 Out[2]: 0 NVDA 1 MSFT.. 2023. 1. 17.
[Pandas][Series]S1_03_A Dictionary: 사전 정의 In [1]: import pandas as pd A Dictionary: A collection of key-value pairs In [2]: dict1 = {'Client ID' : 101, 'Client Name' : 'David', 'Net worth [$]' : 1500, 'Years' : 12} In [3]: # Show dict1 Out[3]: {'Client ID': 101, 'Client Name': 'David', 'Net worth [$]': 1500, 'Years': 12} In [4]: # datatype type(dict1) Out[4]: dict In [5]: s1 = pd.Series(dict1); s1 Out[5]: Client ID 101 Client Name David.. 2023. 1. 17.
[Pandas][Series] S1_02_custom_index: 사용자 지정 인덱스 In [1]: import pandas as pd In [3]: # 5 stocks list1 = ['NVDA','MSFT','META','AMZN','GOOGL']; list1 Out[3]: ['NVDA', 'MSFT', 'META', 'AMZN', 'GOOGL'] In [4]: # index labels1 = ['#1','#2','#3','#4','#5']; labels1 Out[4]: ['#1', '#2', '#3', '#4', '#5'] In [6]: s1 = pd.Series(data = list1, index = labels1); s1 Out[6]: #1 NVDA #2 MSFT #3 META #4 AMZN #5 GOOGL dtype: object In [8]: # datatype type(s1.. 2023. 1. 17.
[Pandas][Series] S1_01_Numeric Default Index: 기본 인덱스 In [1]: import pandas as pd In [5]: # example : Enterprise list1 = ['Nvidia','Microsoft','META','Amazon','Alphabet'] In [6]: # confirming the Datatype type(list1) Out[6]: list https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html In [8]: # pandas.Series # numeric index has been automatically generated s1 = pd.Series(data = list1); s1 Out[8]: 0 Nvidia 1 Microsoft 2 FaceBoo.. 2023. 1. 17.
[실전 연습] 보험료 예측 (insurance) 핵심 포인트 : EDA / 결측치 시각화 / Regression 01. 데이터 수립¶ https://www.kaggle.com/datasets/simranjain17/insurance In [30]: # 필요한 라이브러리 import pandas as pd import seaborn as sns import matplotlib.pyplot as plt import missingno import numpy as np from sklearn.preprocessing import OneHotEncoder from sklearn.preprocessing import LabelEncoder from sklearn.preprocessing import MinMaxScaler from sklearn.preproces.. 2023. 1. 17.
[실기] 빅분기 연습: job_change 데이터로 (with Kaggle) / Classification data : https://www.kaggle.com/datasets/arashnic/hr-analytics-job-change-of-data-scientists?resource=download 작업형 예시 / 문제 : Classification / 평가 : auc In [60]: import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.preprocessing import OneHotEncoder, StandardScaler from sklearn.ensemble import RandomForestClassifier as RFC from xgboost import XGBCl.. 2023. 1. 16.
[실기] 빅데이터분석기사 실기 작업 흐름 (참고) 제공된 패키지 확인 import pkg_resources import pandas Output = pandas.DataFrame(sorted([i.key, i.version) for i in pkg_resources.working_set])) print(Output) help 사용법 (import 후에 사용) : ① # import sklearn # print(sklearn.__all__) ② # from sklearn import linear_model # print(dir(sklearn.linear_model)) ③ # from sklearn.linear_model import LinearRegression # help(LinearRegression) 1. 데이터 읽어오기 2. 데이터 셋 파.. 2023. 1. 16.
728x90