본문 바로가기
728x90

분류 전체보기280

그래프 유형 선택하기 선 그래프 ● 시간 경과에 따른 추세 분석 및 시연에 적합하다 ● 변화를 쉽게 관찰 가능하다 ● 신속한 데이터 설정 및 테스트 ● 한 축은 가변 값(예를 들어 가격)을 나타낼 수 있고 다른 축은 시간을 나타낼 수 있습니다 (가령, 월별 매출 시각화를 수행하는 것과 비슷하다). 막대 그래프 ● 범주 비교에 적합 ● 이해하기 쉬운 전통적인 그래프 표현 ● 막대의 길이는 수치 또는 백분율을 표현 ● 신속한 프로토타이핑을 지원하는 간단한 설정 산점도 ● 두 변수 간의 관계 시연 ● 비선형 패턴 시연의 이점이 있다. ● 구축 및 시각화가 쉽고, 많은 산업에서 널리 인정받고 있다. ● 사용자가 정의하기 쉽다. 히트맵 ● 다양한 항목 비교에 유용 ● 쉽게 이해 가능한 색상 변화 ● 보는 사람을 특정 위치로 안내 가능.. 2023. 1. 21.
[Text]S8_08_Word_Cloud 데이터 출처: https://www.kaggle.com/datasets/PromptCloudHQ/amazon-echo-dot-2-reviews-dataset In [3]: import pandas as pd import string import nltk # natural language processing (자연어 처리) from nltk.corpus import stopwords import gensim # 자연어 처리 중 토큰화 from gensim.utils import simple_preprocess import matplotlib.pyplot as plt import seaborn as sns In [4]: echo_df = pd.read_csv('Echodot2_Reviews.csv', enc.. 2023. 1. 21.
[Text]S8_07_Text_visualization 데이터 출처: https://www.kaggle.com/datasets/PromptCloudHQ/amazon-echo-dot-2-reviews-dataset In [150]: import pandas as pd import string import nltk # natural language processing (자연어 처리) from nltk.corpus import stopwords import gensim # 자연어 처리 중 토큰화 from gensim.utils import simple_preprocess import matplotlib.pyplot as plt import seaborn as sns In [135]: echo_df = pd.read_csv('Echodot2_Reviews.csv',.. 2023. 1. 21.
[Text]S8_06_Text_tokenization 데이터 출처: https://www.kaggle.com/datasets/PromptCloudHQ/amazon-echo-dot-2-reviews-dataset In [21]: import pandas as pd import string import nltk # natural language processing (자연어 처리) from nltk.corpus import stopwords import gensim # 자연어 처리: 토큰 from gensim.utils import simple_preprocess import matplotlib.pyplot as plt In [22]: echo_df = pd.read_csv('Echodot2_Reviews.csv', encoding='utf-8') echo_df.. 2023. 1. 21.
[Text]S8_05_Text_cleaning(removing_stopwords) 데이터 출처: https://www.kaggle.com/datasets/PromptCloudHQ/amazon-echo-dot-2-reviews-dataset In [61]: import pandas as pd import nltk # natural language processing (자연어 처리) from nltk.corpus import stopwords import gensim # 자연어 처리 from gensim.utils import simple_preprocess In [62]: echo_df = pd.read_csv('Echodot2_Reviews.csv', encoding='utf-8') echo_df.head(2) Out[62]: Rating Review Date Configuration.. 2023. 1. 21.
[Text]S8_04_Text_cleaning(removing_punctuation) 데이터 출처: https://www.kaggle.com/datasets/PromptCloudHQ/amazon-echo-dot-2-reviews-dataset In [39]: import pandas as pd import matplotlib.pyplot as plt import seaborn as sns Out[39]: '!"#$%&\'()*+,-./:;?@[\\]^_`{|}~' In [27]: echo_df = pd.read_csv('Echodot2_Reviews.csv', encoding='utf-8') echo_df.head() Out[27]: Rating Review Date Configuration Text Review Text Review Color Title User Verified Revi.. 2023. 1. 21.
[Text]S8_03_Text_in_pandas_2 데이터 출처: https://www.kaggle.com/datasets/PromptCloudHQ/amazon-echo-dot-2-reviews-dataset In [26]: import pandas as pd import matplotlib.pyplot as plt import seaborn as sns In [27]: echo_df = pd.read_csv('Echodot2_Reviews.csv', encoding='utf-8') echo_df.head() Out[27]: Rating Review Date Configuration Text Review Text Review Color Title User Verified Review Useful Count Declaration Text Pageurl 0 .. 2023. 1. 21.
[Text]S8_02_Text_in_pandas_1 데이터 출처: https://www.kaggle.com/datasets/PromptCloudHQ/amazon-echo-dot-2-reviews-dataset In [3]: import pandas as pd import matplotlib.pyplot as plt import seaborn as sns In [4]: echo_df = pd.read_csv('Echodot2_Reviews.csv', encoding='utf-8') echo_df.head() Out[4]: Rating Review Date Configuration Text Review Text Review Color Title User Verified Review Useful Count Declaration Text Pageurl 0 3 1.. 2023. 1. 21.
[Text]S8_01_upper_lower 데이터 출처: https://www.kaggle.com/datasets/PromptCloudHQ/amazon-echo-dot-2-reviews-dataset In [1]: import pandas as pd import matplotlib.pyplot as plt import seaborn as sns In [35]: echo_df = pd.read_csv('Echodot2_Reviews.csv', encoding='utf-8') echo_df.head() Out[35]: Rating Review Date Configuration Text Review Text Review Color Title User Verified Review Useful Count Declaration Text Pageurl 0 3.. 2023. 1. 21.
[datetime]S7_05_Practical_example3 In [68]: import pandas as pd import datetime as dt In [69]: avo_df = pd.read_csv('Avocado.csv') avo_df Out[69]: Date AveragePrice Total Volume type region 0 2015-12-27 1.33 64236.62 conventional Albany 1 2015-12-20 1.35 54876.98 conventional Albany 2 2015-12-13 0.93 118220.22 conventional Albany 3 2015-12-06 1.08 78992.15 conventional Albany 4 2015-11-29 1.28 51039.60 conventional Albany ... ..... 2023. 1. 21.
[datetime]S7_04_Practical_example2 In [68]: import pandas as pd import datetime as dt In [69]: avo_df = pd.read_csv('Avocado.csv') avo_df Out[69]: Date AveragePrice Total Volume type region 0 2015-12-27 1.33 64236.62 conventional Albany 1 2015-12-20 1.35 54876.98 conventional Albany 2 2015-12-13 0.93 118220.22 conventional Albany 3 2015-12-06 1.08 78992.15 conventional Albany 4 2015-11-29 1.28 51039.60 conventional Albany ... ..... 2023. 1. 21.
[datetime]S7_03_Practical_example1 In [59]: import pandas as pd import datetime as dt In [60]: avo_df = pd.read_csv('Avocado.csv') avo_df Out[60]: Date AveragePrice Total Volume type region 0 2015-12-27 1.33 64236.62 conventional Albany 1 2015-12-20 1.35 54876.98 conventional Albany 2 2015-12-13 0.93 118220.22 conventional Albany 3 2015-12-06 1.08 78992.15 conventional Albany 4 2015-11-29 1.28 51039.60 conventional Albany ... ..... 2023. 1. 21.
[datetime]S7_02_Timestamp https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.html In [31]: import pandas as pd import datetime as dt In [33]: pd.Timestamp('2023, 3, 30') Out[33]: Timestamp('2023-03-30 00:00:00') In [34]: # Pandas Timestamp pd.Timestamp(dt.datetime(2022, 3, 31, 8, 0, 15,)) Out[34]: Timestamp('2022-03-31 08:00:15') In [35]: # Difference between two dates day_1 = pd.Timestamp('1990, 3, 31, 11') d.. 2023. 1. 21.
[datetime]S7_01_datetime In [11]: # date: define dates only without including time (month, day, year) # datetime : define times and dates together (month, day, year, hour, second, microsecond) import pandas as pd import datetime as dt In [12]: # A date date_ex = dt.date(2022,1,1) date_ex Out[12]: datetime.date(2022, 1, 1) In [13]: type(date_ex) Out[13]: datetime.date In [14]: # Convert it into string to view str(now) Ou.. 2023. 1. 21.
[seaborn]S6_02_pairplot,displot,heatmap(correlations) Seaborn https://seaborn.pydata.org/examples/index.html ●pandas: data manipulation using dataframes ●numpy: data statistical analysis ●matplotlib: data visualisation ●seaborn: Statistical data visualization https://www.kaggle.com/datasets/yasserh/breast-cancer-dataset?resource=download In [58]: # Seaborn offers enhanced features compared to matplotlib # import libraries import pandas as pd import.. 2023. 1. 21.
[seaborn]S6_01_scatter&count_plot Seaborn https://seaborn.pydata.org/examples/index.html ●pandas: data manipulation using dataframes ●numpy: data statistical analysis ●matplotlib: data visualisation ●seaborn: Statistical data visualization https://www.kaggle.com/datasets/yasserh/breast-cancer-dataset?resource=download In [58]: # Seaborn offers enhanced features compared to matplotlib # import libraries import pandas as pd import.. 2023. 1. 21.
[matplotlib]S5_appendix_Making_dataset(주식 일별 수익률 계산) 데이터 가공 참고 사이트¶ https://invest-in-yourself.tistory.com/278 In [55]: # ! pip install yfinance import yfinance as yf import pandas as pd import datetime In [56]: # Download price data from Yahoo finance start = '2013-04-29' end = '2021-07-06' p_Bitcoin = yf.download('BTC-USD', start = start, end=end ) p_Ethereum = yf.download('ETH-USD', start = start, end=end ) p_Litecoin = yf.download('LTC-USD', s.. 2023. 1. 21.
[matplotlib]S5_07_histogram In [39]: import pandas as pd import matplotlib.pyplot as plt import datetime In [45]: # 일별 수익률 데이터 return_df = pd.read_csv('crypto_daily_returns.csv'); return_df Out[45]: Date BTC ETH LTC 0 09/17/2014 0.000000 0.000000 0.000000 1 09/18/2014 -7.192558 NaN -7.379983 2 09/19/2014 -6.984265 NaN -7.629499 3 09/20/2014 3.573492 NaN -0.955003 4 09/21/2014 -2.465854 NaN -0.945300 ... ... ... ... ... 247.. 2023. 1. 21.
[matplotlib]S5_06_pie_chart In [1]: import matplotlib.pyplot as plt import pandas as pd import datetime In [2]: # data crypto_dict = {'allocation': [20,55,5,17,3]} explode = (0,0,0,0.2,0) In [3]: crypto_df = pd.DataFrame(data= crypto_dict, index=['BTC','ETH','LTC','XRP','ADA']) crypto_df Out[3]: allocation BTC 20 ETH 55 LTC 5 XRP 17 ADA 3 In [4]: # a pie chart crypto_df.plot.pie(y = 'allocation',explode = explode, figsize .. 2023. 1. 21.
[matplotlib]S5_05_Scatterplot In [ ]: import matplotlib.pyplot as plt import pandas as pd import datetime In [63]: # data return_df = pd.read_csv('coin_daily_returns.csv'); return_df Out[63]: Date BTC ETH LTC 0 2013-04-29 23:59 7.509441 NaN 0.392520 1 2013-04-30 23:59 -3.472222 NaN -2.430554 2 2013-05-01 23:59 -15.834534 NaN -11.388866 3 2013-05-02 23:59 -9.597868 NaN -10.794653 4 2013-05-03 23:59 -8.000000 NaN -10.191304 .... 2023. 1. 21.
728x90