Scikit-learn 라이브러리를 이용하여 로지스틱 회귀 분석을 수행하고, confusion matrix를 시각화

Notice

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

Basin of Attraction

Scikit-learn 라이브러리를 이용하여 로지스틱 회귀 분석을 수행하고, confusion matrix를 시각화 본문

코딩

Scikit-learn 라이브러리를 이용하여 로지스틱 회귀 분석을 수행하고, confusion matrix를 시각화

박정현PRO 2023. 3. 2. 22:28

본 예제는 Python 언어와 Scikit-learn 라이브러리를 이용하여 로지스틱 회귀 분석을 수행하고, confusion matrix를 시각화하는 방법을 다루고 있습니다.

1. 필요한 라이브러리 설치 및 불러오기

!pip install scikit-learn matplotlib

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import plot_confusion_matrix
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

2. 데이터 생성 및 분리

make_classification 함수를 이용하여 가상의 데이터를 생성하고, train/test 데이터로 분리합니다.

X, y = make_classification(n_samples=1000, n_features=4, n_classes=2, random_state=123)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)

위 코드에서 make_classification() 함수를 이용하여 n_samples=1000, n_features=4, n_classes=2인 가상의 데이터를 생성합니다. 이 데이터를 train/test 데이터로 분리하기 위해 train_test_split() 함수를 사용합니다. 이 함수를 이용하여 데이터셋을 8:2 비율로 분리하였습니다.

3. 로지스틱 회귀 분석 수행

Scikit-learn에서 제공하는 로지스틱 회귀 모델을 이용하여 train 데이터에 대한 학습을 수행합니다.

clf = LogisticRegression()
clf.fit(X_train, y_train)

위 코드에서 LogisticRegression() 함수를 이용하여 로지스틱 회귀 모델을 초기화하고, fit() 함수를 이용하여 train 데이터에 대한 학습을 수행합니다.

4. confusion matrix 시각화

plot_confusion_matrix() 함수를 이용하여 confusion matrix를 시각화합니다.

confmat = plot_confusion_matrix(clf, X_test, y_test, cmap="Blues")
plt.show()

위 코드에서 plot_confusion_matrix() 함수를 이용하여 로지스틱 회귀 모델 clf, test 데이터셋 X_test, y_test를 이용하여 confusion matrix를 생성하고, cmap 인자를 이용하여 색상을 지정합니다. 마지막으로 plt.show() 함수를 이용하여 confusion matrix를 출력합니다.

5. 전체 코드

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import plot_confusion_matrix
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

X, y = make_classification(n_samples=1000, n_features=4, n_classes=2, random_state=123)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)

clf = LogisticRegression()
clf.fit(X_train, y_train)

confmat = plot_confusion_matrix(clf, X_test, y_test, cmap="Blues")
plt.show()