머신러닝 Scikit-Learn 사용법 요약

Scikit-learn은 머신러닝에 사용되는 지도/비지도 학습 알고리즘을 제공하는 파이썬 라이브러리다. 내부 구조를 살펴보면 NumPy, pandas, Matplotlib과 같이 이미 널리 쓰이는 기술을 기반으로 한다고 한다.

최근에 sklearn 관련 자료를 구글링하면서 치트 시트를 보았는데, 나도 안 쓰면 잊어버릴 수 있으니 한 군데에 기록해놓고자 블로그에 남겨놓는다.

일단 학습 알고리즘은 라이브러리 import 및 모델 생성하고, 피팅, 이후 예측하는 과정으로 정리되어 있다.

Linear Regression

Import and create the model

from sklearn.linear_model import LinearRegression

your_model = LinearRegression()

Fit

your_model.fit(x_training_data, y_training_data)

.coef_: contains the coefficients
.intercept_: contains the intercept

Predict

predictions = your_model.predict(your_x_data)

.score(): returns the coefficient of determination R²

Naive Bayes

Import and create the model

from sklearn.naive_bayes import MultinomialNB

your_model = MultinomialNB()

Fit

your_model.fit(x_training_data, y_training_data)

Predict

# Returns a list of predicted classes - one prediction for every data point

predictions = your_model.predict(your_x_data)

# For every data point, returns a list of probabilities of each class

probabilities = your_model.predict_proba(your_x_data)

K-Nearest Neighbors

Import and create the model

from sklearn.neigbors import KNeighborsClassifier

your_model = KNeighborsClassifier()

Fit

your_model.fit(x_training_data, y_training_data)

Predict

# Returns a list of predicted classes - one prediction for every data point

predictions = your_model.predict(your_x_data)

# For every data point, returns a list of probabilities of each class

probabilities = your_model.predict_proba(your_x_data)

K-Means

Import and create the model

from sklearn.cluster import KMeans

your_model = KMeans(n_clusters=4, init='random')

n_clusters: number of clusters to form and number of centroids to generate
init: method for initialization
- k-means++: K-Means++ [default]
- random: K-Means
random_state: the seed used by the random number generator [optional]

Fit

your_model.fit(x_training_data)

Predict

predictions = your_model.predict(your_x_data)

그리고 이건 내가 가진 데이터를 학습 세트와 시험 세트로 분리하는 방법이다. 모델을 구축하고 테스트까지 해보려면 데이터를 분리해놔야 하기 때문에. 답안지 전부를 주고 학습시키면 시험 볼 문제가 없어지니까.

Training Sets and Test Sets

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.8, test_size=0.2)

train_size: the proportion of the dataset to include in the train split
test_size: the proportion of the dataset to include in the test split
random_state: the seed used by the random number generator [optional]

더 자세한 내용은 아래 포스팅을 참고하자.

머신러닝에서 학습세트, 평가세트를 나누는 이유와 방법 http://hleecaster.com/ml-training-validation-test-set/

이번엔 학습시킨 모델이 얼마나 괜찮은 성능을 보여주는지 확인하는 방법이다.

Validating the Model

Import and print accuracy, recall, precision, and F1 score:

from sklearn.metrics import accuracy_score, recall_score, precision_score, f1_score

print(accuracy_score(true_labels, guesses))

print(recall_score(true_labels, guesses))

print(precision_score(true_labels, guesses))

print(f1_score(true_labels, guesses))

Import and print the confusion matrix

from sklearn.metrics import confusion_matrix

print(confusion_matrix(true_labels, guesses))

더 자세한 내용은 아래 포스팅을 참고하자.

머신러닝 분류 모델의 성능 평가 지표 Accuracy, Recall, Precision, F1 http://hleecaster.com/ml-accuracy-recall-precision-f1/

'머신러닝' 카테고리의 다른 글

머신러닝에서 학습세트, 평가세트를 나누는 이유와 방법 (0)	2021.08.06
랜덤 포레스트(Random Forest) 쉽게 이해하기 (0)	2021.08.06
머신러닝이란 (0)	2021.08.06
의사결정 나무(Decision Tree) 쉽게 이해하기 (0)	2021.08.06
머신러닝 개괄(오버피팅과 언더피팅, 모델링, 신호와 잡음) (0)	2021.08.06

H의 시행착오

머신러닝 Scikit-Learn 사용법 요약

Linear Regression

Naive Bayes

K-Nearest Neighbors

K-Means

Training Sets and Test Sets

Validating the Model

'머신러닝' 카테고리의 다른 글

댓글

티스토리툴바

머신러닝 Scikit-Learn 사용법 요약

Linear Regression

Naive Bayes

K-Nearest Neighbors

K-Means

Training Sets and Test Sets

Validating the Model

'머신러닝' 카테고리의 다른 글

관련글

댓글

티스토리툴바