본문 바로가기
테크 큐레이터

Optimizing k-NN parameter k with cross-validation in R

by 동글네모 2020. 11. 2.
728x90
반응형
300x250

kNN CLASSIFIERS

______ : Use your own data features / variables 

library(class)

str(clean_nhanes)

# distance metric only works with quantitative variables 

train_q <- train %>%

  select(Age, AgeMonths, HHIncomeMid, Poverty, HomeRooms, Weight, Length, HeadCirc, Height,

         BMI, Pulse, BPSysAve, BPDiaAve, BPSys1, BPDia1, BPSys2, BPDia2, BPSys3, BPDia3,

         Testosterone, DirectChol, TotChol, UrineVol1, UrineFlow1, UrineVol2, UrineFlow2,

         DiabetesAge, DaysPhysHlthBad, DaysMentHlthBad, nPregnancies, nBabies, Age1stBaby,

         SleepHrsNight, PhysActiveDays, TVHrsDayChild, CompHrsDayChild, AlcoholDay, AlcoholYear,

         SmokeAge, AgeFirstMarij, AgeRegMarij, SexAge, SexNumPartYear)

test_q <- test %>%

  select(Age, AgeMonths, HHIncomeMid, Poverty, HomeRooms, Weight, Length, HeadCirc, Height,

         BMI, Pulse, BPSysAve, BPDiaAve, BPSys1, BPDia1, BPSys2, BPDia2, BPSys3, BPDia3,

         Testosterone, DirectChol, TotChol, UrineVol1, UrineFlow1, UrineVol2, UrineFlow2,

         DiabetesAge, DaysPhysHlthBad, DaysMentHlthBad, nPregnancies, nBabies, Age1stBaby,

         SleepHrsNight, PhysActiveDays, TVHrsDayChild, CompHrsDayChild, AlcoholDay, AlcoholYear,

         SmokeAge, AgeFirstMarij, AgeRegMarij, SexAge, SexNumPartYear)

 

# knn for test set (k=10)

SleepTrouble_knn10 <- knn(train_q, test = test_q, cl = train$SleepTrouble, k = 10)

SleepTrouble_knn <- knn(train_q, test = test_q, cl = train$SleepTrouble, k = 5) # 좋아짐.

 

# performance of knn for test set

confusionMatrix(table(test$SleepTrouble, SleepTrouble_knn10))

confusionMatrix(table(train$SleepTrouble, SleepTrouble_knn10))

 

###Optimizing the value of the parameter k with cross-validation in kNN

knn_error_rate <-function(x, y, numNeighbors, z=x) {

  y_hat <- knn(train=x, test=z, cl=y, k=numNeighbors)

  return(sum(y_hat !=y) / nrow(x))

}

ks<-c(1:10, 15, 20, 25, 30)

train_rates <- sapply(ks, FUN=knn_error_rate, x=train_q, y=train$SleepTrouble)

knn_error_rates <- data.frame(k=ks, train_rate=train_rates)

ggplot(data=knn_error_rates, aes(x=k, y=train_rate)) +

  geom_point() + geom_line() + ylab("Misclassification Rate")

 

▼ R 프로그래밍 학습용 추천 도서

손에 잡히는 R 프로그래밍
국내도서
저자 : 가렛 그롤먼드 / 이준용역
출판 : 한빛미디어 2015.02.01
상세보기

▼ Python 파이썬 프로그래밍 학습용 추천 도서

파이썬 라이브러리를 활용한 머신러닝
국내도서
저자 : 안드레아스 뮐러(Andreas Mu?ller),세라 가이도(Sarah Guido) / 박해선역
출판 : 한빛미디어 2019.03.29
상세보기
728x90
반응형

댓글