3. Key Resampling Techniques in Machine Learning

Trained Model to be accurate only... not enough.

The equally important thing is

👉, have we measured the Model's brilliance the right way?

Many times, the model looks accurate.

Maybe just "coincidence with that set of data."

But when you find new information, it's easy to break.

This Visual Note is summarized.

3. Key Resampling Techniques in Machine Learning

That Data Scientist actually used to evaluate the Model.

From LOOCV, Bootstrap to K-Fold Cross Validation

Each way allows us

✔️ Use more data (especially when data is limited)

✔️ evaluate the model more fairly, not deceive yourself.

✔️ Reduce the risk of long-term overfitting.

Very suitable for

• People studying Machine Learning

• Data workers who want to tighten the basis for validation

• Or anyone who has been "confused" about which method to choose.

📚 this Visual Note summarized from

Class Data Science Bootcamp Model 12

By the DataRockie page.

Who's studying ML or working on Data Line?

Save it for review. 📌💙

# NichasVisualNote # VisualNote

# MachineLearning # Data Science # Resampling

1/18 Edited to

จากประสบการณ์ส่วนตัวที่ได้ลองใช้ทั้ง 3 เทคนิค Resampling ได้แก่ Leave One Out CV (LOOCV), Bootstrap และ K-Fold Cross Validation พบว่าแต่ละวิธีมีข้อดีและข้อจำกัดที่แตกต่างกันที่ควรเลือกให้เหมาะกับงานและขนาดข้อมูล LOOCV แม้จะช่วยให้ใช้ข้อมูลทั้งหมดเทรนได้อย่างเต็มที่โดยใช้ข้อมูล n-1 สำหรับการเทรนและ 1 สำหรับทดสอบ แต่ใช้เวลานานมากถ้าข้อมูลมีขนาดใหญ่ เช่น n=10000 จะทำให้เทรนโมเดลช้ามาก เพราะต้องเทรนโมเดลซ้ำถึง 10000 ครั้ง ส่วน Bootstrap ซึ่งทำการสุ่มข้อมูลซ้ำโดยใช้ Sampling with Replacement ได้โมเดลที่หลากหลายและเหมาะกับการประมาณค่าข้อผิดพลาดของโมเดล ข้อดีคือสามารถใช้ข้อมูลไม่มากและช่วยลดความเอนเอียง แต่บางครั้งอาจได้โมเดลที่หลากหลายเกินไปทำให้ความแม่นยำผันผวน K-Fold Cross Validation เป็นวิธีที่ผมชอบใช้สุด เพราะแบ่งข้อมูลเป็น K ส่วนแล้วสลับกันใช้เป็นชุดทดสอบและชุดฝึก ซึ่งทำให้ประมาณค่าความแม่นยำของโมเดลได้ดีและไม่ต้องใช้เวลานานเท่า LOOCV อีกทั้งยังช่วยลดความเสี่ยง overfitting ได้ดีมาก โดยทั่วไปเลือก K=5 หรือ 10 เลือกใช้ Resampling ที่เหมาะสมขึ้นอยู่กับทรัพยากรเวลาที่มีและขนาดข้อมูล ถ้าข้อมูลไม่ใหญ่มากและต้องการความแม่นยำสูง LOOCV จะเหมาะมาก แต่ถ้าข้อมูลเยอะและต้องการความรวดเร็ว K-Fold Cross Validation น่าจะเหมาะสมที่สุด สุดท้ายนี้ การเข้าใจแต่ละเทคนิคและทดลองกับข้อมูลจริงๆ จะช่วยให้รู้ว่าเมื่อไรควรเลือกใช้วิธีไหนเพื่อให้ได้โมเดลที่ดีที่สุดครับ

3. Key Resampling Techniques in Machine Learning

Related posts