Analyzing NYPD stop, question, and frisk with machine learning techniques

Passiri Bodhidatta

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/80853

Title:	Analyzing NYPD stop, question, and frisk with machine learning techniques
Other Titles:	การวิเคราะห์ปฏิบัติการเรียกสกัดจับสอบถามและค้นตัวของกรมตำรวจนิวยอร์ค ด้วยเทคนิคการเรียนรู้ของเครื่อง
Authors:	Passiri Bodhidatta
Advisors:	Daricha Sutivong
Other author:	Chulalongkorn University. Faculty of Engineering
Issue Date:	2021
Publisher:	Chulalongkorn University
Abstract:	Although stops from “Stop, Question, and Frisk” program have decreased dramatically after the New York Police Department (NYPD) reform in 2013, the unnecessary stops and weapon use against innocent citizens remain critical problems. This study analyzes the stops during 2014 – 2019, using three tree-based machine learning approaches: Decision Tree, Random Forest, and XGBoost. Models for predicting stops that resulted in a conviction and police’s level of force used are developed and driving factors are identified. Results show that XGBoost outperformed other models in both predictions. The performance of Guilty Prediction was at 65.9% F1 score and 84.0% accuracy. For Level of Force Prediction, the F1 score obtained for “Level 1” and “Level 2” were 40.7% and 35.0% respectively, with 80.4% overall accuracy. The findings indicated that the presence of a weapon implies a suspect's conviction. Despite that, numerous unnecessary stops are likely driven by inaccurate assumptions about suspect’s weapon possession, which lead to police’s gunfire usage against innocent citizens. Additionally, this study explores a hybrid technique called Super Learner. Experiments on various structures of Super Learners are performed. For base models, Super Learners can improve performance from their own base models when using untuned base models but do not improve when using tuned base models. The performance of base models also played a significant role in the performance of Super Learners, namely having high-performance base models improved meta models’ performance, and vice versa. For meta models, XGBoost and Logistic Regression outperform other meta models across both predictions.
Other Abstract:	ถึงแม้ว่าการสกัดจับ ในปฏิบัติการเรียกสกัดจับ สอบถาม และค้นตัว ได้ลดลงอย่างมาก หลังจากการปฏิรูปกรมตำรวจนิวยอร์ค ในปี 2013 แต่การสกัดจับที่ไม่จำเป็น และการใช้อาวุธกับประชาชนผู้บริสุทธิ์ ยังคงเป็นปัญหาสำคัญ งานศึกษานี้ ได้วิเคราะห์การสกัดจับระหว่างปี 2014-2019 โดยใช้การเรียนรู้ด้วยเครื่องแบบต้นไม้ 3 ประเภท ได้แก่ Decision Tree, Random Forest และ XGBoost เพื่อสร้างแบบจำลองเพื่อทำนายการสกัดจับว่าจะมีการกระทำผิดหรือไม่ และเพื่อทำนายระดับการใช้กำลังของตำรวจ รวมทั้งระบุปัจจัยที่ส่งผล ผลการศึกษา แสดงให้เห็นว่า XGBoost ให้ผลลัพธ์ดีกว่าแบบจำลองอื่นในการทำนายทั้งสองปัญหา ในการทำนายความผิด ได้คะแนน F1 ที่ 65.9% และความแม่นยำ 84.0% ส่วนในการทำนายระดับการใช้กำลังของตำรวจ ได้คะแนน F1 ของระดับ 1 และระดับ 2 เป็น 40.7% และ 35.0% ตามลำดับ ด้วยความแม่นยำโดยรวม 80.4% โดยผลลัพธ์ชี้ให้เห็นว่าการมีอาวุธสื่อถึงการที่ผู้ต้องสงสัยได้กระทำผิด ถึงกระนั้น ตำรวจอาจมีการสันนิษฐานที่ไม่แม่นยำเกี่ยวกับการครอบครองอาวุธของผู้ต้องสงสัย ซึ่งอาจนำไปสู่การสกัดจับ และการใช้ปืนกับประชาชนผู้บริสุทธิ์ได้ นอกจากนี้ งานศึกษานี้ยังได้ศึกษาเทคนิคการผสมผสานที่ชื่อว่า Super Learner โดยได้ทดลองสร้างโครงสร้างหลากหลายแบบ พบว่า Super Learner ให้ผลลัพธ์ที่พัฒนาขึ้นจากแบบจำลองพื้นฐานของมันเองเมื่อใช้แบบจำลองพื้นฐานที่ไม่ได้ปรับตั้งค่า แต่จะไม่พัฒนาขึ้นมากนักหากใช้แบบจำลองพื้นฐานที่ผ่านการปรับตั้งค่ามาแล้ว ความสามารถการทำนายของแบบจำลองพื้นฐานก็เป็นสิ่งหลักที่ส่งผลต่อความสามารถในการทำนายของ Super Learner เช่นกัน นั่นคือหากใช้แบบจำลองพื้นฐานที่มีความสามารถที่ดี ก็จะช่วยพัฒนาความสามารถของ meta model ได้ และในทางกลับกันก็เช่นกัน สุดท้ายพบว่า meta model ซึ่งใช้ XGBoost และ Logistic Regression ให้ผลลัพธ์ดีกว่าแบบจำลองอื่นในการทำนายทั้ง 2 ปัญหา
Description:	Thesis (M.Eng.)--Chulalongkorn University, 2021
Degree Name:	Master of Engineering
Degree Level:	Master's Degree
Degree Discipline:	Industrial Engineering
URI:	http://cuir.car.chula.ac.th/handle/123456789/80853
URI:	http://doi.org/10.58837/CHULA.THE.2021.200
metadata.dc.identifier.DOI:	10.58837/CHULA.THE.2021.200
Type:	Thesis
Appears in Collections:	Eng - Theses

Files in This Item:

File	Description	Size	Format
6470247821.pdf		3.26 MB	Adobe PDF	View/Open

Show full item record