Analyzing NYPD stop, question, and frisk with machine learning techniques

Passiri Bodhidatta

Analyzing NYPD stop, question, and frisk with machine learning techniques

Passiri Bodhidatta

URI: http://cuir.car.chula.ac.th/handle/123456789/80853

Date: 2021

Abstract:

Although stops from “Stop, Question, and Frisk” program have decreased dramatically after the New York Police Department (NYPD) reform in 2013, the unnecessary stops and weapon use against innocent citizens remain critical problems. This study analyzes the stops during 2014 – 2019, using three tree-based machine learning approaches: Decision Tree, Random Forest, and XGBoost. Models for predicting stops that resulted in a conviction and police’s level of force used are developed and driving factors are identified. Results show that XGBoost outperformed other models in both predictions. The performance of Guilty Prediction was at 65.9% F1 score and 84.0% accuracy. For Level of Force Prediction, the F1 score obtained for “Level 1” and “Level 2” were 40.7% and 35.0% respectively, with 80.4% overall accuracy. The findings indicated that the presence of a weapon implies a suspect's conviction. Despite that, numerous unnecessary stops are likely driven by inaccurate assumptions about suspect’s weapon possession, which lead to police’s gunfire usage against innocent citizens. Additionally, this study explores a hybrid technique called Super Learner. Experiments on various structures of Super Learners are performed. For base models, Super Learners can improve performance from their own base models when using untuned base models but do not improve when using tuned base models. The performance of base models also played a significant role in the performance of Super Learners, namely having high-performance base models improved meta models’ performance, and vice versa. For meta models, XGBoost and Logistic Regression outperform other meta models across both predictions.