Abstract:
Recurrent Neural Networks (RNNs) have shown an incredible performance in supervised machine learning tasks such as Natural Language Processing (NLP). However, theoretical understanding of RNNs' performances in NLP are still limited due to intrinsically complex non-linear computations of RNNs. This thesis explores a class of RNNs called Recurrent Arithmetic Circuits (RACs), possessing a dual mathematical representation as a Matrix Product State (MPS) widely used in many-body quantum physics. The duality allows us to compute the entanglement entropy of an MPS, which can be used as a proxy for information propagation in the dual neural networks, to phenomenologically explain the RNNs-based model prediction accuracy's behaviors in NLP. We found that the entanglement entropy saturates when the accuracy saturates in the fixed word embedding case. The unfixed word embedding experiments also reveal that the entanglement entropy of the RACs is decaying as the word embedding becomes more meaningful, as reflected by the behaviors of cosine similarity between word embeddings. This thesis sheds light on more transparent and explainable behaviors of RNNs-based machine learning in NLP, using tools from many-body quantum physics.