Supervisor: Prof. Vipul Arora, IIT Kanpur.
In this term report we presented our model for the speaker diarization problem and explained how one can leverage Transfer Learning to quickly learn a model at the expense of negligible performance loss as compared to a fully trained one. Given the input utterances and their speaker identity labels, we extracted embeddings from short audio segments and used these embeddings to segregate the speaker segments within the input source. Building upon this model, we focused on transfer learning and manually adapting over various datasets so as to make our model more generic. We also focused on improving the DER along with experimenting with different embedding generation networks.