Bart from Scratch
GitHubImplementation of the BART (Bidirectional and Auto-Regressive Transformers) model from scratch.
In this project, I wanted to code a complete implementation of the BART (Bidirectional and Auto-Regressive Transformers) model from scratch using PyTorch. I did not use any pre-trained model or external libraries, except for the tokenizer from the HuggingFace Transformers library.
I used the CNN/Daily Mail dataset to train the model, with the objective of learning how to summarize an article. I coded my own sampling strategies, including both greedy and beam search methods. The entire model architecture is configurable using a JSON file and can be trained both on Google Colab and locally. I provide both a single Jupyter notebook and a complete structured implementation of each major component that makes up a BART model. The BART paper and the "Attention Is All You Need" paper were the two primary references I used for this project.