HugsFace It serves as the home for many popular open source NLP models. Many of these models are effective as-is, but often require some type of training or tuning to improve performance for your specific use case. As the implosion of LLM continues, we will take a step back in this article to review some of the building blocks that HuggingFace provides that simplify training NLP models.
Traditionally, NLP models can be trained using basic PyTorch, TensorFlow/Keras, and other popular machine learning frameworks. While you can go this route, it requires a deeper understanding of the framework you are using, as well as more code to write the training loop. With HuggingFace coach classThere is an easier way to interact with the NLP Transformers models you want to use.
Trainer is a class specifically optimized for Transformers models and also provides tight integration with other Transformers libraries such as Data sets and Assess. Trainer at a more advanced level also supports distributed training libraries and can easily integrate with infrastructure platforms such as Amazon SageMaker.
In this example, we will look at using the Trainer class locally to tune the popular BERT model in the IMBD data set for a text classification use case (ai.stanford.edu/~amaas/data/sentiment/” rel=”noopener ugc nofollow” target=”_blank”>Large Movie Reviews Dataset ai.stanford.edu/~amaas/papers/wvSent_acl2011.bib” rel=”noopener ugc nofollow” target=”_blank”>Citation).
NOTE: This article assumes basic knowledge of Python and mastery of NLP. We won't go into any machine learning-specific theories about model construction or selection; This article is dedicated to understanding how we can fine-tune the existing pre-trained models available in HuggingFace Model Hub.
- Setting
- fine-tuning BERT
- Additional Resources and Conclusion
For this example, we will work on SageMaker Studio and use a conda_python3 kernel on a ml.g4dn.12xlarge instance. Note that you can use a smaller instance type, but this could affect the training speed depending on the number of CPUs/workers that are available.