Learn how to deploy a real machine learning application using AWS and FastAPI
Introduction
I've always thought that even the best project in the world doesn't have much value if people can't use it. That is why it is very important to learn how to implement Machine Learning models. In this article we focus on deploying a small and large language model, Tiny-Llama, on an AWS instance called EC2.
List of tools I have used for this project:
- deep note– It is a cloud-based notebook that is ideal for collaborative data science projects and good for prototyping.
- Fast API– A web framework for creating APIs with Python
- AWS EC2– is a web service that provides considerable computing power in the cloud
- nginx– It is an HTTP server and reverse proxy. I use it to connect FastAPI server to AWS
- GitHub: GitHub is a hosting service for software projects.
- HugsFace– is a platform for hosting and collaborating on unlimited models, datasets, and applications.
About Little Flame
TinyLlama-1.1B is a project that aims to pre-train a 1.1 billion Llama with 3 billion tokens. It uses the same architecture as ai.meta.com/llama/” rel=”noopener ugc nofollow” target=”_blank”>call2 .
Today's large language models have impressive capabilities but are extremely expensive in terms of hardware. In many areas we have limited hardware: think smartphones or satellites. So there is a lot of research into creating smaller models so they can be deployed at the edge.
Here is a list of “small” models that are becoming fashionable:
- Mobile VLM (Multimodal)
- fi-2
- Obsidian (Multimodal)