π How to Find a GPU Hosting Service β a Guide by Viraaj Akuthota
To fine-tune models and create embeddings on large corpuses of qualitative data, a high amount of GPU RAM (VRAM) is required. For example, fine-tuning BERT on a dataset of 15k cases that vary in size creates roughly 100k-200k sequences at a 512 token limit. This requires approximately 140 GB of VRAM. This hardware requirement means such tasks cannot be conducted on most consumer-grade machines. I conducted an exercise to hopefully identify an affordable and relatively easy-to-use cloud compute option. During this search, I faced many difficulties. The benefits and disadvantages of the majority of service providers I reviewed can be found in the table below.
Overall, the production system I landed on is to utilize:
Β·    PaperSpace's Core using a Windows Server instance to avoid using the terminal as much as possible.
Β·    Always available Multi-GPU instances, for example, 4 x A6000 Nvidia GPUs with 192 GB VRAM total for roughly $7 USD an hour.
Β·    Approximately $3 USD per month for 50 GB persistent storage, making offline costs negligible.
Β·    For Linux users, they have a Python ML template which will save time installing python, packages, cuda, etc.
Before production, I utilise either Google Colab or HuggingFace:
Β·    For  testing fine-tuning or creating embeddings, I believe Google Colab's  free T4 instance provides the highest amount of VRAM for any free tier.
Β·    For  testing LLMs, HuggingFace's serverless inference free tier allows you  to utilize a variety of LLMs such as LLAMA 405B. However, the Pro tier  at $9 USD per month increases the rate limit on this inference. I  receive approximately 300 API calls per hour.
| Provider | Benefits | Disadvantages | GPU Limit | 
| Amazon EC2 | 
 | 
 | 
 | 
| Amazon Notebooks | 
 | 
 | 
 | 
| Microsoft Azure | 
 | 
 | |
| Google Cloud | 
 | ||
| Google Colab | 
 | 
 | 
 | 
| Paperspace Notebooks | 
 | 
 | |
| Paperspace Server/Console | 
 | 
 | 
 | 
