📖 How to Find a GPU Hosting Service – a Guide by Viraaj Akuthota

To fine-tune models and create embeddings on large corpuses of qualitative data, a high amount of GPU RAM (VRAM) is required. For example, fine-tuning BERT on a dataset of 15k cases that vary in size creates roughly 100k-200k sequences at a 512 token limit. This requires approximately 140 GB of VRAM. This hardware requirement means such tasks cannot be conducted on most consumer-grade machines. I conducted an exercise to hopefully identify an affordable and relatively easy-to-use cloud compute option. During this search, I faced many difficulties. The benefits and disadvantages of the majority of service providers I reviewed can be found in the table below.

Overall, the production system I landed on is to utilize:
· PaperSpace's Core using a Windows Server instance to avoid using the terminal as much as possible.
· Always available Multi-GPU instances, for example, 4 x A6000 Nvidia GPUs with 192 GB VRAM total for roughly $7 USD an hour.
· Approximately $3 USD per month for 50 GB persistent storage, making offline costs negligible.
· For Linux users, they have a Python ML template which will save time installing python, packages, cuda, etc.

Before production, I utilise either Google Colab or HuggingFace:
· For testing fine-tuning or creating embeddings, I believe Google Colab's free T4 instance provides the highest amount of VRAM for any free tier.
· For testing LLMs, HuggingFace's serverless inference free tier allows you to utilize a variety of LLMs such as LLAMA 405B. However, the Pro tier at $9 USD per month increases the rate limit on this inference. I receive approximately 300 API calls per hour.

Provider	Benefits	Disadvantages	GPU Limit
Amazon EC2	Relatively affordable compared to other cloud providers	Requires familiarity with AWS Application for quotas is not straightforward and the approval process takes time	Essentially unlimited
Amazon Notebooks	Easy to set up an ML system Relatively affordable compared to other cloud providers	Notebooks are limited to certain GPU sizes, essentially under 100GB VRAM. Even if you have a quota for the underlying resource it will not work for a notebook	Under 100GB VRAM
Microsoft Azure		The registration system and console is sufficiently complicated that I did not utilise this service. Quota application process did not seem straight forward.	Essentially unlimited
Google Cloud		Unable to secure access to a high-end GPU as they were ALWAYS unavailable
Google Colab	Very easy to use and set up	Relatively more expensive Not guaranteed access to the most powerful GPUs that is claimed to be accessible even with premium services	A100 GPU with 40GB VRAM, if available, which is rare
Paperspace Notebooks	Very easy to use and set up Multiple ‘free’ GPU availability with unlimited hours at the premium option		P5000 15GB VRAM for the free GPU with the premium plan; Single P6000 16GB at the premium plan – $10 per month; A600 x 4 – 180GB at $7.56 an hour in addition to a monthly Growth plan, which is $40 per month
Paperspace Server/Console	Always available multi-GPU instance ML template server instances Easy server setup	More expensive than the big players Some of the ML template server instances come with certain issues with libraries	Essentially unlimited