📖 How to Find a GPU Hosting Service – a Guide by Viraaj Akuthota

For his project "Human Rights Predictor" (Round 15) our grantee Viraaj Akuthota was looking for a GPU hosting service. Here he explains how he went about it:

To fine-tune models and create embeddings on large corpuses of qualitative data, a high amount of GPU RAM (VRAM) is required. For example, fine-tuning BERT on a dataset of 15k cases that vary in size creates roughly 100k-200k sequences at a 512 token limit. This requires approximately 140 GB of VRAM. This hardware requirement means such tasks cannot be conducted on most consumer-grade machines. I conducted an exercise to hopefully identify an affordable and relatively easy-to-use cloud compute option. During this search, I faced many difficulties. The benefits and disadvantages of the majority of service providers I reviewed can be found in the table below.

Overall, the production system I landed on is to utilize:·    PaperSpace's Core using a Windows Server instance to avoid using the terminal as much as possible.·    Always available Multi-GPU instances, for example, 4 x A6000 Nvidia GPUs with 192 GB VRAM total for roughly $7 USD an hour.·    Approximately $3 USD per month for 50 GB persistent storage, making offline costs negligible.·    For Linux users, they have a Python ML template which will save time installing python, packages, cuda, etc.

Before production, I utilise either Google Colab or HuggingFace:·    For  testing fine-tuning or creating embeddings, I believe Google Colab's  free T4 instance provides the highest amount of VRAM for any free tier.·    For  testing LLMs, HuggingFace's serverless inference free tier allows you  to utilize a variety of LLMs such as LLAMA 405B. However, the Pro tier  at $9 USD per month increases the rate limit on this inference. I  receive approximately 300 API calls per hour.

Provider

Benefits

Disadvantages

GPU Limit

Amazon EC2

Relatively affordable compared to other cloud providers

Requires familiarity with AWS

Application for quotas is not straightforward and the approval process takes time

Essentially unlimited

Amazon Notebooks

Easy to set up an ML system

Relatively affordable compared to other cloud providers

Notebooks are limited to certain GPU sizes, essentially under 100GB VRAM.

Even if you have a quota for the underlying resource it will not work for a notebooks

Under 100GB VRAM

Microsoft Azure

The registration system and console is sufficiently complicated       that I did not utilise this service.

Quota application process did not       seem straight forward.

Essentially unlimited

Google Cloud

Unable to secure access to a high-end GPU as they were ALWAYS unavailable

Google Colab

Very easy to use and set up

Relatively more expensive

Not guaranteed access to the most powerful GPUs that is claimed to be       accessible even with premium services

A100 GPU with 40GB VRAM, if available, which is rare

Paperspace Notebooks

Very easy to use and set up

Multiple ‘free’ GPU availability with unlimited hours at the premium option

PaperSpace has plans which provide various systems at 6 hours of continuous use at a mix of free or paid options. The free options still require a base payment plan to be purchased

For the premium plan, a single P5000 15gb VRAM machine is available for free.

A 'core' machine can also be purchased where you can pay per hour without having to pay for a monthly plan. I currently have 4 x A6000 48gb VRAM for $7.56 an hour.

Paperspace Server/Console

Always available multi-GPU instance

ML template server instances

Easy server setup

More expensive than the big players

Some of the ML template server instances come with certain issues with libraries

Essentially unlimited