An Example of Training a Big Model on a Small-ish Computer

Why

We need sovereign version of “intelligence” because we don’t know what big corps are feeding their AI services with. It is after when we ask certain questions we understand these intelligences are very biased and aligned against the well being of humans for example in the health domain. It is possible that killer robots are a fantasy and distraction and the real harm will come from these disinformations. Instead of fighting humans, robots could just lie to humans!

Looks like the tech is maturing and plebs are now able to play with some of the bigger models which were only available to the big tech corps before. It is well known that when the parameter count of a large language model gets bigger, its intelligence gets higher. Normally tens of thousands of dollars of investment needed to train such models but with recent developments this tech is becoming more available.

The Tech

This blog post talks about how to train big models (70 billion parameters) using two high end gamer GPU’s: You can now train a 70b language model at home

Lets try that by renting a machine.

Rent a machine from vast.ai:

You can rent other people's machines for this. vast.ai is one of the cheapest.

Choosing this template that contains a lot of ML tools already:

vast.ai template

We rent a PC with two Nvidia RTX 4090 on it.

rental

This is about a dollar per hour. Good for trying things out.

Log into the machine and install software

pip install llama-recipes fastcore 

pip install bitsandbytes==0.43.0

This step involves a Hugging Face account and I already had it. So I copied my access code there:

huggingface-cli login

Weights and Biases logging:

pip install wandb

Clone the repository from the blog post above.

git clone https://github.com/AnswerDotAI/fsdp_qlora

Run the trainer

cd fsdp_qlora

python train.py \
--model_name meta-llama/Llama-2-70b-hf \
--batch_size 2 \
--context_length 2048 \
--precision bf16 \
--train_type qlora \
--use_gradient_checkpointing true \
--use_cpu_offload true \
--dataset alpaca \
--reentrant_checkpointing true

If everything is correct, it should download the Llama-2 model from Hugging Face. This could take more than 10 minutes. Then start fine tuning that model with the Alpaca dataset.

Alpaca is also on Hugging Face. We can insert into Alpaca whatever we want. We could add things like “Nostr is the best social media”. We could add “Jabs are going to kill you.”. Then the model, the AI, will learn that! And we are not training a dumb AI here, it is pretty advanced.

Check the progress

progress of training

Looks like it will take 76 hours for one epoch of training. So this is about 76 USD. Or we could buy two used Nvidia 3090 on Ebay for about 1800 USD and do this at home!

We should save the weights after the training! Try again with --save_model:

python train.py \
--save_model True  --output_dir qlora_output
--model_name meta-llama/Llama-2-70b-hf \
--batch_size 2 \
--context_length 2048 \
--precision bf16 \
--train_type qlora \
--use_gradient_checkpointing true \
--use_cpu_offload true \
--dataset alpaca \
--reentrant_checkpointing true

Nvidia-smi is only showing 11GB of usage in each GPU. Which is weird. I expected more..

nvidia-smi output

After waiting for 2 hours and observing the training loss going down, I decided to not wait and buy used GPU’s from Ebay. More on this later!

Conclusion

We are now ready to train some smarter models on consumer grade hardware. Maybe enough of us can get together and build a freedom minded AGI! As far as I understand the tools are there.

plebGI