码涯-AIGC代码仓库-openoker/petals: petals 是一个运行极大型（100B以上）语言模型的去中心化平台，通过与互联网上的人联合计算资源，运行推理或微调大型语言模型，如 BLOOM-176B。不需要拥有高端 GPU。 @ readme-release

petals 是一个运行极大型（100B以上）语言模型的去中心化平台，通过与互联网上的人联合计算资源，运行推理或微调大型语言模型，如 BLOOM-176B。不需要拥有高端 GPU。

Alexander Borzunov a16853ccfc Add badges		2 years ago
.github	d271b75dd4 Let users specify sequence length instead of assuming 2048 (#52)	2 years ago
cli	d271b75dd4 Let users specify sequence length instead of assuming 2048 (#52)	2 years ago
src	0be21775af remove transformer block, implement as sequential of size 1 (#54)	2 years ago
tests	0be21775af remove transformer block, implement as sequential of size 1 (#54)	2 years ago
.gitignore	99059ae667 install script	2 years ago
README.md	a16853ccfc Add badges	2 years ago
pyproject.toml	e2711a033b Add automated tests (#23)	2 years ago
requirements-dev.txt	e2711a033b Add automated tests (#23)	2 years ago
requirements.txt	d271b75dd4 Let users specify sequence length instead of assuming 2048 (#52)	2 years ago

Decentralized platform for running 100B+ language models

Key features

Run inference or fine-tune BLOOM-176B by joining compute resources with people all over the Internet. No need to have high-end GPUs.
One inference step takes ≈ 1 sec — much faster than possible with offloading. Enough for chatbots and other interactive apps.
Employ any fine-tuning and sampling methods by accessing model's hidden states and changing its control flow — something you can't do in proprietary APIs.

[Read paper] | [View website]

How it works?

🚧 This project is in active development

Be careful: some features may not work, interfaces may change, and we have no detailed docs yet (see roadmap).

A stable version of the code and a public swarm open to everyone will be released in November 2022. You can subscribe to be emailed when it happens or fill in this form to help the public launch by donating GPU time. In the meantime, you can launch and use your own private swarm.

Code examples

Solving a sequence classification task via soft prompt tuning of BLOOM-176B:

# Initialize distributed BLOOM with soft prompts
model = AutoModelForPromptTuning.from_pretrained(
       "bigscience/distributed-bloom")
# Define optimizer for prompts and linear head
optimizer = torch.optim.AdamW(model.parameters())

for input_ids, labels in data_loader:
    # Forward pass with local and remote layers
    outputs = model.forward(input_ids)
    loss = cross_entropy(outputs.logits, labels)

    # Distributed backward w.r.t. local params
    loss.backward() # Compute model.prompts.grad
    optimizer.step() # Update local params only
    optimizer.zero_grad()

Installation

conda install -y -c conda-forge cudatoolkit-dev==11.3.1 cudatoolkit==11.3.1 cudnn==8.2.1.32
pip install torch==1.12.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
pip install -i https://test.pypi.org/simple/ bitsandbytes-cuda113

Basic functionality

All tests is run on localhost

First, run one or more servers like this:

# minimalistic server with non-trained bloom blocks
python -m cli.run_server --converted_model_name_or_path bigscience/test-bloomd-6b3 \
  --block_indices 3:5 --torch_dtype float32 --identity_path ./server1.id --host_maddrs /ip4/127.0.0.1/tcp/31337
# when running multiple servers:
# - give each server a unique --identity_path (or remote --identity_path arg when debugging)
# - if running multiple servers on the same machine, give each a unique port (last integer in --host_maddrs, 0 means random port)
# - when running over the internet, change --host_maddrs according to https://learning-at-home.readthedocs.io/en/latest/user/dht.html#running-across-the-internet
# - each server except first should have --initial_peers pointing to one of pre-existing servers

Then open a python notebook or console and run:

import torch
import hivemind
from src import DistributedBloomConfig, get_remote_module


dht = hivemind.DHT(
    initial_peers=[TODO_COPY_FULL_ADDRESS_FROM_ANY_OF_THE_SERVERS],  # e.g. /ip4/127.0.0.1/...
    client_mode=True, start=True,
)
config = DistributedBloomConfig.from_pretrained("bigscience/test-bloom-6b3")
layer3, layer4 = get_remote_module(dht, ['bigscience/test-bloomd-6b3.3', 'bigscience/test-bloomd-6b3.4'], config)
assert layer3 is not None and layer4 is not None, "one or both layers were not found in DHT"
# test forward/backward, two blocks
outputs = layer4(layer3(torch.randn(1, 64, 4096)))
loss = (outputs * torch.randn_like(outputs)).norm()
loss.backward()

# test inference, one block
with layer3.inference_session(max_length=10) as sess:
    for i in range(10):
        res = sess.step(torch.ones(1, 1, 4096))

Convert regular BLOOM into distributed


# convert model from HF hub to a distributed format (can take hours depending on your connection!)
MY_WRITE_TOKEN=TODO_WRITE_TOKEN_FROM_https://huggingface.co/settings/token
python -m cli.convert_model --model bigscience/bloom-6b3  \
  --output_path ./converted_model --output_repo bigscience/test-bloomd-6b3 \
  --use_auth_token $MY_WRITE_TOKEN  # ^-- todo replace output repo with something you have access to

Test local vs remote block (allclose)

To test distributed inference, run one or more servers, then open a new shell and run pytest with environment variables:

# shell A: serve model
python -m cli.run_server --converted_model_name_or_path bigscience/test-bloomd-6b3 \
  --torch_dtype float32 --identity_path ./server1.id --host_maddrs /ip4/127.0.0.1/tcp/31337

# shell B:
export PYTHONPATH=.
export INITIAL_PEERS="/ip4/TODO_COPY_INITIAL_PEERS_FROM_SERVER_OUTPUT"
export MODEL_NAME="bigscience/test-bloomd-6b3"

# test individual random blocks for exact match
pytest tests/test_block_exact_match.py

# test the full model
pytest tests/test_full_model.py

This project is a part of the BigScience research workshop.

README.md