Alexander Borzunov
|
a26559ff65
Fix `.generate(input_ids=...)` (#485)
|
1 year ago |
Alexander Borzunov
|
329f7d31e8
Add `blocked_servers` argument (#462)
|
1 year ago |
Alexander Borzunov
|
056f22515a
Prioritize short inference, unmerge pools for long inference (#458)
|
1 year ago |
Alexander Borzunov
|
8c546d988a
Test Llama, rebalancing, throughput eval, and all CLI scripts (#452)
|
1 year ago |
Alexander Borzunov
|
de930918a0
Support loading blocks in 4-bit (QLoRA NF4 format, disabled by default) (#333)
|
1 year ago |
Alexander Borzunov
|
cb3f018f9f
Add LLaMA support (#323)
|
1 year ago |
Alexander Borzunov
|
8f6342a861
Refactor RemoteSequenceManager (#309)
|
1 year ago |
Alexander Borzunov
|
21c3526ec1
Start SequenceManager's thread only after first .make_sequence() (#301)
|
1 year ago |
Max Ryabinin
|
793726b041
Speed up loading blocks using init with meta weights (#285)
|
1 year ago |
Alexander Borzunov
|
fee19e9b9b
Use get_logger(__name__) instead of get_logger(__file__) (#265)
|
1 year ago |
justheuristic
|
ae9e71fe8e
Add local tensor-parallel fwd/bwd (#143)
|
1 year ago |
Alexander Borzunov
|
668b736031
Fix logging: do not duplicate lines, enable colors in Colab (#156)
|
1 year ago |
justheuristic
|
a2066a4096
Optimize RemoteSequenceManager (#106)
|
1 year ago |
Alexander Borzunov
|
43ac6016ac
Fix dtypes in backend schemas (#99)
|
1 year ago |
Alexander Borzunov
|
7bd5916744
Make Petals a pip-installable package (attempt 2) (#102)
|
1 year ago |
Dmitry Baranchuk
|
6095f58681
Deep distributed prompt tuning (#42)
|
2 years ago |
justheuristic
|
f0c7383181
Implement RemoteSequential slicing and extra repr, add tests (#30)
|
2 years ago |