Is your feature request related to a problem? Please describe.
I'm currently working on a project that involves llama-cpp-python's high-level API, and i wanted to know if there was a high-level chat feature, as chat completion tends to produce an unintelligible mess when i try talking to it.
Describe the solution you'd like
It would be nice if i could get some documentation on how to properly chat with the ai, or some alternate way to do this, without making my codebase too messy.
Describe alternatives you've considered
- Running the model low-level (i couldn't find anything related to -ngl, which made me give up pretty quickly)
- running the chat completion model with a prompt, but that lead to it repeatedly affirming me that Moscow was the largest European city. Either that, or hallucinating quite a lot. (it also couldn't recall what i said either.)
Additional context
Model: vicuna 7B
GPU: RTX 3060ti, compiled with cuBLAS
I'm running this native on my Pop_OS machine.
Is your feature request related to a problem? Please describe.
I'm currently working on a project that involves llama-cpp-python's high-level API, and i wanted to know if there was a high-level chat feature, as chat completion tends to produce an unintelligible mess when i try talking to it.
Describe the solution you'd like
It would be nice if i could get some documentation on how to properly chat with the ai, or some alternate way to do this, without making my codebase too messy.
Describe alternatives you've considered
Additional context
Model: vicuna 7B
GPU: RTX 3060ti, compiled with cuBLAS
I'm running this native on my Pop_OS machine.