Having trouble to generate correct output? Try prefixes!

Smorty [she/her]@lemmy.blahaj.zone · 12 days ago

Having trouble to generate correct output? Try prefixes!

hendrik@palaver.p3x.de · 12 days ago

Very good idea. I mean there are frameworks for programmers to do exaclty that, like LangChain. But I also end up doing this manually. I use Kobold.cpp and most of the times I just switch it to Story mode and I get one lage notebook / text area. I’ll put in the questions, prompts, special tokens if it’s an instruct-tuned variant and start the bullet point list for it. Or click on generate after I’ve already typed in the chapter names or a table of contents. Or opened the code block with the proper markdown. So pretty much like what you lined out. It’s super useful to guide the LLM into the proper direction. Or steer it back on track with a small edit in its output, and a subsequent call to generate from there.

Smorty [she/her]@lemmy.blahaj.zone · 12 days ago

Could you please tell me why you chose kobold.cpp over llama.cpp? I only ever used llama.cpp so I’d like to hear from the other side!

I really like the idea of letting an LLM perform too calls into middle of the generation.

Like, we instruct the LLM to Say what it will do, then to put the tool call into <tool></tool> tags. Then we could set </tool> as a stop keyword and insert the results into it’s message.

I have tries this before, but it tends to not believe what is in its own message. It tends to see the output of the tool cal and go Don't believe what I just said, I made that up, even though LLMs are infamous for hallucinating…

hendrik@palaver.p3x.de · 12 days ago

Kobold.cpp is using llama.cpp under the hoods. It just adds a few extras and a webserver and an user interface. Plus some backwards compatibility for older model file formats, and it’s relatively easy to install. But the project builds upon llama.cpp and uses that same code for inference.

Having trouble to generate correct output? Try prefixes!

Having trouble to generate correct output? Try prefixes!

Predefined formats

Translation

Code completion and generation

Using this in ollama

Be aware!