

I don’t think that’s a problem. I live in Spain and speak Spanish daily with real people, many of them my friends. They’ll correct me if needed, they often do. Though most are my own mistakes.
Don’t forget people give wrong answers too. But people aren’t available 24/7 to help me.

Thank you so much!! I have been putting it off because what I have works but a time will soon come when I’ll want to test new models.
I’m looking for a server but not many parallel calls because I would like to use as much context as I can. When making space for e.g. 4 threads, the context is split and thus 4x as small. With llama 3.1 8b I managed to get 47104 context on the 16GB card (though actually using that much is pretty slow). That’s with KV quant to 8b too. But sometimes I just need that much.
I’ve never tried the llama.cpp directly, thanks for the tip!
Kobold sounds good too but I have some scripts talking to it directly. I’ll read up on that too see if it can do that. I don’t have time now but I’ll do it in the coming days. Thank you!