I made a first test of Llama 2 13B, in a 6-bits quantized version (thanks, #ggml )
It's good-to-excellent in various tasks: summarization, translation (I tried EN, IT, FR), NER with semantic filters.
AND it runs on a CPU-only installation on an Intel, at decent speed. 👏
Just did a bunch of merges of upstream #ggml repo and managed to get the StarCoder and WizardCoder running in #turbopilot - there are definitely some opportunities to accelerate it to make it more useful.