Remix.run Logo
icelancer 3 months ago

Nice use of an LLM - we use Groq 70b models for this in our pipelines at work. (After using WhisperX ASR on meeting files and such)

One of the better reasons to use Cerebras/Groq that I've found so you can return huge amounts of clean text back fast for processing in other ways.

ldenoue 3 months ago | parent [-]

Although Gemini accepts very long input context, I found that sending more than 512 or so words at a time to the LLM for "cleaning up the text" yields hallucinations. That's why I chunk the raw transcript into 512-word chunks.

Are you saying it works with 70B models on Groq? Mixtral, Llama? Other?

bob_theslob646 3 months ago | parent | next [-]

When you did this, I am assuming you cut the audio off around 5 mins?

https://github.com/google-gemini/generative-ai-js/issues/269...

icelancer 3 months ago | parent | prev [-]

Yeah, I've had no issues sending tokens up to the context limit. I cut it off with a 10% buffer but that's just to ensure I don't run into tokenization miscounting between tiktoken and whatever tokenizer my actual LLM uses.

I have had little success with Gemini and long videos. My pipeline is video -> ffmpeg strip audio -> whisperX ASR -> groq (L3-70b-specdec) -> gpt-4o/sonnet-3.5 for summarization. Works great.