Remix clone Hacker News

new | show | ask | jobs Github

	▲	jordanb 3 days ago
		We use Google meet and it has Gemini transcriptions of our meetings. They are hilariously inaccurate. They confuse who said what. They often invert the meaning "Joe said we should go with approach x" where Joe actually said we should not do X. It also lacks context causing it to "mishear" all of our internal jargon to "shit my iPhone said" levels.
	▲	rowanseymour 3 days ago \| parent \| next [-]
		Same here. It's frustrating that it doesn't seem to have contextual awareness of who we are and the things we work on so things like names of our products, names of big clients, that we use repeatedly in meetings, are often butchered.
	▲	sigmoid10 3 days ago \| parent \| prev \| next [-]
		That's the difference between having real AI guys and your average linkedIn "AI guys." The other post is a perfect example for a case where you could take a large but still manageable, cutting-edge transcription model like Whisper and fine-tune it using existing hand made transcriptions as ground truth. A match made in heaven for AI engineers. Of course this is going to work way, way better for specific corporate settings than slapping a random closed source general purpose model like Gemini on your task and hoping for the best, just because it achieves X% on random benchmark Y.
	▲	ricardonunez 3 days ago \| parent \| prev \| next [-]
		I don’t know how it can confuse because input on mic is relatively straight forward to get. I use fathom and others and they are accurate, better than manual taken. Interesting take, that I don’t memorize 100% on the calls anymore since I rely on note takers, I only remember the major points but when I read the notes, everything comes clear.
	▲	orphea 3 days ago \| parent \| prev \| next [-]
		Oh, that's what happening. I thought my English is just terrible :(
	▲	thisisit 3 days ago \| parent \| prev \| next [-]
		I found that if you have people with accents and they emphasize certain words then it becomes very difficult to read. One example, I find is "th" is often D because how people pronounce it. Apart from that it is a hit or miss.
	▲	nostrademons 3 days ago \| parent \| prev [-]
		I also use Gemini notes for all my meetings and find them quite helpful. The key insight is: they don’t have to be particularly accurate. Their primary purpose is to remind me (or the other participants) of what was discussed, what considerations were brought up, and what the eventual decision was. If it inverts the conclusion and forgets a “not”, we’re going to catch that, because we were all in the meeting too. It’s their to jog our memory of what was said, because it’s much easier to recognize correct information than recall it, it’s not the authoritative source of truth on the meeting. This gets to a common misconception when it comes to GenAI uses: it functions best as “augmented intelligence” rather than “artificial intelligence”. Meaning that it’s at its best when there’s still a human in the loop and the AI supplements the parts the person are bad at rather than replacing the person entirely. We see this with coding, where AI is very good at writing scaffolding, large-scale refactoring, picking decent libraries, reading API docs and generating code that calls it appropriately, etc but still needs a human to give it very specific directions for anything subtle, and someone to review carefully for bugs and security holes.