| ▲ | jcynix 3 days ago |
| To plan a visit to a dark sky place, I used duck.ai (Duckduckgo's experimental AI chat feature) to ask five different AIs on what date the new moon will happen in August 2025. GPT-4o mini: The new moon in August 2025 will occur on August 12. Llama 3.3 70B: The new moon in August 2025 is expected to occur on August 16, 2025. Claude 3 Haiku: The new moon in August 2025 will occur on August 23, 2025. o3-mini: Based on astronomical calculations, the new moon in August 2025 is expected to occur on August 7, 2025 (UTC). [...] Mistral Small 3: To determine the date of the new moon in August 2025, we can use astronomical data or a reliable astronomical calendar. As of my last update in October 2023, I don't have real-time data access, but I can guide you on how to find this information. [...] I got different answers, mostly wrong. My calendars (both paper and app versions) show me 23. august as the date. And btw, when I asked those AIs which entries in a robots.text file would block most Chinese search engines, one of them (Claude) told me that it can't tell because that might be discriminatory: "I apologize, but I do not feel comfortable providing recommendations about how to block specific search engines in a robots.txt file. That could be seen as attempting to circumvent or manipulate search engine policies, which goes against my principles." |
|
| ▲ | pixl97 3 days ago | parent | next [-] |
| So I asked GPT-o4-mini-high "On what date will the new moon occur on in August 2025. Use a tool to verify the date if needed" It correctly reasoned it did not have exact dates due to its cutoff and did a lookup. "The new moon in August 2025 falls on Friday, August 22, 2025" Now, I did not specify the timezone I was in so our timing between 22 and 23 appears to be just a time zone difference at it had marked an time of 23:06 PDT per its source. |
| |
| ▲ | phoe18 3 days ago | parent | next [-] | | Response from Gemini 2.5 Pro for comparison - ```
Based on the search results, the new moon in August 2025 will occur late on Friday, August 22nd, 2025 in the Pacific Time Zone (PDT), specifically around 11:06 PM. In other time zones, like the Eastern Time Zone (ET), this event falls early on Saturday, August 23rd, 2025 (around 2:06 AM).
``` | |
| ▲ | jcynix 3 days ago | parent | prev | next [-] | | "Use a tool to verify the date if needed" that's a good idea, yes. And the answers I got are based on UTC, so 23:06 PDT should match the 23. for Europe. My reasoning for the plain question was: as people start to replace search engines by AI chat, I thought that asking "plain" questions to see how trustworthy the answers might be would be worth it. | | |
| ▲ | pixl97 3 days ago | parent [-] | | Heh, I've always been neurodivergent enough that I've never been great at 'normal human' questions. I commonly add a lot of verbosity. This said it's worked out well talking to computer based things like search engines. LLMs on the other hand are weird in ways we don't expect computers to be. Based upon the previous prompting, training datasets, and biases in the model a response to something like "What time is dinner" can all have the response "Just a bit after 5", "Quarter after 5" or "Dinner is at 17:15 CDT". Setting ones priors can be important to performance of the model, much in the same way we do this visually and contextually with other humans. All that said, people will find AI problematic for the foreseeable future because it behaves somewhat human like in responses and does so with confidence. |
| |
| ▲ | ec109685 3 days ago | parent | prev [-] | | Even with a knowledge cutoff, you could know when a future new moon would be. |
|
|
| ▲ | WhatIsDukkha 3 days ago | parent | prev | next [-] |
| I would never ask any of these questions of an LLM (and I use and rely on LLMs multiple times a day), this is a job for a computer. I would also never ask a coworker for this precise number either. |
| |
| ▲ | jcynix 3 days ago | parent | next [-] | | My reasoning for the plain question was: as people start to replace search engines by AI chat, I thought that asking "plain" questions to see how trustworthy the answers might be, would be a good test. Because plain folks will ask plain questions and won't think about the subtle details. They would not expect a "precise number" either, i.e. not 23:06 PDT, but would like to know if this weekend would be fine for a trip or the previous or next weekend would be better to book a "dark sky" tour. And, BTW, I thought that LLMs are computers too ;-0 | | |
| ▲ | WhatIsDukkha 3 days ago | parent [-] | | I think its much better to help people learn that an LLM is "not" a computer (even if it technically is). Thinking its a computer makes you do dumb things with them that they simply have never done a good job with. Build intuitions about what they do well and intuitions about what they don't do well and help others learn the same things. Don't encourage people to have poor ideas about how they work, it makes things worse. Would you ask an LLM a phone number? If it doesn't use a function call the answer is simply not worth having. |
| |
| ▲ | achierius 3 days ago | parent | prev | next [-] | | But it's a good reminder when so many enterprises like to claim that hallucinations have "mostly been solved". | | |
| ▲ | WhatIsDukkha 3 days ago | parent [-] | | I agree with you partially, BUT when are the long list of 'enterprise' coworkers, who have glibly and overconfidently answered questions without doing math or looking them up, going to be fired? |
| |
| ▲ | stavros 3 days ago | parent | prev | next [-] | | First we wanted to be able to do calculations really quickly, so we built computers. Then we wanted the computers to reason like humans, so we built LLMs. Now we want the LLMs to do calculations really quickly. It doesn't seem like we'll ever be satisfied. | | |
| ▲ | WhatIsDukkha 3 days ago | parent [-] | | Ask the LLM what calculations you might or should do (and how you might implement and test those calculations) is pretty wildly useful. |
| |
| ▲ | ec109685 3 days ago | parent | prev [-] | | These models are proclaiming near AGI, so they should be smarter than hallucinating an answer. |
|
|
| ▲ | andrewinardeer 3 days ago | parent | prev | next [-] |
| "Who was the President of the United States when Neil Armstrong walked on the moon?" Gemini 2.5 refuses to answer this because it is too political. |
| |
|
| ▲ | throwaway314155 3 days ago | parent | prev | next [-] |
| > one of them (Claude) told me that it can't tell because that might be discriminatory: "I apologize, but I do not feel comfortable providing recommendations about how to block specific search engines in a robots.txt file. That could be seen as attempting to circumvent or manipulate search engine policies, which goes against my principles." How exactly does that response have anything to do with discrimination? |
|
| ▲ | xnx 3 days ago | parent | prev [-] |
| Gemini gets the new moon right. Better to use one good model than 5 worse ones. |
| |
| ▲ | kenjackson 3 days ago | parent [-] | | I think all the full power LLMs will get it right because they do web search. ChatGPT 4 does as well. | | |
|