I wanted to extract some crime statistics broken by the type of crime and different populations, all of course normalized by the population size. I got a nice set of tables summarizing the data for each year that I requested.

When I shared these summaries I was told this is entirely unreliable due to hallucinations. So my question to you is how common of a problem this is?

I compared results from Chat GPT-4, Copilot and Grok and the results are the same (Gemini says the data is unavailable, btw :)

So is are LLMs reliable for research like that?

  • mods_mum@lemmy.todayOP
    link
    fedilink
    arrow-up
    9
    ·
    3 months ago

    That’s seems pretty fucking important :) Thanks for educating me. I’ll stick to raw R for now.

    • INeedMana@lemmy.world
      link
      fedilink
      arrow-up
      5
      ·
      3 months ago

      Asking an LLM for raw R code that accomplishes some task and fixing the bugs it hallucinates can be a time booster, though