One challenge in using LLMs for this (I maintained an advice package for UK social security benefits for 35 years) is simply that the law is constantly changing, i did see one prototype that had no clue about this: it would answer questions with material relevant to different periods. And of course mechanisms for LLMS to determine their own competence in answering arbitrarily complex questions are difficult to create and verify.
Another thought I had, which I think would be fun, would be to have a set of objective training answers that could be used to evaluate models based on how much they "understand" (are trained upon) SNAP or other benefits.
This thought came out of a RubyConf talk (on a track I led) called "Do LLMs dream of Type Inference" that was all about evaluating the answers an LLM provides about certain "Ruby code problems" but I think could be replaced with any other thing, like snap benefits.
I’ve done this already on a small scale! Asset limits are a fun one because so much literal information on the web is wrong. I’m doing some scaling up on this but it’s also interesting to go beyond strict facts and into more normative/alignment questions.
Does Propel publish dashboards on application timelines, acceptance rates, etc? Do other benefits assistance orgs? If not, why not? I agree that some knowledge is most saliently found in the person asking the question, but we can at least do *our* part to make the knowledge in *our* orgs more systemic and transparent to (society|AI), even if we can't claim to have a complete picture
The examples you're giving of Reddit answers, I would expect to show up in the training data (as well as the presence of probabilistic like heuristics like "expect bureaucracy to take twice as long as they say it should".
Cause I agree that without a canonical source, there isn't a way to give an objective answer. But the AI can definitely give a reasonably intuitive answer. Which could be wrong.
By canonical I mean “there exists a written source of truth.” I think there is a set of objective pieces of information, but also a set of non-objective but nonetheless practically important information. I agree models will likely get some of this but I think how it does compared to what we consider right is an interesting question. (In fact, what we consider right and why is an interesting question in and of itself!)
One challenge in using LLMs for this (I maintained an advice package for UK social security benefits for 35 years) is simply that the law is constantly changing, i did see one prototype that had no clue about this: it would answer questions with material relevant to different periods. And of course mechanisms for LLMS to determine their own competence in answering arbitrarily complex questions are difficult to create and verify.
Another thought I had, which I think would be fun, would be to have a set of objective training answers that could be used to evaluate models based on how much they "understand" (are trained upon) SNAP or other benefits.
This thought came out of a RubyConf talk (on a track I led) called "Do LLMs dream of Type Inference" that was all about evaluating the answers an LLM provides about certain "Ruby code problems" but I think could be replaced with any other thing, like snap benefits.
I’ve done this already on a small scale! Asset limits are a fun one because so much literal information on the web is wrong. I’m doing some scaling up on this but it’s also interesting to go beyond strict facts and into more normative/alignment questions.
Does Propel publish dashboards on application timelines, acceptance rates, etc? Do other benefits assistance orgs? If not, why not? I agree that some knowledge is most saliently found in the person asking the question, but we can at least do *our* part to make the knowledge in *our* orgs more systemic and transparent to (society|AI), even if we can't claim to have a complete picture
I guess by "canonical" you mean specifically RAG.
The examples you're giving of Reddit answers, I would expect to show up in the training data (as well as the presence of probabilistic like heuristics like "expect bureaucracy to take twice as long as they say it should".
Cause I agree that without a canonical source, there isn't a way to give an objective answer. But the AI can definitely give a reasonably intuitive answer. Which could be wrong.
By canonical I mean “there exists a written source of truth.” I think there is a set of objective pieces of information, but also a set of non-objective but nonetheless practically important information. I agree models will likely get some of this but I think how it does compared to what we consider right is an interesting question. (In fact, what we consider right and why is an interesting question in and of itself!)