Discussion about this post

User's avatar
Tim Blackwell's avatar

One challenge in using LLMs for this (I maintained an advice package for UK social security benefits for 35 years) is simply that the law is constantly changing, i did see one prototype that had no clue about this: it would answer questions with material relevant to different periods. And of course mechanisms for LLMS to determine their own competence in answering arbitrarily complex questions are difficult to create and verify.

Expand full comment
Ben Sheldon's avatar

Another thought I had, which I think would be fun, would be to have a set of objective training answers that could be used to evaluate models based on how much they "understand" (are trained upon) SNAP or other benefits.

This thought came out of a RubyConf talk (on a track I led) called "Do LLMs dream of Type Inference" that was all about evaluating the answers an LLM provides about certain "Ruby code problems" but I think could be replaced with any other thing, like snap benefits.

Expand full comment
4 more comments...

No posts