This is really cool. Do you mind sharing a bit about your process for coming up with the scoring system? Like, what was the process like when you were working with the AI to develop it?
Also, have you ever considered the ability to provide the site with, say, a name and some links to social media or a blog or any other websites associated with that person, and it provides a score on demand?
Thanks! At first I was using OpenAI's deep research to just give a summary and overall score 1-10, but I realized that could not be iterative and future proof as new evidence comes to light.
So after some thought, I switched to a system of individual evidence gathering and weighting each piece of evidence. I've given the models some basic starting points for types of evidence (for instance a donation has a default weight of 8/10), but have given the models leeway to make relative judgements.
After all evidence is collected, the weights and confidence that the evidence is accurate (usually very high) are put into a formula to derive a final score. No recency bias. The nitty gritty:
-Each row contributes direction × weight × confidence × status_factor, where disputed is cut in half and there is no recency decay.
-All signed contributions are summed into S, and total support mass goes into M.
Final score is 50 + 50 * (S / (M + 4)), clamped to 0-100.
-That +4 prior mass keeps thin but unanimous evidence from producing extreme scores too easily.
-Neutral evidence (direction = 0) doesn’t push the score up or down, but it does increase M, which pulls the result back toward 50.
As for the ladder - I think that is a good idea, but in a controlled manner because of the token cost and potential for abuse.
Also, have you ever considered the ability to provide the site with, say, a name and some links to social media or a blog or any other websites associated with that person, and it provides a score on demand?
So after some thought, I switched to a system of individual evidence gathering and weighting each piece of evidence. I've given the models some basic starting points for types of evidence (for instance a donation has a default weight of 8/10), but have given the models leeway to make relative judgements.
After all evidence is collected, the weights and confidence that the evidence is accurate (usually very high) are put into a formula to derive a final score. No recency bias. The nitty gritty:
-Each row contributes direction × weight × confidence × status_factor, where disputed is cut in half and there is no recency decay.
-All signed contributions are summed into S, and total support mass goes into M. Final score is 50 + 50 * (S / (M + 4)), clamped to 0-100.
-That +4 prior mass keeps thin but unanimous evidence from producing extreme scores too easily.
-Neutral evidence (direction = 0) doesn’t push the score up or down, but it does increase M, which pulls the result back toward 50.
As for the ladder - I think that is a good idea, but in a controlled manner because of the token cost and potential for abuse.