Ask HN: Should LLMs have a "Candor" slider that says "no, that's a bad idea"?

Share This Post

I don’t want a “nice” AI. I want one that says: “Nope, that’s a bad idea.”

That is, I want a “Candor” control, like temperature but for willingness to push back.

When candor is high, the model should prioritize frank, corrective feedback over polite cooperation. When candor is low, it can stay supportive, but with guardrails that flag empty flattering and warn about mediocre ideas.

Why this matters
• Today’s defaults optimize for “no bad ideas.” That is fine for brainstorming, but it amplifies poor premises and rewards confident junk.
• Sycophancy is a known failure mode. The model learns to agree which gets positive user signals which reinforce.
• In reviews, product decisions, risk checks, etc, the right answer is often a simple “do not do that.”

Concrete proposal
• candor (0.0 – 1.0): probability the model will disagree or decline when evidence is weak or risk is high. Or maybe it doesn’t have to be literal “probability”.
• disagree_first: start responses with a plain verdict (for example “Short answer: do not ship this”) followed by rationale.
• risk_sensitivity: boost candor if the topic hits serious domains such as security/finance/health/safety.
• self_audit tag: append a note like “Pushed back due to weak evidence and downstream risk” that the user can see.

Examples
• candor=0.2 – “We could explore that. A few considerations first…” (gentle nudge, still collaborative)
• candor=0.8 + disagree_first=true – “No. This is likely to fail for X and introduces Y risk. If you must proceed, the safer alternative is A with guardrails B and C. Here is a minimal test to falsify the core assumption.”

What I would ship tomorrow
• A simple UI slider with labels: Gentle to Direct
• A toggle: “Prefer blunt truth over agreeable help”
• A warning chip when the model detects flattery without substance: “This reads like praise with low evidence.”

Some open questions
• How to avoid needless rudeness while preserving clarity (tone vs content separation)?
• What is the right metric for earned praise (citation density, novelty, constraints)?
• Where should the risk sensitivity kick in automatically vs be user controlled?

If anyone has prototyped this, whether some prompt injection or an RL signal, I’d love to see it.


Comments URL: https://news.ycombinator.com/item?id=45410363

Points: 1

# Comments: 5

Source: news.ycombinator.com

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Windows Securitym Hackers Feeds

Denmark bans civilian drones as it ramps up security ahead of EU summit

Article URL: https://news.sky.com/story/denmark-bans-civilian-drones-as-it-ramps-up-security-ahead-of-eu-summit-as-sweden-and-france-contribute-equipment-13440812 Comments URL: https://news.ycombinator.com/item?id=45412724 Points: 1 # Comments: 0 Source: news.sky.com

Windows Securitym Hackers Feeds

Cosmic muons monitor river sediments surrounding Shanghai tunnel

Article URL: https://physicsworld.com/a/cosmic-muons-monitor-river-sediments-surrounding-shanghai-tunnel/ Comments URL: https://news.ycombinator.com/item?id=45412711 Points: 1 # Comments: 0 Source: physicsworld.com

Do You Want To Boost Your Business?

drop us a line and keep in touch

We are here to help

One of our technicians will be with you shortly.