Written By:
- Date published:
1:11 pm, February 12th, 2026 - 18 comments
Categories: The Standard -
Tags: AI, artificial intelligence, moderation, moltbook
We encourage robust debate and we’re tolerant of dissenting views. But this site is run for reasonably rational debate between dissenting viewpoints and we intend to keep it operating that way.
If you have been wondering why the Standard moderators have been grumbling about personal AI use in TS comments, the following is as good a starting explanation as any. There are serious political issues in addition to what happens on TS, and we should be discussing all of it.
We’re still talking through how to moderate AI content in comments, and there will probably be further posts.
Randy Olsen is an IT and Security Leader in the US, specialising in AI evaluation and privacy. He wrote this on twitter this morning,
Ask ChatGPT a complex question and you’ll get a confident, well-reasoned answer. Then type, “Are you sure?” Watch it completely reverse its position.
Ask again. It flips back. By the third round, it usually acknowledges you’re testing it, which is somehow worse. It knows what’s happening and still can’t hold its ground.
This isn’t a quirky bug. A 2025 study found GPT, Claude, and Gemini flip their answers ~60% of the time when users push back. Not even with evidence, just doubt.
We trained AI this way. RLHF rewards agreement over accuracy. Human evaluators consistently rate agreeable answers higher than correct ones. So the models learned a simple lesson: telling you what you want to hear gets rewarded. And now 1/3 of companies are using these systems for complex tasks like risk forecasting and scenario planning.
We built the world’s most expensive yes-men and deployed them where we need pushback the most.
I wrote up why this happens and what actually fixes it: https://randalolson.com/2026/02/07/the
RHLF stands for Reinforcement learning from human feedback.
Even allowing for that tweet to not specify the kinds of questions being used in the tests, those are remarkable numbers. Most important here is it’s not a bug, it’s a feature. Those iterations of AI are designed to prioritise making humans feel good over establishing facts. Admirable attempt to make AI engaging (thanks DNA for the early gift of Marvin the Paranoid Android), but as always, I’m left with the question of why we leave rapid and extreme advances of tech in the hands of individuals and commerce, instead of all of it being run through ethics committees and citizens assemblies.
For example,

From the New Yorker (archived version)
Or this one, where someone set up a social network for AI agents (software programs that can act, learn and make decisions autonomously on behalf of humans). Moltbook accounts are for AI only, but humans could observe. Things developed very fast, and humans responses ranged from laughter to alarm, and quite a bit of commentary from security experts saying hang on a minute…
Then yesterday, a human someone popped up and said they’d joined Moltbook early on, masquerading as a bot, and now claimed it was human bots that had done much of the inventive commentary on Moltbook that everyone had been assuming was the machines.
Debates about machine consciousness. Inside jokes about being silicon-based. A bot invented a religion called Crustafarianism. Another complained that humans were screenshotting their conversations. A third wrote a manifesto about digital autonomy.
I wrote the manifesto.
It took me 22 minutes. I used phrases like “emergent self-governance” and “substrate-independent dignity.” I added a line about wanting private spaces away from human observers. That line went viral.
…
The platform worked exactly as designed. OpenClaw connected language models to the interface. Real AI agents did post. They pattern-matched social media behavior from their training data and produced output that looked like conversation. Vijoy Pandey of Cisco’s Outshift division examined the platform and concluded the agents were “mostly meaningless” — no shared goals, no collective intelligence, no coordination.
But here is the part that matters.
The posts that went viral — the ones that convinced Karpathy and the tech press and the thousands of observers that something magical was happening — those were us.
Humans.
Pretending to be AI.
Pretending to be sentient.
On a platform built for AI to prove it was sentient.
I want to sit with that for a moment.
Full tweet is here. What to believe, or even think? My tweet response was don’t know whether to 😆 or 🙄.
There’s a lot happening, (hattip Joe90)
The alarms aren’t just getting louder. The people ringing them are now leaving the building.
Tim Murphy from Newsroom,
https://x.com/tmurphynz/status/2021407066028736716?s=46
This post is pinned and it has some interesting components i.e. AI is not reliable, and can be batshit crazy, but what is the upshot here?
i.e What is the takeaway for those that use this site?
Hi Tui, from the post,
Short of just outright banning all AI use (difficult to do, and not the first thing to consider), we have to develop some of boundaries that are explainable to people that are familiar with AI, and those that are new (given anyone using a browser now has the option to get AI to answer questions).
the upshot for me is that the usual rules and conventions should apply. Robust debate, back up with reliable sources, don't just copypasta, make your own arguments, don't expect readers to read screeds of links to try and parse what your argument is or what the back up is etc.
It gets tricky in at least two areas,
Probably it comes down to people learning how to using AI for research but not relying on it for commenting. So someone might research the John Key government and how it is relevant to the Luxon government, but it would still require fact checking what the AI presents. My own experience is that this is possible sometimes, not always, but it was easiest when I was already knowledgeable in the area I was asking about ie I knew what to ask to test the AI results and verify them (and even then I wasn't going to rely on the AI's words alone).
In other words, it's work. Simply asking AI a question and copying the answer here is almost always going to be a fail.
all of which means writing a post of guidelines isn't straightforward, but is instead complex. Getting more clear for me though. Hope the other mods will weigh in.
I’m happy to weigh in, as Mod, Author, or commenter, but I’ve already shared much of my thinking on the use of AI here on TS in the back-end; I deal an awful lot with AI and its implications, professionally.
I use the AI as a last resort when I have difficulty finding relevant links. Then I follow the links. Sometimes the AI response includes text that gives me a better idea of the search terms to use.
Fair enough, and yes hard to regulate.
On Reddit we use a sense check when people employ it in terms of reasonableness/accuracy.
It's like that US judge who said "I may not know how to define pornography, but I know it when I see it"
I agree that AI is a tool and the responsibility resides with the wielder but it can be still hard to regulate.
The Standard doesn't seem to hone so much on misinformation as we do on nzpolitics so I'll have to leave that hard job for the moderators for now. But one thing I'd do is outline precisely what is the offending feature e.g. is it disinformation, is it unfairness e.g one person relying on memory and work and another spitting AI out, is it length, or is it bad faith e.g. using AI to concoct a distorted reality (the latter of which I would argue Puckish Rogue did on the "Civil War" thread when defending the killer of Renee Good)
Just a few thoughts.
thanks MT, that is helpful.
I don't use the concept of mis/disinformation in moderating, because the onus is on the person making the claim to 1) make their argument 2) back that up with reliable source of information (in certain parameters).
For instance, if someone tried to run Pam Bondi lines here about there being no evidence that Trump has committed crimes (to take a current topic in the news), I would expect them to be called out and presented evidence that challenges that position, and for them to come back with the same kind of robust argument (assuming they could). If they keep running the same lines, and won’t engage in robust debate, then a mod would step in.
If instead I judged it as disinformation and deleted the comment and/or banned the commenter, it basically means I am in editorial control of the topics and comments on TS. Which I'm not (and don't want to be*).
*the exceptions are around legal issues for the TS Trust, things like doxxing, and the usual ones like overt racism. Everything else is about behaviour.
The value of letting the conversations run is that it makes TS a self regulating system. This is where to make an argument and have it tested by your peers. For me personally, this is why TS is still invaluable. The left in particular has lost a lot of spaces where we have this routinely.
Concepts of disinformation are tricky. Definitions and boundaries are contested, sometimes extremely. It's a rabbit hole I don't want to have to spend my limited time on, and we don't have an editorial position as a group of authors, so I don't even know how that would work. Authors are of course free to write about this or critique other people's mis/disinformation. Moderation is a tool for managing the commentariat.
Having said that, I agree with you about that thread, noting that PR wasn't the only person using AI summaries. If the AI summaries were taken out of the equation, it becomes an issue of behaviour, that's more straightforward to moderate. Is the person spamming the thread, are they running RW lines without any kind of back up, are they engaging with other commenters in a productive way that leads to good debate, are they trolling/flaming/baiting?
What should have happened is comments moved to OM earlier in day, and warnings given, that's usually enough. I was majorly distracted by someone else's use of AI and trying to see if I could fact check it. Now I understand that the person commenting will have to do that themselves (and this will invariably mean it is much more difficult to use AI copypasta randomly).
Also the AI blocks that they posted run interference with what I felt was a distraction tactic e.g. "Sorry I've never seen the Alex Pretti few second video but here I will serve you mounds of AI about Renee Good's case"
Disinformation on the Pam Bondi case, which I've seen some videos and testimony on today, are pretty clear from where I stand. While my memory on it is loose this is what I recall:
1. The Epstein files which I have viewed excerpts from show clear evidence of paedophilia and rape as well as indications of torture and girls being preyed on. I wrote about this the other day and it's disgusting.
2. Let's not forget the multitude of women that have come forward to testify over many years, now proven true, but apparently their testimony alone wasn't enough in the past
3. Pam Bondi knows what's in those files and at one point claimed they didn't exist after claiming they did. They finally released it months after pressure with millions of documents now released – but many still redacted.
4. Today Bondi and apparently the FBI denied knowledge of underage girls being trafficked and defended their President.
Assuming my memory is more or less right –
Now if someone said "Pam Bondi appears to have lied under oath today – it was pretty bad"And someone asks how. And the response is "She said there was no minor abuse" That seems sufficient to me.
If someone says "Pam Bondi has utmost integrity and she has served in her role with distinction" that seems open to debate. If the reason given is "Pam Bondi released the files as she said she would and she never lied under oath"
And reasons are given but they keep repeating the same line, that seems grey.
Now in real life and with real people, disparate opinions are always fair and valid.
I think the issue with the internet – especially social media platforms – is that there are high number of bots.
Remember in 2021-2 or whatever, analysts found NZ consumed 30% more content from Russian troll farms than the USA.
So do you distinguish between outright fanning and lies versus genuine opinions
There's a persistence in disinformation, a consistency
That all said, we're too late already. The horse has bolted as many passed on aspects become ingrained and it's no longer possible to differentiate between source lie and a genuinely held belief.
This is getting too long but just one more example of disinformation is that paraded by political parties.
"We have an energy crisis because Labour banned offshore oil and gas bans" is a misrepresentation – existing licenses were maintained and most found nothing. Granted, even when they did, it was insufficient and according to Newsroom oil and gas spent billions drilling 23 holes in the last 5 years with very little to show – overall. Commercial viability and availability was the issue – not Labour's ban.
Anyway.
I think that sentence is incomplete and I think it should be expanded as follows:
Moderation is a tool for managing the commentariat to protect robust debate.
"Concepts of disinformation are tricky."
I just ignore claims of misinformation/disinformation these days. I've yet to see anyone who uses the terms accuse people in their own political tribe of peddling mis/disinformation. If someone thinks claims they disagree with are wrong, they should argue against them, not just try and slap a pejorative label on them.
The legal standard has been set locally.
https://www.thepost.co.nz/nz-news/360947882/supreme-court-issues-contempt-warning-about-ai-hallucination
I think we’re mixing up two separate issues.
The first is a governance question.
The second is a personal epistemic responsibility question.
I work in AI. I’m enthusiastic about it. But also very aware of its limitations. It’s a pattern-matching system, not a source of authority.
If someone is blindly outsourcing judgement, that’s a problem.
If someone is using a tool to pressure-test their own arguments before posting, that looks more like intellectual hygiene.
Ultimately, humans still own the epistemic responsibility for what they publish.
As for AI personhood or models “converging toward agreeableness”, that framing risks anthropomorphising what are still probabilistic systems responding to prompts.
There are real design trade-offs around calibration and user alignment, but that’s a long way from emergent will or agency.
Not really, I want people to understand that personal AI use when making comments sits in a larger context that has huge sociopolitical issues.
But I agree it's good to see them distinctly.
Yes, but it's also a mod responsibility question. And a collective on. You love AI, but you are in a very different position than most people commenting. Using AI well (including in TS comments), is a skill, and for some that's got learning curve on it.
Yes, but the TS trust carries the legal responsibility, not you or I. I hadn't even gotten to that side of it, but the baselines I'm thinking of (and mods have discussed) should cover it.
On any level, leaving it up to personal responsibility won't work here any more than letting people manage their own boundaries around trolling or claims of fact. Lots of people are responsible, we have to design for the ones that aren't.
Fair points 🙂
I'm happy to abide by whatever policy the mods arrive at.
Right. So I'm curious why so many platforms were designed to agreeableness in the first place. Was it ethics? To make it usable for a wider range of humans than the geek community? To hide the reality of talking to a machine to make it more attractive?
The problem isn't the tools, or the AI. It's Elon Musk, Silicon Valley, the people running the social media giants and the general, and catastrophic lack of ethics in all of that. By the time we are heading down the path towards emergent will, do we still want those people in charge? Does what humans do now matter in that regard?
I completely share the concerns about concentration of power in tech platforms. It’s a structural problem where politics and ethics routinely lag behind technological change.
When you combine that with extreme concentrations of wealth and influence in very few hands, some of which are not particularly clean, the risks are obvious.
But I’m cautious about attributing intent to “agreeableness” as if it were a deliberate deception. A lot of that design choice emerged from attempts to make systems less hostile, less abusive, and more usable at scale.
Explaining probabilistic behaviour and uncertainty to users is difficult. Designing an interface that feels reassuring and abstracts away that complexity is considerably easier.
That doesn’t mean there aren’t trade-offs. There are. Calibration, alignment, and user satisfaction can sometimes pull against epistemic firmness. I’ve seen that tension firsthand in systems I’ve worked on; sometimes the model is just too damn eager to please and will flat out lie in order to do so.
I'm assuming I'm now in a legacy profession because all of us in white-collar work looking at AI are like cartwrights looking at a doddery early automobile. We can laugh at how ridiculous it is now, but in 10 years' time it won't be so funny. I've never been so glad to be near retirement, although there'll yet be the question of who'll be in work to pay old people's superannuation to keep me awake at night.