NZ Gvt to let AI start making welfare calls

Kia ora! Welcome to New Zealand’s weekly roundup of AI news and education.

We're on the lookout for a gun, AI-pilled engineer to join Allexive as Lead AI Engineer. This is the person who'll help steer the technical trajectory of a rapidly scaling AI engineering and consultancy firm, redesigning how work gets done and how AI agents operate, across the likes of Mitre 10, Petbarn (Animates' Australian sibling), Target Furniture, RealNZ and Opes Mortgages.

In practice that means rolling out Claude, Copilot and ChatGPT inside real businesses, and shipping product into Claude's Agent SDK, Microsoft's Agent Framework and Google's ADK. If this is your world, or you know someone who lives and breathes agent engineering, DM me or apply here.

Happy reading ✌️

Did someone forward you this? Sign up!

🇳🇿 New Zealand News

A change to the Social Security Act, debated under urgency so it skipped select committee and public consultation, lets the Ministry of Social Development approve automated electronic systems to make decisions, exercise powers and take actions across specified welfare provisions. MSD told RNZ the change would not use generative AI like ChatGPT, and that automation would handle simple rules-based decisions while humans kept judgement where needed. Opposition MPs invoked Australia's Robodebt scheme, which an inquiry found made wrongly-pursued recipients feel like criminals and was linked to suicides.
3 min read

Our take: The detail doing the most work is the process, because urgency removed the select committee stage where the public would normally scrutinise how automated welfare decisions get made. A law about trusting machines with people's income is the last place to skip the step designed to build trust, and the redacted problem statement Labour flagged makes the rush hard to read as anything but avoidance.

New Zealand's Supreme Court warned self-represented litigants who submitted AI-generated filings citing fabricated authorities, including property investor Liyun Chen who referenced a non-existent case called "Peterson v Forbes". Justices flagged that filing fake AI authorities could draw contempt charges carrying up to a $25,000 fine or six months in jail, or obstruction-of-justice charges carrying up to seven years. It was the court's second such warning this year.
3 min read

Our take: Courts elsewhere have leaned on fines, but New Zealand reaching for obstruction-of-justice charges signals a harder posture, because obstruction frames the fake citation as an attack on the process rather than a careless error. That reframing matters for how seriously professionals in regulated fields should treat unverified AI output landing in any formal document. The model produces a confident, properly formatted reference to a case that was never decided, and the cost of checking is trivial next to the cost of being caught, yet the checking is the step people skip because the output looks finished.

Health Minister Simeon Brown confirmed procurement is underway for an AI tool to perform one of the two independent reads currently required in BreastScreen Aotearoa, with a rollout planned from early 2027. Around 270,000 people aged 45 to 69 are screened each year, with an age extension to 74 underway. Health NZ framed the tool as supporting rather than replacing radiologists amid workforce shortages, and the Science Media Centre published expert reaction the same day.
3 min read

Our take: International programmes that ran AI alongside radiologists found the gains come from catching what tired humans miss on the hundredth scan of the day, not from speed. If New Zealand measures success by detection rates rather than throughput, this becomes a genuine quality story; if it measures by cost per screen, the incentive drifts somewhere less reassuring.

Regulation Minister David Seymour released "Responsible AI in Action" guidance directing regulators to use AI on low-risk tasks such as triaging cases and validating data against clear rules, while keeping humans on judgement, legal interpretation and accountability. The guidance asks regulatory leaders to weigh four principles: transparency, fairness, privacy, and te Tiriti o Waitangi. It is separate from the earlier exercise mapping 267 regulators.
2 min read

Our take: The signal here is the posture shift, because the official line has moved from "explore AI" to "here is the lane it belongs in", which is what maturing adoption looks like inside government. Drawing a clear line between rules-based triage and human judgement is more useful than any high-level principle, since it gives a regulator something to act on rather than aspire to.

Today's sponsor

4x more context into every prompt. Zero extra effort.

You think faster than you type. Which means every typed prompt leaves out the constraints, examples, and edge cases that would have made the output actually useful.

Wispr Flow turns your voice into paste-ready text inside any AI tool. Speak naturally — include "um"s, tangents, half-finished thoughts — and Flow cleans everything up. You get detailed, structured prompts without touching a keyboard.

89% of messages sent with zero edits. Used by teams at OpenAI, Vercel, and Clay. Free on Mac, Windows, and iPhone.

Try Wispr Flow free

📚️ Mike’s Takes From The Week

Helping leaders and teams adapt, learn, and scale with AI.

1️⃣ Here’s How to Actually Implement AI Across a Team: We rolled out Glean across the whole team and people drifted back within three weeks. Nothing shifted until we mapped one workflow end to end and tracked the work. The two-workstream playbook for closing the gap.
6-min read

2️⃣ The person booking the first AI discovery call in ANZ has changed: Less CIOs, heavier towards Chief People Officers. The shift moved through three phases: enthusiast, then IT leadership, now P&C. One Australian CPO landed on it herself this week, realising the hard part was never the technology, it's change management.
2-min read

3️⃣ A bike shop had the answer I needed. It just hadn’t used AI to reach me: Four calls in 30 minutes, every line engaged. The part was on their website in a language built for suppliers (or people who speak ‘bike-part’), so ChatGPT surfaced their competitors instead. Not a tech gap or a staffing gap. A knowledge-distribution gap, and it's in most mid-market businesses.
3-min read

4️⃣ Five takeaways from a marketers' breakfast and Google's Agentic event: AI as a skill-set equaliser that kills department dependency, shared context breaking down as nobody feeds corrections back, and agentic commerce making product-data enrichment the new battleground because emotional copy is wasted on agents who want verifiable facts.
4-min read

5️⃣ The four types of Claude Skills, and how to build each one: A saved prompt rebuilt in ten minutes sits at one end. A system that interviews its owner, keeps memory, and runs on a schedule sits at the other. The gap between them is about fifty times the value. Map any job before building.
6-min read

🎙️ The AI Corner Podcast

This week's guest is Nyssa Waters, founder of possibl.ai. Hear:

Why Nyssa says most custom AI agents are already obsolete, and why getting your scaffolding right beats building twenty of them.
How a single "universal fabric" of skills, plugins and connectors replaces armies of agents and stacks of SaaS subscriptions.
Why the real AI tsunami is coming from the East, from world models and robotics to engineers training models on entirely different languages.

🎧 Apple | Spotify | YouTube

🛠 Latest Builds and Finds

Helping advanced builders stay at the frontier of AI.

1️⃣ a16z on why the App layer isn't dead, and where it survives. The labs walk the "Yellow Brick Road": horizontal, low-step work that improves with raw model capability. Everywhere else in Oz is vertical, multi-step work where the moat is the scaffolding, not the model. The tell: are you a system the customer runs work through, or a tool sitting on top of one? Best read on the build-vs-don't question I've seen this year, and I may do a full write-up.
Article

2️⃣ The Cursor comeback, and why the harness is the moat. Written off 90 days ago as "just a wrapper," Cursor ran $100M to $3B ARR by wrapping a smaller open-source model (Kimi K2.5) in their own harness and beating frontier models at a fraction of the cost. The proprietary RL loop and accept/reject data are the moat, not the model underneath.
Podcast

3️⃣ Skill-cleaner, an auditor for skill prompt-budget bloat. It mirrors how Codex renders skills and flags duplicates, unused skills, and descriptions you can shorten. With my skill library sprawling, this is the housekeeping I keep meaning to automate. A skill's cost isn't the file, it's the prompt real estate it occupies every turn.
SKILL.md

🌍 Tech Updates From Global

The selected top headlines from each major AI tech company.

Anthropic

Claude Opus 4.8 launched as the new flagship with effort control, fast mode 3x cheaper, and beating GPT-5.5 on SWE-Bench Pro (69.2% vs 58.6%). (May 28)
Claude Code added dynamic workflows that plan a large problem then run hundreds of parallel subagents in one session and verify outputs before reporting back. (May 28)
Claude Code shipped a security-guidance plugin plus self-hosted sandboxes and MCP tunnels, with workflows orchestrating up to 1,000 parallel subagents. (May 26)
The Messages API now accepts system entries inside the messages array, letting developers update Claude's instructions mid-task. (May 28)
Closed a $65B Series H at a $965B post-money valuation on a $47B revenue run rate, overtaking OpenAI as the most valuable AI startup. (May 28)
Shipped a Claude Compliance API plus 28 security integrations (CrowdStrike, Okta, Zscaler, Microsoft Purview) routing Enterprise chat and activity logs into existing SIEM and DLP tooling. (May 25)

OpenAI

Gave Codex Computer Use the ability to see, click and type in Windows desktop apps, with remote continuation from phone or Mac. (May 30)
Tuned GPT-5.5 Instant for more natural, better-paced replies and added writing and coding blocks directly inside ChatGPT. (May 30)
Shipped a Codex CLI update with searchable local conversation history, a streamlined profile selector and OAuth-based MCP setup. (May 28)
Detailed a Codex-built self-improving tax agent with Thrive Holdings that hit 97% accuracy across 7,000 returns and cut prep time by a third. (May 27)
Named a Leader in Gartner's enterprise coding agents evaluation. (May 27)

Microsoft

Redesigned Microsoft 365 Copilot into a task-aware workspace that loads 50%+ faster and lifted active usage 27-43% across the core Office apps in piloting. (May 28)
Made Copilot Studio computer-using agents generally available alongside real-time voice agents and agent-to-agent communication. (May 26)
Reported to be merging GitHub Copilot, Copilot chat, Cowork and a new agentic "Autopilot" capability into one super app slated for end of summer. (May 29)

Google / DeepMind

Rolled out Preferred Sources (now 345K+ sites selected) into AI Mode and AI Overviews, plus a perspectives carousel and expanded "Highly Cited" badges. (May 27)
Started rolling out a batch of Workspace AI features including BigQuery anomaly detection in Connected Sheets and NotebookLM-Schoology integration. (May 26)
Released a preview Looker MCP server letting AI agents connect to Looker without middleware, plus publishing Conversational Analytics agents to Gemini Enterprise. (May 28)
Demis Hassabis said AGI is roughly 3-4 years away and framed AI as a "species-level transition" with little margin for error. (May 26)

Amazon / AWS

Made next-generation OpenSearch Serverless generally available for agentic workloads, auto-scaling 20x faster at up to 60% lower cost than peak-provisioned clusters. (May 29)
Deprecated an internal AI-usage leaderboard after staff ran needless agent tasks to climb rankings and drove up token costs. (May 29)

xAI / Grok

Released grok-build-0.1, its fastest coding model, on the API in public beta at 100+ tokens/sec, $1/$2 per million tokens and a 256K context window. (May 29)
Rolled out Custom Skills, letting users build reusable workflows once via natural language or file upload that Grok then applies persistently. (May 26)
Brought Grok Build to SuperGrok and X Premium+ users through a Kilo IDE integration. (May 27)
Finished training Grok V9-Medium, a 1.5-trillion-parameter model trained on real Cursor developer workflows, with public release targeted mid-June. (May 25)

Apple

Leaked details confirmed three new Apple Intelligence photo tools (Extend, Enhance, Reframe) joining Clean Up, with Siri gaining a home in the Camera app. (May 28)

Perplexity

Launched Computer add-ins inside Word, Excel, PowerPoint and Outlook via the Microsoft Marketplace for Pro, Max and Enterprise users. (May 28)
Hit with a CNN copyright and trademark suit alleging it scraped 17,000+ stories and falsely advertised a "Comet Plus" CNN content tier. (May 28)

✨A few people have asked…

It’s Mike here, I run The AI Corner.

I’m not just into writing about AI. I run Allexive, and we help businesses grow without adding headcount by implementing AI platforms, and building AI systems.

Let’s chat if you’re interested to learn more →

👋 Mike & Erin