A three-month evaluation of Microsoft’s M365 Copilot AI assistant in a key UK department found mixed results and few measurable efficiency gains.
Mixed Results From Promising Tech
The UK Department for Business and Trade (DBT) has published the results of a detailed trial of Microsoft’s M365 Copilot AI assistant, revealing no definitive evidence that the tool leads to higher productivity. Despite users reporting moderate satisfaction and perceived time savings, the trial concluded that the AI often performed inconsistently across tasks, and in some cases, reduced output quality.
The trial, which ran from October to December 2024, involved 1,000 Copilot licences distributed across the department, with roughly 300 participants consenting to monitored usage. The pilot aimed to assess the AI’s real-world impact on common digital tasks using Microsoft 365 apps such as Word, Outlook, Teams, Excel and PowerPoint.
Microsoft markets Copilot as an AI productivity enhancer that can summarise meetings, draft emails, generate slides, analyse data, and more. However, the DBT report suggests the real-world impact was far more nuanced than the promotional material might suggest.
Where Copilot Worked And Where It Didn’t
The government study found that users were most satisfied when using Copilot to perform simpler, text-based tasks, e.g., summarising meetings, writing emails, and condensing written communications.
In the trial, these tasks consistently showed time savings and improved clarity when compared with work from non-Copilot users. Email writing was slightly faster and judged higher in quality and accuracy.
However, performance in more complex tasks was notably weaker. For example, data analysis in Excel and visual content creation in PowerPoint suffered, with AI-generated outputs often requiring correction or falling short of expectations. PowerPoint slide creation was seven minutes faster on average, but to a lower standard of quality. In Excel, AI users took longer and produced less accurate results than their non-AI counterparts.
The report concluded: “We did not find robust evidence to suggest that time savings are leading to improved productivity. However, this was not a key aim of the evaluation, and therefore, limited data was collected to identify if time savings have led to productivity gains.”
Light Usage of Copilot
The study also revealed relatively light usage patterns. According to the M365 Copilot dashboard, the average user triggered just 1.14 Copilot actions per working day across the 63-day pilot.
Word, Outlook and Teams saw the highest engagement, but more specialised tools such as Excel, PowerPoint, Loop and OneNote saw very low uptake, i.e., less than 7 percent of users activated Copilot in Excel or PowerPoint on any given day. Loop and OneNote usage was negligible.
These numbers raise questions about whether the value justifies the cost. For example, UK commercial Copilot licences currently range from £4.90 to £18.10 per user per month. For large departments or enterprises, these costs could escalate rapidly, especially if many users engage with the tool only sporadically.
User Attitudes and AI Limitations
While 72 percent of participants were “satisfied or very satisfied” with Copilot, it seems that qualitative interviews suggested that much of this enthusiasm came from the novelty or perceived time saved on repetitive admin tasks. In some cases, staff used their saved time to take training courses or enjoy longer breaks, rather than focusing on higher-value work.
Interestingly, a significant number of participants (22 percent) reported witnessing hallucinations (AI-generated inaccuracies or fabrications). Another 11 percent were unsure, highlighting the still-fragile trust in GenAI tools in professional settings.
Adoption also varied across teams, often influenced by management attitudes. Some line managers actively encouraged use, while others created a “frosty” culture around AI assistance, which in turn impacted engagement.
Microsoft and Its Competitors
For Microsoft, the report is clearly a mixed result. The company has heavily invested in integrating Copilot across its core product suite and has made productivity gains a central part of its pitch. But in the DBT trial, the return on investment appears questionable.
Critics say that the AI’s strengths in low-complexity tasks are well documented, but the promise of broad-based productivity enhancement still feels premature.
A recent MIT survey cited in the DBT report found that 95 percent of companies investing in generative AI tools (including M365 Copilot) had little tangible benefit to show for it. With corporate spending on GenAI already topping $40 billion, pressure is growing for vendors to demonstrate real ROI.
The findings may also embolden competitors such as Google, Zoho, or even open-source productivity platforms. For now, Copilot’s core strength appears to lie in routine, text-heavy admin support. In more complex tasks requiring judgement or accuracy, it remains inconsistent.
As DBT continues to analyse the environmental and cost impacts of Copilot, Microsoft may need to further refine how its AI interacts with different apps and workflows, or risk a broader slowdown in enterprise adoption.
What Does This Mean For Your Business?
The DBT trial showed that M365 Copilot could save time on routine admin, but may not provide any meaningful productivity gains across a department. For UK businesses considering using the tool (or using it already), this raises some serious questions about cost-effectiveness, especially where licences are purchased at scale but only lightly used. With Microsoft positioning Copilot as a flagship product, the pressure to deliver clear, measurable value will only grow.
Usage data from the trial suggests that even in a controlled, well-supported environment, AI tools are far from being embedded into daily workflows. Inconsistent performance in more complex tasks, combined with ongoing concerns about hallucinations, points to a maturity gap that will be difficult to ignore. Competitors offering simpler or more focused AI products may now find space to challenge Microsoft’s all-in-one approach, particularly in areas where users need speed and accuracy over generalised support.
0 Comments