Share:

AI Is Good at Picking Qualified Suppliers. It Still Struggles to Pick the Best One.

Companies are rapidly integrating generative AI into procurement and sourcing decisions. The promise is obvious. AI can read thousands of pages of supplier proposals faster than any human team, summarize technical requirements in seconds, and create the appearance of consistency and objectivity in evaluation.  But there is an important distinction managers are starting to overlook.

The same AI system that performs extremely well at identifying whether a supplier meets minimum requirements may perform much less reliably when judging which supplier is truly better.

Recent research published in the Journal of Business Logistics examined this issue by comparing how large language models evaluated supplier bids against evaluations completed by experienced procurement professionals. It analyzed 123 supplier proposals tied to 31 public procurement projects conducted by the State of Ohio between 2023 and 2024. The projects involved complex IT services contracts, many containing large, text-heavy bid packages requiring evaluative judgment rather than simple arithmetic comparisons.

The researchers tested three reasoning-oriented AI models: OpenAI o3, Grok-3-Mini, and DeepSeek R1. They then compared their evaluations against human procurement scores.  The findings revealed a surprisingly clear pattern.

AI performed well when evaluating compliance signals. These are signals tied to baseline qualifications and technical requirements. Does the supplier meet the required certifications? Do they satisfy the mandatory specifications? Did they include the required documentation? Are the implementation requirements addressed?

On these types of tasks, the AI models showed relatively high agreement with human evaluators and relatively stable scoring behavior across repeated evaluations.  But the results changed once proposals shifted from compliance to differentiation.

When suppliers attempted to distinguish themselves through strategic capabilities, innovation claims, implementation approaches, past experience, or value-added propositions, AI scoring became far more volatile. The same proposal could receive meaningfully different evaluations across repeated AI runs even when the prompt and underlying content remained unchanged.

That volatility matters.

In procurement, the most important decisions often occur after baseline qualification has already been established. Most serious bidders can satisfy the minimum requirements. Competitive advantage comes from identifying which supplier will create superior long-term value, adapt better during uncertainty, collaborate more effectively, or reduce implementation risk in ways that are difficult to fully codify.

Those judgments require interpretation, contextual reasoning, and tradeoff assessment. Humans are imperfect at this too, but experienced procurement professionals rely on domain expertise and pattern recognition developed over years of evaluating suppliers and managing outcomes.

Large language models work differently. They generate probabilistic outputs based on statistical relationships in language rather than genuine understanding of supplier quality or operational fit. That distinction becomes especially important in ambiguous or strategically nuanced evaluations.

Many executives currently frame AI adoption as a replacement question: “Can AI evaluate suppliers as well as humans?” That is the wrong question.  A better question is: “Which parts of supplier evaluation are structured enough for AI to handle reliably, and which parts still require human judgment?”

The answer emerging from the research suggests a hybrid approach.

In the first stage, AI can handle qualification screening. It can rapidly process proposals, verify compliance requirements, identify missing information, summarize technical content, and flag inconsistencies. This reduces administrative burden and allows procurement professionals to focus their attention where it matters most.

In the second stage, humans should take the lead in evaluating differentiation. This is where procurement teams assess strategic fit, implementation realism, innovation potential, relationship quality, operational flexibility, and long-term value creation. These decisions are often embedded in subtle contextual cues that AI systems do not evaluate consistently.

One of the most interesting findings from the study is that AI volatility itself may become a useful management signal.  When repeated AI evaluations produce highly inconsistent scores, managers should interpret that inconsistency as a warning sign rather than a nuisance. In many cases, volatility may indicate that the proposal contains ambiguous, subjective, or strategically complex content requiring deeper human review.

In other words, AI uncertainty may serve as a diagnostic tool for identifying where human expertise is most valuable.  This has implications beyond procurement.

Many organizations are currently deploying generative AI into judgment-heavy workflows involving hiring, performance evaluations, contract review, lending decisions, and strategic analysis. In many of these contexts, AI may excel at standardized screening tasks while struggling with contextual differentiation and nuanced tradeoffs.  Managers should resist the temptation to confuse speed with understanding.

The real opportunity is not eliminating humans from decision processes. It is reallocating human attention more effectively.

The organizations that benefit most from generative AI will likely be those that understand where automation creates leverage and where human expertise still creates advantage.

Supplier selection sits directly at that intersection.

 

Based on research published in the Journal of Business Logistics

Finnegan A. McKinley, Anne E. Dohmen, and Vincent E. Castillo, “Do Humans and GAI See Eye to Eye? Implications of LLM Scoring Volatility in Supplier Evaluations,” Journal of Business Logistics, 2026, 47. https://doi.org/10.1111/jbl.70072.

More Blogs

travel
Blogs
August 28, 2025

Why a “Perfect Fit” TMS Beats Feature-Packed Systems

The right match unlocks efficiency, visibility, and cost control—without drowning in unused features Special Guest Blog Post written by e2open   When picking a Transportation Management System (TMS), shiny features, slick dashboards, and buzzwords can be distracting. But here’s the truth: real ROI doesn’t come from having the most bells and whistles. It comes from finding a TMS that fits your transportation complexity like a glove.   Too simple, and you’ll outgrow it before the ink is dry. Too complex, and you’ll be paying for tools you don’t use. Nail the fit, though, and other KPIs like cost savings, faster execution, and happier customers will slide into place.   How to pick a TMS that fits your freight   Carriers and LSPs running on legacy systems miss out on the real-time visibility and cost control a modern TMS delivers, leaving them slower, less efficient, and easier to undercut. Let’s unpack how to look beyond flashy features and choose a TMS that works for your business:   Match complexity first. The biggest ROI driver is aligning your TMS with your transportation complexity. Get that right, and everything else follows.   Consider adaptability and scalability. Your TMS should grow with you. Look…
supply chain decision making
Blogs
February 16, 2026

2026 Is the Year of No Excuses: Why Calmer Conditions Could Expose (and Reward) True Commercial Leadership

A Shift in the Narrative for 2026 In a recent conversation, Scott Luton spoke with Mark Gilham, Vice President & Head of Global Advisory at Enable, about what supply chain and commercial leaders should expect from the year ahead. While many annual outlooks attempt to forecast the next major disruption, Gilham offered a different lens: 2026 may become the “year of no excuses.” After years defined by a global pandemic, inflationary shocks, geopolitical instability, supply shortages, and the rapid rise of AI, organizations have already endured extraordinary volatility. Businesses not only survived, but in many cases adapted and grew. According to Gilham, that reality weakens the argument that disruption alone explains underperformance. Disruption is not disappearing, he cautioned, but leaders can only lean on it for so long.   Why a Calmer Year Raises the Bar Gilham argued that if external conditions stabilize even slightly, the pressure on leadership actually increases. A less chaotic environment removes convenient explanations and shines a brighter light on internal shortcomings. Process gaps, misaligned incentives, and execution failures become harder to ignore when the world is not on fire. Rather than waiting for certainty, Gilham believes leaders should act decisively. This does not mean radical…