Share:

AI Is Good at Picking Qualified Suppliers. It Still Struggles to Pick the Best One.

Companies are rapidly integrating generative AI into procurement and sourcing decisions. The promise is obvious. AI can read thousands of pages of supplier proposals faster than any human team, summarize technical requirements in seconds, and create the appearance of consistency and objectivity in evaluation.  But there is an important distinction managers are starting to overlook.

The same AI system that performs extremely well at identifying whether a supplier meets minimum requirements may perform much less reliably when judging which supplier is truly better.

Recent research published in the Journal of Business Logistics examined this issue by comparing how large language models evaluated supplier bids against evaluations completed by experienced procurement professionals. It analyzed 123 supplier proposals tied to 31 public procurement projects conducted by the State of Ohio between 2023 and 2024. The projects involved complex IT services contracts, many containing large, text-heavy bid packages requiring evaluative judgment rather than simple arithmetic comparisons.

The researchers tested three reasoning-oriented AI models: OpenAI o3, Grok-3-Mini, and DeepSeek R1. They then compared their evaluations against human procurement scores.  The findings revealed a surprisingly clear pattern.

AI performed well when evaluating compliance signals. These are signals tied to baseline qualifications and technical requirements. Does the supplier meet the required certifications? Do they satisfy the mandatory specifications? Did they include the required documentation? Are the implementation requirements addressed?

On these types of tasks, the AI models showed relatively high agreement with human evaluators and relatively stable scoring behavior across repeated evaluations.  But the results changed once proposals shifted from compliance to differentiation.

When suppliers attempted to distinguish themselves through strategic capabilities, innovation claims, implementation approaches, past experience, or value-added propositions, AI scoring became far more volatile. The same proposal could receive meaningfully different evaluations across repeated AI runs even when the prompt and underlying content remained unchanged.

That volatility matters.

In procurement, the most important decisions often occur after baseline qualification has already been established. Most serious bidders can satisfy the minimum requirements. Competitive advantage comes from identifying which supplier will create superior long-term value, adapt better during uncertainty, collaborate more effectively, or reduce implementation risk in ways that are difficult to fully codify.

Those judgments require interpretation, contextual reasoning, and tradeoff assessment. Humans are imperfect at this too, but experienced procurement professionals rely on domain expertise and pattern recognition developed over years of evaluating suppliers and managing outcomes.

Large language models work differently. They generate probabilistic outputs based on statistical relationships in language rather than genuine understanding of supplier quality or operational fit. That distinction becomes especially important in ambiguous or strategically nuanced evaluations.

Many executives currently frame AI adoption as a replacement question: “Can AI evaluate suppliers as well as humans?” That is the wrong question.  A better question is: “Which parts of supplier evaluation are structured enough for AI to handle reliably, and which parts still require human judgment?”

The answer emerging from the research suggests a hybrid approach.

In the first stage, AI can handle qualification screening. It can rapidly process proposals, verify compliance requirements, identify missing information, summarize technical content, and flag inconsistencies. This reduces administrative burden and allows procurement professionals to focus their attention where it matters most.

In the second stage, humans should take the lead in evaluating differentiation. This is where procurement teams assess strategic fit, implementation realism, innovation potential, relationship quality, operational flexibility, and long-term value creation. These decisions are often embedded in subtle contextual cues that AI systems do not evaluate consistently.

One of the most interesting findings from the study is that AI volatility itself may become a useful management signal.  When repeated AI evaluations produce highly inconsistent scores, managers should interpret that inconsistency as a warning sign rather than a nuisance. In many cases, volatility may indicate that the proposal contains ambiguous, subjective, or strategically complex content requiring deeper human review.

In other words, AI uncertainty may serve as a diagnostic tool for identifying where human expertise is most valuable.  This has implications beyond procurement.

Many organizations are currently deploying generative AI into judgment-heavy workflows involving hiring, performance evaluations, contract review, lending decisions, and strategic analysis. In many of these contexts, AI may excel at standardized screening tasks while struggling with contextual differentiation and nuanced tradeoffs.  Managers should resist the temptation to confuse speed with understanding.

The real opportunity is not eliminating humans from decision processes. It is reallocating human attention more effectively.

The organizations that benefit most from generative AI will likely be those that understand where automation creates leverage and where human expertise still creates advantage.

Supplier selection sits directly at that intersection.

 

Based on research published in the Journal of Business Logistics

Finnegan A. McKinley, Anne E. Dohmen, and Vincent E. Castillo, “Do Humans and GAI See Eye to Eye? Implications of LLM Scoring Volatility in Supplier Evaluations,” Journal of Business Logistics, 2026, 47. https://doi.org/10.1111/jbl.70072.

More Blogs

AI in supply chain
Blogs
March 2, 2026

The Amazon Effect for AI: Aadil Kazmi of Infios on Execution, AI Readiness and the Next Competitive Divide in Supply Chain

Execution Is Everything At Manifest 2026, Scott Luton spoke with Aadil Kazmi, Head of AI at Infios, to discuss the next chapter of intelligent supply chain execution. Infios provides an integrated suite of supply chain execution software: order management, warehouse management, and transportation management – all running on a single data model. “When a supply chain runs on a single data model, you can make better decisions,” Kazmi explained. Fragmented systems require expensive data lakes and normalization efforts before even basic BI is possible. An integrated ecosystem simplifies intelligence from the start. For Kazmi, AI is not about flashy demos. But rather, it is about execution. The most advanced technologies mean little if companies cannot execute faster, smarter, and more resiliently in the real world.   Disruption Isn’t Going Away Reflecting on 2025, Kazmi did not sugarcoat reality. Ports closed. Trade wars escalated. Wildfires disrupted domestic production. Shipping lanes tightened. “We don’t believe that supply chain disruptions are going away anytime soon,” he said. Volatility is becoming the baseline, not the exception. But what is changing in 2026 is mindset. Kazmi describes what he calls the “Amazon effect for AI.” Just as Amazon forced retailers to rethink last-mile execution a…
compliance
Blogs
January 27, 2026

AI in Global Trade Compliance: What Works Now, What’s Next, and How to Govern It

Special Guest Blog Post written by Dr. Johannes Hangl with e2open   AI is no longer an experiment in global trade compliance. It’s already being applied in product classification, document-to-declaration workflows, risk targeting, and sanctions screening. At the same time, regulators and customs authorities are adopting AI themselves. This is raising expectations for data quality, transparency, and governance across the entire trade ecosystem. With the EU AI Act set to apply from August 2026, companies that have not yet implemented human-in-the-loop controls, drift monitoring, and defensible audit trails are running out of time to close the gap.   Where AI is already adding real value today: HS and ECN classification   Product classification has become one of the most practical AI use cases. Modern tools can now suggest harmonized system (HS/ HTS) and export control (ECCN) codes, explain the rationale, and attach confidence scores and audit metadata to each decision. This direction mirrors what customs authorities are doing. Administrations such as German Customs have discussed using machine learning to improve targeting and risk detection. It appears both sides of the border are moving toward data-driven decision support. AI does not remove accountability. It changes how accountability is exercised.   Practical…