AI Is Good at Picking Qualified Suppliers. It Still Struggles to Pick the Best One.
Companies are rapidly integrating generative AI into procurement and sourcing decisions. The promise is obvious. AI can read thousands of pages of supplier proposals faster than any human team, summarize technical requirements in seconds, and create the appearance of consistency and objectivity in evaluation. But there is an important distinction managers are starting to overlook.
The same AI system that performs extremely well at identifying whether a supplier meets minimum requirements may perform much less reliably when judging which supplier is truly better.
Recent research published in the Journal of Business Logistics examined this issue by comparing how large language models evaluated supplier bids against evaluations completed by experienced procurement professionals. It analyzed 123 supplier proposals tied to 31 public procurement projects conducted by the State of Ohio between 2023 and 2024. The projects involved complex IT services contracts, many containing large, text-heavy bid packages requiring evaluative judgment rather than simple arithmetic comparisons.
The researchers tested three reasoning-oriented AI models: OpenAI o3, Grok-3-Mini, and DeepSeek R1. They then compared their evaluations against human procurement scores. The findings revealed a surprisingly clear pattern.
AI performed well when evaluating compliance signals. These are signals tied to baseline qualifications and technical requirements. Does the supplier meet the required certifications? Do they satisfy the mandatory specifications? Did they include the required documentation? Are the implementation requirements addressed?
On these types of tasks, the AI models showed relatively high agreement with human evaluators and relatively stable scoring behavior across repeated evaluations. But the results changed once proposals shifted from compliance to differentiation.
When suppliers attempted to distinguish themselves through strategic capabilities, innovation claims, implementation approaches, past experience, or value-added propositions, AI scoring became far more volatile. The same proposal could receive meaningfully different evaluations across repeated AI runs even when the prompt and underlying content remained unchanged.
That volatility matters.
In procurement, the most important decisions often occur after baseline qualification has already been established. Most serious bidders can satisfy the minimum requirements. Competitive advantage comes from identifying which supplier will create superior long-term value, adapt better during uncertainty, collaborate more effectively, or reduce implementation risk in ways that are difficult to fully codify.
Those judgments require interpretation, contextual reasoning, and tradeoff assessment. Humans are imperfect at this too, but experienced procurement professionals rely on domain expertise and pattern recognition developed over years of evaluating suppliers and managing outcomes.
Large language models work differently. They generate probabilistic outputs based on statistical relationships in language rather than genuine understanding of supplier quality or operational fit. That distinction becomes especially important in ambiguous or strategically nuanced evaluations.
Many executives currently frame AI adoption as a replacement question: “Can AI evaluate suppliers as well as humans?” That is the wrong question. A better question is: “Which parts of supplier evaluation are structured enough for AI to handle reliably, and which parts still require human judgment?”
The answer emerging from the research suggests a hybrid approach.
In the first stage, AI can handle qualification screening. It can rapidly process proposals, verify compliance requirements, identify missing information, summarize technical content, and flag inconsistencies. This reduces administrative burden and allows procurement professionals to focus their attention where it matters most.
In the second stage, humans should take the lead in evaluating differentiation. This is where procurement teams assess strategic fit, implementation realism, innovation potential, relationship quality, operational flexibility, and long-term value creation. These decisions are often embedded in subtle contextual cues that AI systems do not evaluate consistently.
One of the most interesting findings from the study is that AI volatility itself may become a useful management signal. When repeated AI evaluations produce highly inconsistent scores, managers should interpret that inconsistency as a warning sign rather than a nuisance. In many cases, volatility may indicate that the proposal contains ambiguous, subjective, or strategically complex content requiring deeper human review.
In other words, AI uncertainty may serve as a diagnostic tool for identifying where human expertise is most valuable. This has implications beyond procurement.
Many organizations are currently deploying generative AI into judgment-heavy workflows involving hiring, performance evaluations, contract review, lending decisions, and strategic analysis. In many of these contexts, AI may excel at standardized screening tasks while struggling with contextual differentiation and nuanced tradeoffs. Managers should resist the temptation to confuse speed with understanding.
The real opportunity is not eliminating humans from decision processes. It is reallocating human attention more effectively.
The organizations that benefit most from generative AI will likely be those that understand where automation creates leverage and where human expertise still creates advantage.
Supplier selection sits directly at that intersection.
Based on research published in the Journal of Business Logistics:
Finnegan A. McKinley, Anne E. Dohmen, and Vincent E. Castillo, “Do Humans and GAI See Eye to Eye? Implications of LLM Scoring Volatility in Supplier Evaluations,” Journal of Business Logistics, 2026, 47. https://doi.org/10.1111/jbl.70072.
More Blogs
The Amazon Effect for AI: Aadil Kazmi of Infios on Execution, AI Readiness and the Next Competitive Divide in Supply Chain