Preparing product data for AI starts with the basics

It’s so often the case that an AI initiative in product data starts with a tool demo and a tight deadline for go-live. After all, this shiny new innovation is going to turbo-charge the business, right? Then, when things aren’t living up to expectations, it’s the catalogue that gets blamed for outputs which are wrong, inconsistent, or simply unsafe to publish.

Our article pinpoints the data failures which stymie the full effectiveness of AI tools and reveals the operational consequences in the areas of search, merchandising, and supplier onboarding. Finally, we set out a corrective sequence you can execute:

Stabilise
Standardise
Enforce

Use it as a checklist before you scale AI pilots into production.

1. The failure: product facts aren’t captured as facts

Most catalogues contain, for want of a better term, ‘facts.’ They’re trapped in:

Free-text descriptions
Supplier PDFs and spec sheets
Product titles stuffed with attributes
Spreadsheets with mixed units and conventions

Typical structural faults include:

duplicated SKUs with conflicting attributes
variant families split into separate products
attributes reused with different meanings (“size” as UK number, S/M/L, and inches)
unit drift (mm/cm/in; g/kg/lbs) stored as text strings

Your new AI doesn’t resolve these conflicts. On the contrary, it learns them.

2. Why AI makes it worse (and does it faster)

When the foundations of your data are weak, AI amplifies these issues in three predictable ways:

Repetition: wrong titles or attributes become wrong descriptions at scale
Misclassification: bad taxonomy drives bad recommendations and poor semantic search
False confidence: outputs look plausible, so errors reach customers and channels

Some of the most common failure modes you’ll recognise:

hallucinated materials when “Material” is empty
“compatible accessories” suggested, but without relationship data
inconsistent spelling and terminology (colour VS color; cm VS centimetre VS CM)

3. Operational consequences

In day-to-day operations, these inconsistencies turn into constant manual rework and, naturally, delays:

Merchandising spends time reviewing AI copy instead of enriching ranges
Search teams chase “relevance” problems that are actually missing attributes
Syndication feeds fail validation or get suppressed because formats are wrong
Customer service has to deal with avoidable questions and disputes

And you can measure the commercial and risk impact:

lower conversion from broken discovery and comparison
higher returns from incorrect specs or compatibility claims
margin leakage from incorrect shipping rules and pack sizes
compliance exposure where safety, warranty, or expiry dates are wrong

4. What does “AI-ready” mean in practice?

AI readiness is not a question of volume. It’s three enforced basics:

A. Completeness (per category)

Don’t audit “fields that exist.” Confine yourselves to auditing populated attributes which matter most in buying decisions:

dimensions (packaged and unpackaged)
weight and capacity
material and finish
voltage/phase/IP rating where relevant
compatibility, certifications, and intended use

The targets you need to instil:

Defined and mandatory attributes per category
Measurement of fill-rate by SKU and by supplier
Treating anything below 95% correct as operational debt

B. Consistency (units, vocabulary, format)

Make sure the same concept looks the same everywhere:

numeric value stored separately from unit
controlled value lists (For example: Navy, not navy blue / dark navy)
standard date format (use ISO where systems allow for it)
consistent naming rules for titles and feature bullets

C. Structure (relationships and typed fields)

AI needs something about which it can carry out reasoning. Common examples:

parent–child variant links and defined variant axes
accessory/consumable/spare-part relationships
cross-references for equivalents and replacements
attribute datatypes and validation rules

5. B2C and B2B: Examples of how AI can fail

B2C:

Furniture: depth and width swapped AI è copy repeats it è returns rate goes up
Electronics: screen sizes stored as 55”, 55-inch, 139.7 cm è filters and AI search disagree
DIY: drill bits listed as 10mm, 1 cm, 3/8″ è recommendations miss exact matches

B2B:

Industrial fittings: “50mm”, “5 cm”, “2 inch” create three filter values è buyers hesitate and go elsewhere
Electrical components: 24VDC vs 24 V; phase missing è AI suggests incompatible parts
Chemicals: litres mixed with gallons è pack sizes are inconsistent è reporting on price-per-unit collapses.

6. The corrective sequence: Stabilise. Standardise. Enforce.

Stabilise (stopping new contamination)

quarantine inbound supplier files
block free-text entry for measurable attributes
add validation rules (datatype, range, allowed values)
create an exception queue and approval gate

Standardise (defining the model)

publish an attribute dictionary with definitions and examples
set ‘golden’ units and conversion rules
update supplier templates and mapping specs
clarify packaged vs unpackaged dimensions and naming

Enforce (making it non-optional)

reject non-conforming values at ingestion
track exceptions by supplier and category
set completeness thresholds before syndication
run regular audits for drift and duplicates

7. The next step: align before you automate

If teams are fixing AI outputs in spreadsheets, you’re essentially paying twice for the same gap in your data governance. Therefore, begin with a data assessment which scores completeness, consistency, and structure in your highest-revenue categories, then turn those findings into a remediation backlog of specific artefacts

Attribute sets
Validation rules
Supplier templates
Workflows
Approval gates

From our experience, it’s better to delay production rollouts of generative descriptions, semantic search re-ranking, and automated attribute extraction until your assessment shows stable schema, validated values, and governed supplier inputs. If you rush ahead, review cycles tend to expand and overall confidence remains brittle at best, untrustworthy at worst.

Everyone’s getting in the act with AI. But as we’ve seen, there are perils and pitfalls to negotiate first. If you’re not getting what you need from your AI tools, Get in touch with us today at Start with Data, because we can arrange a product data structure audit for you and get to the root of the issue.

Preparing Product Data for AI Starts with the Basics