Normally, it’s not a question of poor-quality product information failing because you didn’t add enough content. Most frequently, it doesn’t pass the sniff test because your teams have confused data cleansing with data enrichment – they’ve applied the wrong fix at the wrong time.
This article outlines the difference, the failure modes behind this confusion, and what actually fixes product data in a business running on legacy ERP/PIM/feeds. We’ll leave you in a position to diagnose whether you need cleaning, enrichment, or both, as well as revealing how you can stop the same issues reappearing regularly.
The failure: treating all product data work as ‘fixing’ things
Data failure: cleaning and enrichment are treated as interchangeable tasks in backlogs, supplier onboarding, and ‘content projects.’
‘
Operational consequence: duplicated effort, constant rework, and a range of conflicting updates across ERP, spreadsheets, supplier feeds, and eCommerce.
Commercial (and risk) impact: channel rejections, slower time-to-market, poorer discoverability, and preventable increase in returns.
This has predictable causes in mature organisations:
- Multiple sources of truth (ERP, legacy PIM, supplier files, marketplace portals)
- No single attribute dictionary (definitions, allowed values, units, examples)
- Ownership ambiguity (operations “own” onboarding, marketing “own” content, IT “own” integrations, but nobody owns end-to-end quality)
- Feeds that overwrite local work (a supplier refresh wipes enrichment, and manual fixes never go upstream)
What cleaning actually fixes
Carried out suitably, cleaning enhances the reliability of what data you already have. At base, it’s about the triple whammy of correctness, consistency, and structural integrity.
The typical features and actions around cleaning your data? A lot of rules for a start!
- Deduplication rules: merge the criteria for near-identical SKUs and consolidate any variants.
- Normalisation rules: unit conversions, text casing, punctuation, controlled vocabulary (such as size values and colour families).
- Validation rules: mandatory fields by category, range checks (weight, dimensions), format checks (EAN/GTIN length), ‘one-of’ lists for regulated attributes
And for your incoming supplier feeds:
- Supplier intake mapping: protocols for template-to-schema mapping, field-level transformations, and dealing with exceptions.
Cleaning fixes the following operational paint points which, left to their own devices, become so brittle they eventually snap:
- Feed errors
- Listing failures
- Broken filters
- Incorrect variants
We’ll add ‘Internal mistrust in reporting’ as a fifth factor. With such a mess, who would rely on it? A product record is not commercially usable if key attributes are missing, descriptions are skeletal, or assets simply aren’t present.
What enrichment actually fixes
Enrichment enhances the usability of product data for purchasing decisions and compliance with channel requirements. It adds what’s missing and makes your product records fit-for-purpose in terms of discovery, comparison, and compliance.
Typical enrichment artefacts and actions include:
- Attribute completion: category-specific “selling” attributes (compatibility, performance ratings, materials, certifications)
- Content briefs and templates: channel-specific titles, bullets, long description structure, brand tone characteristics and constraints, banned claims
- Digital asset rules: image sets by variant, file naming conventions, document linkage (datasheets, compliance, and sustainability declarations)
- Channel output models: mapping the same truth into Amazon, Google Merchant, ERP exports, and/or B2B catalogues
Enrichment drives outcomes like higher channel acceptance, better on-site filtering, improved SEO, fewer ‘what are the correct specs?’ enquiries, and reduced returns caused by missing or unclear information.
The predictable mistake: enriching dirty data
If you enrich data before you clean it, you’re simply scaling up the errors:
- Duplicate products get enriched twice
- Inconsistent values become harder to reconcile once copied into multiple channels
- Supplier refreshes overwrite your work because the underlying intake and governance weren’t fixed
- AI-generated descriptions amplify inconsistencies because the source attributes are unreliable (‘garbage in, garbage out,’ instead of ‘quality in, quality out’)
If your teams keep generating ‘enhanced content’ but marketplaces are still rejecting listings, and customers are still complaining that specs are wrong, you’re not only failing at enrichment. You’re failing at cleaning and rule enforcement.
A fast diagnostic: what problem do you really have?
A good diagnostic exercise is to use these three tests in one category (not your full catalogue!):
1. Accuracy test (cleaning)
Take 50 high-selling SKUs and verify 10 critical attributes against manufacturer documentation. If you find conflicting units, impossible values, or variants mis-assembled, prioritise cleaning.
2. Completeness test (enrichment)
Take the same SKUs and compare your attribute coverage to what your best channel requires (filters, compliance fields, key specs). If you’re missing the fields which buyers use to decide, prioritise enrichment.
3. Regression test (governance)
Check the last three supplier feed cycles. If ‘fixed’ data reverts, your issue isn’t a lack of effort! Rather, it’s the process: No ownership, a faulty overwrite logic, and missing approval gates.
The ways to fix product data: stabilise, standardise, enforce
Cleaning and enrichment work only when you treat them as steps embedded in an operating model rather than as ad-hoc projects.
1) Stabilise (stop the bleeding)
- Freeze uncontrolled overwrites: define which system is authoritative per attribute group.
- Run dedupe + normalisation on priority categories first (revenue, returns, or channel rejection hotspots).
- Create an exception queue: every failed validation goes into a workflow, not someone’s inbox.
The outcomes are: fewer failures in feeds to channels, fewer listings with duplicates, and a much-reduced need for manual firefighting.
2) Standardise (make ‘good’ a repeatable threshold)
- Build an attribute dictionary: definitions, allowed values, units, examples, per-category requirements
- Standardise supplier templates and mapping rules (what you accept, what you reject, what you transform)
- Define enrichment briefs per channel: what ‘complete’ should mean, not “write better copy!”
The outcome? Faster onboarding, consistent filtering, and fewer arguments about what ‘the right value’’ actually is.
3) Enforce (preventing regression to a bad norm)
- Implement validation rules and approval gates in your PIM (or equivalent hub) so bad data cannot be syndicated
- Monitor completeness and quality scores by category, supplier, and channel output
- Assign ownership: Roles such as – data steward for standards, category for commercial relevance, content for enrichment, and IT for integrations. Document and operationalise this ownership
The outcome is: Fewer rework cycles, higher channel acceptance, and a measurable improvement in time-to-market.
Cleaning vs enrichment: the practical truth
Cleaning fixes errors. Enrichment makes data usable. The big ‘BUT’ is that neither of the two fixes product data on its own, because the underlying and recurring issue is data governance:
- Unclear ownership
- Uncontrolled inputs
- No enforcement layer
Those businesses which gain best traction in their product data management are those which treat cleaning as the foundation, enrichment as the value-add, and the operating model as the permanent fix.
Next step: get a sample output for your category
Reach out to us today at Start with Data. If you want to see what ‘cleaned then enriched’ looks like in practice (as in: attribute dictionary extract, validation rules, supplier template, and a channel-ready product record), we’ll be more than happy to collaborate with you to create a sample output.