Context
Large retailers managing physical products face a data coordination problem that tends to get worse as they scale. Product information (SKUs, specs, marketing copy, imagery, supply chain data) lives in multiple disconnected systems. Each system was built for a specific team's needs, and over time they drift apart: different taxonomies, conflicting attribute values, duplicate records with no clear authority.
This work sample tackles that exact problem. The ask: define a strategy for replatforming a company's PLM and item master systems, prioritize which systems to tackle first, and explain how you'd handle the messy reality of duplicate records and conflicting data.
The Problem
The core dysfunction was having no single source of truth. Different teams answered the question "what is our product?" from different systems, and they often got different answers. The downstream effects were significant:
- Product data error rates as high as 18% across some systems
- Product launch cycles stretching to 45–60 days due to manual reconciliation
- Annual maintenance costs ranging from $600K to $1.5M per system
- User-created macros and offline workarounds hiding the true scale of the problem
The challenge wasn't just technical. Different business units had built workflows and cultural muscle memory around their systems. Any replatform strategy that ignored organizational reality would fail regardless of how good the architecture was.
My Approach
I structured the work in three phases before proposing any solution: discovery, design, and strategy.
Discovery
Before designing anything, I wanted to understand the full landscape, including the shadow systems people had built to work around official ones. Discovery involved three activities:
- System architecture audit: Map all PLM/catalog systems including hidden tech (macros, offline automations, batch processes). Generate data flow diagrams showing how a product moves from inception to delivery.
- Stakeholder interviews: Sessions with product development, go-to-market, operations, IT, and customer service, focused on their day-in-the-life and, critically, identifying the "untouchables." Every organization has cultural constants that cannot be changed regardless of their technical merit. Finding these early is non-negotiable.
- Pain point analysis: Quantify the impact. Error rates by system. Where bottlenecks occur. How many sources of "truth" exist for each data type.
Three questions guided the discovery: Where does the data originate? What are the most frequent complaints about data quality? Which step causes the most pain?
Design
Based on discovery findings, I recommended a federated domain model: centralized control over product identity, with each functional domain owning its own attributes. The assumptions: business units want to preserve domain autonomy, cross-functional governance is feasible given the culture, and existing systems can support API-based integration.
The model breaks into seven domains:
- Product Identity (Centralized): SKUs, GTINs, taxonomy, master hierarchy. The foundation everything else depends on.
- Product Design (Federated by category): Specs, materials, colorways, fit, sustainability compliance
- Merchandising & Planning: Collections, assortment, channel strategy, pricing
- Digital & Content: Imagery, marketing copy, digital assets, size guides
- Lifecycle Management: Product status, versioning, variant relationships
- Supply Chain: Sourcing, vendor management, packaging, logistics
- Customer Experience: Fit data, style recommendations, reviews, personalization
What I Built / Decided
Phased deployment strategy
The sequence matters as much as the architecture. I proposed four phases ordered by dependency and impact:
- Phase 1: Product Identity Layer. MDM hub for golden records, universal identifiers, API layer. Start here because everything else depends on consistent product identification.
- Phase 2: Commercial/GTM. PIM for customer-facing data, merchandising content management. Second because it has direct revenue impact.
- Phase 3: Technical Specifications. Engineering attributes, PLM integration. Third because it requires deep domain expertise and a stable identity layer underneath.
- Phase 4: Supply Chain. Connect the re-platformed core to supply chain systems. Last because it's entirely dependent on stable upstream data.
Duplicate record strategy
Replatforming without a data triage plan guarantees migrating the mess. I designed a three-part approach: automated detection using fuzzy matching with quality dashboards; a resolution framework with survivorship rules (which system wins for each attribute type), merge workflows with human review for edge cases, and a governance model with named data stewards; and data cleansing sprints prioritized by customer-facing impact first.
System prioritization
Given four systems to evaluate, I scored each against maintenance cost, error rate, replatforming cost, estimated savings, and strategic importance, then calculated break-even and 3/5-year ROI:
| System | Break-even | 3-yr ROI | 5-yr ROI | Error rate | Priority |
|---|---|---|---|---|---|
| C | 1.6 yrs | $7.5M | $12.5M | 18% | 1st |
| A | 1.9 yrs | $5.4M | $9.0M | 12% | 2nd |
| B | 2.4 yrs | $2.7M | $4.5M | 5% | 3rd |
| D | 3.0 yrs | $1.5M | $2.5M | 3% | 4th |
System C wins on every ROI metric despite being the most expensive to replatform. D was deprioritized: lowest error rate, longest break-even, smallest return. The ordering optimizes for responsible budgeting and system health, not raw product throughput.
OKRs
Three OKRs, prioritized by a value-vs-effort matrix and dependency map:
- Establish a single source of truth for product identity: 100% of active SKUs in MDM, 90% reduction in duplicate records, 99% taxonomy accuracy
- Reduce time-to-market by 25%: cut data creation from 6 weeks to 3.5 weeks, automate 80% of cross-system propagation
- Achieve 98% customer-facing data accuracy: 50% reduction in data-related returns, 85% fewer customer-reported errors
Outcome
This is a work sample, so there's no production outcome to report. The framework is built for measurability. Each OKR has explicit success validation: MDM audits, launch timeline tracking, mystery shopping audits, and A/B testing on conversion impact from improved data quality.
The projected upside if the strategy executes: $12.5M in 5-year ROI on System C alone, a product launch cycle cut nearly in half, and a governance model that prevents the fragmentation from happening again.
What I'd Do Differently
A few things I'd push harder on if this were a real engagement:
- Validate the "untouchables" earlier. The federated model assumes cross-functional governance is feasible. That assumption needs to be stress-tested in discovery, not just confirmed. I'd want to run a lightweight governance pilot before committing the architecture to it.
- Challenge the ROI assumptions with real data. The analysis assumes data quality issues cause measurable business impact via delays and costs. That's probably true, but I'd want to see actual incident data and customer complaint logs before presenting the numbers to an exec team.
- Pilot before migrating. I'd want to run the MDM hub on a single product category end-to-end before scaling. Migration risk is real and often underestimated. A contained pilot surfaces the edge cases that kill big-bang rollouts.