
Designing taxonomy for scale across a 10,000+ asset content ecosystem
Built and operationalized a unified taxonomy within an enterprise content platform, transforming a manual tagging process into an automated system and establishing scalable workflows for classification and audit.
Impact
- Reduced tagging effort from multi-week manual processing to minutes through automation
- Enabled classification across 10,000+ content assets in a continuously growing system
- Achieved ~80% first-pass accuracy, with defined metrics and ongoing tuning
- Created automated audit workflows, reducing validation time from minutes to seconds
Role: Taxonomist
Platform: AEM + Semaphore
Focus: Metadata strategy, automation, governance
Context
This work took place at Optum within a large-scale healthcare content ecosystem supporting digital experiences across multiple platforms. Content was managed in Adobe Experience Manager (AEM) and structured using a unified taxonomy model maintained in Semaphore.
At the time of this work:
- The system contained 10,000+ content assets, with ongoing ingestion from internal authors and external vendors
- Multiple controlled vocabularies were in use, with varying levels of governance and consistency
- Taxonomy played a critical role in content discovery, reuse, and downstream delivery systems
Problem
Content tagging was primarily manual, time-intensive, and difficult to scale.
- Tagging even a few hundred assets required weeks of manual effort, creating a bottleneck for content ingestion
- Metadata consistency varied, reducing reliability for search, personalization, and reuse
- Existing workflows did not support bulk updates or efficient auditing, making quality control slow and error-prone
- Success metrics for automated tagging were undefined, making it difficult to evaluate or improve system performance
As content volume continued to grow, these limitations made it increasingly difficult to maintain a reliable and scalable metadata system.
Approach
To move from manual tagging to a scalable system, I focused on four parallel tracks: taxonomy design, workflow enablement, audit and quality control, and governance.
Assessing the system and constraints
I began by evaluating the existing taxonomy, content workflows, and platform limitations within AEM and Semaphore.
- Identified gaps in automation, bulk operations, and auditability
- Evaluated how content was being ingested and tagged across sources
- Determined where manual effort was unavoidable vs. where it could be reduced
This assessment informed a phased approach: stabilize what existed, then introduce scalable improvements.
Designing and structuring the taxonomy
I worked within an existing unified taxonomy model, focusing on improving structure, usability, and consistency.
- Refined Topic taxonomy to better support classification and retrieval
- Ensured alignment between taxonomy structure and real content patterns
- Balanced precision vs. coverage to support automated tagging at scale
Enabling automation within platform constraints
Because automation was still in development, I adapted workflows to bridge gaps between manual and automated processes.
- Performed manual tagging at scale during early phases to establish a baseline
- Collaborated with developers as AEM workflow automation became available
- Supported transition from manual tagging to automated classification at ingestion
Creating scalable audit and validation workflows
To ensure quality at scale, I designed and implemented a repeatable audit process.
- Extracted full content datasets (including test data) for analysis
- Developed a multi-step transformation process (30+ steps) to prepare data for audit
- Identified patterns in tagging accuracy and inconsistency
I then automated this process using scripting, reducing audit time from minutes to seconds and enabling more frequent validation.
Defining success metrics and improving accuracy
When clear success criteria were not initially defined, I worked with a new product manager to establish measurable targets.
- Defined accuracy benchmarks (~80% baseline) appropriate for a dynamic content system
- Shifted evaluation from binary “complete/incomplete” to meaningful quality metrics
- Supported ongoing tuning of the classification model
Supporting governance and adoption
To ensure sustainability, I focused on making the system usable for non-technical stakeholders.
- Created job aids and training materials for authors and content teams
- Delivered training sessions to support correct taxonomy usage
- Established practices to maintain consistency over time
Solution
The final solution combined taxonomy structure, automated classification, and scalable workflows to support content tagging and validation across the ecosystem.
Unified taxonomy model
- Maintained and refined a Topic-based taxonomy to support consistent classification
- Ensured alignment between taxonomy terms and real-world content patterns
- Supported interoperability across multiple controlled vocabularies
Automated tagging at ingestion
- Enabled automatic application of taxonomy tags as content entered the system
- Reduced reliance on manual tagging while maintaining acceptable accuracy
- Supported continuous ingestion from both internal authors and external vendors
Bulk update and management workflows
- Established a repeatable process for large-scale metadata updates
- Leveraged structured data exports and controlled re-import processes
- Enabled efficient correction and refinement of taxonomy application over time
Audit and validation system
- Designed a repeatable audit workflow to evaluate tagging accuracy at scale
- Automated data transformation and validation steps using scripting
- Enabled faster, more frequent quality checks across the content set
Governance and enablement
- Delivered training and job aids to support correct taxonomy usage
- Provided guidance to both technical and non-technical stakeholders
- Established a foundation for ongoing taxonomy governance and refinement
Impact
This work transformed taxonomy from a manual, time-intensive task into a scalable system supporting enterprise content operations.
Operational efficiency
- Reduced tagging effort from multi-week manual processing to minutes through automation
- Decreased audit time from manual review (minutes) to automated validation (seconds)
- Enabled more frequent and reliable quality checks
Scalability
- Supported classification across 10,000+ content assets in a continuously growing system
- Enabled sustainable tagging workflows despite ongoing content ingestion
- Reduced dependency on manual intervention as content volume increased
Quality and consistency
- Achieved ~80% first-pass accuracy, with defined benchmarks and ongoing tuning
- Improved metadata consistency across assets
- Increased confidence in taxonomy as a reliable system of record
Business impact
- Strengthened foundation for search, personalization, and content reuse
- Improved efficiency of content operations and reduced bottlenecks
- Enabled more effective downstream content delivery
This work positioned taxonomy as a core layer of the content ecosystem—connecting structure, automation, and user experience to improve how content is organized, discovered, and delivered at scale.
Reflection & Next Steps
This work reinforced the importance of designing taxonomy as a system—one that balances structure, automation, and real-world usage within platform constraints.
A key takeaway was that accuracy alone is not the goal. In a dynamic content environment, success depends on establishing practical benchmarks, enabling continuous improvement, and supporting the people and processes that sustain the system over time.
If I had continued this work, I would have focused on expanding both capability and adoption:
- Deepening platform expertise
Further develop proficiency in Semaphore, including advanced configuration and calculation logic within XML - Improving accessibility of data and insights
Design and implement self-service search and reporting, reducing reliance on manual data extracts and enabling faster decision-making - Enhancing automation and accuracy
Refine classification models through feedback loops and performance monitoring - Expanding taxonomy adoption across systems
Explore broader use of taxonomy tools to bring structure to other platforms, including large-scale content repositories such as SharePoint
These next steps reflect a continued focus on making structured content more usable, discoverable, and scalable across sy