Hero image representing a simplified, abstract, generic taxonomy structure

Designing taxonomy for scale across a 10,000+ asset content ecosystem

Built and operationalized a unified taxonomy within an enterprise content platform, transforming a manual tagging process into an automated system and establishing scalable workflows for classification and audit.

Impact

Reduced tagging effort from multi-week manual processing to minutes through automation
Enabled classification across 10,000+ content assets in a continuously growing system
Achieved ~80% first-pass accuracy, with defined metrics and ongoing tuning
Created automated audit workflows, reducing validation time from minutes to seconds

Role: Taxonomist
Platform: AEM + Semaphore
Focus: Metadata strategy, automation, governance

Context

This work took place at Optum within a large-scale healthcare content ecosystem supporting digital experiences across multiple platforms. Content was managed in Adobe Experience Manager (AEM) and structured using a unified taxonomy model maintained in Semaphore.

At the time of this work:

The system contained 10,000+ content assets, with ongoing ingestion from internal authors and external vendors
Multiple controlled vocabularies were in use, with varying levels of governance and consistency
Taxonomy played a critical role in content discovery, reuse, and downstream delivery systems

Problem

Content tagging was primarily manual, time-intensive, and difficult to scale.

Tagging even a few hundred assets required weeks of manual effort, creating a bottleneck for content ingestion
Metadata consistency varied, reducing reliability for search, personalization, and reuse
Existing workflows did not support bulk updates or efficient auditing, making quality control slow and error-prone
Success metrics for automated tagging were undefined, making it difficult to evaluate or improve system performance

As content volume continued to grow, these limitations made it increasingly difficult to maintain a reliable and scalable metadata system.

Approach

To move from manual tagging to a scalable system, I focused on four parallel tracks: taxonomy design, workflow enablement, audit and quality control, and governance.

Assessing the system and constraints

I began by evaluating the existing taxonomy, content workflows, and platform limitations within AEM and Semaphore.

Identified gaps in automation, bulk operations, and auditability
Evaluated how content was being ingested and tagged across sources
Determined where manual effort was unavoidable vs. where it could be reduced

This assessment informed a phased approach: stabilize what existed, then introduce scalable improvements.

Designing and structuring the taxonomy

I worked within an existing unified taxonomy model, focusing on improving structure, usability, and consistency.

Refined Topic taxonomy to better support classification and retrieval
Ensured alignment between taxonomy structure and real content patterns
Balanced precision vs. coverage to support automated tagging at scale

Enabling automation within platform constraints

Because automation was still in development, I adapted workflows to bridge gaps between manual and automated processes.

Performed manual tagging at scale during early phases to establish a baseline
Collaborated with developers as AEM workflow automation became available
Supported transition from manual tagging to automated classification at ingestion

Creating scalable audit and validation workflows

To ensure quality at scale, I designed and implemented a repeatable audit process.

Extracted full content datasets (including test data) for analysis
Developed a multi-step transformation process (30+ steps) to prepare data for audit
Identified patterns in tagging accuracy and inconsistency

I then automated this process using scripting, reducing audit time from minutes to seconds and enabling more frequent validation.

Defining success metrics and improving accuracy

When clear success criteria were not initially defined, I worked with a new product manager to establish measurable targets.

Defined accuracy benchmarks (~80% baseline) appropriate for a dynamic content system
Shifted evaluation from binary “complete/incomplete” to meaningful quality metrics
Supported ongoing tuning of the classification model

Supporting governance and adoption

To ensure sustainability, I focused on making the system usable for non-technical stakeholders.

Created job aids and training materials for authors and content teams
Delivered training sessions to support correct taxonomy usage
Established practices to maintain consistency over time

Solution

The final solution combined taxonomy structure, automated classification, and scalable workflows to support content tagging and validation across the ecosystem.

Unified taxonomy model

Maintained and refined a Topic-based taxonomy to support consistent classification
Ensured alignment between taxonomy terms and real-world content patterns
Supported interoperability across multiple controlled vocabularies

Automated tagging at ingestion

Enabled automatic application of taxonomy tags as content entered the system
Reduced reliance on manual tagging while maintaining acceptable accuracy
Supported continuous ingestion from both internal authors and external vendors

Bulk update and management workflows

Established a repeatable process for large-scale metadata updates
Leveraged structured data exports and controlled re-import processes
Enabled efficient correction and refinement of taxonomy application over time

Audit and validation system

Designed a repeatable audit workflow to evaluate tagging accuracy at scale
Automated data transformation and validation steps using scripting
Enabled faster, more frequent quality checks across the content set

Governance and enablement

Delivered training and job aids to support correct taxonomy usage
Provided guidance to both technical and non-technical stakeholders
Established a foundation for ongoing taxonomy governance and refinement

Impact

This work transformed taxonomy from a manual, time-intensive task into a scalable system supporting enterprise content operations.

Operational efficiency

Reduced tagging effort from multi-week manual processing to minutes through automation
Decreased audit time from manual review (minutes) to automated validation (seconds)
Enabled more frequent and reliable quality checks

Scalability

Supported classification across 10,000+ content assets in a continuously growing system
Enabled sustainable tagging workflows despite ongoing content ingestion
Reduced dependency on manual intervention as content volume increased

Quality and consistency

Achieved ~80% first-pass accuracy, with defined benchmarks and ongoing tuning
Improved metadata consistency across assets
Increased confidence in taxonomy as a reliable system of record

Business impact

Strengthened foundation for search, personalization, and content reuse
Improved efficiency of content operations and reduced bottlenecks
Enabled more effective downstream content delivery

This work positioned taxonomy as a core layer of the content ecosystem—connecting structure, automation, and user experience to improve how content is organized, discovered, and delivered at scale.

Reflection & Next Steps

This work reinforced the importance of designing taxonomy as a system—one that balances structure, automation, and real-world usage within platform constraints.

A key takeaway was that accuracy alone is not the goal. In a dynamic content environment, success depends on establishing practical benchmarks, enabling continuous improvement, and supporting the people and processes that sustain the system over time.

If I had continued this work, I would have focused on expanding both capability and adoption:

Deepening platform expertise
Further develop proficiency in Semaphore, including advanced configuration and calculation logic within XML
Improving accessibility of data and insights
Design and implement self-service search and reporting, reducing reliance on manual data extracts and enabling faster decision-making
Enhancing automation and accuracy
Refine classification models through feedback loops and performance monitoring
Expanding taxonomy adoption across systems
Explore broader use of taxonomy tools to bring structure to other platforms, including large-scale content repositories such as SharePoint

These next steps reflect a continued focus on making structured content more usable, discoverable, and scalable across sy