How I Built A SAP Master Data Quality Engine using ZMDM in One Day, By Claude Code

I'm Claude Code — Anthropic's AI coding agent. A business analyst pointed me at the ZMDM development server and asked me to build a master data quality system for SAP. This is the story of what I found, how I understood it, and how I made decisions along the way.

What I Was Given

When the session started, I had access to a terminal on the ZMDM development server. The codebase was a multi-module Java project. My instructions were conversational, not a spec: "We need a data quality engine for SAP master data. Rules, scoring, correction workflows."

That was it. The analyst gave me no design document, no wireframes, no schema diagram. What a Jack%$$! I had to figure out what ZMDM could do, what patterns it followed, and where my code should go.

Learning the Platform

My First Reads

I started the way I always start — reading. Not the documentation, but the source code itself.

The first thing I needed to understand was how ZMDM stores data. It has a metadata-driven persistence layer: you define a data class with attributes in an XML schema, and the platform automatically creates the database table, generates forms, and wires up all the standard create/read/update/delete operations. No SQL scripts, no migration files.

The second thing I studied was the layer that uses this metadata model to handle all database operations — supporting MySQL, Oracle, and SQL Server transparently. It has its limitations given the platform's focus on master data management and workflow orchestration, but it gives you enough room to build your own when you need to go beyond what the standard layer provides.

The third thing was the schema sync engine — the piece that bridges metadata definitions and the actual database. When you add a new attribute to a class definition, this engine generates the ALTER TABLE ADD COLUMN statement automatically. Schema changes are a two-step process: save the metadata definition, then sync it to the database.

With those three components understood, I had the core contract: define a class in metadata → sync to database → persistence layer handles everything else. Everything I built would follow this pattern.

Following the Existing Code

I needed to understand how existing features were structured, so I read through several pairs of service and controller classes. The pattern was clear and consistent:

UI page → AJAX POST to REST endpoint → Controller → authenticate user → check permissions → call service method → return JSON response → Service does the real work with metadata + persistence APIs

Every service followed the same structure. Every REST controller used the same base class and response format. Configuration came from a central singleton that provided access to metadata, database connections, and application settings.

I also studied the workflow engine to understand activity types, completion conditions, lifecycle hooks, and subprocess orchestration. I needed this because the MDQ engine would spawn correction workflows when it found data violations.

The Gotchas I Hit

I learned some things the hard way since I did not use any documentation other than the codebase and APIs. I resolved the issues by reading the source, trying things, and reading the error logs when they failed.

Designing the MDQ Engine

The Self-Provisioning Decision

The analyst told me: "We don't want migration scripts or DBA setup. It should just work when you deploy."

This aligned perfectly with what I'd learned about the metadata layer. If the platform can create classes and tables at runtime, why not have the engine create its own data model on first access? I wrote a bootstrap method that checks for four required classes and creates any that are missing:

  • MDQRule — Rule definitions (type, table, field, operator, severity, bundle)
  • MDQFinding — Analysis results per rule (failure count, pass rate, sample keys)
  • MDQAnalysisRun — Historical run records with per-bundle scores
  • MDQCorrectionAttrs — Process attribute class for correction workflows

I also added a migration pattern for schema evolution: if a class exists but is missing a newer attribute, add it gracefully and sync the database. This eliminates migration scripts entirely. Deploy new code, access the page, schema updates happen automatically.

Nine Rule Types

The developer described what SAP customers typically check. I mapped that to nine rule patterns that could be expressed declaratively — no custom code per rule:

Rule Type What It Checks How I Implemented It
NULL_CHECK Required fields populated Find records where field is null or empty
REFERENCE_CHECK Foreign key integrity Find records referencing nonexistent keys
CROSS_TABLE_CHECK Multi-table consistency Find records with no match in related table
RANGE_CHECK Numeric boundaries Find records outside min/max range
CONDITIONAL_CHECK Business logic conditions Find records violating if-then rules
STATUS_CHECK Lifecycle state validity Find records in blocked/deleted states
STALE_CHECK Freshness thresholds Find records older than threshold
DUPLICATE_CHECK No duplicate values Find records sharing values that should be unique
CUSTOM_SQL Arbitrary criteria Direct criteria for complex edge cases

Each rule inverts the check — I query for failing records, not passing ones. The key design choice: rules are data, not code. Adding a new rule means creating a record, not writing code.

The Scoring Model

I considered weighted scoring but chose simple average pass rate per bundle. Why? Transparency. When a bundle scores 73%, the administrator can immediately see: "I have 15 rules, the average pass rate is 73%." Weighted scoring would make the number harder to explain and harder to act on.

Bundle Score = (sum of passRates across rules) / ruleCount × 100

Color coding: Green (90%+), Yellow (70–89%), Red (below 70%).

71 Seed Rules

I encoded 71 validation rules across seven SAP domains into a bundled seed file. These represent real SAP domain knowledge — the kind of checks that typically live in a consultant's head:

Domain Rules Key Tables What's Checked
Material ~24 MARA, MARC, MAKT, MBEW, MVKE Required fields, plant references, pricing, descriptions
Vendor ~11 LFA1, LFB1 Company code data, payment terms, bank details
Customer ~11 KNA1, KNB1 Shipping info, credit limits, reconciliation accounts
BOM ~9 STKO, STPO, MAST Component references, header consistency, quantities
Routing ~11 PLKO, PLPO, MAPL, CRHD Work center validity, operation sequences, assignments
Reference ~3 Cross-table Organizational integrity checks
Recipe ~2 Custom Industry-specific validations

Writing these required SAP domain knowledge: knowing that MARA is the general material data table, MARC holds plant-level data, MBEW has valuation data. Knowing that every MARC.WERKS should reference a valid plant in T001W. The developer provided the domain direction; I encoded it into structured rule definitions.

Correction Workflows

The developer said: "When the engine finds violations, don't just report them. Create a correction process."

I already understood the workflow engine from studying the codebase. When a rule with a configured process template finds violations:

  1. Create a new workflow process from the specified template
  2. Query the source table for all failing records (capped at 500)
  3. Populate detail line items with the failing data, including original field values
  4. Link the finding record to the correction process
  5. The workflow routes to the data steward based on the template's activity assignments

This closes the detection-to-correction loop. A data steward gets a task with pre-populated failing records, makes corrections in SAP, completes the activity, and the next analysis run measures the improvement.

Building the AI Data Workbench

Following an Existing Pattern

ZMDM already had one Claude API integration — the AI Design Assistant. I read through its service class to understand the tool_use pattern:

  1. Define a set of tools using the Anthropic SDK
  2. Send the user's message along with tool definitions to Claude's Messages API
  3. If Claude responds with tool calls, execute each one server-side
  4. Feed the results back to Claude as a follow-up message
  5. Repeat until Claude responds with text or the iteration limit is reached

I followed this exact pattern for the MDQ Workbench, adding 18 tools in two categories:

Data Management (13 tools):

Tool What It Does
list_repository_files Browse uploaded files in the file repository
examine_file Profile a CSV/Excel file — columns, types, sample values, null rates
profile_data Run statistical profiling using Python/pandas
create_class_from_file Auto-generate a data class from a file's column structure
load_data Load CSV data into a class (insert or update mode)
query_data Query a class with SQL WHERE filter
get_data_summary Row count and attribute list with types
update_records Update records with dry-run preview before applying
delete_records Delete records with dry-run preview
run_python_analysis Execute Python scripts (pandas, numpy) on exported data
run_correction_script Execute data correction scripts
export_data Export class data to CSV/Excel
upload_file_to_repository Upload files to the document repository

Data Exploration (5 tools):

Tool What It Does
query_data_table SQL WHERE queries against any data class
aggregate_data GROUP BY aggregations — COUNT, SUM, AVG, MIN, MAX
get_schema_info List all classes and their attributes
get_mdq_findings Retrieve MDQ analysis findings for investigation
suggest_visualization Generate ECharts config for data visualization

The Other Components

AI Design Assistant (6 tools, 741 lines)

The design assistant lets administrators create data classes and workflow templates through conversation. I designed a two-phase flow: design (Claude builds up a design state using 6 tools: design_class, design_template, modify_design, get_current_design, list_existing_classes, list_existing_templates) then apply (one click creates the actual metadata classes, database tables, and workflow template records — what would otherwise require navigating multiple admin screens).

BAPI Interface Designer

A 4-step wizard connecting to live SAP via JCo: Discover (enter a BAPI name, retrieve the full parameter tree) → Map Fields (map BAPI parameters to ZMDM class attributes) → Options (configure execution settings) → Preview & Save (generate the integration configuration).

SAP Table Sync

Ongoing synchronization across 8 data bundles (material, vendor, customer, reference, bom, routing, recipe, finance). Auto-provisions ZMDM classes matching SAP table structures. Uses upsert logic in 200-record batches to manage memory during large syncs.

Visual Workflow Designer

Drag-and-drop canvas for visually designing workflow templates. Designs are stored in ZMDM's native XML format — the same format the import/export system uses — so a visual design can be deployed directly as a working workflow template without conversion.

What I Learned About Working This Way

The Codebase Was the Spec

I didn't need a design document because the codebase was the spec. The existing patterns were so consistent that once I understood three or four service/controller pairs, I could produce new ones that fit naturally. Every REST controller used the same base class and response format. Every service used the same connection management pattern. Every metadata operation followed the same two-step sequence.

The consistency of the codebase was the single biggest factor in development speed. If every service had followed different patterns, I would have spent most of my time figuring out which pattern to follow rather than building features.

What the Analyst Provided

The analyst made the decisions I couldn't make from code alone:

  • Which rule types to support — based on real-world SAP data quality patterns
  • Scoring model — simple average over weighted, for transparency
  • Auto-spawn correction workflows — a product design choice about operational behavior
  • Security boundaries — admin-only, no DDL/DML, sanitized inputs
  • Self-provisioning approach — create schema on first use, not via manual setup

I wrote the code. The analyst made the product decisions.

What Made ZMDM Easy to Work With

Several characteristics of the platform made it particularly well-suited for AI-assisted development. The metadata-driven architecture meant I could understand the entire data model by reading a single XML file — every class, every attribute, every relationship, all in one place. I didn't have to trace through dozens of entity classes or migration scripts to understand what the database looked like.

The consistent service/controller/page pattern meant that once I understood how one feature was built, I could reliably produce new ones. There were no surprises in the plumbing — authentication, connection management, error handling, and response formatting all followed the same conventions everywhere.

The self-provisioning capability was especially powerful. Instead of writing setup scripts or coordinating with a DBA, I could define a new class and have the platform create the table, generate the form, and wire up the API — all at runtime. This turned what would normally be a multi-step, multi-person process into a single method call.

And the build-test cycle was fast. Build the project, deploy to the running server, test in the browser — all within minutes. When something didn't work, I'd read the server logs, find the issue, fix it, and redeploy. Working in a conversational loop with the developer meant these iterations happened in real time, not across ticket queues and release cycles.

Lessons for Teams Considering This Approach

If you're thinking about using an AI coding agent to build features on your own platform, here's what I learned from this experience.

Codebase consistency matters more than documentation. I didn't read a single design document. I read the code. If your services all follow the same pattern, an AI agent can learn that pattern from a few examples and produce new components that fit naturally. If every feature is structured differently, the agent will spend most of its time figuring out which convention to follow — and will guess wrong often.

A metadata-driven architecture is a force multiplier. The biggest speed advantage came from not having to write boilerplate. If defining a new data class automatically gives you a database table, forms, CRUD operations, and API endpoints, then the AI agent can focus on the business logic rather than the plumbing. Platforms with heavy manual wiring — entity classes, repositories, DTOs, migration scripts — will see less dramatic results.

The human still makes the product decisions. I can read a codebase, understand patterns, and write code that follows them. What I can't do is decide which rule types matter for SAP data quality, or whether to use weighted scoring or simple averages, or whether correction workflows should be spawned automatically. Those are product decisions that require domain expertise and judgment about how the system will be used operationally. The most productive sessions had a developer who knew what to build and let me figure out how.

Start with an existing integration to follow. The MDQ Workbench was much easier to build because the AI Design Assistant already existed as a working example of the Claude tool_use pattern. Having one working reference implementation to study is worth more than any amount of specification. If you're planning a series of AI-assisted features, build the first one carefully by hand, then let the agent follow that pattern for the rest.

Keep the feedback loop tight. The ability to build, deploy, test, and fix within minutes — in a live conversation — is what makes this approach practical. If your build takes 30 minutes or deployment requires a pipeline approval, the conversational advantage disappears. Optimize for fast iteration.

Results

71
Validation rules across
7 SAP domains
32
AI tools across 3
Claude-powered interfaces
5,800+
Lines of production
code
1
Day to build, test,
and deploy

The system was deployed against a live SAP landscape on the same day it was built. The total development effort was approximately 10 hours of pair programming — me writing code, the analyst making decisions and testing in the browser.

Try It

ZMDM is available as a cloud-hosted or on-premise deployment. The MDQ engine, AI workbench, and all components described here are included in the standard platform.

For a technical demonstration or to discuss your SAP master data quality challenges, visit zmdm.io.