Overview
This tutorial shows you how to generate a specification from your dataset. Specifications define the structure and requirements that generated samples must follow.What You’ll Learn
- How to trigger specification analysis
- Monitoring analysis progress
- Reviewing generated specifications
- Customizing specifications
- Using specifications for generation
Prerequisites
- A created dataset (see Creating a Dataset)
- Dataset ID from previous tutorial
- API key for authentication
Step 1: Start Specification Analysis
Trigger the analysis process:Python Example
Step 2: Monitor Analysis Progress
Check the status periodically:Python Polling Script
Analysis typically takes 2-5 minutes. Larger datasets may take up to 10 minutes.
Step 3: Review Generated Specification
Once analysis completes, retrieve the specification:View the YAML Configuration
Theconfig_yaml field contains the specification:
Step 4: Customize the Specification
You can edit the specification to refine requirements:Add More Requirements
Modify Variation Axes
Update the Specification
Updating a specification creates a new version. The previous version is preserved.
Step 5: Test with Small Generation
Before generating many samples, test with a small batch:Step 6: View Specification Versions
List all versions of a specification:Common Issues
Analysis takes too long
Analysis takes too long
Normal duration: 2-10 minutesIf longer than 15 minutes:
- Check status for error messages
- Verify dataset is not corrupt
- Try with smaller dataset first
- Contact support if persistent
Analysis fails
Analysis fails
Common causes:
- Dataset files corrupted
- Unsupported file format
- Files not UTF-8 encoded
- Dataset empty or too small
- Check error message in status response
- Verify dataset has at least 3-5 samples
- Ensure files are valid and readable
Specification quality issues
Specification quality issues
Problem: Generated spec doesn’t capture requirementsSolution:
- Manually edit specification
- Add specific requirements
- Define clearer variation axes
- Provide more diverse seed data
YAML format errors
YAML format errors
Problem: YAML syntax error when updatingSolution:
- Validate YAML syntax: https://www.yamllint.com/
- Check indentation (use spaces, not tabs)
- Escape special characters
- Use multiline strings with
|
Best Practices
✅ Review automatically generated specs: Always review before large generation runs ✅ Start with small tests: Generate 5-10 samples to validate spec quality ✅ Be specific in requirements: Clear requirements → better samples ✅ Use diverse seed data: More variety → better specification ✅ Iterate and refine: Test, review, update, repeat ✅ Document changes: Add notes when creating new versionsSpecification Quality Checklist
Before using a specification for production:- Requirements are clear and specific
- All mandatory fields/properties are listed
- Data formats are defined
- Variation axes cover important dimensions
- Variation values are distinct and clear
- Tested with small sample batch
- Samples meet quality expectations

