Skip to main content

Technical Details

This document provides in-depth technical information about the Category classification system, including architecture details, API interfaces, performance metrics, and extension guidelines.

Architecture Overview​

The Category system is built on a multi-layered architecture that combines modern transformer models with efficient routing logic.

Classifier Architecture​

ModernBERT-Based Classification​

The category classifier uses ModernBERT, a state-of-the-art transformer model optimized for classification tasks.

Model Specifications:

  • Architecture: ModernBERT-base
  • Parameters: ~110M parameters
  • Input Length: Up to 512 tokens
  • Output: 14-class probability distribution
  • Inference Time: <50ms average

Classification Pipeline​

type Classifier struct {
categoryInference CategoryInference
categoryMapping *CategoryMapping
config *config.RouterConfig
}

func (c *Classifier) ClassifyCategory(text string) (string, float64, error) {
// 1. Tokenize and encode input
result, err := c.categoryInference.Classify(text)
if err != nil {
return "", 0.0, err
}

// 2. Apply confidence threshold
if result.Confidence < c.config.Classifier.CategoryModel.Threshold {
return "", result.Confidence, nil
}

// 3. Map index to category name
category := c.categoryMapping.IdxToCategory[strconv.Itoa(result.ClassIndex)]
return category, result.Confidence, nil
}

Entropy-Based Reasoning Decision​

Advanced entropy calculation determines when to enable reasoning:

func (c *Classifier) ClassifyCategoryWithEntropy(text string) (string, float64, entropy.ReasoningDecision, error) {
// Get full probability distribution
result, err := c.categoryInference.ClassifyWithProbabilities(text)
if err != nil {
return "", 0.0, entropy.ReasoningDecision{}, err
}

// Calculate entropy for reasoning decision
reasoningDecision := entropy.ShouldUseReasoning(result.Probabilities, c.config)

return result.Category, result.Confidence, reasoningDecision, nil
}

API Interfaces​

Classification API​

Intent Classification Endpoint​

Endpoint: POST /classify/intent

Request Format:

{
"text": "Solve the quadratic equation x² + 5x + 6 = 0",
"options": {
"return_probabilities": true,
"confidence_threshold": 0.6
}
}

Response Format:

{
"classification": {
"category": "math",
"confidence": 0.95,
"processing_time_ms": 42
},
"probabilities": {
"math": 0.95,
"physics": 0.03,
"computer science": 0.01,
"other": 0.01
}
}

Batch Classification Endpoint​

Endpoint: POST /classify/batch

Request Format:

{
"texts": [
"Calculate the derivative of x²",
"Implement a sorting algorithm",
"What is photosynthesis?"
],
"options": {
"return_probabilities": false
}
}

Response Format:

{
"results": [
{
"text": "Calculate the derivative of x²",
"classification": {
"category": "math",
"confidence": 0.92,
"processing_time_ms": 38
}
},
{
"text": "Implement a sorting algorithm",
"classification": {
"category": "computer science",
"confidence": 0.89,
"processing_time_ms": 41
}
},
{
"text": "What is photosynthesis?",
"classification": {
"category": "biology",
"confidence": 0.87,
"processing_time_ms": 39
}
}
],
"batch_processing_time_ms": 125
}

Model Information API​

Endpoint: GET /api/v1/models

Response Format:

{
"models": [
{
"name": "category_classifier",
"type": "intent_classification",
"loaded": true,
"model_path": "models/category_classifier_modernbert-base_model",
"categories": [
"business", "law", "psychology", "biology", "chemistry",
"history", "other", "health", "economics", "math",
"physics", "computer science", "philosophy", "engineering"
],
"metadata": {
"mapping_path": "models/category_classifier_modernbert-base_model/category_mapping.json",
"model_type": "modernbert",
"threshold": "0.60"
}
}
]
}

Performance Metrics​

Classification Performance​

MetricValueNotes
Average Latency45msSingle query classification
Batch Latency35ms/queryBatch processing efficiency
Throughput200 QPSQueries per second
Memory Usage2.1GBModel + runtime overhead
CPU Usage15-25%Single core utilization

Accuracy Metrics​

CategoryPrecisionRecallF1-Score
Math0.940.920.93
Computer Science0.890.870.88
Physics0.850.830.84
Chemistry0.880.860.87
Biology0.860.840.85
Business0.820.800.81
Law0.840.820.83
Economics0.810.790.80
Health0.830.810.82
Psychology0.800.780.79
Philosophy0.780.760.77
History0.820.800.81
Engineering0.870.850.86
Other0.750.730.74

Confidence Distribution​

Confidence Range | Percentage | Accuracy
0.9 - 1.0 | 35% | 97%
0.8 - 0.9 | 28% | 92%
0.7 - 0.8 | 22% | 87%
0.6 - 0.7 | 12% | 81%
0.5 - 0.6 | 3% | 74%

Implementation Details​

Category Mapping​

The system uses JSON mapping files to convert between model outputs and category names:

{
"category_to_idx": {
"business": 0,
"law": 1,
"psychology": 2,
"biology": 3,
"chemistry": 4,
"history": 5,
"other": 6,
"health": 7,
"economics": 8,
"math": 9,
"physics": 10,
"computer science": 11,
"philosophy": 12,
"engineering": 13
},
"idx_to_category": {
"0": "business",
"1": "law",
"2": "psychology",
"3": "biology",
"4": "chemistry",
"5": "history",
"6": "other",
"7": "health",
"8": "economics",
"9": "math",
"10": "physics",
"11": "computer science",
"12": "philosophy",
"13": "engineering"
}
}

Routing Integration​

The classifier integrates with the routing system through the ExtProc handler:

func (r *OpenAIRouter) handleRequestBody(ctx *RequestContext, body []byte) error {
// Extract user content from request
userContent := extractUserContent(body)

// Classify the query
category, confidence, reasoningDecision, err := r.Classifier.ClassifyCategoryWithEntropy(userContent)
if err != nil {
return fmt.Errorf("classification failed: %w", err)
}

// Select model based on category
selectedModel := r.selectModelForCategory(category, confidence)

// Apply reasoning if needed
if reasoningDecision.ShouldUseReasoning {
body = r.applyReasoningMode(body, category, reasoningDecision.Effort)
}

// Route to selected model
return r.routeToModel(selectedModel, body)
}

Caching Integration​

Categories work with semantic caching for performance optimization:

type SemanticCache struct {
backend CacheBackend
threshold float64
categoryWeights map[string]float64
}

func (c *SemanticCache) Get(query string, category string) (*CacheEntry, bool) {
// Use category-specific similarity thresholds
threshold := c.getCategoryThreshold(category)

// Search for similar queries in the same category
return c.backend.FindSimilar(query, threshold, category)
}

func (c *SemanticCache) getCategoryThreshold(category string) float64 {
if weight, exists := c.categoryWeights[category]; exists {
return c.threshold * weight
}
return c.threshold
}

Monitoring and Observability​

Metrics Collection​

The system collects comprehensive metrics for monitoring:

// Classification metrics
metrics.RecordClassifierLatency("category", latency)
metrics.RecordClassificationAccuracy(category, confidence)
metrics.RecordCategoryDistribution(category)

// Routing metrics
metrics.RecordRoutingDecision(category, selectedModel, confidence)
metrics.RecordReasoningDecision(category, useReasoning, effort)

// Performance metrics
metrics.RecordThroughput("classification", qps)
metrics.RecordMemoryUsage("classifier", memoryMB)

Logging​

Structured logging provides detailed insights:

observability.Infof("Category classification: query='%s' category='%s' confidence=%.3f latency=%dms",
query, category, confidence, latencyMs)

observability.Infof("Routing decision: category='%s' model='%s' reasoning=%v effort='%s'",
category, selectedModel, useReasoning, reasoningEffort)

Health Checks​

func (c *Classifier) HealthCheck() error {
// Test classification with known input
testQuery := "What is 2 + 2?"
category, confidence, err := c.ClassifyCategory(testQuery)

if err != nil {
return fmt.Errorf("classification health check failed: %w", err)
}

if category != "math" || confidence < 0.8 {
return fmt.Errorf("unexpected classification result: category=%s confidence=%.3f",
category, confidence)
}

return nil
}

Extension and Customization​

Adding New Categories​

  1. Update Model Training Data:

    # Add training examples for new category
    training_data = [
    {"text": "Example query", "label": "new_category"},
    # ... more examples
    ]
  2. Update Category Mapping:

    {
    "category_to_idx": {
    "new_category": 14
    },
    "idx_to_category": {
    "14": "new_category"
    }
    }
  3. Update Configuration:

    categories:
    - name: "new_category"
    use_reasoning: false
    model_scores:
    - model: "best-model-for-category"
    score: 1.0

Custom Classification Models​

Replace the default ModernBERT classifier:

type CustomClassifier struct {
model CustomModel
}

func (c *CustomClassifier) Classify(text string) (candle_binding.ClassResult, error) {
// Custom classification logic
result := c.model.Predict(text)
return candle_binding.ClassResult{
ClassIndex: result.ClassIndex,
Confidence: result.Confidence,
}, nil
}

// Register custom classifier
classifier := &classification.Classifier{
categoryInference: &CustomClassifier{model: loadCustomModel()},
}

Performance Optimization​

Model Quantization​

// Enable model quantization for faster inference
config := &classification.Config{
UseQuantization: true,
QuantizationBits: 8, // 8-bit quantization
}

Batch Processing​

// Process multiple queries in batches
func (c *Classifier) ClassifyBatch(texts []string) ([]ClassResult, error) {
// Batch tokenization and inference
results := make([]ClassResult, len(texts))

// Process in batches of 32
batchSize := 32
for i := 0; i < len(texts); i += batchSize {
end := min(i+batchSize, len(texts))
batch := texts[i:end]

batchResults, err := c.processBatch(batch)
if err != nil {
return nil, err
}

copy(results[i:end], batchResults)
}

return results, nil
}

Troubleshooting​

Common Issues​

Low Classification Accuracy​

Symptoms: Queries consistently misclassified

Diagnosis:

# Check model health
curl -X GET http://localhost:8080/health

# Test specific queries
curl -X POST http://localhost:8080/classify/intent \
-d '{"text": "test query", "options": {"return_probabilities": true}}'

Solutions:

  • Increase confidence threshold
  • Retrain model with more data
  • Update category mapping

High Latency​

Symptoms: Slow classification responses

Diagnosis:

# Monitor classification metrics
curl -X GET http://localhost:8080/metrics | grep classifier_latency

Solutions:

  • Enable model quantization
  • Use batch processing
  • Optimize hardware (GPU acceleration)

Memory Issues​

Symptoms: High memory usage or OOM errors

Solutions:

# Reduce model size
classifier:
category_model:
use_cpu: true
model_id: "smaller-model"

Next Steps​