Discovery Scheduling

Effective discovery scheduling ensures your CMDB stays current while minimizing impact on network and system resources. Tripl-i provides flexible scheduling options that adapt to your infrastructure's needs and operational windows.

Scheduling Architecture

Scheduling Engine

Schedule Types

Fixed Schedules

Daily Discovery:
  schedule: "0 2 * * *"  # 2 AM daily
  targets: all_infrastructure
  type: incremental
  max_duration: 4h
  
Weekly Full Scan:
  schedule: "0 6 * * 0"  # Sunday 6 AM
  targets: all_infrastructure
  type: full
  max_duration: 12h
  
Hourly Critical:
  schedule: "0 * * * *"  # Every hour
  targets: 
    - tag: critical
    - tag: production
  type: incremental
  max_duration: 45m

Dynamic Schedules

Event-Driven:
  triggers:
    - new_device_detected
    - configuration_change
    - incident_created
    - deployment_completed
  
  response:
    delay: 5m
    type: targeted
    scope: affected_items
    
Change-Based:
  monitor:
    - deployment_pipeline
    - change_calendar
    - maintenance_windows
  
  action:
    pre_change: baseline_scan
    post_change: verification_scan
    delay: 30m

Schedule Configuration

Basic Scheduling

# Schedule configuration example
schedules:
  production_servers:
    name: "Production Server Discovery"
    description: "Critical production infrastructure"
    enabled: true
    
    timing:
      frequency: every_4_hours
      start_time: "00:00"
      timezone: "America/New_York"
      blackout_windows:
        - start: "08:00"
          end: "09:00"
          days: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
          reason: "Peak business hours"
    
    targets:
      include:
        - ip_range: "10.1.0.0/16"
        - tags: ["production", "critical"]
      exclude:
        - ip: "10.1.1.1"  # Router
        - tag: "maintenance"
    
    discovery:
      type: incremental
      methods: ["agent", "wmi", "ssh"]
      timeout: 300
      parallel_jobs: 10

Advanced Scheduling

Intelligent Scheduling

smart_schedule:
  name: "Adaptive Infrastructure Discovery"
  
  rules:
    business_criticality:
      critical:
        frequency: real_time
        method: agent_only
      high:
        frequency: every_2_hours
        method: agent_preferred
      medium:
        frequency: every_12_hours
        method: agentless_ok
      low:
        frequency: daily
        method: any
    
    device_type:
      database_servers:
        frequency: every_hour
        preferred_window: "02:00-05:00"
      web_servers:
        frequency: every_4_hours
        avoid_window: "09:00-17:00"
      workstations:
        frequency: on_login
        max_daily: 2
    
    change_frequency:
      high_change:  # > 10 changes/week
        frequency: every_2_hours
      moderate_change:  # 3-10 changes/week
        frequency: every_6_hours
      stable:  # < 3 changes/week
        frequency: daily

Resource-Aware Scheduling

resource_limits:
  global:
    max_concurrent_discoveries: 50
    max_network_bandwidth: "100Mbps"
    max_cpu_usage: 70
    max_memory_usage: "8GB"
  
  per_target:
    max_connections: 5
    max_bandwidth: "10Mbps"
    backoff_on_error: exponential
    retry_limit: 3
  
  adaptive_throttling:
    high_load_threshold: 80
    reduce_concurrency_by: 50
    increase_interval_by: 100
    
  priority_queues:
    critical: 
      reserved_slots: 20
      max_wait: "5m"
    high:
      reserved_slots: 15
      max_wait: "15m"
    normal:
      reserved_slots: 10
      max_wait: "1h"
    low:
      reserved_slots: 5
      max_wait: "4h"

Scheduling Strategies

Infrastructure-Based Scheduling

strategies:
  geographic_distribution:
    regions:
      us_east:
        window: "02:00-06:00 EST"
        stagger: 15m
      us_west:
        window: "02:00-06:00 PST"
        stagger: 15m
      europe:
        window: "02:00-06:00 CET"
        stagger: 20m
      asia:
        window: "02:00-06:00 JST"
        stagger: 20m
    
  network_topology:
    core:
      frequency: every_30_minutes
      priority: critical
    distribution:
      frequency: every_2_hours
      priority: high
    access:
      frequency: every_6_hours
      priority: normal
    edge:
      frequency: daily
      priority: low
    
  service_dependencies:
    tier_1_services:
      discover_first: true
      frequency: continuous
    tier_2_services:
      after: tier_1_services
      frequency: every_hour
    tier_3_services:
      after: tier_2_services
      frequency: every_4_hours

Business-Aligned Scheduling

business_alignment:
  maintenance_windows:
    source: change_management_system
    respect_blackouts: true
    pre_maintenance_scan: "-1h"
    post_maintenance_scan: "+30m"
  
  business_cycles:
    end_of_month:
      dates: [28, 29, 30, 31]
      reduce_discovery_by: 75%
      priority_only: true
    
    quarter_end:
      months: [3, 6, 9, 12]
      dates: [25-31]
      minimal_discovery: true
      defer_non_critical: true
    
    year_end:
      dates: ["12/24-12/31", "01/01-01/02"]
      emergency_only: true
      manual_approval: required
  
  sla_driven:
    platinum_sla:
      discovery_interval: 15m
      availability_requirement: 99.99%
    gold_sla:
      discovery_interval: 1h
      availability_requirement: 99.9%
    silver_sla:
      discovery_interval: 4h
      availability_requirement: 99%

Schedule Management

Web UI Management

Schedule Dashboard:
  Views:
    - Calendar view: Visual schedule timeline
    - List view: Tabular schedule details
    - Gantt chart: Resource utilization
    - Heat map: Discovery density
  
  Actions:
    - Create/Edit schedules
    - Enable/Disable schedules
    - Run now option
    - Skip next run
    - View history
    - Clone schedule
    
  Monitoring:
    - Next run times
    - Currently running
    - Success/failure rates
    - Average duration
    - Resource usage

API Management

# Schedule management via API
import requests
from datetime import datetime, timedelta

# Create a new schedule
schedule = {
    "name": "Database Server Discovery",
    "description": "Discover all database servers",
    "enabled": True,
    "schedule": {
        "type": "cron",
        "expression": "0 */4 * * *",  # Every 4 hours
        "timezone": "UTC"
    },
    "targets": {
        "tags": ["database", "production"],
        "discovery_type": "full"
    },
    "options": {
        "timeout": 600,
        "parallel_jobs": 5,
        "retry_failed": True
    }
}

response = requests.post(
    "https://nopesight.company.com/api/schedules",
    json=schedule,
    headers={"Authorization": f"Bearer {api_token}"}
)

schedule_id = response.json()["id"]

# Trigger immediate discovery
requests.post(
    f"https://nopesight.company.com/api/schedules/{schedule_id}/run",
    headers={"Authorization": f"Bearer {api_token}"}
)

# Get schedule statistics
stats = requests.get(
    f"https://nopesight.company.com/api/schedules/{schedule_id}/stats",
    params={"period": "7d"},
    headers={"Authorization": f"Bearer {api_token}"}
).json()

print(f"Success rate: {stats['success_rate']}%")
print(f"Average duration: {stats['avg_duration_minutes']} minutes")

CLI Management

# Tripl-i CLI schedule management

# List all schedules
nopesight schedule list --format table

# Create schedule from file
nopesight schedule create --file production_schedule.yaml

# Update schedule
nopesight schedule update db_servers \
  --frequency "every 2 hours" \
  --window "22:00-06:00"

# Disable schedule temporarily
nopesight schedule disable web_servers \
  --reason "Maintenance" \
  --until "2024-01-20"

# View schedule history
nopesight schedule history db_servers \
  --last 10 \
  --include-details

# Run schedule immediately
nopesight schedule run production_servers \
  --wait --timeout 30m

Schedule Optimization

Performance Analysis

metrics:
  discovery_performance:
    - completion_time
    - success_rate
    - resource_usage
    - queue_depth
    - wait_time
  
  optimization_recommendations:
    overlap_detection:
      finding: "Schedules A and B overlap by 45%"
      recommendation: "Stagger by 2 hours"
      impact: "Reduce resource contention by 40%"
    
    underutilized_windows:
      finding: "02:00-04:00 window only 20% utilized"
      recommendation: "Move low-priority discoveries here"
      impact: "Better resource distribution"
    
    long_running_jobs:
      finding: "Full scan takes 6+ hours"
      recommendation: "Split into regional schedules"
      impact: "Reduce completion time by 60%"

Adaptive Scheduling

// Adaptive scheduling algorithm
const adaptiveScheduler = {
  analyze: function(historicalData) {
    return {
      peak_usage_times: this.findPeakTimes(historicalData),
      optimal_windows: this.findOptimalWindows(historicalData),
      bottlenecks: this.identifyBottlenecks(historicalData),
      recommendations: this.generateRecommendations(historicalData)
    };
  },
  
  adjust: function(schedule, analysis) {
    if (analysis.bottlenecks.network) {
      schedule.parallel_jobs *= 0.8;
      schedule.bandwidth_limit = "50Mbps";
    }
    
    if (analysis.peak_usage_times.includes(schedule.start_time)) {
      schedule.start_time = analysis.optimal_windows[0];
    }
    
    return schedule;
  },
  
  learn: function(executionResults) {
    // Machine learning feedback loop
    this.updateModel({
      schedule: executionResults.schedule,
      performance: executionResults.metrics,
      success: executionResults.success_rate > 95
    });
  }
};

Monitoring & Alerting

Schedule Monitoring

monitoring:
  dashboards:
    schedule_overview:
      widgets:
        - upcoming_schedules
        - currently_running
        - recent_failures
        - resource_utilization
        - sla_compliance
    
    performance_metrics:
      widgets:
        - completion_times_trend
        - success_rate_gauge
        - discovery_coverage_map
        - queue_depth_chart
        - bottleneck_analysis
  
  kpis:
    - discovery_coverage: "> 95%"
    - success_rate: "> 98%"
    - avg_completion_time: "< 30m"
    - resource_utilization: "60-80%"
    - schedule_adherence: "> 95%"

Alert Configuration

alerts:
  schedule_failures:
    condition: "failed_count > 2"
    severity: high
    notification:
      - email: ops-team@company.com
      - slack: #infrastructure-alerts
    auto_action: 
      - retry_with_backoff
      - create_incident
  
  long_running:
    condition: "duration > expected_duration * 2"
    severity: medium
    notification:
      - email: discovery-admin@company.com
    auto_action:
      - check_resource_usage
      - throttle_if_needed
  
  missed_schedule:
    condition: "missed_run_count > 0"
    severity: high
    notification:
      - sms: on-call
      - email: ops-team@company.com
    auto_action:
      - run_immediately
      - investigate_cause

Best Practices

1. Schedule Design

✅ Align with business hours
✅ Consider geographic distribution
✅ Respect maintenance windows
✅ Plan for growth

2. Resource Management

✅ Monitor resource usage
✅ Implement throttling
✅ Use priority queues
✅ Balance load distribution

3. Reliability

✅ Build in redundancy
✅ Handle failures gracefully
✅ Implement retry logic
✅ Monitor success rates

4. Optimization

✅ Regular performance reviews
✅ Adjust based on metrics
✅ Eliminate redundancy
✅ Continuous improvement

Troubleshooting

Common Issues

Schedules Not Running

Diagnostic Steps:
  1. Check schedule status (enabled?)
  2. Verify schedule expression
  3. Check blackout windows
  4. Review system resources
  5. Examine scheduler logs

Common Causes:
  - Disabled schedule
  - Invalid cron expression
  - Blackout window active
  - Resource limits reached
  - Scheduler service down

Performance Degradation

Symptoms:
  - Increasing completion times
  - High resource usage
  - Queue buildup
  - Timeout errors

Solutions:
  - Reduce parallel jobs
  - Increase intervals
  - Optimize discovery scope
  - Add more workers
  - Implement caching

Schedule Analysis

-- Analyze schedule performance
SELECT 
  schedule_name,
  AVG(duration_minutes) as avg_duration,
  COUNT(*) as total_runs,
  SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END) as successful,
  SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END) as failed,
  ROUND(100.0 * SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END) / COUNT(*), 2) as success_rate
FROM discovery_runs
WHERE run_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY schedule_name
ORDER BY success_rate ASC, avg_duration DESC;

Advanced Topics

Multi-Site Scheduling

multi_site:
  coordination:
    mode: distributed
    sites:
      - name: datacenter_east
        timezone: "America/New_York"
        bandwidth_to_central: "1Gbps"
      - name: datacenter_west
        timezone: "America/Los_Angeles"
        bandwidth_to_central: "1Gbps"
      - name: europe_dc
        timezone: "Europe/London"
        bandwidth_to_central: "500Mbps"
    
    strategy:
      - local_discovery_first
      - aggregate_to_central
      - deduplicate_results
      - sync_on_completion

Predictive Scheduling

# ML-based schedule optimization
from sklearn.ensemble import RandomForestRegressor
import pandas as pd

# Load historical data
history = pd.read_csv('discovery_history.csv')

# Features: time_of_day, day_of_week, target_count, discovery_type
# Target: completion_time

model = RandomForestRegressor()
model.fit(history[features], history['completion_time'])

# Predict optimal time for new schedule
new_schedule = {
    'target_count': 500,
    'discovery_type': 'full',
    'preferred_window': '00:00-06:00'
}

predicted_duration = model.predict([new_schedule])
optimal_start = find_optimal_slot(predicted_duration, preferred_window)

Next Steps

📖 Troubleshooting - Common issues and solutions
📖 Best Practices - CMDB best practices
📖 Performance Tuning - System optimization

Scheduling Architecture​

Scheduling Engine​

Schedule Types​

Schedule Configuration​

Basic Scheduling​

Advanced Scheduling​

Scheduling Strategies​

Infrastructure-Based Scheduling​

Business-Aligned Scheduling​

Schedule Management​

Web UI Management​

API Management​

CLI Management​

Schedule Optimization​

Performance Analysis​

Adaptive Scheduling​

Monitoring & Alerting​

Schedule Monitoring​

Alert Configuration​

Best Practices​

1. Schedule Design​

2. Resource Management​

3. Reliability​

4. Optimization​

Troubleshooting​

Common Issues​

Schedule Analysis​

Advanced Topics​

Multi-Site Scheduling​

Predictive Scheduling​

Next Steps​