Discovery Scheduling
Effective discovery scheduling ensures your CMDB stays current while minimizing impact on network and system resources. Tripl-i provides flexible scheduling options that adapt to your infrastructure's needs and operational windows.
Scheduling Architecture
Scheduling Engine
Schedule Types
Fixed Schedules
Daily Discovery:
schedule: "0 2 * * *" # 2 AM daily
targets: all_infrastructure
type: incremental
max_duration: 4h
Weekly Full Scan:
schedule: "0 6 * * 0" # Sunday 6 AM
targets: all_infrastructure
type: full
max_duration: 12h
Hourly Critical:
schedule: "0 * * * *" # Every hour
targets:
- tag: critical
- tag: production
type: incremental
max_duration: 45m
Dynamic Schedules
Event-Driven:
triggers:
- new_device_detected
- configuration_change
- incident_created
- deployment_completed
response:
delay: 5m
type: targeted
scope: affected_items
Change-Based:
monitor:
- deployment_pipeline
- change_calendar
- maintenance_windows
action:
pre_change: baseline_scan
post_change: verification_scan
delay: 30m
Schedule Configuration
Basic Scheduling
# Schedule configuration example
schedules:
production_servers:
name: "Production Server Discovery"
description: "Critical production infrastructure"
enabled: true
timing:
frequency: every_4_hours
start_time: "00:00"
timezone: "America/New_York"
blackout_windows:
- start: "08:00"
end: "09:00"
days: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
reason: "Peak business hours"
targets:
include:
- ip_range: "10.1.0.0/16"
- tags: ["production", "critical"]
exclude:
- ip: "10.1.1.1" # Router
- tag: "maintenance"
discovery:
type: incremental
methods: ["agent", "wmi", "ssh"]
timeout: 300
parallel_jobs: 10
Advanced Scheduling
Intelligent Scheduling
smart_schedule:
name: "Adaptive Infrastructure Discovery"
rules:
business_criticality:
critical:
frequency: real_time
method: agent_only
high:
frequency: every_2_hours
method: agent_preferred
medium:
frequency: every_12_hours
method: agentless_ok
low:
frequency: daily
method: any
device_type:
database_servers:
frequency: every_hour
preferred_window: "02:00-05:00"
web_servers:
frequency: every_4_hours
avoid_window: "09:00-17:00"
workstations:
frequency: on_login
max_daily: 2
change_frequency:
high_change: # > 10 changes/week
frequency: every_2_hours
moderate_change: # 3-10 changes/week
frequency: every_6_hours
stable: # < 3 changes/week
frequency: daily
Resource-Aware Scheduling
resource_limits:
global:
max_concurrent_discoveries: 50
max_network_bandwidth: "100Mbps"
max_cpu_usage: 70
max_memory_usage: "8GB"
per_target:
max_connections: 5
max_bandwidth: "10Mbps"
backoff_on_error: exponential
retry_limit: 3
adaptive_throttling:
high_load_threshold: 80
reduce_concurrency_by: 50
increase_interval_by: 100
priority_queues:
critical:
reserved_slots: 20
max_wait: "5m"
high:
reserved_slots: 15
max_wait: "15m"
normal:
reserved_slots: 10
max_wait: "1h"
low:
reserved_slots: 5
max_wait: "4h"
Scheduling Strategies
Infrastructure-Based Scheduling
strategies:
geographic_distribution:
regions:
us_east:
window: "02:00-06:00 EST"
stagger: 15m
us_west:
window: "02:00-06:00 PST"
stagger: 15m
europe:
window: "02:00-06:00 CET"
stagger: 20m
asia:
window: "02:00-06:00 JST"
stagger: 20m
network_topology:
core:
frequency: every_30_minutes
priority: critical
distribution:
frequency: every_2_hours
priority: high
access:
frequency: every_6_hours
priority: normal
edge:
frequency: daily
priority: low
service_dependencies:
tier_1_services:
discover_first: true
frequency: continuous
tier_2_services:
after: tier_1_services
frequency: every_hour
tier_3_services:
after: tier_2_services
frequency: every_4_hours
Business-Aligned Scheduling
business_alignment:
maintenance_windows:
source: change_management_system
respect_blackouts: true
pre_maintenance_scan: "-1h"
post_maintenance_scan: "+30m"
business_cycles:
end_of_month:
dates: [28, 29, 30, 31]
reduce_discovery_by: 75%
priority_only: true
quarter_end:
months: [3, 6, 9, 12]
dates: [25-31]
minimal_discovery: true
defer_non_critical: true
year_end:
dates: ["12/24-12/31", "01/01-01/02"]
emergency_only: true
manual_approval: required
sla_driven:
platinum_sla:
discovery_interval: 15m
availability_requirement: 99.99%
gold_sla:
discovery_interval: 1h
availability_requirement: 99.9%
silver_sla:
discovery_interval: 4h
availability_requirement: 99%
Schedule Management
Web UI Management
Schedule Dashboard:
Views:
- Calendar view: Visual schedule timeline
- List view: Tabular schedule details
- Gantt chart: Resource utilization
- Heat map: Discovery density
Actions:
- Create/Edit schedules
- Enable/Disable schedules
- Run now option
- Skip next run
- View history
- Clone schedule
Monitoring:
- Next run times
- Currently running
- Success/failure rates
- Average duration
- Resource usage
API Management
# Schedule management via API
import requests
from datetime import datetime, timedelta
# Create a new schedule
schedule = {
"name": "Database Server Discovery",
"description": "Discover all database servers",
"enabled": True,
"schedule": {
"type": "cron",
"expression": "0 */4 * * *", # Every 4 hours
"timezone": "UTC"
},
"targets": {
"tags": ["database", "production"],
"discovery_type": "full"
},
"options": {
"timeout": 600,
"parallel_jobs": 5,
"retry_failed": True
}
}
response = requests.post(
"https://nopesight.company.com/api/schedules",
json=schedule,
headers={"Authorization": f"Bearer {api_token}"}
)
schedule_id = response.json()["id"]
# Trigger immediate discovery
requests.post(
f"https://nopesight.company.com/api/schedules/{schedule_id}/run",
headers={"Authorization": f"Bearer {api_token}"}
)
# Get schedule statistics
stats = requests.get(
f"https://nopesight.company.com/api/schedules/{schedule_id}/stats",
params={"period": "7d"},
headers={"Authorization": f"Bearer {api_token}"}
).json()
print(f"Success rate: {stats['success_rate']}%")
print(f"Average duration: {stats['avg_duration_minutes']} minutes")
CLI Management
# Tripl-i CLI schedule management
# List all schedules
nopesight schedule list --format table
# Create schedule from file
nopesight schedule create --file production_schedule.yaml
# Update schedule
nopesight schedule update db_servers \
--frequency "every 2 hours" \
--window "22:00-06:00"
# Disable schedule temporarily
nopesight schedule disable web_servers \
--reason "Maintenance" \
--until "2024-01-20"
# View schedule history
nopesight schedule history db_servers \
--last 10 \
--include-details
# Run schedule immediately
nopesight schedule run production_servers \
--wait --timeout 30m
Schedule Optimization
Performance Analysis
metrics:
discovery_performance:
- completion_time
- success_rate
- resource_usage
- queue_depth
- wait_time
optimization_recommendations:
overlap_detection:
finding: "Schedules A and B overlap by 45%"
recommendation: "Stagger by 2 hours"
impact: "Reduce resource contention by 40%"
underutilized_windows:
finding: "02:00-04:00 window only 20% utilized"
recommendation: "Move low-priority discoveries here"
impact: "Better resource distribution"
long_running_jobs:
finding: "Full scan takes 6+ hours"
recommendation: "Split into regional schedules"
impact: "Reduce completion time by 60%"
Adaptive Scheduling
// Adaptive scheduling algorithm
const adaptiveScheduler = {
analyze: function(historicalData) {
return {
peak_usage_times: this.findPeakTimes(historicalData),
optimal_windows: this.findOptimalWindows(historicalData),
bottlenecks: this.identifyBottlenecks(historicalData),
recommendations: this.generateRecommendations(historicalData)
};
},
adjust: function(schedule, analysis) {
if (analysis.bottlenecks.network) {
schedule.parallel_jobs *= 0.8;
schedule.bandwidth_limit = "50Mbps";
}
if (analysis.peak_usage_times.includes(schedule.start_time)) {
schedule.start_time = analysis.optimal_windows[0];
}
return schedule;
},
learn: function(executionResults) {
// Machine learning feedback loop
this.updateModel({
schedule: executionResults.schedule,
performance: executionResults.metrics,
success: executionResults.success_rate > 95
});
}
};
Monitoring & Alerting
Schedule Monitoring
monitoring:
dashboards:
schedule_overview:
widgets:
- upcoming_schedules
- currently_running
- recent_failures
- resource_utilization
- sla_compliance
performance_metrics:
widgets:
- completion_times_trend
- success_rate_gauge
- discovery_coverage_map
- queue_depth_chart
- bottleneck_analysis
kpis:
- discovery_coverage: "> 95%"
- success_rate: "> 98%"
- avg_completion_time: "< 30m"
- resource_utilization: "60-80%"
- schedule_adherence: "> 95%"
Alert Configuration
alerts:
schedule_failures:
condition: "failed_count > 2"
severity: high
notification:
- email: ops-team@company.com
- slack: #infrastructure-alerts
auto_action:
- retry_with_backoff
- create_incident
long_running:
condition: "duration > expected_duration * 2"
severity: medium
notification:
- email: discovery-admin@company.com
auto_action:
- check_resource_usage
- throttle_if_needed
missed_schedule:
condition: "missed_run_count > 0"
severity: high
notification:
- sms: on-call
- email: ops-team@company.com
auto_action:
- run_immediately
- investigate_cause
Best Practices
1. Schedule Design
- ✅ Align with business hours
- ✅ Consider geographic distribution
- ✅ Respect maintenance windows
- ✅ Plan for growth
2. Resource Management
- ✅ Monitor resource usage
- ✅ Implement throttling
- ✅ Use priority queues
- ✅ Balance load distribution
3. Reliability
- ✅ Build in redundancy
- ✅ Handle failures gracefully
- ✅ Implement retry logic
- ✅ Monitor success rates
4. Optimization
- ✅ Regular performance reviews
- ✅ Adjust based on metrics
- ✅ Eliminate redundancy
- ✅ Continuous improvement
Troubleshooting
Common Issues
Schedules Not Running
Diagnostic Steps:
1. Check schedule status (enabled?)
2. Verify schedule expression
3. Check blackout windows
4. Review system resources
5. Examine scheduler logs
Common Causes:
- Disabled schedule
- Invalid cron expression
- Blackout window active
- Resource limits reached
- Scheduler service down
Performance Degradation
Symptoms:
- Increasing completion times
- High resource usage
- Queue buildup
- Timeout errors
Solutions:
- Reduce parallel jobs
- Increase intervals
- Optimize discovery scope
- Add more workers
- Implement caching
Schedule Analysis
-- Analyze schedule performance
SELECT
schedule_name,
AVG(duration_minutes) as avg_duration,
COUNT(*) as total_runs,
SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END) as successful,
SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END) as failed,
ROUND(100.0 * SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END) / COUNT(*), 2) as success_rate
FROM discovery_runs
WHERE run_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY schedule_name
ORDER BY success_rate ASC, avg_duration DESC;
Advanced Topics
Multi-Site Scheduling
multi_site:
coordination:
mode: distributed
sites:
- name: datacenter_east
timezone: "America/New_York"
bandwidth_to_central: "1Gbps"
- name: datacenter_west
timezone: "America/Los_Angeles"
bandwidth_to_central: "1Gbps"
- name: europe_dc
timezone: "Europe/London"
bandwidth_to_central: "500Mbps"
strategy:
- local_discovery_first
- aggregate_to_central
- deduplicate_results
- sync_on_completion
Predictive Scheduling
# ML-based schedule optimization
from sklearn.ensemble import RandomForestRegressor
import pandas as pd
# Load historical data
history = pd.read_csv('discovery_history.csv')
# Features: time_of_day, day_of_week, target_count, discovery_type
# Target: completion_time
model = RandomForestRegressor()
model.fit(history[features], history['completion_time'])
# Predict optimal time for new schedule
new_schedule = {
'target_count': 500,
'discovery_type': 'full',
'preferred_window': '00:00-06:00'
}
predicted_duration = model.predict([new_schedule])
optimal_start = find_optimal_slot(predicted_duration, preferred_window)
Next Steps
- 📖 Troubleshooting - Common issues and solutions
- 📖 Best Practices - CMDB best practices
- 📖 Performance Tuning - System optimization