Discovery Scheduling
Effective discovery scheduling ensures your CMDB stays current while minimizing impact on network and system resources. NopeSight provides flexible scheduling options that adapt to your infrastructure's needs and operational windows.
Scheduling Architecture
Scheduling Engine
Schedule Types
Fixed Schedules
Daily Discovery:
schedule: "0 2 * * *" # 2 AM daily
targets: all_infrastructure
type: incremental
max_duration: 4h
Weekly Full Scan:
schedule: "0 6 * * 0" # Sunday 6 AM
targets: all_infrastructure
type: full
max_duration: 12h
Hourly Critical:
schedule: "0 * * * *" # Every hour
targets:
- tag: critical
- tag: production
type: incremental
max_duration: 45m
Dynamic Schedules
Event-Driven:
triggers:
- new_device_detected
- configuration_change
- incident_created
- deployment_completed
response:
delay: 5m
type: targeted
scope: affected_items
Change-Based:
monitor:
- deployment_pipeline
- change_calendar
- maintenance_windows
action:
pre_change: baseline_scan
post_change: verification_scan
delay: 30m
Schedule Configuration
Basic Scheduling
# Schedule configuration example
schedules:
production_servers:
name: "Production Server Discovery"
description: "Critical production infrastructure"
enabled: true
timing:
frequency: every_4_hours
start_time: "00:00"
timezone: "America/New_York"
blackout_windows:
- start: "08:00"
end: "09:00"
days: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
reason: "Peak business hours"
targets:
include:
- ip_range: "10.1.0.0/16"
- tags: ["production", "critical"]
exclude:
- ip: "10.1.1.1" # Router
- tag: "maintenance"
discovery:
type: incremental
methods: ["agent", "wmi", "ssh"]
timeout: 300
parallel_jobs: 10
Advanced Scheduling
Intelligent Scheduling
smart_schedule:
name: "Adaptive Infrastructure Discovery"
rules:
business_criticality:
critical:
frequency: real_time
method: agent_only
high:
frequency: every_2_hours
method: agent_preferred
medium:
frequency: every_12_hours
method: agentless_ok
low:
frequency: daily
method: any
device_type:
database_servers:
frequency: every_hour
preferred_window: "02:00-05:00"
web_servers:
frequency: every_4_hours
avoid_window: "09:00-17:00"
workstations:
frequency: on_login
max_daily: 2
change_frequency:
high_change: # > 10 changes/week
frequency: every_2_hours
moderate_change: # 3-10 changes/week
frequency: every_6_hours
stable: # < 3 changes/week
frequency: daily
Resource-Aware Scheduling
resource_limits:
global:
max_concurrent_discoveries: 50
max_network_bandwidth: "100Mbps"
max_cpu_usage: 70
max_memory_usage: "8GB"
per_target:
max_connections: 5
max_bandwidth: "10Mbps"
backoff_on_error: exponential
retry_limit: 3
adaptive_throttling:
high_load_threshold: 80
reduce_concurrency_by: 50
increase_interval_by: 100
priority_queues:
critical:
reserved_slots: 20
max_wait: "5m"
high:
reserved_slots: 15
max_wait: "15m"
normal:
reserved_slots: 10
max_wait: "1h"
low:
reserved_slots: 5
max_wait: "4h"
Scheduling Strategies
Infrastructure-Based Scheduling
strategies:
geographic_distribution:
regions:
us_east:
window: "02:00-06:00 EST"
stagger: 15m
us_west:
window: "02:00-06:00 PST"
stagger: 15m
europe:
window: "02:00-06:00 CET"
stagger: 20m
asia:
window: "02:00-06:00 JST"
stagger: 20m
network_topology:
core:
frequency: every_30_minutes
priority: critical
distribution:
frequency: every_2_hours
priority: high
access:
frequency: every_6_hours
priority: normal
edge:
frequency: daily
priority: low
service_dependencies:
tier_1_services:
discover_first: true
frequency: continuous
tier_2_services:
after: tier_1_services
frequency: every_hour
tier_3_services:
after: tier_2_services
frequency: every_4_hours
Business-Aligned Scheduling
business_alignment:
maintenance_windows:
source: change_management_system
respect_blackouts: true
pre_maintenance_scan: "-1h"
post_maintenance_scan: "+30m"
business_cycles:
end_of_month:
dates: [28, 29, 30, 31]
reduce_discovery_by: 75%
priority_only: true
quarter_end:
months: [3, 6, 9, 12]
dates: [25-31]
minimal_discovery: true
defer_non_critical: true
year_end:
dates: ["12/24-12/31", "01/01-01/02"]
emergency_only: true
manual_approval: required
sla_driven:
platinum_sla:
discovery_interval: 15m
availability_requirement: 99.99%
gold_sla:
discovery_interval: 1h
availability_requirement: 99.9%
silver_sla:
discovery_interval: 4h
availability_requirement: 99%
Schedule Management
Web UI Management
Schedule Dashboard:
Views:
- Calendar view: Visual schedule timeline
- List view: Tabular schedule details
- Gantt chart: Resource utilization
- Heat map: Discovery density
Actions:
- Create/Edit schedules
- Enable/Disable schedules
- Run now option
- Skip next run
- View history
- Clone schedule
Monitoring:
- Next run times
- Currently running
- Success/failure rates
- Average duration
- Resource usage
API Management
# Schedule management via API
import requests
from datetime import datetime, timedelta
# Create a new schedule
schedule = {
"name": "Database Server Discovery",
"description": "Discover all database servers",
"enabled": True,
"schedule": {
"type": "cron",
"expression": "0 */4 * * *", # Every 4 hours
"timezone": "UTC"
},
"targets": {
"tags": ["database", "production"],
"discovery_type": "full"
},
"options": {
"timeout": 600,
"parallel_jobs": 5,
"retry_failed": True
}
}
response = requests.post(
"https://nopesight.company.com/api/schedules",
json=schedule,
headers={"Authorization": f"Bearer {api_token}"}
)
schedule_id = response.json()["id"]
# Trigger immediate discovery
requests.post(
f"https://nopesight.company.com/api/schedules/{schedule_id}/run",
headers={"Authorization": f"Bearer {api_token}"}
)
# Get schedule statistics
stats = requests.get(
f"https://nopesight.company.com/api/schedules/{schedule_id}/stats",
params={"period": "7d"},
headers={"Authorization": f"Bearer {api_token}"}
).json()
print(f"Success rate: {stats['success_rate']}%")
print(f"Average duration: {stats['avg_duration_minutes']} minutes")
CLI Management
# NopeSight CLI schedule management
# List all schedules
nopesight schedule list --format table
# Create schedule from file
nopesight schedule create --file production_schedule.yaml
# Update schedule
nopesight schedule update db_servers \
--frequency "every 2 hours" \
--window "22:00-06:00"
# Disable schedule temporarily
nopesight schedule disable web_servers \
--reason "Maintenance" \
--until "2024-01-20"
# View schedule history
nopesight schedule history db_servers \
--last 10 \
--include-details
# Run schedule immediately
nopesight schedule run production_servers \
--wait --timeout 30m
Schedule Optimization
Performance Analysis
metrics:
discovery_performance:
- completion_time
- success_rate
- resource_usage
- queue_depth
- wait_time
optimization_recommendations:
overlap_detection:
finding: "Schedules A and B overlap by 45%"
recommendation: "Stagger by 2 hours"
impact: "Reduce resource contention by 40%"
underutilized_windows:
finding: "02:00-04:00 window only 20% utilized"
recommendation: "Move low-priority discoveries here"
impact: "Better resource distribution"
long_running_jobs:
finding: "Full scan takes 6+ hours"
recommendation: "Split into regional schedules"
impact: "Reduce completion time by 60%"
Adaptive Scheduling
// Adaptive scheduling algorithm
const adaptiveScheduler = {
analyze: function(historicalData) {
return {
peak_usage_times: this.findPeakTimes(historicalData),
optimal_windows: this.findOptimalWindows(historicalData),
bottlenecks: this.identifyBottlenecks(historicalData),
recommendations: this.generateRecommendations(historicalData)
};
},
adjust: function(schedule, analysis) {
if (analysis.bottlenecks.network) {
schedule.parallel_jobs *= 0.8;
schedule.bandwidth_limit = "50Mbps";
}
if (analysis.peak_usage_times.includes(schedule.start_time)) {
schedule.start_time = analysis.optimal_windows[0];
}
return schedule;
},
learn: function(executionResults) {
// Machine learning feedback loop
this.updateModel({
schedule: executionResults.schedule,
performance: executionResults.metrics,
success: executionResults.success_rate > 95
});
}
};
Monitoring & Alerting
Schedule Monitoring
monitoring:
dashboards:
schedule_overview:
widgets:
- upcoming_schedules
- currently_running
- recent_failures
- resource_utilization
- sla_compliance
performance_metrics:
widgets:
- completion_times_trend
- success_rate_gauge
- discovery_coverage_map
- queue_depth_chart
- bottleneck_analysis
kpis:
- discovery_coverage: "> 95%"
- success_rate: "> 98%"
- avg_completion_time: "< 30m"
- resource_utilization: "60-80%"
- schedule_adherence: "> 95%"
Alert Configuration
alerts:
schedule_failures:
condition: "failed_count > 2"
severity: high
notification:
- email: ops-team@company.com
- slack: #infrastructure-alerts
auto_action:
- retry_with_backoff
- create_incident
long_running:
condition: "duration > expected_duration * 2"
severity: medium
notification:
- email: discovery-admin@company.com
auto_action:
- check_resource_usage
- throttle_if_needed
missed_schedule:
condition: "missed_run_count > 0"
severity: high
notification:
- sms: on-call
- email: ops-team@company.com
auto_action:
- run_immediately
- investigate_cause
Best Practices
1. Schedule Design
- ✅ Align with business hours
- ✅ Consider geographic distribution
- ✅ Respect maintenance windows
- ✅ Plan for growth
2. Resource Management
- ✅ Monitor resource usage
- ✅ Implement throttling
- ✅ Use priority queues
- ✅ Balance load distribution
3. Reliability
- ✅ Build in redundancy
- ✅ Handle failures gracefully
- ✅ Implement retry logic
- ✅ Monitor success rates
4. Optimization
- ✅ Regular performance reviews
- ✅ Adjust based on metrics
- ✅ Eliminate redundancy
- ✅ Continuous improvement
Troubleshooting
Common Issues
Schedules Not Running
Diagnostic Steps:
1. Check schedule status (enabled?)
2. Verify schedule expression
3. Check blackout windows
4. Review system resources
5. Examine scheduler logs
Common Causes:
- Disabled schedule
- Invalid cron expression
- Blackout window active
- Resource limits reached
- Scheduler service down
Performance Degradation
Symptoms:
- Increasing completion times
- High resource usage
- Queue buildup
- Timeout errors
Solutions:
- Reduce parallel jobs
- Increase intervals
- Optimize discovery scope
- Add more workers
- Implement caching
Schedule Analysis
-- Analyze schedule performance
SELECT
schedule_name,
AVG(duration_minutes) as avg_duration,
COUNT(*) as total_runs,
SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END) as successful,
SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END) as failed,
ROUND(100.0 * SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END) / COUNT(*), 2) as success_rate
FROM discovery_runs
WHERE run_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY schedule_name
ORDER BY success_rate ASC, avg_duration DESC;
Advanced Topics
Multi-Site Scheduling
multi_site:
coordination:
mode: distributed
sites:
- name: datacenter_east
timezone: "America/New_York"
bandwidth_to_central: "1Gbps"
- name: datacenter_west
timezone: "America/Los_Angeles"
bandwidth_to_central: "1Gbps"
- name: europe_dc
timezone: "Europe/London"
bandwidth_to_central: "500Mbps"
strategy:
- local_discovery_first
- aggregate_to_central
- deduplicate_results
- sync_on_completion
Predictive Scheduling
# ML-based schedule optimization
from sklearn.ensemble import RandomForestRegressor
import pandas as pd
# Load historical data
history = pd.read_csv('discovery_history.csv')
# Features: time_of_day, day_of_week, target_count, discovery_type
# Target: completion_time
model = RandomForestRegressor()
model.fit(history[features], history['completion_time'])
# Predict optimal time for new schedule
new_schedule = {
'target_count': 500,
'discovery_type': 'full',
'preferred_window': '00:00-06:00'
}
predicted_duration = model.predict([new_schedule])
optimal_start = find_optimal_slot(predicted_duration, preferred_window)
Next Steps
- 📖 Troubleshooting - Common issues and solutions
- 📖 Best Practices - CMDB best practices
- 📖 Performance Tuning - System optimization