Managing Events

This guide covers the day-to-day operations of managing events in KillIT v3's Event Management system.

Event Dashboard

The Event Dashboard provides a real-time view of your IT environment's health.

Key Metrics

Total Events: Count of all events in the selected time range
Critical/Major Open: High-severity events requiring attention
Average Resolution Time: Mean time to resolve events
Active Incidents: Currently open events

Filtering Events

Use filters to focus on specific events:

Status: Open, Acknowledged, Resolved, Suppressed
Severity: Critical, Major, Minor, Warning, Info
Source: Filter by monitoring tool
Time Range: Last hour, 6 hours, 24 hours, 3 days, 7 days

Event Lifecycle

1. Open State

New events enter the system in "Open" state
Automatic correlation and AI analysis begin
Notifications sent based on severity

2. Acknowledged State

Indicates someone is investigating
Stops escalation timers
Records who acknowledged and when

3. Resolved State

Issue has been fixed
Resolution notes document the fix
Metrics calculated for reporting

4. Suppressed State

Event deemed not actionable
Useful for known issues or maintenance
Removes from active event count

Working with Events

Viewing Event Details

Click any event to see:

Overview Tab

Event title and description
Severity and status indicators
Source and timing information
Related CI information
Assignment details

AI Analysis Tab

Anomaly score (0-100%)
Root cause analysis
Contributing factors
Suggested remediation actions
Automation opportunities

Correlation Tab

Related events in the correlation group
Affected Configuration Items
Root cause candidate identification
Pattern analysis

Timeline Tab

Complete event history
Status changes
User actions
Automation executions

Updating Event Status

Click Update Status button
Select new status:
- Acknowledged: "I'm working on this"
- Resolved: "The issue is fixed"
- Suppressed: "This is not actionable"
Add notes explaining your action
For resolved events, select resolution category:
- Auto-resolved
- Manual fix
- False positive
- Duplicate
- No action needed

Bulk Operations

Select multiple events to:

Acknowledge all
Assign to team member
Suppress similar events
Export for reporting

Event Deduplication

How Deduplication Works

KillIT v3 automatically identifies and groups duplicate events to reduce alert noise. When multiple instances of the same issue occur, they are consolidated into a single event with an occurrence count.

Correlation Signature

Each event receives a unique correlation signature based on:

Source: The monitoring system (Nagios, Zabbix, etc.)
CI/Host: The Configuration Item ID, hostname, or IP address
Normalized Title: Event title with dynamic parts removed
Service: The affected service name

The signature is generated as an MD5 hash of these components joined with "-".

Title Normalization

To handle dynamic content in event titles, the system normalizes them by replacing:

Dates (YYYY-MM-DD) → "DATE"
Times (HH:MM:SS) → "TIME"
Numbers → "NUM"
UUIDs → "UUID"
Multiple spaces → Single space

Example:

Original: "Database connection failed 3 times at 14:35:20 on 2025-06-17"
Normalized: "database connection failed NUM times at TIME on DATE"

Deduplication Process

New Event Arrives: System generates correlation signature
Duplicate Check: Searches for existing events with:
- Same correlation signature
- Status is "open" or "acknowledged"
- Created within the last hour (configurable window)
If Duplicate Found:
- Increment occurrenceCount
- Update lastOccurrence timestamp
- Update severity if new event is more severe
- Merge additional details
- Return existing event (no new event created)
If No Duplicate:
- Create new event with occurrenceCount = 1

Viewing Deduplicated Events

In the event list, deduplicated events show:

Occurrence Badge: Shows count when > 1
First Occurrence: Original event timestamp
Last Occurrence: Most recent duplicate timestamp
Severity: Highest severity across all occurrences

Benefits of Deduplication

Reduces Alert Fatigue: One alert instead of hundreds
Preserves Information: Track frequency with occurrence count
Smart Matching: Dynamic content doesn't prevent deduplication
Time-Based Windows: Only recent events are considered duplicates
Status-Aware: Resolved events won't be matched

Configuring Deduplication

Administrators can adjust deduplication behavior:

Time Window: Default 1 hour (Settings → Event Management)
Status Matching: Which statuses to consider for duplicates
Field Weights: Customize signature generation
Exclusion Patterns: Events to never deduplicate

Event Correlation

How Correlation Works

KillIT v3 uses a multi-strategy correlation engine that automatically groups related events into incidents. This helps identify root causes and understand the full impact of issues.

Correlation Strategies

The system employs four intelligent correlation strategies:

1. Temporal Correlation

Groups events occurring within 5-minute windows
Scores based on time proximity (closer events = higher score)
Catches cascading failures and alert storms

2. Topology Correlation

Uses CMDB relationships to find events from related CIs
Considers relationship types (critical relationships get higher scores)
Identifies impact across connected infrastructure
Example: Database failure → Application errors → Web timeouts

3. Pattern Correlation

Matches events with the same correlation signature
Groups recurring instances of the same issue
Different from deduplication - these are related but distinct events

4. Service Correlation

Groups events affecting the same service or application
15-minute time window for service-related issues
Helps understand service-wide problems

Correlation Process

Event Arrives → Saved and queued for correlation
Worker Processing → Runs all 4 strategies in parallel
Score Merging → Combines results, taking highest scores
Correlation Assignment → Events with score > 0.7 are correlated
Root Cause Analysis → Identifies the earliest critical/major event

Understanding Correlation Groups

When viewing correlated events, you'll see:

Correlation ID: Unique identifier for the group (e.g., COR-123456789)
Event Count: Number of related events
Time Span: Duration from first to last event
Severity Breakdown: Distribution of event severities
Affected CIs: All Configuration Items involved
Root Cause Candidate: Most likely originating event

Correlation Benefits

Noise Reduction: See one incident instead of hundreds
Root Cause Identification: Quickly find the source
Impact Analysis: Understand full scope
Faster Resolution: Address root cause, not symptoms
Better Prioritization: Focus on critical issues

Viewing Correlation Information

In Event List: Look for correlation badges
In Event Details: Check the Correlation Tab
Correlation Group View: See all related events together

Manual Correlation

If automatic correlation misses related events:

Select primary event
Click "Add to Correlation"
Search for related events
Confirm correlation

Correlation vs Deduplication

Aspect	Deduplication	Correlation
Purpose	Reduce duplicate alerts	Group related incidents
Scope	Same event repeating	Different related events
Result	Single event, count increases	Multiple events, same correlation ID
Time Window	1 hour	5-15 minutes (varies by strategy)
Example	"DB down" × 100 → 1 event	DB down + App errors + User complaints

Real-World Example

Database Incident Timeline:
10:00 - Database CPU hits 95% (Critical)
10:01 - Database connection pool exhausted (Major)
10:02 - App server connection errors (Major) × 5
10:03 - Web server timeouts (Warning) × 10
10:04 - User login failures (Minor) × 50

Correlation Result:
- Correlation ID: COR-20250617-abc123
- Total Events: 67
- Root Cause: Database CPU spike
- Affected Services: Database, Application, Web
- Recommended Action: Scale database resources

AI-Powered Features

Anomaly Detection

The AI analyzes each event for anomalies:

Score 0-30%: Normal behavior
Score 30-70%: Unusual but not critical
Score 70-100%: Highly anomalous, investigate

Root Cause Analysis

AI identifies probable root causes by analyzing:

Event timing and sequence
CI relationships and dependencies
Historical patterns
Current system state

Suggested Actions

For each event, AI may suggest:

Immediate remediation steps
Long-term fixes
Automation opportunities
Prevention strategies

Automation

Available Automations

Service Restart: Safely restart failed services
Resource Scaling: Add CPU/memory when needed
Log Rotation: Clear full disks
Cache Clearing: Reset application caches

Enabling Automation

Review suggested actions in event details
Click "Enable Automation" for approved actions
Monitor automation execution in timeline
Automation results appear in event notes

Creating Custom Automations

Navigate to Settings → Automation
Create new runbook
Define trigger conditions
Add automation steps
Test in non-production first

Best Practices

Response Times

Aim for these targets:

Severity	Acknowledgment	Resolution
Critical	5 minutes	1 hour
Major	15 minutes	4 hours
Minor	1 hour	1 day
Warning	4 hours	1 week

Event Hygiene

Acknowledge Promptly: Shows you're aware and working
Update Regularly: Add notes as you investigate
Document Resolution: Help future troubleshooting
Review Suppressed: Periodically check suppressed events
Learn from Patterns: Use insights to prevent recurrence

Team Collaboration

Use @mentions in notes to loop in experts
Share Findings in resolution notes
Create Knowledge Base entries for complex issues
Review Post-Mortems for major incidents

Reporting

Available Reports

Event Summary: Overview by severity and source
MTTR Analysis: Resolution time trends
Top Issues: Most frequent problems
Team Performance: Who resolves what
SLA Compliance: Meeting service levels

Creating Custom Reports

Navigate to Reports → Event Reports
Select report type
Choose filters and date range
Schedule or run immediately
Export to PDF/Excel

Event Dashboard​

Key Metrics​

Filtering Events​

Event Lifecycle​

1. Open State​

2. Acknowledged State​

3. Resolved State​

4. Suppressed State​

Working with Events​

Viewing Event Details​

Overview Tab​

AI Analysis Tab​

Correlation Tab​

Timeline Tab​

Updating Event Status​

Bulk Operations​

Event Deduplication​

How Deduplication Works​

Correlation Signature​

Title Normalization​

Deduplication Process​

Viewing Deduplicated Events​

Benefits of Deduplication​

Configuring Deduplication​

Event Correlation​

How Correlation Works​

Correlation Strategies​

1. Temporal Correlation​

2. Topology Correlation​

3. Pattern Correlation​

4. Service Correlation​

Correlation Process​

Understanding Correlation Groups​

Correlation Benefits​

Viewing Correlation Information​

Manual Correlation​

Correlation vs Deduplication​

Real-World Example​

AI-Powered Features​

Anomaly Detection​

Root Cause Analysis​

Suggested Actions​

Automation​

Available Automations​

Enabling Automation​

Creating Custom Automations​

Best Practices​

Response Times​

Event Hygiene​

Team Collaboration​

Reporting​

Available Reports​

Creating Custom Reports​

Integration with Other Modules​

CMDB Integration​

Service Request Integration​

Change Management​

Troubleshooting Common Issues​

Events Not Correlating​

Missing AI Analysis​

Automation Not Executing​

Next Steps​