Splunk Storage Sizing

Input data

Estimate the average daily amount of data to be ingested. The more data you send to Splunk Enterprise, the more time Splunk needs to index it into results that you can search, report and generate alerts on.

Estimate the amount of data based on a number of events per second – this calculates based on a typical event size. The more data you send to Splunk Enterprise, the more time Splunk needs to index it into results that you can search, report and generate alerts on.

Events per Second

Average event Size

Daily Data Volume

Raw Compression Factor

Metadata Size Factor

Daily Data Volume: ( events/s * bytes avg. event size * 3600 seconds/hour * 24 hours/day)

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

Data Retention

Specify the amount of time to retain data for each category. Data will be rolled through each category dependant on its age.

Hot, Warm

Cold

Archived (Frozen)

Retention Time

Hot, Warm
Cold
Archived

Total = XX

Architecture

Specify the number of nodes required. The more data to ingest, the greater the number of nodes required. Adding more nodes will improve indexing throughput and search performance.

Use Case / App




Max. Volume per Indexer

Number of Nodes

node(s)

Storage Required

This is a breakdown of the overall storage requirement.

(per Indexer) (all Indexers)
Hot, Warm
Cold
Archived
Total

Storage Configuration

Specify the location of the storage configuration. If possible, spread each type of data across separate volumes to improve performance. Hot/Warm data should be on the fastest disk, cold data on slower disk and archived data on the slowest disk.

Storage Required

This is a breakdown of the overall storage requirement.

(per Indexer) (all Indexers)
Hot, Warm
Cold
Archived
Total

Configuration Files

This is an example configuration file that describes the volume configuration for each data type. Note: The path should be modified to point to each disk type. It assumes that all data will be stored in the main index.

indexes.conf

# volume definitions
        
[volume:] path = /mnt/ maxVolumeDataSizeMB =
[volume:] path = /mnt/ maxVolumeDataSizeMB =
[volume:] path = /mnt/ maxVolumeDataSizeMB =
[volume:] path = /mnt/
# index definition (calculation is based on a single index) [main] homePath = volume:/defaultdb/db coldPath = volume:/defaultdb/colddb thawedPath = $SPLUNK_DB/defaultdb/thaweddb homePath.maxDataSizeMB = coldPath.maxDataSizeMB = maxWarmDBCount = 4294967295 frozenTimePeriodInSecs = maxDataSize = coldToFrozenDir = /mnt//defaultdb/frozendb

Detailed Storage on Volume 1 for Buckets

Specify the RAID level, size of individual disks and contingency required for this volume. RAID configurations that stripe will yield significantly superior performance to parity based RAID. That is, RAID 0, 10 and 0+1 will give the best performance, while RAID 5 will offer the worst performance.

Splunk does not recommend use of Parity based RAID, since this will have a very negative impact on performance
Disk Space Contingency:
???

The selected storage configuration would typically be expected to achieve about IOPS when doing 100% read operation, and about IOPS for 100% write operation. These numbers assume that array is dedicated to Splunk and consists of with disk(s) (typically 200 IOPS per disk).

(per Indexer) (all Indexers)
Number of Disks
???
???
Physical Disk Space
???
???
Effective Disk Space
???
???

Detailed Storage on Volume 2 for Buckets

Specify the RAID level, size of individual disks and contingency required for this volume. RAID configurations that stripe will yield significantly superior performance to parity based RAID. That is, RAID 0, 10 and 0+1 will give the best performance, while RAID 5 will offer the worst performance.

Splunk does not recommend use of Parity based RAID, since this will have a very negative impact on performance
Disk Space Contingency:
???

The selected storage configuration would typically be expected to achieve about IOPS when doing 100% read operation, and about IOPS for 100% write operation. These numbers assume that array is dedicated to Splunk and consists of with disk(s) (typically 200 IOPS per disk).

(per Indexer) (all Indexers)
Number of Disks
???
???
Physical Disk Space
???
???
Effective Disk Space
???
???

Detailed Storage on Volume 3 for Buckets

Specify the RAID level, size of individual disks and contingency required for this volume. RAID configurations that stripe will yield significantly superior performance to parity based RAID. That is, RAID 0, 10 and 0+1 will give the best performance, while RAID 5 will offer the worst performance.

Splunk does not recommend use of Parity based RAID, since this will have a very negative impact on performance
Disk Space Contingency:
???

The selected storage configuration would typically be expected to achieve about IOPS when doing 100% read operation, and about IOPS for 100% write operation. These numbers assume that array is dedicated to Splunk and consists of with disk(s) (typically 200 IOPS per disk).

(per Indexer) (all Indexers)
Number of Disks
???
???
Physical Disk Space
???
???
Effective Disk Space
???
???

Summary Storage for Buckets

Specify the price per GB of storage. Hot/Warm data should be on the most expensive disk, cold data on cheaper disk and archived data on the cheapest disk.

Price per GB Size Totel Price
Hot Warm
?
?
Cold
?
?
Archived
?
?
Total
?