Best Practices to Define Your Splunk Indexes – Part 1
This is part 1 of 2 Part Blog series defining the best practices for Splunk Indexes…
Difference between Indexes and Indexes
Newbies to Splunk often confuse Splunk indexes with Splunk Indexers, so I would like to just clarify this before we deep dive onto how best you should configure your Splunk Indexes
Splunk Indexers are full Splunk Enterprise Instances configured specifically for the function of Indexing. In simpler terms Splunk Indexers are Splunk servers which are responsible for below tasks in Splunk Architecture:
- Store Incoming Data from forwarders, TCP/UDP inputs, API, HEC inputs or other Splunk Servers
- Send the data back to Splunk search heads based on the queries being run by users on the search head
Whereas Splunk Indexes are defined for storing identical data together on a Splunk indexer. You also define indexes so that you can manage, separate different data differently.
For Analogy we can think of an Indexer as a database server that stored all data and servers the data to users when queried, Whereas Indexes are more like tables within the database that contains similar data for example all logs from Palo Alto Firewall can be stored in Index “Palo Alto”, all exchange related logs can be stored in Index “Exchange”. While all these indexes are defined/configured on one or multiple indexes in an indexer cluster.
Why you do you need the best practices?
Time and again we have seen customers running into serious issues due to misconfiguration and in some cases “no configuration” of the defined indexes i.e. all data going into default index i.e. “main”. You need to configure indexes following best practices to ensure:
- Client’s/company’s data retention requirements are met to comply with Govt regulations, compliance requirements etc.
- Disk storage getting full, thus causing issues like indexing being stopped or even the Splunk service being halted which can further lead to searches being failed to run
- Required logs getting deleted silently thus leading to incomplete search results or missing data.
What are the best practices?
From our years of in-field Professional services experience following is the list of best practices we have compiled. Needless to stay this list is non-exhaustive and as new features/enhancement come in Splunk these recommendations are subject to change:
- Define Volume Based Indexing – This is a relatively a newer/not widely known Index configuration method. It’s one of our tope best practice recommendations to follow. Correct Implementation of this method can significantly avoid issues highlighted in previous section.
As compared to traditional method of defining indexes by giving them a static path, in volume based indexing you define a specific directory on the OS as a specific volume to be used for storing particular kind of data, for example to store your hot, warm buckets in one location and your cold buckets in second location based on different storage types you can define your volume definitions as below in indexes.conf:
[volume:splunk_hot_warm_all]
path = /mnt/splunk_hot_warm
[volume:cold_all]
path = /mnt/splunk_cold
Once you define the volumes you can refer these volumes while configuring your indexes:
[idx1]
homePath = volume:hot_warm_/idx1
coldPath = volume:cold1/idx1
As a result, now all indexes would store their respective hot,warm cold buckets in the specified volume and you can control the total size this volume can occupy by configuring “maxVolumeDataSizeMB” parameter. When this limit is hit Splunk will start deleting oldest buckets from all indexes so that volume utilization does not exceed the configured size. This technique helps you avoid running into situations where splunk tries to store more data than the disk storage is sized for and enables you to store more data than minimum retention requirements if your storage permits.
We recommend configuring this parameter to not more than 95% of the actual size of that partition to account for sudden bursts of data being freshly indexed. For example let’s say customer mounter a SSD disk of size 500 GB for hot, warm data on partition – /mnt/splunk_hot_warm. As such the maxVolumeDataSizeMB that you set should be 95*500GB i.e. 486400 MB.
[volume:splunk_hot_warm_all]
path = /mnt/splunk_hot_warm
maxVolumeDataSizeMB = 486400
- Don’t’ forget to configure path for acceleration and summary data – This we have seen at many customers while path for hot, warm and cold buckets is configured correctly, often path for summarization results i.e. summaryHomePath and path for data model acceleration TSIDX data i.e. tstatsHomePath is not defined, as a result Splunk keeps storing these in default Splunk_DB path which is usually Splunk_Home/var/lib/splunk. Now in most cases as the Splunk root directory is not provisioned with enough storage to handle this it cause the disk to be full and again cause sever issues like searches being stopped, splunk indexing stopped, Splunk service getting stopped. It’s recommended to configure these parameter using the volume definitions itself. For example:
[idx1]
homePath = volume:hot_warm_/idx1
coldPath = volume:cold1/idx1
tstatsHomePath= volume:cold1/idx1/summary
summaryHomePath= volume:cold1/idx1/datamodel_summary
- Use the special token $_index_name – This is my one my favorite feature which makes adding new indexes very easy. This is basically a token which you can use while defining the paths we discussed above and it will expand to the name of stanza i.e. your index at runtime. When you use these token under default stanza it applies to all indexes and takes the respective stanza/index name. For example if you want to configure let’s say 5 indexes referencing same volume but each storing data in a separate folder distinguished by index name, you can do below:
[default]
homePath = volume:hot_warm_/$_index_name
coldPath = volume:cold1/$_index_name
tstatsHomePath= volume:cold1/$_index_name /summary
summaryHomePath= volume:cold1/$_index_name /datamodel_summary
[volume:splunk_hot_warm_all]
path = /mnt/splunk_hot_warm
[volume:cold_all]
path = /mnt/splunk_cold
[idx1]
[idx2]
[idx3]
[idx4]
[idx5]
This would create 5 indexes each storing it’s data in respective folder such as /mnt/splunk_hot_warm/idx1, /mnt/splunk_hot_warm/idx2 so on and /mnt/splunk_cold/idx1, /mnt/splunk_cold/idx2 and so on.
Next time you want to create a new index just add literally one line e.g. [idx6] to the bottom of the file and it will create a new index with its respective segregated folders.
This is the end of part 1 of the 2-part blog series, for other recommendations and a sample index file please check out the second part of this blog series…