Best Practices to Define Your Splunk Indexes 2

March 17, 2022

Best Practices to Define Your Splunk Indexes – Part 2

Welcome to the second part of our 2 Part Blog series defining the best practices for Splunk Indexes…

Continuing from where we left in the previous blog post in the series, the following are a few other best practices we recommend to be followed while defining your indexes. To understand the difference between indexes and indexers please refer to Part 1 of this 2 part blog series.

Configure your Frozen Directory and Time period – These are the required features to manage when logs should be moved to the archived folder or should be deleted. This is in sync with clients’ retention policies which dictate the period for which they should have searchable data and the total period for which they should maintain the logs in their archives. To configure frozen directory i.e. the path of the archived directory where frozen buckets will be stored, it’s required to use the parameter “coldToFrozenDir” which cannot have a volume reference and hence needs to be static, however for ease of use you can always use the $_index_name token here. All buckets stored under configured frozen directory are in raw format, not searchable as such. However, when required those buckets can be copied to a defined thawed patch to make it searchable.

[default]

coldToFrozenDir = /mnt/frozen/$_index_name

For defining the time after which Splunk should move the data to you should configure the parameter “frozenTimePeriodInSecs” For example for moving all the data to Frozen path after 6 months you can configure below:

[idx1]

frozenTimePeriodInSecs = 15780000

Here to mention some log sources might be very low volume and hence they can be stored for a longer period of time than the minimum requirements in such cases you can configure the “frozenTimePeriodInSecs“ accordingly or keep it arbitrarily large so that buckets are deleted when the other parameters like “maxVolumeDataSizeMB” are hit.

Finally, to comply with your data retention policy you can configure an OS-level script to delete the logs from the archived folder periodically. Following is an example of a script that deletes files with mod time > 2 years:

find /mnt/frozen/* -mtime +5 -exec rm {} \;

Change Default maxTotalDataSizeMB – The default value of maxTotalDataSizeMB i.e. the maximum size of index is 500 GB. Now this default value may not be appropriate for high volume log sources that need to be retained for a specific period of time. In such cases, one option is to see the actual volume of log per day for the specific index and then change the value accordingly. Another easier option is to keep this value arbitrarily high like 5 TB and let the data be deleted on other parameters like maxVolumeDataSizeMB and frozenTimePeriodInSecs. This approach avoids the overhead of testing for each index and also helps take advantage of the more dynamic flexibility of volume-based indexing.
Consider using SmartStore – Smart store or remote storage is another useful architecture decision that could be made if the customer already has or can use public cloud providers like AWS or GCP for storing their logs.

Once you configure the smart store, you push all your warm buckets to be stored in the cloud, only the most recent buckets i.e. hot buckets and data most frequently being searched stay in cache on premise server. As a result, the customer doesn’t need to manage the log storage on-prem.

Following are some sample indexes.conf configuration to configure smartstore using AWS S3 buckets.
[volume:s3volume]

storageType = remote

path = s3://rest/of/path

remote.s3.url_version = v2

remote.s3.endpoint = https://bucketname.whatever.customer.com

# MIGRATE ALL INDEXES

[default]

remotePath = volume:s3volume/$_index_name

Set Replication Factor to Auto in an indexer cluster – The parameter “repFactor” determines whether an index gets replicated. This is a very important parameter that requires to be set to “auto” especially in an indexer cluster so that the indexer cluster can make copies of it as per the replication factor defined. We have seen instances where clients miss to set this as a result corresponding index doesn’t get replicated. This setting should be configured on all indexers in an indexer cluster and should be pushed through cluster master.
# Replicate ALL INDEXES

[default]

repFactor = auto

Other Parameters to modify as per requirements: Apart from the above best practices there are a number of other parameters which you can define/modify to change the way Splunk indexes the data thus tailoring the indexing process as per the client’s unique environment and requirements. We will mention them briefly here for your easy reference:
1. maxWarmDBCount – The maximum number of warm buckets. Default: 300
2. thawedPath – An absolute path that contains the thawed (resurrected) databases for the index.
3. maxGlobalDataSizeMB – The maximum size, in megabytes, for all warm buckets in a SmartStore index on a cluster.
4. maxHotBuckets – Maximum number of hot buckets that can exist per index.

Sample Indexes Configuration File:

[default]

frozenTimePeriodInSecs=31556952

maxTotalDataSizeMB = 100000

homePath = volume:hot_warm/$_index_name/db

coldPath = volume:cold/$_index_name/colddb

thawedPath = /splunkcold/$_index_name/thaweddb

summaryHomePath = volume:cold/$_index_name/summary

tstatsHomePath = volume:cold/$_index_name/datamodel_summary

coldToFrozenDir = /splunkarchive/$_index_name/frozen

[volume:hot_warm]

path = /splunk_hot_warm

maxVolumeDataSizeMB = 4590000

[volume:cold]

path = /splunkcold

maxVolumeDataSizeMB = 13680000

[paloalto]

[trendmicro]

[email_gw_bc]

[web_sec_fw_bc]

[web_app_fq_bc]

[nx_fireeye]

[ex_fireeye]

[efc]

[ata]

[windows]

[wineventlog]

Further Reading/References:

https://docs.splunk.com/Documentation/Splunk/8.2.5/Indexer/Aboutindexesandindexers

https://docs.splunk.com/Documentation/Splunk/latest/admin/Indexesconf

https://docs.splunk.com/Documentation/Splunk/8.2.5/Indexer/ConfigureremotestoreforSmartStore

You have reached to the end of the 2-part blog series, you have any feedback, suggestions feel free to drop a note to author of this series – bharat.jindal@citrusconsulting.com

Comments

0 Likes

Citrus Consulting Services

Best Practices to Define Your Splunk Indexes – Part 2

Post a Comment cancel reply

Citrus Consulting

Services

Office Locations

About Us

Citrus Consulting Services

Best Practices to Define Your Splunk Indexes – Part 2

Related Posts

ITSI Version upgrade from 4.4.x to 4.7.x

Splunk Version Upgrade Planning

Splunk Enterprise Security Configuration Best Practices

Palo Alto Dashboards & Reporting – What you can do and How?

Post a Comment cancel reply

Get Exclusive Industry Insights

Access the Webinar Video