How to Setup your Linux OS for Splunk Installation Correctly
How to Setup your Linux OS for Splunk Installation Correctly?
Splunk Enterprise the leading SIEM and data analytics solution comes as a software installer that can be installed on a number of operating systems such as Linux, Windows, and macOS. Splunk doesn’t provide ISO or OVA files which can be simply spun up along with OS on the Hyper virtualization software like Windows Hyper-V or VMWare ESXi directly. As such the responsibility relies on clients to install and set up their own preferred choice of OS and harden it as per their organization’s needs. Once this is done often by the client’s system admin team, they hand over these VMs to Splunk Consultant to start the Splunk Installation and configuration process.
Now time and again we have seen due to certain OS level specifications or the way VM is provisioned, or the way disk storage is mounted on different servers it leads to degraded Splunk performance. And at times serious issues with indexing as it requires large and designated space to store various buckets such as hot, warm, cold frozen, etc.
In the current blog post, we will try to mention the best practices and provide the commands which Splunk Consultants and client’ system teams can easily follow to set up the OS correctly before even starting with the actual Splunk Installation. For this article, we will take Linux OS as our reference OS as it’s the most widely used and the most recommended OS for most customers to install their Splunk Enterprise given the efficiency and security it offers as compared to other OS options available.
By following this procedure, you will also eliminate the risk of meeting OS imposed limits that impede high-performance Splunk operation, regardless of how the Splunk process is started or if any boot-time race conditions exist.
General Considerations for all Splunk servers
- Setting Ulimits and Transparent Huge Pages – The optimal method for setting appropriate file descriptors (open files), max processes, and max file size via ulimits, as well as disabling transparent huge pages memory allocation are described in the following steps. By following this procedure, you will eliminate the risk of meeting OS imposed limits that impede high-performance Splunk operation, regardless of how the Splunk process is started or if any boot-time race conditions exist. We have found it necessary to do ALL of these steps in order to deal with all of the different complexities of getting this right, especially in cases to ensure these settings survive a restart AND are actually applied to the Splunk processes. These steps should be followed on all parts of your Splunk Server Infrastructure (All Search Heads, Indexers, Heavy Forwarders, Syslog Forwarders).Step 1: Set ulimit settings in /etc/security/limits.conf as root
If this system is running Splunk as a user called “splunk,” add the following to the bottom of the /etc/security/limits.conf file:
splunk hard core 0
splunk hard maxlogins 10
splunk soft nofile 65535
splunk hard nofile 65535
splunk soft nproc 20480
splunk hard nproc 20480
splunk soft fsize unlimited
splunk hard fsize unlimited
If you are running as root or some other user, it is appropriate to replace “Splunk” with the name of that user or an asterisk * to apply the settings to all users. Note that it is unsafe to apply the settings to all users unless the machine is ONLY in use for Splunk (this should be the case, however).
Step 2: Add a script to /etc/rc.d/rc.local
Note the information in this step pertains to derivatives of Red Hat Enterprise Linux (RHEL) that use the THP settings in /sys/kernel/mm/transparent_hugepage. The actual RHEL distribution uses a slightly different path: /sys/kernel/mm/redhat_transparent_hugepage/ – you may need to modify the code in this step to reflect the correct version of this for your system. You can use the ls command to confirm which one works for your system:
[root@deploy ~]# ls /sys/kernel/mm/redhat_transparent_hugepage/
ls: cannot access /sys/kernel/mm/redhat_transparent_hugepage/: No such file or directory
[root@deploy ~]# ls /sys/kernel/mm/transparent_hugepage/
defrag enabled khugepaged use_zero_page
[root@deploy ~]#
There are two sets of configurations for THP, depending on if you are using RedHat Enterprise Linux or some other variant distribution. Add the following to the bottom of your /etc/rc.d/rc.local file:
#SPLUNK: disable THP at boot time
THP=`find /sys/kernel/mm/ -name transparent_hugepage -type d | tail -n 1`
for SETTING in “enabled” “defrag”;do
if test -f ${THP}/${SETTING}; then
echo never > ${THP}/${SETTING}
fi
done
Step 3: Make /etc/rc.d/rc.local executable
Run the following command as root:
chmod +x /etc/rc.d/rc.local
Step 4: Reboot the OS, and confirm within splunkd.log that your Splunk process is under the correct settings.
Don’t just restart Splunk. Reboot your OS. Allow Splunk to start on its own. Then run the following to confirm your settings are correct:
cat /opt/splunk/var/log/splunk/splunkd.log | grep ulimit
- Turn OFF SELInux – It has been observed when SELinux is enabled on the OS level it restricts what Splunk needs to do for its normal operations. As such we recommend turning it off or if not possible at least put it in “permissive” mode
Open the config file and set the parameter as shown below:
vi /etc/selinux/config
SELINUX=disabled
- Check the Firewalld – In case as per company policy you need to have OS-level firewall make sure you open the required ports for Splunk on the OS. Following are a few useful commands you can use
# firewall-cmd –list-ports
# firewall-cmd –permanent –add-port={8000/tcp,8089/tcp,22/tcp}
# firewall-cmd –reload
- Don’t Run Splunk as Root – As per best security practices, it’s recommended to NOT run Splunk as the root user, rather create a new user “Splunk” and group “Splunk” and use the same for installing, configuring, and managing Splunk files.
Now you can either use the .rpm installation file from Splunk and will create the required user and install the Splunk using the same. Otherwise, you can manually create the required environment using the following steps
Step 1. Create a Splunk user & group
#groupadd splunk -g 1000 | splunk group creation
# useradd splunk -g splunk -u 1000 –g splunk | splunk user creation
# passwd splunk | set splunk user password
Step 2. Give Splunk user Sudo privileges.
#visudo
Edit the sudoers file and add the following line for the Splunk user as shown below in purple.
## The COMMANDS section may have other options added to it.
##
## Allow root to run any commands anywhere
root ALL=(ALL) ALL
splunk ALL=(ALL:ALL) ALL
Step 3. Copy Splunk file to /opt & extract it
#cp splunk-<>.tgz /opt/
#tar -zxvf splunk-<>.tgz
Step 4. Modify Splunk directory permissions so the application can run as a Splunk user.
#chown -R splunk:splunk /opt/splunk/
Step 5. Login as splunk and start the Splunk application.
#su splunk
#cd /opt/splunk/bin
#./splunk start
Step 6. Login as root and enable the application to start at boot time as the splunk user.
#su root
#cd /opt/splunk/bin
#./splunk enable boot-start -user splunk
Storage Consideration for Indexers
Splunk indexed data goes through various stages during its lifecycle as shown below:
Hot Bucket > Warm Bucket > Cold Bucket > Frozen/Archived > Thawed(Manual process)
This allows different buckets to be stored on different storage types which can in turn is very useful to improve efficiency and reduce storage costs. Below are the recommended configurations for each bucket/storage type and example indexes.conf parameters that can be utilized by the client.
Bucket/Data Type | Data Type | Sample Indexes.conf Configuration | Minimum IOPS | Example Storage media |
Hot and Warm Buckets | Searchable | [volume:hot1] path = /mnt/fast_disk[idx1] homePath = volume:hot1/idx1 | 1200 and above | SDD |
Cold Buckets | Searchable | [volume:cold2] path = /mnt/big_disk2[idx1] coldPath = volume:cold1/idx1 | 800 IOPS and above | SDD or HDD |
Archived Buckets | Non-Searchable | [default] coldToFrozenDir = /mnt/archived/$_index_name | Doesn’t matter | NAS, disk tapes |