Linux offers a lot of flexibility, But in doing so it also offers a lot of potential for improvement.
Since kernel/distro maintainers can't guess where your installation of linux will be used general assumptions are made and safe options are choosen,
However those options may not always be the optimal ones.
Here are some general tips for picking the right options.
Note
Some kernels don't include some of the features/configs listed here.
To check if your kernel offers something run zcat /proc/config.gz | grep CONFIG_X
where CONFIG_X is what the "Needs" section mentions.
Important
Please reference external sources since i'm no expert in any of this matter,
All of the data written here is from my own personal testing/experience.
Caution
Please read through each section and benchmark things yourself.
Blindly copy/pasting things from here may lead to undesirable behavior.
It's common knowledge that your CPU/GPU runs at different clock speeds depending on the workload, But if you don't really care about energy efficiency and want to improve latency/performance you can switch to the performance governor.
This can be done via:
CPU: echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
Needs: CONFIG_CPU_FREQ_GOV_PERFORMANCE
GPU: echo performance | tee /sys/class/devfreq/*.gpu/governor
Needs: CONFIG_DEVFREQ_GOV_PERFORMANCE
Other: Same as GPU if available.
On systems with slow/fast cores you may find that running a multi threaded process with all cores enabled is slower than running with just the fast cores.
This is usually not that noticable but can be a severe performance bottleneck in some systems.
When you run a multi threaded process on heterogeneous system, Your program will only run as fast as the slowest core it can access,
So even if you have really fast cores, As long as one slow core is available for the program to use,
The fast cores will have to wait until the slow core is done processing.
Additionally since the fast and slow cores "live on separate nodes" there's a slight data transfer penalty if the cores have to communicate with one another.
To alivieate this you can limit which cores a program runs on by using the taskset command.
Example 1: taskset -c 0-3 stress -c 4
This will stress cores 0,1,2,3
Example 2: taskset -c 4-7 stress -c 4
This will stress cores 4,5,6,7
Networking on linux can be complicated, There's far too much stuff to mess with or fine tune.
The following section has only been tested on small home networks, Your milage may vary on high end/many user networks.
(Please run benchmarks and tests and measure that there's a positive impact by the following changes/recommendations)
Maximum Transmission Unit Refers to the largest size of a packet that can be sent over a network.
Ethernet tends to default to 1500, But most modern NICs support "Jumbo Frames" which is an MTU of 9000.
Larger MTU means you can pack more data into a single packet allowing for more throughput, HOWEVER this also means that if that packet is corrupt or dropped more data has to be re-transmitted.
It's generally recommended to set your MTU to the maximum your NIC supports for lossless networks. (Ethernet)
Example: ip link set dev eth0 mtu 9000
Note: If you have a bridge device be sure to also change it's MTU
Congestion Control refers to how a machine handles high network traffic.
For wired networks i personally found out that H-TCP seems to be most reliable, But again this highly depends on your network and hardware so please double check.
For wireless networks BBR or BIC might be better, Again please double check.
H-TCP seems to offer the least drops/retries with highest real world bandwidth.
BBR/BIC seem to offer lots of drops/retries with absurdly high "throughput".
Changing the algorithm:
H-TCP: sysctl -w net.ipv4.tcp_congestion_control = htcp
Needs: CONFIG_TCP_CONG_HTCP
BBR: sysctl -w net.ipv4.tcp_congestion_control = bbr
Needs: CONFIG_TCP_CONG_BBR
BIC: sysctl -w net.ipv4.tcp_congestion_control = bic
Needs: CONFIG_TCP_CONG_BIC
Quality of Service refers to how a machine handles network packet classification.
From my testing i found that FQ-Codel seems to be the best choice all around, But in some cases CAKE might be the other best alternative.
Changing the algorithm:
FQ-Codel: tc qdisc replace dev eth0 root fq_codel
Needs: CONFIG_NET_SCH_FQ_CODEL
CAKE: tc qdisc replace dev eth0 root cake
Needs: CONFIG_NET_SCH_CAKE