"I continue to highlight the very best of the over 300 tips I've extracted this year. The top tip this month: On modern systems O(N) analysis is insufficient, as memory transfer often dominates algorithm complexity analysis; the next tip ..."
"For ultra-low latency: you can't have any GCs; use shared memory; apply single-threaded processing logic (no synchronization) with the thread pinned to the core and all other threads excluded from that core; use simple object pooling (single-threaded); scale by partitioning data across non-shared threads/processes/microservices; spin when waiting for data to keep hold of the CPU and keep it hot; record everything so that you can replay in test to analyse outliers; don't cross NUMA regions, each process/microservice should run on one core; use wait-free data structures (no waits and guaranteed that the thread can always proceed and will finish within a set number of cycles); run replicated hot-hot for high availability"