- resources locking design: by CPU, by processes, by threads etc.
- how data structures is shared by the CPU (per-CPU or no per-CPU locking),
- total nos of global resources and its locks,
- nos of CPU,
- CPU task scheduler design,
- atomicity of instructions (number of bytes for each instructions),
- caching / prefetching,
- speculative prefetching,
- timeout / round-trip time (for gigabit networking)
- memory pool design (allocated/deallocated pool etc)
- memory vs storage usage,
- one-off vs repeated request design (asynchronous vs synchronous design)
- traceability/debuggability vs scalability/performance (amount of logging)
- bottleneck monitoring & identification – how resources competitions are even out and smoothen out (information cached in procfs etc)
- CPU utilization
- GPU utilization
References:
https://github.com/brendangregg/msr-cloud-tools
https://www.youtube.com/watch?v=zxCWXNigDpA
https://thetechsolo.wordpress.com/2016/02/29/scalable-io-events-vs-multithreading-based/
https://murphyswork.wordpress.com/2015/08/01/linux-tcp-connections-tuning-for-scalability/
https://thetechsolo.wordpress.com/2016/08/28/scaling-to-thousands-of-threads/
https://cs.nju.edu.cn/tianchen/lunwen/2017/sgws-Zhao.pdf
https://lwn.net/Articles/295094/
https://lwn.net/Articles/419811/
https://lwn.net/Articles/633538/
https://lwn.net/Articles/704478/
http://www.ncic.ac.cn/~majie/Papers/References/p294_SC2000.pdf