per-cpu flow caching in FreeBSD

Along with some other experimental work (multiple transmit queues - which have just now made it in to HEAD) I now have per-cpu flow caches in a personal svn branch:
http://svn.freebsd.org/base/user/kmacy/HEAD_fast_multi_xmit/

Interestingly enough, with small numbers of connections with TSO enabled there is little measurable benefit for normal frames or 9k frames. Evidently TSO does a sufficiently good job of coalescing calls down in to ip_output that the reduction in lock contention or lookup time doesn't measurably impact performance. However, with an MTU of 1500 bytes and TSO disabled there is a clear impact. Below, "current" is flow caching disabled and using a single transmit queue and "multiflow" is flow caching and multiple transmit queues enabled.Measurements are in Gbps using Robert Watson's tcpp:

./tcpp -c 10.0.0.150 -p 4 -t 100 -m 100 -b 10000000

This connects to 10.0.0.150 with 4 processes, each process creates a 100 connections and pushes 10MB across each connection before closing.


ministat -c 90 -w 74 current multiflow
x current
+ multiflow
+--------------------------------------------------------------------------+
|             xx                                        +                  |
|x  x        xxx                                        +  +               |
|x  x xx x   xxxx                         ++ +          ++ + + +++  +     +|
|     |____A__M_|                                |________AM_______|       |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  19      1.886103      2.457827      2.388812     2.2664012    0.20140299
+  15      3.476315      4.744577      4.153121     4.1130334    0.35428067
Difference at 90.0% confidence
        1.84663 +/- 0.163126
        81.4786% +/- 7.19758%
        (Student's t, pooled s = 0.2788)

As you can see from ministat, there is an 81% increase in throughput over the default implementation for 400 connections spread over 4 processes. Interestingly, there is a 30% increase in performance even for single connections. This would appear to indicate that rtentry and ARP lookup are fairly expensive.

I took these measurements a week ago. I've managed to substantially further improve aggregate throughput since then. However, I'll talk about that another time.

per-cpu flow caching in FreeBSD

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112