Experimenting with nftables flowtable as an iptables enthusiast

I am presently experimenting with a software-based routing-offload feature of nftables that I am not used as an iptables fan called flowtable. I haven’t had a chance yet to measure the performance of this config but I am using the commands below to help set it up in my firewall:

nft add flowtable ip filter fast "{ hook ingress priority 0; devices = { eth0, eth1 }; counter; }"
nft add rule ip filter FORWARD iifname "eth0" oifname "eth1" ct state "{ established, related }" counter flow add @fast
nft add rule ip filter FORWARD iifname "eth1" oifname "eth0" ct state "{ established, related }" counter flow add @fast
nft add rule ip filter FORWARD iifname "eth0" oifname "eth1" ct state "{ established, related }" counter accept
nft add rule ip filter FORWARD iifname "eth1" oifname "eth0" ct state "{ established, related }" counter accept

You will see some connections being tracked and offloaded with the conntrack -L command:

tcp 6 src=192.168.99.1 dst=1.2.3.4 sport=52077 dport=443 src=1.2.3.4 dst=10.10.10.2 sport=443 dport=52077 [OFFLOAD] mark=17 use=2
udp 17 src=192.168.99.1 dst=2.3.4.5 sport=53055 dport=4500 src=2.3.4.5 dst=10.10.10.2 sport=4500 dport=53055 [OFFLOAD] mark=17 use=2

~

Google Cancels Gmailify! :O :(

Link: https://support.google.com/mail/answer/16604719?hl=en

First, Yahoo cancels simple mail forwarding. Then I try to use the Google gmailify feature to get my email as well as be able to send mail on behalf of my domain name. Next, Google cancels gmailify so now I have recently switched everything over to Fastmail, which offers both of these features plus the label organization and filtering capabilities — but for a price of course. Anyway, I’ve been using Gmail for many many years since it was Beta as my default mail client but now that time has come to an end sadly…

😦

The Typeplus Stabilizer Design & Movement

I wanted to highlight this interesting new keyboard stabilizer design that the Typeplus Stabilizer offers. It has a hooked wiring ending insert which basically slides/glides along an internal track within the stem housing when actuated. With only a little bit of lube needed, there is no wire ending rattling or ticking that can occur because of that simple hooked wire and track movement design. I will be testing these out first in the 7U spacebar on the Mode Envoy!

Kit: https://kbdfans.com/products/typeplus-x-yikb-screw-in-stabilizer

Generating better random numbers in C with the help of /dev/urandom (practical-random) and (s)rand (pseudo-random) [88-bit seed]

This example code will first load in 88-bits worth of /dev/urandom data into an initial seed variable which will then feed the first 32-bits into the srand() algorithm seed. Then the regular rand() function can be used to mix in the generated random data into the seeded random data to create a new stream of random output data to be used by the application.

Year End Summary – OpenVPN Modifications in a Screen Cap – Sometimes Lines of Code Removed is a Better Metric!

I’ve spent a few number of months ironing out some remaining edge cases and code paths to get this highly modified version of OpenVPN to work as stable as I intended it to. It’s basically a lighter-weight TCP-protocol focused-version of OVPN with a number of extra huge unused libraries removed, including WIN32, which I never have run anything on for the last few decades at least now. The summary of all the modifications I made comes out to roughly 3,000 lines added and over 25,000 lines of code removed! I did not remove the UDP library or protocol files from it since that was the OG VPN protocol and it doesn’t really get in the way of things, I may just leave it be anyway.

Warning: This is a really long screen capture to scroll through!

~

Secure VPN DNS Service – Forwarding/Proxying/Caching/Ciphering – Python

Since I am running a network-wide VPN tunnel on behalf of the clients on the network, any plaintext UDP based DNS packets would be protected from your ISP seeing them, however, the VPN servers ISP would still be able to see them all. I decided to write a new Python DNS server which will listen on the VPN client side and redirect all plaintext UDP DNS traffic to it locally instead. It will then create a TCP SSL connection to a DNS server through the VPN tunnel and perform the query via DNS-over-TLS as a replacement. The answer and response can also then be cached which will help to reduce the amount of UDP DNS packets being sent over the VPN tunnel as well!

Source Code: https://github.com/stoops/xfer/blob/main/fdns.py

~

Finally Able to Insert a Proper Layer of Bi-Directional Multi-Threaded Set of Core Operations to the Highly-Modified OpenVPN Source Code!

Edit-Edit: After spending several more days tracing and tracking down connection state edge cases, I was able to greatly improve the performance of this modification. I had to rewrite most of the ssl library, socket library, packet library, forward library and multi/io libraries as well as remove some unneeded code paths including the fragment, ntlm, occ, ping, proxy, reliable, and socks libraries. I also filed a couple more code quality related PRs as well ( #4 ~ #5 ).

Edit: After spending several days of testing and debugging, I found a couple bugs in the OpenVPN source code that I have informed the developers about ( #1 ~ #2 ~ #3 ). I was finally able to get 3 parallel session key states negotiated and renegotiated which the framework was never accounting for to happen! The framework allows for a 3-bit key ID field code to be used and negotiated.

Session-State-Key-IDs(1,2,3): Client to Server Traffic Communication

Session-State-Key-IDs(4,5,6): Server to Client Traffic Communication

Session-State-Key-IDs(0,7): Backup General Traffic Communication


We first negotiated our set of 3 keys for each of the 4 threads in this 1 process:
Session State Key IDs: no=1, no=7, no=4

2025-11-02 21:05:23 TCPv4_CLIENT DUAL keys [5] [ [key#0 state=S_GENERATED_KEYS auth=KS_AUTH_TRUE id=0 no=1 sid=0eb81f5c fb6a0553] [key#1 state=S_INITIAL auth=KS_AUTH_FALSE id=0 no=0 sid=00000000 00000000] [key#2 state=S_GENERATED_KEYS auth=KS_AUTH_TRUE id=0 no=7 sid=be550b82 c30fb5ec] [key#3 state=S_GENERATED_KEYS auth=KS_AUTH_TRUE id=0 no=4 sid=d0f67472 3dc74469] [key#4 state=S_INITIAL auth=KS_AUTH_FALSE id=0 no=0 sid=00000000 00000000]]

We then renegotiated each of the 3 keys for each of the 4 threads for a total of 12x keys:
Session State Key IDs: no=2, no=0, no=5
Lame State Key IDs: no=1, no=7, no=4

2025-11-02 21:06:09 TCPv4_CLIENT DUAL keys [5] [ [key#0 state=S_GENERATED_KEYS auth=KS_AUTH_TRUE id=0 no=2 sid=68cc0579 08edeaed] [key#1 state=S_GENERATED_KEYS auth=KS_AUTH_TRUE id=0 no=1 sid=0eb81f5c fb6a0553] [key#2 state=S_GENERATED_KEYS auth=KS_AUTH_TRUE id=0 no=0 sid=66f6bd41 5b9c05bd] [key#3 state=S_GENERATED_KEYS auth=KS_AUTH_TRUE id=0 no=5 sid=af1447d1 95e7c215] [key#4 state=S_GENERATED_KEYS auth=KS_AUTH_TRUE id=0 no=4 sid=d0f67472 3dc74469]]

The second keys to each primary key listed are considered lame keys which are only meant for back up purposes only
and are set to expire as they will be rotated out upon the next key renegotiation.

When it comes to tunnelling and proxying data, there are in general two independent pipeline directions, read-link->send-tunn && read-tunn->send-link. I separated out some shared limiting variables in the bulk-mode source code which were the c2.buf && m->pending variables so that the data processing can operate independently for RL->ST and RT->SL. I also added a separate additional session state cipher key in the new dual-mode so that the PRIMARY key can handle client->server encryption/decryption independently and the new THREAD key can now be used for server->client traffic communication.

~

~

Using TCP+BULK+MTIO+DUAL modes together with iptables performing dynamically distributed load balancing over 2x OpenVPN process tunnels running with 16x available threads — I am now able to achieve 144,000 bytes worth of data at a given time onto the 1500 byte MTU VPN link!

~

Commit Code: github.com/stoops/openvpn-fork/compare/mtio…dual

Pull Request: github.com/OpenVPN/openvpn/pull/884

Complete Commits: github.com/stoops/openvpn-fork/compare/master…bust

~

Linux Routing – Load Balancing

So, I also learned another new lesson recently that the Linux kernel way back in the day used to properly load balance routes (nexthop weights) by making use of a routing cache to remember which connection is associated with which route. Then, that cache implementation was removed and replaced instead with sending individual packets randomly down each route listed (similar to how I was initially approaching the OpenVPN load balancing between multiple threads). This, however, would break connection states and packet ordering as the two route links may be going to separate places entirely. Then, the Linux kernel developers decided to implement a basic hash mapping algorithm that would associate a connection stream with the same routing hop, always. This is a very limited form of load balancing as the same source+destination address will always map to the same hash which will always match the same routing path every time (this will be even more limiting also if you are using a source NAT).

It turns out there is another trick to get some more dynamically distributed load balanced routing under Linux which is to make use of the iptables mangle table and connmark a new connection state so that the conntrack table can save+restore the specified packet markings. You can then set an ip rule to pick up these firewall markers and associate them to a different routing table. In addition to this, you are able to use a random algorithm or modding algorithm to evenly set different marks giving you even greater routing variety!

$ipt -t mangle -A PREROUTING -i lan -m mark --mark 0x0 -j CONNMARK --restore-mark
$ipt -t mangle -A PREROUTING -i lan -m statistic --mode nth --every 2 --packet 0 -m mark --mark 0x0 -j MARK --set-mark 8
$ipt -t mangle -A PREROUTING -i lan -m statistic --mode nth --every 1 --packet 0 -m mark --mark 0x0 -j MARK --set-mark 9
$ipt -t mangle -A PREROUTING -i lan -m mark ! --mark 0x0 -j CONNMARK --save-mark
echo "8 vpna" >> /etc/iproute2/rt_tables
echo "9 vpnb" >> /etc/iproute2/rt_tables
ip rule add fwmark 8 table vpna
ip rule add fwmark 9 table vpnb

You can then now add your VPN tunnel routing rules to both new ip routing tables, for example, vpna and vpnb!

~

Solving a Final Remaining Performance Impact with Mutli-Threaded Operation by using Connection-State Mapping in the Highly-Modified OpenVPN Source Code [Implementation]

In my previous blog post, I started observing strange performance issues with using my network-wide, highly-modified OpenVPN application setup. I noticed that everything ran fast and speedy in bulk-mode, however, in mtio-mode things began to not work as smoothly as expected (speed tests were good, TCP seemed alright, but UDP appeared fairly impacted). When I thought about it more in depth, I realized that my multi-threaded implementation of OVPN was simply throwing any data read off of the TUN interface into any available thread all at the same time to try and maximize parallel performance. The issue with doing it this way was that I was breaking up the ordering of packets in either of the UDP or TCP “streams” of data and non-advanced/capable API applications were not able to handle this as well as expected (although TCP seem to fair a bit better but you could still sense a hesitation or lag to the connection).

I wrote about this observation to the OpenVPN devs and they agreed that this setup could cause connection problems with having to perform mass-reordering of all the packets all the time. I then came up with the split tunnel solution in my previous post which did help to temporarily solve the issue, however, I wanted to find a way to solve the issue in source code as well instead. I did not, however, want to implement packet tracking and ordering as I would then be basically re-implementing the whole entire functionality and complexity of a TCP protocol all over again. Instead, I chose another way to help prevent this issue which was to implement a simple version of connection state tracking and mapping (similar to how iptables conntrack would work). All I would need to do is parse and extract the source+destination addresses from the packet header and place them in a mapping table for a brief amount of time and associate that connection “stream” of data with an available thread. When you combine this change with the bulk-mode operation, it now makes for a very snappy and performative VPN experience overall. I was able to implement this change in less than 75 lines of code with only a single helper method!

Complete Code Change Commits: github.com/stoops/openvpn-fork/compare/master…bust

~

~

Split VPN Tunnelling and Routing Based on Packet Protocol and Port for Improved Network Performance and App Compatibility

So I’ve been running a highly modified version of OpenVPN and speed and performance have been pretty good overall, however, I noticed that some very few apps would use a custom UDP based API call for data transfer (Ring live video app) and the video would appear very choppy and blurry. I suspected this is because the multi-threaded version of OpenVPN does not preserve UDP packet ordering (whereas UDP protocols like QUIC seem to be more advanced and capable).

I am now experimenting with a solution to this issue by running two VPN tunnels at the same time, one is bulk-mode-multi-threaded for the majority of smart protocols and data and the second is bulk-mode-single-threaded for some of the simpler UDP protocols that assume a specific order of packets arriving from a server or service. This technique requires use of iptables mangle packet marking as well as ip rule mark setting with additional routing table rules.

iPhone 17 Base – The best of the iPhone releases this year… almost perfect!

I have been a fan of smaller sized iPhones ever since I initially purchased the 12 Mini (as well as the 13 Mini which I still have as part of my collection today). It allows me to more easily carry and use in light-weight, one-handed operation while I am out carrying something in my other hand, for example answering phone calls or looking information up while waiting in line. Ideally, this size would be less than 6″ diagonal, preferably 5.7″ (+/- 0.2″) with as small of a camera bump as possible.

I had switched over to the iPhone 15 Pro due to the always-on-display feature which is important to me so that I can catch any missed important notifications at a quick glance while I’m working at my desk and my phone is resting on the table. However, the 15 Pro is far from “small” as it comes in at 6.1″ (which actually technically makes it the smallest AOD screen that has been offered from Apple to date).

This year, Apple surprised me as they added the AOD screen to the regular base iPhone 17 which would save me quite a bit of money from having to buy the Pro versions in the future during upgrades. However, the screen size comes in at 6.3″ which is the only option available. If Apple were to offer this phone in 5.7″, it would be an instant buy from me tomorrow… it’s almost perfect!

Update: I just learned of this new signed memory security feature that was implemented at the A19 hardware level which is really tempting me to upgrade now… [Apple]

~

Buffer Bloat Buster – Running a TCP VPN tunnel with large memory buffer sizes

I have been running the highly modified version of OpenVPN which is a TCP VPN with large sysctl memory buffers set and I am trying a new add-on experiment of using an extra dedicated thread to send dummy/mock UDP data at a specific bit rate, for example 1024 kbps, through the tunnel at a constant and consistent pace. It works with both client and server modes to get the virtual IP addresses (or you can specify one yourself) and auto sends random bytes to each IP in attempt to keep the tunnel active and alive at all times!

Server-to-Client (~750kbps down)

[--bust-mode 750 0.0.0.0]

~

Client-to-Server (~250kbps up)

[--bust-mode 250 10.0.0.1]

~

~

Patch Diff: https://github.com/stoops/openvpn-fork/compare/mtio…bust