Skip to content

[application] ThreadsApplication#14

Merged
nhairs merged 19 commits intomainfrom
nhairs-threadpool
Mar 13, 2026
Merged

[application] ThreadsApplication#14
nhairs merged 19 commits intomainfrom
nhairs-threadpool

Conversation

@nhairs
Copy link
Owner

@nhairs nhairs commented Mar 6, 2026

  • Create ThreadsApplication and expose on the CLI using --application.
  • All Transports now support timeout on server start and on message receive.
  • Various updates to logging, which includes changing the repr of logged objects

Fixes #13

Test Plan

@nhairs
Copy link
Owner Author

nhairs commented Mar 6, 2026

Hi @grigi - I've here's a rough draft with multithreading on the workers if you'd like to give it a try and see if we get better performance.

@grigi
Copy link

grigi commented Mar 6, 2026

I had a quick look and using the ThreadsApplication this is promising ito performance, but it seems to fail quite easily when using dnsperf against it with large quantities.

I'm starting to think I'm approaching this problem the wrong way.

@grigi
Copy link

grigi commented Mar 6, 2026

I tried the fixes you did during me testing this, and at a concurrency of 100 is now seems much more reliable.

@nhairs
Copy link
Owner Author

nhairs commented Mar 6, 2026

but it seems to fail quite easily when using dnsperf against it with large quantities.

So I've noticed this when doing some of my local benchmarking (against domain-park which uses nserver).

I've noticed that for TCP, tests work at first then suddenly stop. UDP seems to work (but eventually starts dropping packets). It makes no difference if I'm using coredns in front locally or not.

Yet when I run the same test against the public version of domain-park.org, I don't seem to see the same connection drops.

I suspect that there is something funky happening with how many TCP connections you can create or receive locally (potentially around file descriptors?).

@grigi
Copy link

grigi commented Mar 6, 2026

Yes, port exhaustion.
TCP has the problem where it has a cooldown period after the connection closed before it will be available for re-allocation. So, for TCP you want to setup some kind of pipelining where you re-use the connection over and over.

For UDP it's almost worse as if you create a new socket to do a DNS lookup (the default in many stub clients) you'll exhaust ports much faster than TCP as instead of a short default 20-second cooldown for a confirmed closed flow, the IP spec says to use a longer cooldown (typically 20 minute) of a n unconfirmed closed flow.
So you should pre-allocate a UDP socket and re-use it as much as possible.

Luckily DNS has been designed to be stateless UDP with a response stating which question it's answering, so the onus is on the client to match it up with the data queried.

Unfortunately, nearly nobody does that other than dedicated DNS server software.

And if you have a NAT in the way they will expire the flows as timers dictate or their buffers fill up and they end up forgetting half the flow (so e.g. requests get received, but responses get null-routed).

It's a pain 😞

@grigi
Copy link

grigi commented Mar 6, 2026

I don't really have a reason as to why it works better over the internet other than maybe latency or router strategies masking the issue.

I have found the same though.

@nhairs
Copy link
Owner Author

nhairs commented Mar 6, 2026

So I'm wondering if it's testing tool related (especially when on the same host).

I tend to use two different tools for my benchmarking.

jedisct1/dnsblast - appears work reliably when testing up to 1000rps. Above this I start to run into "too many open files" errors. I'll note that it only tests with UDP.

sandeeprenjith/dnsblast - sends a few hundred requests before becoming unresponsive during local testing. This is the same on both UDP and TCP. However when testing against domain-park.org it is successful.

I think I'd want to test it between two hosts on my local network to see if that helps. Though it's pretty late here so might not get to it for a few days.

@nhairs
Copy link
Owner Author

nhairs commented Mar 6, 2026

All that said, I'm hoping the concurrency can hit your targets regardless of our benchmarking issues.

If this looks promising for you'll I'll keep on cleaning it up next week.

@grigi
Copy link

grigi commented Mar 6, 2026

So I'm wondering if it's testing tool related (especially when on the same host).

Might very well be. I've been using DNSPerf for years now, it supports udp/tcp/dot(tls)/doh(http2).

I've just managed to do 10,000,000 (10 million) DNS queries to this threaded system where I simulated 10ms IO latency over UDP locally:

DNS Performance Testing Tool
Version 2.14.0

[Status] Command line: dnsperf -m udp -p 5300 -d ./queries.txt -T 4 -c 20 -n 1000000 -t 10 -q 200
[Status] Sending queries (to 127.0.0.1:5300)
[Status] Started at: Fri Mar  6 09:47:43 2026
[Status] Stopping after 1000000 runs through file
[Status] Testing complete (end of file)

Statistics:

  Queries sent:         10000000
  Queries completed:    10000000 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 9000000 (90.00%), NXDOMAIN 1000000 (10.00%)
  Average packet size:  request 35, response 52
  Run time (s):         2407.071726
  Queries per second:   4154.425434

  Average Latency (s):  0.048128 (min 0.003427, max 0.159618)
  Latency StdDev (s):   0.008821

@nhairs nhairs changed the title Multithreaded Applications [application] ThreadsApplication Mar 11, 2026
@nhairs nhairs force-pushed the nhairs-threadpool branch from e9bd5df to 80810c9 Compare March 11, 2026 08:34
@nhairs
Copy link
Owner Author

nhairs commented Mar 11, 2026

So I've setup dnsperf on another host on my network and run it against a test server on my laptop. It runs successfully - which means that there is definitely problems trying to benchmark TCP on the same host.

I'll note that this isn't super indicative of multi-threaded performance per se as the server instance is just returning static response (i.e. no backend IO).

If we want better benchmarking before we ship this PR, we'll need to come up with a better test server that replicates IO based activity (time.sleep(0.01)?).

DNS Performance Testing Tool
Version 2.14.0

[Status] Command line: dnsperf -m tcp -s 192.168.20.47 -p 5301 -d ./dnsperf-queries.txt -T 2 -c 20 -n 1000000 -t 10 -q 200 -l 120
[Status] Sending queries (to 192.168.20.47:5301)
[Status] Started at: Wed Mar 11 09:37:57 2026
[Status] Stopping after 120.000000 seconds or 1000000 runs through file
[Status] Testing complete (time limit)

Statistics:

  Queries sent:         599879
  Queries completed:    599879 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 199960 (33.33%), NXDOMAIN 399919 (66.67%)
  Average packet size:  request 33, response 33
  Run time (s):         120.048419
  Queries per second:   4996.975429

  Average Latency (s):  0.039913 (min 0.006011, max 0.261634)
  Latency StdDev (s):   0.003643

Connection Statistics:

  Connection attempts:  20 (20 successful, 100.00%)

  Average Latency (s):  0.005316 (min 0.000938, max 0.009881)
  Latency StdDev (s):   0.002402

UDP server with same number of threads for comparison

[Status] Command line: dnsperf -m udp -s 192.168.20.47 -p 5301 -d ./dnsperf-queries.txt -T 2 -c 20 -n 1000000 -t 10 -q 200 -l 120
[Status] Sending queries (to 192.168.20.47:5301)
[Status] Started at: Wed Mar 11 09:42:02 2026
[Status] Stopping after 120.000000 seconds or 1000000 runs through file
<snip>
[Status] Testing complete (time limit)

Statistics:

  Queries sent:         684561
  Queries completed:    684145 (99.94%)
  Queries lost:         416 (0.06%)

  Response codes:       NOERROR 228051 (33.33%), NXDOMAIN 456094 (66.67%)
  Average packet size:  request 33, response 32
  Run time (s):         120.008043
  Queries per second:   5700.826235

  Average Latency (s):  0.029429 (min 0.004785, max 0.338306)
  Latency StdDev (s):   0.006914

@grigi
Copy link

grigi commented Mar 11, 2026

If we want better benchmarking before we ship this PR, we'll need to come up with a better test server that replicates IO based activity (time.sleep(0.01)?).

I think that in reality there will be some variation in e.g. DB responsiveness.
I've been testing with sleep(0.1) and sleep(0.01).
Based on some stats of our prod system, basic DB queries respond in about 20ms - 40ms, and there is no real upper bound. Maybe a sleep(0.025) would be a reasonable approximation for the test.

This PR work is looking pretty good so far 👍

@nhairs
Copy link
Owner Author

nhairs commented Mar 12, 2026

Alright, so I've given this a whirl with random.uniform(0.01, 0.1) - it looks like performance is pretty reasonable:

uv run nserver --server benchmark/sleepy_server.py --host 192.168.20.47 --port 5302 --application threads --workers 12

[Status] Command line: dnsperf -m tcp -s 192.168.20.47 -p 5302 -d ./dnsperf-queries.txt -T 2 -c 20 -n 1000000 -t 10 -q 200 -l 60
[Status] Sending queries (to 192.168.20.47:5302)
[Status] Started at: Thu Mar 12 02:17:27 2026
[Status] Stopping after 60.000000 seconds or 1000000 runs through file
[Status] Testing complete (time limit)

Statistics:

  Queries sent:         19482
  Queries completed:    19482 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 12988 (66.67%), NXDOMAIN 6494 (33.33%)
  Average packet size:  request 33, response 43
  Run time (s):         60.677411
  Queries per second:   321.075004

  Average Latency (s):  0.618849 (min 0.001786, max 0.960726)
  Latency StdDev (s):   0.046382

Connection Statistics:

  Connection attempts:  20 (20 successful, 100.00%)

  Average Latency (s):  0.004926 (min 0.000884, max 0.009478)
  Latency StdDev (s):   0.002386

Increasing the number of workers has a positive impact. I'll note that this computer only has 8 cores / 16 threads.

uv run nserver --server benchmark/sleepy_server.py:server --host 192.168.20.47 --port 5302 --application threads --workers 48 --tcp

[Status] Command line: dnsperf -m tcp -s 192.168.20.47 -p 5302 -d ./dnsperf-queries.txt -T 2 -c 20 -n 1000000 -t 10 -q 200 -l 60
[Status] Sending queries (to 192.168.20.47:5302)
[Status] Started at: Thu Mar 12 02:18:56 2026
[Status] Stopping after 60.000000 seconds or 1000000 runs through file
[Status] Testing complete (time limit)

Statistics:

  Queries sent:         78166
  Queries completed:    78166 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 52111 (66.67%), NXDOMAIN 26055 (33.33%)
  Average packet size:  request 33, response 43
  Run time (s):         60.200058
  Queries per second:   1298.437287

  Average Latency (s):  0.153600 (min 0.001045, max 0.288520)
  Latency StdDev (s):   0.034864

Connection Statistics:

  Connection attempts:  20 (20 successful, 100.00%)

  Average Latency (s):  0.004094 (min 0.001179, max 0.008620)
  Latency StdDev (s):   0.002335

More workers more good right? Not so much, definitely hit the cap of context switching but got pretty good performance.

uv run nserver --server benchmark/sleepy_server.py:server --host 192.168.20.47 --port 5302 --application threads --workers 480 --tcp

[Status] Command line: dnsperf -m tcp -s 192.168.20.47 -p 5302 -d ./dnsperf-queries.txt -T 2 -c 20 -n 1000000 -t 10 -q 200 -l 60
[Status] Sending queries (to 192.168.20.47:5302)
[Status] Started at: Thu Mar 12 02:36:47 2026
[Status] Stopping after 60.000000 seconds or 1000000 runs through file
[Status] Testing complete (time limit)

Statistics:

  Queries sent:         256751
  Queries completed:    256751 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 171168 (66.67%), NXDOMAIN 85583 (33.33%)
  Average packet size:  request 33, response 43
  Run time (s):         60.102198
  Queries per second:   4271.906994

  Average Latency (s):  0.046694 (min 0.000367, max 0.395074)
  Latency StdDev (s):   0.034651

Connection Statistics:

  Connection attempts:  20 (20 successful, 100.00%)

  Average Latency (s):  0.004187 (min 0.001002, max 0.008870)
  Latency StdDev (s):   0.002509

For comparison here's using the direct (single-threaded server).

uv run nserver --server benchmark/sleepy_server.py:server --host 192.168.20.47 --port 5302 --application direct --tcp

[Status] Command line: dnsperf -m tcp -s 192.168.20.47 -p 5302 -d ./dnsperf-queries.txt -T 2 -c 20 -n 1000000 -t 10 -q 200 -l 60
[Status] Sending queries (to 192.168.20.47:5302)
[Status] Started at: Thu Mar 12 02:25:12 2026
[Status] Stopping after 60.000000 seconds or 1000000 runs through file
[Status] Testing complete (time limit)

Statistics:

  Queries sent:         1752
  Queries completed:    1752 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 1168 (66.67%), NXDOMAIN 584 (33.33%)
  Average packet size:  request 33, response 43
  Run time (s):         67.478595
  Queries per second:   25.963789

  Average Latency (s):  7.280939 (min 0.000706, max 8.525610)
  Latency StdDev (s):   1.500055

Connection Statistics:

  Connection attempts:  20 (20 successful, 100.00%)

  Average Latency (s):  0.004463 (min 0.001309, max 0.009217)
  Latency StdDev (s):   0.002521

Also for comparison, 48 workers with UDP.

[Status] Command line: dnsperf -m udp -s 192.168.20.47 -p 5302 -d ./dnsperf-queries.txt -T 2 -c 20 -n 1000000 -t 10 -q 200 -l 60
[Status] Sending queries (to 192.168.20.47:5302)
[Status] Started at: Thu Mar 12 02:33:34 2026
[Status] Stopping after 60.000000 seconds or 1000000 runs through file
[Timeout] Query timed out: msg id 71
<snip>
[Timeout] Query timed out: msg id 107
[Status] Testing complete (time limit)

Statistics:

  Queries sent:         78053
  Queries completed:    78017 (99.95%)
  Queries lost:         36 (0.05%)

  Response codes:       NOERROR 52012 (66.67%), NXDOMAIN 26005 (33.33%)
  Average packet size:  request 33, response 43
  Run time (s):         60.209236
  Queries per second:   1295.764656

  Average Latency (s):  0.149300 (min 0.001825, max 0.564149)
  Latency StdDev (s):   0.035918

Also for comparison 48 workers behind coredns (still 1 nserver instance). Reminder that config without coredns has average 1298 qps.

So somehow we've got better performance than the go program 😱 - that said still might not to want to expose nserver to the internet directly because it's probably more security hardened.

uv run nserver --server benchmark/sleepy_server.py:server --host 127.0.0.1 --port 5301 --application threads --workers 48 --tcp

[Status] Command line: dnsperf -m tcp -s 192.168.20.47 -p 5300 -d ./dnsperf-queries.txt -T 2 -c 20 -n 1000000 -t 10 -q 200 -l 60
[Status] Sending queries (to 192.168.20.47:5300)
[Status] Started at: Thu Mar 12 02:42:36 2026
[Status] Stopping after 60.000000 seconds or 1000000 runs through file
[Status] Testing complete (time limit)

Statistics:

  Queries sent:         31295
  Queries completed:    31295 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 20864 (66.67%), NXDOMAIN 10431 (33.33%)
  Average packet size:  request 33, response 52
  Run time (s):         61.509218
  Queries per second:   508.785529

  Average Latency (s):  0.385765 (min 0.000562, max 1.926708)
  Latency StdDev (s):   0.312798

Connection Statistics:

  Connection attempts:  20 (20 successful, 100.00%)

  Average Latency (s):  0.005124 (min 0.001103, max 0.009882)
  Latency StdDev (s):   0.002563

Same coredns server setup but with UDP queries has fair difference in QPS. Another reason you'd probably want to run coredns or similar in front as well is nserver can (currently) only handle one type of transport at a time.

[Status] Command line: dnsperf -m udp -s 192.168.20.47 -p 5300 -d ./dnsperf-queries.txt -T 2 -c 20 -n 1000000 -t 10 -q 200 -l 60
[Status] Sending queries (to 192.168.20.47:5300)
[Status] Started at: Thu Mar 12 02:55:43 2026
[Status] Stopping after 60.000000 seconds or 1000000 runs through file
[Status] Testing complete (time limit)

Statistics:

  Queries sent:         78365
  Queries completed:    78365 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 52244 (66.67%), NXDOMAIN 26121 (33.33%)
  Average packet size:  request 33, response 52
  Run time (s):         60.207979
  Queries per second:   1301.571674

  Average Latency (s):  0.153215 (min 0.001466, max 0.279039)
  Latency StdDev (s):   0.034108

@nhairs nhairs marked this pull request as ready for review March 12, 2026 05:59
@grigi
Copy link

grigi commented Mar 12, 2026

I have similar results on PowerDNS, it performs way better on UDP than on TCP.
My guess is that DNS over UDP can use shared UDP sockets as the protocol was designed that way, whereas TCP has the connection establishment and cleanup overhead.

I've done some testing on my own and NServer is notably faster and can sustain zero errors on long-term tests.

PowerDNS + remote-http backend is very sensitive to threads on PowerDNS's side, where I would need about 80 threads to keep up with 16 threads on NServer ( 🤯 ) in UDP, but for TCP the performance drops off significantly (similar to NServer single-threaded). It also has comms fails between PowerDNS and the HTTP server under load. (so non-zero errors).

The PowerDNS + remote-ZeroMQ backend is surprisingly single-threaded all the way.

Hosting a DoH server is doable, but the HTTP v2 + TLS overhead is also quite high, and it's basically all custom code (so a lot more to maintain)

I'm very happy that it will be easy to exceed the minimum performance requirements by nearly 2 orders of magnitude with NServer 😄

One last thing. I'm looking at basic DNSSEC support. CoreDNS seems to have some dynamic signing of DNSSEC keys. It claims to use "black lies" for NSEC reponses. I think it's like what Cloudflare does, which would be ok.
Do you have any experience with DNSSEC on coredns+nserver?

@nhairs
Copy link
Owner Author

nhairs commented Mar 12, 2026

I have similar results on PowerDNS, it performs way better on UDP than on TCP. My guess is that DNS over UDP can use shared UDP sockets as the protocol was designed that way, whereas TCP has the connection establishment and cleanup overhead.

If I'm understanding dnsperf correctly, I think it is reusing the TCP sockets and just streaming the events, which is probably about as fast with UDP as long as you're not dropping packets. The setup cost is pretty low (4ms according to the tests above on my local ethernet network).

But yeah there is a lot of overhead with dealing with TCP multiplexing compared to UDP which doesn't need to keep an open socket to manage (you can compare the length of their code in transport.py 😅).

I've done some testing on my own and NServer is notably faster and can sustain zero errors on long-term tests.

PowerDNS + remote-http backend is very sensitive to threads on PowerDNS's side, where I would need about 80 threads to keep up with 16 threads on NServer ( 🤯 ) in UDP, but for TCP the performance drops off significantly (similar to NServer single-threaded). It also has comms fails between PowerDNS and the HTTP server under load. (so non-zero errors).

Honestly kind of wild how well it is performing.

Rather than dealing with locking I've kept the receiving and sending in their own threads (though strictly speaking there might be some race conditions on the connection pool cleanup when a thread is still processing or sending - how good is multithreading 🫠). But that does mean at a certain point adding more workers to NServer won't get any extra performance. Will be interesting to see how well it performs with no GIL.

One last thing. I'm looking at basic DNSSEC support. CoreDNS seems to have some dynamic signing of DNSSEC keys. It claims to use "black lies" for NSEC reponses. I think it's like what Cloudflare does, which would be ok. Do you have any experience with DNSSEC on coredns+nserver?

No experience with DNSSEC unfortunately.


If you wouldn't mind doing some code review it would be appreciated. I'll try get a release candidate published to PyPI tomorrow for you.

@nhairs nhairs merged commit 7b66ad2 into main Mar 13, 2026
4 checks passed
@nhairs nhairs deleted the nhairs-threadpool branch March 13, 2026 01:17
@grigi
Copy link

grigi commented Mar 13, 2026

Sorry, I am down with the flu I didn't get round to do a review yet. I will test the release early next week.

@nhairs
Copy link
Owner Author

nhairs commented Mar 14, 2026

Ah all good, hope you feel better soon 🙂

I was rushing ahead because I'm starting a new job on Monday so wanted to try finish this off. Might need to wait until next weekend to give you enough time 👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Concurrency

2 participants