Conversation
|
Hi @grigi - I've here's a rough draft with multithreading on the workers if you'd like to give it a try and see if we get better performance. |
|
I had a quick look and using the I'm starting to think I'm approaching this problem the wrong way. |
|
I tried the fixes you did during me testing this, and at a concurrency of 100 is now seems much more reliable. |
So I've noticed this when doing some of my local benchmarking (against domain-park which uses nserver). I've noticed that for TCP, tests work at first then suddenly stop. UDP seems to work (but eventually starts dropping packets). It makes no difference if I'm using coredns in front locally or not. Yet when I run the same test against the public version of domain-park.org, I don't seem to see the same connection drops. I suspect that there is something funky happening with how many TCP connections you can create or receive locally (potentially around file descriptors?). |
|
Yes, port exhaustion. For UDP it's almost worse as if you create a new socket to do a DNS lookup (the default in many stub clients) you'll exhaust ports much faster than TCP as instead of a short default 20-second cooldown for a confirmed closed flow, the IP spec says to use a longer cooldown (typically 20 minute) of a n unconfirmed closed flow. Luckily DNS has been designed to be stateless UDP with a response stating which question it's answering, so the onus is on the client to match it up with the data queried. Unfortunately, nearly nobody does that other than dedicated DNS server software. And if you have a NAT in the way they will expire the flows as timers dictate or their buffers fill up and they end up forgetting half the flow (so e.g. requests get received, but responses get null-routed). It's a pain 😞 |
|
I don't really have a reason as to why it works better over the internet other than maybe latency or router strategies masking the issue. I have found the same though. |
|
So I'm wondering if it's testing tool related (especially when on the same host). I tend to use two different tools for my benchmarking. jedisct1/dnsblast - appears work reliably when testing up to 1000rps. Above this I start to run into "too many open files" errors. I'll note that it only tests with UDP. sandeeprenjith/dnsblast - sends a few hundred requests before becoming unresponsive during local testing. This is the same on both UDP and TCP. However when testing against domain-park.org it is successful. I think I'd want to test it between two hosts on my local network to see if that helps. Though it's pretty late here so might not get to it for a few days. |
|
All that said, I'm hoping the concurrency can hit your targets regardless of our benchmarking issues. If this looks promising for you'll I'll keep on cleaning it up next week. |
Might very well be. I've been using DNSPerf for years now, it supports udp/tcp/dot(tls)/doh(http2). I've just managed to do 10,000,000 (10 million) DNS queries to this threaded system where I simulated 10ms IO latency over UDP locally: |
Uusually presents as hangs during shut down. Implemented by making `Transport.receive_message` have timeout
e9bd5df to
80810c9
Compare
|
So I've setup dnsperf on another host on my network and run it against a test server on my laptop. It runs successfully - which means that there is definitely problems trying to benchmark TCP on the same host. I'll note that this isn't super indicative of multi-threaded performance per se as the server instance is just returning static response (i.e. no backend IO). If we want better benchmarking before we ship this PR, we'll need to come up with a better test server that replicates IO based activity ( UDP server with same number of threads for comparison |
I think that in reality there will be some variation in e.g. DB responsiveness. This PR work is looking pretty good so far 👍 |
|
Alright, so I've given this a whirl with
Increasing the number of workers has a positive impact. I'll note that this computer only has 8 cores / 16 threads.
More workers more good right? Not so much, definitely hit the cap of context switching but got pretty good performance.
For comparison here's using the direct (single-threaded server).
Also for comparison, 48 workers with UDP. Also for comparison 48 workers behind coredns (still 1 nserver instance). Reminder that config without coredns has average 1298 qps. So somehow we've got better performance than the go program 😱 - that said still might not to want to expose nserver to the internet directly because it's probably more security hardened.
Same coredns server setup but with UDP queries has fair difference in QPS. Another reason you'd probably want to run coredns or similar in front as well is nserver can (currently) only handle one type of transport at a time. |
|
I have similar results on PowerDNS, it performs way better on UDP than on TCP. I've done some testing on my own and NServer is notably faster and can sustain zero errors on long-term tests. PowerDNS + remote-http backend is very sensitive to threads on PowerDNS's side, where I would need about 80 threads to keep up with 16 threads on NServer ( 🤯 ) in UDP, but for TCP the performance drops off significantly (similar to NServer single-threaded). It also has comms fails between PowerDNS and the HTTP server under load. (so non-zero errors). The PowerDNS + remote-ZeroMQ backend is surprisingly single-threaded all the way. Hosting a DoH server is doable, but the HTTP v2 + TLS overhead is also quite high, and it's basically all custom code (so a lot more to maintain) I'm very happy that it will be easy to exceed the minimum performance requirements by nearly 2 orders of magnitude with NServer 😄 One last thing. I'm looking at basic DNSSEC support. CoreDNS seems to have some dynamic signing of DNSSEC keys. It claims to use "black lies" for NSEC reponses. I think it's like what Cloudflare does, which would be ok. |
If I'm understanding dnsperf correctly, I think it is reusing the TCP sockets and just streaming the events, which is probably about as fast with UDP as long as you're not dropping packets. The setup cost is pretty low (4ms according to the tests above on my local ethernet network). But yeah there is a lot of overhead with dealing with TCP multiplexing compared to UDP which doesn't need to keep an open socket to manage (you can compare the length of their code in
Honestly kind of wild how well it is performing. Rather than dealing with locking I've kept the receiving and sending in their own threads (though strictly speaking there might be some race conditions on the connection pool cleanup when a thread is still processing or sending - how good is multithreading 🫠). But that does mean at a certain point adding more workers to NServer won't get any extra performance. Will be interesting to see how well it performs with no GIL.
No experience with DNSSEC unfortunately. If you wouldn't mind doing some code review it would be appreciated. I'll try get a release candidate published to PyPI tomorrow for you. |
|
Sorry, I am down with the flu I didn't get round to do a review yet. I will test the release early next week. |
|
Ah all good, hope you feel better soon 🙂 I was rushing ahead because I'm starting a new job on Monday so wanted to try finish this off. Might need to wait until next weekend to give you enough time 👌 |
ThreadsApplicationand expose on the CLI using--application.reprof logged objectsFixes #13
Test Plan