Rendered at 16:08:39 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
necovek 12 hours ago [-]
It is not really true that DNS is for people only: it is used as an aliasing system, for load balancing, and for caching (with no cache invalidation mechanism other than ahead-of-time TTL setting).
It is used to make entire protocols work (MX records for email, but SRV records are used for much more).
Now, if we do look at the most basic of basic DNS roles — mapping a human readable name to arbitrary set of numbers identifying a machine on the network — we should consider how do we avoid some of the issues while keeping all of the benefits of DNS.
Eg. if we indeed "materialize" machine identifiers, we lose the ability to do virtual hosting (domains not passed in) or fix a problem with just a DNS update (eg. treating load-balancing machines like cattle).
The author jumps immediately to, arguably, ill advised materialization techniques like /etc/hosts, without considering all that DNS does for a complex, real world system and what goes missing.
louwrentius 11 hours ago [-]
- note I was talking about internal infrastructure, not public services
- DNS load balancing is not that important for internal services in most Cases? Would only use it if alternatives won’t work.
- the virtual host issue is really adressed by /etc/hosts, I thought that was obvious, I now regret not explicitly adressing it.
necovek 11 hours ago [-]
The examples you cite (eg. 2021 Facebook outage) have nothing to do with DNS being used for internal infrastructure.
In the other example (Amazon DynamoDB issue), the problem is with dynamically choosing from a large dynamic pool of IP addresses for a service — DNS is but one mechanism to do it. If it wasn't DNS, it could have been something else that did that job that was broken. Even /etc/hosts if it was updated with an empty record.
What I am saying is that your analysis is not defining the problem you want solved exactly, your examples are not backing up your proposal or analysis, and you are ignoring all the things DNS does both for public and private infrastructure. You seem to have some intuition about this adding complexity and thus being a risk (which is true), but you need to do a better job of connecting and analysing real risks and proposed solutions (and their comparative performance).
louwrentius 10 hours ago [-]
I do state in the article that in the examples DNS isn't the root-cause, but the blast radius is very significant. Regardless of the topic of external/internal services, isn't it remarkable that a group of very smart and well-paid people create such circular dependancies?
Yet, I'm not arguing for Facebook or similar size companies to ditch DNS internally. I'm making the argument for much smaller organisations to pause and think where their own risks lie and if it would make sense to cut out DNS to reduce risk. Whatever process you used as an organisation to update DNS in a safe manner, you still use with the alternative solution, that doesn't change.
That said, even an broken update to /etc/hosts is probably easier and faster to recover from than a broken DNS service that everything is tied to and due to TTL caching, can take much longer to resolve.
necovek 7 hours ago [-]
As said, I believe you are simplifying the problem significantly and thus making general claims which do not hold water.
Eg. even if you are DNS based but have direct SSH access to the system which has a query cached and root access on it (you need to manage all this too!), you can temporarily edit /etc/hosts or /etc/resolv.conf to workaround the cached value.
So my suggestion remains to keep working on a better argument and scenario by trying to understand exactly where your intuition applies — but be critical to yourself too, and think through if your alternative has any other cons too.
By doing so, you will likely find why everybody defaults to DNS for a named service registry in a sense.
JackSlateur 6 hours ago [-]
TTL caching
We are talking about 300sec (=5Min), this is never an issue
simonjgreen 10 hours ago [-]
This author has clearly never operated internal infrastructure at scale. The measures proposed in this are home lab grade at best, and require ludicrous levels of precision and overhead for something that changes thousands to tens of thousands of times per day.
And for very specific nit picks, and I can’t believe I’m entertaining this idea enough to ask, but tell me how the new device on the network bootstraps without DNS? And the guest device. And the printer without Ansible support. And the NDI receiver that needs to resolve its host. And how do you resolve split brain resolution for roaming devices? Are you going to publicly address all internal resources now so my laptop keeps working outside the office?
DNS was not created as a random solution looking for a problem…
protocolture 9 hours ago [-]
OP will just reinvent Netbios running over Ansible.
louwrentius 10 hours ago [-]
I wasn't talking about an office environment. I'm talking server-to-server communication. Like all the internal infrastructure to support a web application.
Maybe I should have been more explicit about that.
How a new device bootstraps on the network without DNS? Depends, on the device, but a physical server doesn't need DNS, only PXE boot / TFTP / HTTP as usual and maybe a proxy to access an update server if you don't run one yourself.
JackSlateur 9 hours ago [-]
"How a new device bootstraps on the network without DNS?"
DHCP
Bratmon 13 hours ago [-]
> Instead of configuring domain names that may not resolve, we can just directly inject the appropriate IP address(ess) into configuration files
Because now you've replaced one single point of failure configuration system with caching and TTLs (DNS) with a higher maintenance and much less widely supported one.
bot403 11 hours ago [-]
Not to mention losing load balancing and failover.
throw0101a 5 hours ago [-]
Failover can be done with something like keepalived. VRRP/CARP are a thing.
For LB you'll need something in front of your service to bounce connections around, which is replacing one point of failure (DNS) for another (HAproxy, IPVS). Though I guess you can run the LB stack on your app service servers.
dzr0001 11 hours ago [-]
And making TLS more difficult, especially for HA systems. Guess you would just need one cert for 127.0.0.1 for all local services.
louwrentius 11 hours ago [-]
Certs support ip addresses?
However, /etc/hosts would solve the issue probably, unless I’m missing something
flumpcakes 10 hours ago [-]
What has /etc/hosts got to do with valid TLS certificates? I think that’s a non-sequitur.
louwrentius 10 hours ago [-]
You don't need to setup one cert for 127.0.0.1 as stated by the parent comment.
gfody 12 hours ago [-]
> we'll just use /etc/hosts no DNS required!
this is classic "easy vs. simple" folly, witness how someone too lazy to [learn how to] setup proper DNS for their infrastructure will do 10x the work hacking something "easy"
qmr 10 hours ago [-]
Set up. You set up your setup.
louwrentius 11 hours ago [-]
Serious response: how is templating out /etc/hosts with Ansible not 10x simpler than setting up an additional service that only introduces additional risk?
arter45 10 hours ago [-]
You lose the concept of DNS forwarding. Usually, if your company has example.com, your DNS server is authoritative for example.com, which means it will actually contain (fqdn,ip) entries belonging to example.com, and it will forward requests for other domains to other DNS servers, possibly one DNS server per domain.
If you remove DNS servers from the equation, you need to write down records for other domains, too. This means you have to chase every domain for changes in CDN configuration, hosting provider or ISP migrations, IPv4 to v6 migrations and so on.
You don't have PTR records, which means you can't find out a name from its IP address.
You also miss other features of DNS, like SRV, MX and so on.
More subtly, you lose the ability to control DNS resolution over systems you can't control. If a DNS server says host.example.com is 192.168.0.4, a Windows desktop, a Linux server and your toaster will agree on that (especially if no local cache is enabled, but even then TTLs apply). If for some reason you cannot control a particular machine, you will never get it to consider that new DNS record. This can happen for a lot of reasons.
louwrentius 10 hours ago [-]
It's interesting as I really address all these things in the article. Not explicitly PTR and SRV, MX records, but these aren't essential within your internal infrastructure. No need to look at MX records if I can just straight up point at the SMTP server(s).
And I explicitly argue within the section about egress filtering that allowing systems access to public DNS is a security risk.
Bratmon 50 minutes ago [-]
If you're worried about bad DNS changes causing problems, then you should be terrified about bad Ansible changes.
10 hours ago [-]
ryanshrott 44 minutes ago [-]
Everyone's dunking on /etc/hosts, but I've debugged enough production DNS loops to get the temptation. It's not the right answer, but the impulse isn't crazy either.
ritcgab 1 hours ago [-]
DNS is a federated, read-optimized, geo-replicated key-value store with eventual consistency.
jaredhallen 12 hours ago [-]
Seems like a weird crusade. Pointing everything directly at the IP address might not seem so swell when it's time to upgrade the server or the address has to change for some reason. Sure would be nice to just update the DNS record to point to the new address.
themafia 12 hours ago [-]
> or the address has to change for some reason
One annoying reason is you don't own it/have access through the owner anymore.
> Sure would be nice to just update the DNS record to point to the new address.
EC2. Elastic IPs are easy enough, but, precisely, I would just like to make a Route53 alias for an EC2 instance and not even have to care.
louwrentius 11 hours ago [-]
Proposed solution: update the inventory and run your Ansible playbook/role agains your infrastructure (or subset). I don’t see the issue, to be frank.
XYen0n 10 hours ago [-]
DNS is merely one implementation of service discovery; even without DNS, some other form of service discovery would still be needed.
louwrentius 10 hours ago [-]
Why would some form of service discovery be required? No need to discover things if you can push said information in configuration updates using tools like Ansible, pyinfra, and so on?
protocolture 10 hours ago [-]
How does your convoluted Ansible system know which systems and services to maintain.
If its a list of IP addresses, having a list of ip addresses is a crude service discovery protocol.
Tasking developers (because lets be absolutely clear, the idea of removing DNS from production environments is something only a developer could come up with, no competent engineer would ever raise) with maintaining ordered lists of servers to keep updated is only going to overcomplicate things.
And yes your hosts file is another example of a list.
irjustin 11 hours ago [-]
Hard disagree - only because if you didn't have DNS you would have something else in its place. But, we understand DNS _very_ well.
People, services, machines, etc need to "dial" canonical-somewhere. Whatever does the canonical management is the piece that when it breaks everything breaks.
Doesn't matter if it's DNS, EIP rotation, some HA proxy, whatever. It'll break.
It's actually that DNS is so well understood that it doesn't fail more often.
So no, DNS is for IT Infra.
louwrentius 10 hours ago [-]
> Whatever does the canonical management is the piece that when it breaks everything breaks.
That is absolutely true. I believe that a solution where you provision a text file with an updated ip address or /etc/hosts file is inherently simpler, less risky and easier to recover from, although I admit I don't explicitly state this in the article.
protocolture 10 hours ago [-]
>I believe that a solution where you provision a text file with an updated ip address or /etc/hosts file is inherently simpler, less risky and easier to recover from
You are wrong. Its possible that your confidence in being wrong is due to your inexperience. But you are still wrong.
irjustin 9 hours ago [-]
I'm surprised at this and some of your other responses. It makes me believe you've never managed anything at scale, but then why have such a strong opinion about DNS for infra?
> I believe that a solution where you provision a text file with an updated ip address or /etc/hosts file is inherently simpler
So simple that it doesn't scale beyond a few machines nor outside your org.
davkan 9 hours ago [-]
I’m not sure I’ve seen unanimous agreement in an HN comment section before so that’s nice I guess.
But to address the article in a simple environment dns _just_works_. I’ve never once had an issue with bind. It’s incredibly simple and stable and easy to understand when working with within a small environment without much churn and enables other technologies to operate in an expected way because it’s the standard. ACME, kerberos, sshfp, many more are enabled by DNS. Sure maybe you can kludge some of that back together with hosts but I’d rather not just to replace one of the most stable services that exist.
DNS does start to get more complicated in massive environments but that’s just a reflection of the environment. Using ansible to manage /etc/hosts across hundreds or thousands machines with churn will not be less complicated to manage than dns.
fulafel 11 hours ago [-]
History tip: Using /etc/hosts (or as it was called then, "the HOSTS.TXT file") ran into some problems.
qmr 10 hours ago [-]
Bah storage is cheap these days and we have git let's give it another go
rho138 6 hours ago [-]
That’s not how DNSSEC works… it’s sole purpose is signing - not encryption.
Why is DNS for people only? The article suggests rolling your own DNS with static config, which I could see myself doing in the right scenarios, but you can't always do that. Kinda reminds me of Kubernetes, though instead of /etc/hosts it runs an actual DNS server.
mixdup 13 hours ago [-]
"just use /etc/hosts" is wild. That is effectively just going from one DNS server servicing all of your machines to having bespoke DNS servers individually running on every host. madness
louwrentius 11 hours ago [-]
Why is that madness and not amazing? Isn’t the simplicity beautiful? Managing /etc/hosts with a tool like Ansible?
mixdup 4 hours ago [-]
Why not manage your one single DNS server with a tool like Ansible? Why fragment it and have to manage it on dozens/hundreds/thousands of endpoints instead?
spragl 10 hours ago [-]
/etc/hosts scales like a lead balloon.
For small groups of servers, with limited egress communication, it might nevertheless make sense. And then go for it, by all means. As a general replacement for DNS, not likely.
It is hard to see how Ansible should be simpler than DNS. Maybe if you have worked with Ansible and not DNS, you might think so.
denkmoon 12 hours ago [-]
This is what happens when you take the "it was DNS" meme too seriously. DNS is brilliant. Learn it. If you're really that ideologically opposed to such brilliance, just use the addresses directly. The system described is insane.
linksnapzz 14 hours ago [-]
Counterpoint: DNS isn't used enough; consider replacing sssd/AD with Hesiod.
samrus 12 hours ago [-]
But whats the problem woth using DNS internally? Given the system is already present, and moving away fron it would be effort. Seems like a nitpick
tikhonj 10 hours ago [-]
DNS is a database.
protocolture 10 hours ago [-]
>DNS Is for People
DNS is for Infrastructure, people use infrastructure.
>That got me thinking, why would we use DNS for infrastructure services? It isn't necessary for machine-to-machine communication. Instead of configuring domain names that may not resolve, we can just directly inject the appropriate IP address(ess) into configuration files. It's easy to configure systems with tools like Ansible or pyinfra at scale.
No no no no god no.
"What if we set up a convoluted higher level application solution"
This is going to go wrong more frequently and contain more errors than DNS.
>Fortunately, we still have /etc/hosts, which we can easily provision. Still no DNS service required! This way, we can configure domain names and pretend to use DNS. I also suspect that DNS queries against /etc/hosts are quite responsive.
No thats a horrible idea. Userspace should never be updating your hosts file, users will fall behind on changes and be placed at extreme security risk. Fully half the benefit of UAC on windows is preventing persistence by preventing malicious entities from updating hosts.
>As of today, most network traffic is encrypted by default, or tunneled through an encrypted channel. DNS is - by default - the exception.
DNS is mostly secure now, to the point where its a problem. But thats a vendor issue not a you issue please dont attempt to solve it. If you go full encrypted DNS you generally also get dragged into HTTPS proxying and things of that nature. This does not get better by removing a dynamic protocol for querying names.
>Due to this risk, there is a case to be made, to - at least - not allow systems to query public DNS records. As servers may need to interfact with services on the internet (update servers, APIs, and so on), such access can be facilitated by a proxy server using allow-listed domains.
Attackers use DNS because its versatile and resistant to the very issues you keep confidently presenting. A protocol is not a risk just because hackers use it. Hackers also use HTTPS and other protocols but we arent burning them at the stake.
>That said, I think it's reasonable to explore if DNS can be avoided altogether within the IT infrastructure to increase reliability and robustness.
Its reasonable for people with much better understanding of the infrastructure and protocol to examine these things. This reads like an end user suggesting "what if we deliver websites by hand printed on paper".
jghefner 11 hours ago [-]
> It's easy to configure systems with tools like Ansible or pyinfra at scale.
Tell me that you've never used Ansible at scale without telling me that you've never used Ansible at scale.
louwrentius 11 hours ago [-]
Tell me please what the problem is exactly
JackSlateur 9 hours ago [-]
Please describe how you plan to use ansible to deploy config in ~200k containers, with hundreds of data updates per day
louwrentius 5 hours ago [-]
I would not use containers in the first place as this adds added complexity and overhead in the first place.
JackSlateur 3 hours ago [-]
Please share how to manage ~200k applications without containers
themafia 12 hours ago [-]
> The case against DNS for internal IT infrastructure
In SOHO settings I might actually agree, but, this is where I think site administered and distributed multicast DNS was a missed opportunity.
It is used to make entire protocols work (MX records for email, but SRV records are used for much more).
Now, if we do look at the most basic of basic DNS roles — mapping a human readable name to arbitrary set of numbers identifying a machine on the network — we should consider how do we avoid some of the issues while keeping all of the benefits of DNS.
Eg. if we indeed "materialize" machine identifiers, we lose the ability to do virtual hosting (domains not passed in) or fix a problem with just a DNS update (eg. treating load-balancing machines like cattle).
The author jumps immediately to, arguably, ill advised materialization techniques like /etc/hosts, without considering all that DNS does for a complex, real world system and what goes missing.
- DNS load balancing is not that important for internal services in most Cases? Would only use it if alternatives won’t work.
- the virtual host issue is really adressed by /etc/hosts, I thought that was obvious, I now regret not explicitly adressing it.
In the other example (Amazon DynamoDB issue), the problem is with dynamically choosing from a large dynamic pool of IP addresses for a service — DNS is but one mechanism to do it. If it wasn't DNS, it could have been something else that did that job that was broken. Even /etc/hosts if it was updated with an empty record.
What I am saying is that your analysis is not defining the problem you want solved exactly, your examples are not backing up your proposal or analysis, and you are ignoring all the things DNS does both for public and private infrastructure. You seem to have some intuition about this adding complexity and thus being a risk (which is true), but you need to do a better job of connecting and analysing real risks and proposed solutions (and their comparative performance).
Yet, I'm not arguing for Facebook or similar size companies to ditch DNS internally. I'm making the argument for much smaller organisations to pause and think where their own risks lie and if it would make sense to cut out DNS to reduce risk. Whatever process you used as an organisation to update DNS in a safe manner, you still use with the alternative solution, that doesn't change.
That said, even an broken update to /etc/hosts is probably easier and faster to recover from than a broken DNS service that everything is tied to and due to TTL caching, can take much longer to resolve.
Eg. even if you are DNS based but have direct SSH access to the system which has a query cached and root access on it (you need to manage all this too!), you can temporarily edit /etc/hosts or /etc/resolv.conf to workaround the cached value.
So my suggestion remains to keep working on a better argument and scenario by trying to understand exactly where your intuition applies — but be critical to yourself too, and think through if your alternative has any other cons too.
By doing so, you will likely find why everybody defaults to DNS for a named service registry in a sense.
We are talking about 300sec (=5Min), this is never an issue
And for very specific nit picks, and I can’t believe I’m entertaining this idea enough to ask, but tell me how the new device on the network bootstraps without DNS? And the guest device. And the printer without Ansible support. And the NDI receiver that needs to resolve its host. And how do you resolve split brain resolution for roaming devices? Are you going to publicly address all internal resources now so my laptop keeps working outside the office?
DNS was not created as a random solution looking for a problem…
How a new device bootstraps on the network without DNS? Depends, on the device, but a physical server doesn't need DNS, only PXE boot / TFTP / HTTP as usual and maybe a proxy to access an update server if you don't run one yourself.
DHCP
Because now you've replaced one single point of failure configuration system with caching and TTLs (DNS) with a higher maintenance and much less widely supported one.
For LB you'll need something in front of your service to bounce connections around, which is replacing one point of failure (DNS) for another (HAproxy, IPVS). Though I guess you can run the LB stack on your app service servers.
this is classic "easy vs. simple" folly, witness how someone too lazy to [learn how to] setup proper DNS for their infrastructure will do 10x the work hacking something "easy"
If you remove DNS servers from the equation, you need to write down records for other domains, too. This means you have to chase every domain for changes in CDN configuration, hosting provider or ISP migrations, IPv4 to v6 migrations and so on.
You don't have PTR records, which means you can't find out a name from its IP address.
You also miss other features of DNS, like SRV, MX and so on.
More subtly, you lose the ability to control DNS resolution over systems you can't control. If a DNS server says host.example.com is 192.168.0.4, a Windows desktop, a Linux server and your toaster will agree on that (especially if no local cache is enabled, but even then TTLs apply). If for some reason you cannot control a particular machine, you will never get it to consider that new DNS record. This can happen for a lot of reasons.
And I explicitly argue within the section about egress filtering that allowing systems access to public DNS is a security risk.
One annoying reason is you don't own it/have access through the owner anymore.
> Sure would be nice to just update the DNS record to point to the new address.
EC2. Elastic IPs are easy enough, but, precisely, I would just like to make a Route53 alias for an EC2 instance and not even have to care.
If its a list of IP addresses, having a list of ip addresses is a crude service discovery protocol.
Tasking developers (because lets be absolutely clear, the idea of removing DNS from production environments is something only a developer could come up with, no competent engineer would ever raise) with maintaining ordered lists of servers to keep updated is only going to overcomplicate things.
And yes your hosts file is another example of a list.
People, services, machines, etc need to "dial" canonical-somewhere. Whatever does the canonical management is the piece that when it breaks everything breaks.
Doesn't matter if it's DNS, EIP rotation, some HA proxy, whatever. It'll break.
It's actually that DNS is so well understood that it doesn't fail more often.
So no, DNS is for IT Infra.
That is absolutely true. I believe that a solution where you provision a text file with an updated ip address or /etc/hosts file is inherently simpler, less risky and easier to recover from, although I admit I don't explicitly state this in the article.
You are wrong. Its possible that your confidence in being wrong is due to your inexperience. But you are still wrong.
> I believe that a solution where you provision a text file with an updated ip address or /etc/hosts file is inherently simpler
So simple that it doesn't scale beyond a few machines nor outside your org.
But to address the article in a simple environment dns _just_works_. I’ve never once had an issue with bind. It’s incredibly simple and stable and easy to understand when working with within a small environment without much churn and enables other technologies to operate in an expected way because it’s the standard. ACME, kerberos, sshfp, many more are enabled by DNS. Sure maybe you can kludge some of that back together with hosts but I’d rather not just to replace one of the most stable services that exist.
DNS does start to get more complicated in massive environments but that’s just a reflection of the environment. Using ansible to manage /etc/hosts across hundreds or thousands machines with churn will not be less complicated to manage than dns.
> https://www.rfc-editor.org/info/rfc9364/#name-dnssec-core-do...
For small groups of servers, with limited egress communication, it might nevertheless make sense. And then go for it, by all means. As a general replacement for DNS, not likely.
It is hard to see how Ansible should be simpler than DNS. Maybe if you have worked with Ansible and not DNS, you might think so.
DNS is for Infrastructure, people use infrastructure.
>That got me thinking, why would we use DNS for infrastructure services? It isn't necessary for machine-to-machine communication. Instead of configuring domain names that may not resolve, we can just directly inject the appropriate IP address(ess) into configuration files. It's easy to configure systems with tools like Ansible or pyinfra at scale.
No no no no god no.
"What if we set up a convoluted higher level application solution"
This is going to go wrong more frequently and contain more errors than DNS.
>Fortunately, we still have /etc/hosts, which we can easily provision. Still no DNS service required! This way, we can configure domain names and pretend to use DNS. I also suspect that DNS queries against /etc/hosts are quite responsive.
No thats a horrible idea. Userspace should never be updating your hosts file, users will fall behind on changes and be placed at extreme security risk. Fully half the benefit of UAC on windows is preventing persistence by preventing malicious entities from updating hosts.
>As of today, most network traffic is encrypted by default, or tunneled through an encrypted channel. DNS is - by default - the exception.
DNS is mostly secure now, to the point where its a problem. But thats a vendor issue not a you issue please dont attempt to solve it. If you go full encrypted DNS you generally also get dragged into HTTPS proxying and things of that nature. This does not get better by removing a dynamic protocol for querying names.
>Due to this risk, there is a case to be made, to - at least - not allow systems to query public DNS records. As servers may need to interfact with services on the internet (update servers, APIs, and so on), such access can be facilitated by a proxy server using allow-listed domains.
Attackers use DNS because its versatile and resistant to the very issues you keep confidently presenting. A protocol is not a risk just because hackers use it. Hackers also use HTTPS and other protocols but we arent burning them at the stake.
>That said, I think it's reasonable to explore if DNS can be avoided altogether within the IT infrastructure to increase reliability and robustness.
Its reasonable for people with much better understanding of the infrastructure and protocol to examine these things. This reads like an end user suggesting "what if we deliver websites by hand printed on paper".
Tell me that you've never used Ansible at scale without telling me that you've never used Ansible at scale.
In SOHO settings I might actually agree, but, this is where I think site administered and distributed multicast DNS was a missed opportunity.
Now it’s for machines