Brute Forcing Subdomains¶

About the Domain Name System¶

Before we talk about enumerating subdomains, we need to have a chat about DNS.

So long story short, DNS is the phone book of the Internet. When you browse to snakecharmingforbeginners.com part of the network connection is turning this into an IP address so your browser can load the resources.

Each device connected to the Internet has a unique IP address which is used to find the device, this is referred to as a public IP address and is different to the IP address you may see on your computer which is a private IP address, this address is a unique identifier for any device behind your router.

The following ranges are reserved by IANA for use as private IP addresses:

10.0.0.0 - 10.255.255.255
172.16.0.0 - 172.31.255.255
192.168.0.0 - 192.168.255.255

So excluding the above ranges, public IP addresses range from 1... to 191.... All of the addresses starting with 192 are not registered for public use, which means they can only be used as a private IP address and is often one of the most commonly used private spaces.

To find out your public IP address you can Google “What’s my IP address” or using Python you can use the following:

import urllib.request

external_ip = urllib.request.urlopen('https://ident.me').read().decode('utf8')

print(f'Your public IP address is: {external_ip}')

When talking about brute forcing and enumeration there a number of records you want to take not of, these are listed in the table below with a description of the records job:

DNS Record	Usage
A Record	Also known as a DNS Host record, it stores a hostname and its corresponding IPv4 addresses
AAAA Record	The same as an A Record but for IPv6
A Record	Also known as a DNS Host record, it stores a hostname and its corresponding IPv4 addresses
CNAME Record	Can be used to alias a hostname to another hostname When a DNS client requests a record that contains a CNAME, which points to another hostname, the DNS resolution process is then repeated with the new hostname
MX Record	Specifies an SMTP email server for the domain, used to route outgoing emails to an email server

Understanding Subdomains¶

So in the DNS hierarchy, a subdomain is a domain that is a part of another main domain. For example, tweetdeck.twitter.com where tweetdeck is a subdomain of twitter.com.

Depending on the application, a record inside the domain, or subdomain, the subdomain may point to a machine cluster. While this can vary on the configuration of the system, a common occurrence of this is the usage www.example.com to point to cluster one, www2.example.com to point to cluster two etc.

Some domains may also host their nameservers as ns1.example.com, ns2.example.com etc. However these are not typically external resources and may not typically show up in search engine results.

Subdomain enumeration is therefore part of the reconnaissance phase, where you are looking for other domains that may reveal other in scope domains which helps increase the chances of finding vulnerabilities and issues, you may uncover long forgotten applications, or applications that were not supposed to see the light of day.

A Recipe for Building a Subdomain Enumerator¶

Concepts Covered

Using requests
String concatenation
Using named parameters

Ingredients¶

One dash of requests
One liberal handful of exception handling
A pinch of list enumeration

Method¶

Now that we have processed our word list we are going to start with the simplest type of subdomain brute-forcing. The web kind.

Normally when we talk about subdomain brute forcing we also talk about enumeration, however enumeration relies on drawing from more sources which inherently makes it more complex as you are dealing with search engine APIs DNS certificates.

These are all things that can be built in later, so first we will start out with the basics.

As we mentioned previously, a subdomain in its simplest form can be like tweetdeck.twitter.com however in theory this subdivision can go 127 levels deep (though that limit is not in any published RFC so mileage with this sort of system will vary, however it is not common to see that many levels of subdivision.

The first thing we will want to do is iterate over the word list we processed in the previous step.

Now we can start queries.

Requests is a third-party Python library created by Kenneth Reitz however significant contributions have also been made by Cory Benfield, Ian Stapleton Cordasco and Nate Prewitt. It’s a fantastic alternative to urllib as it provides a much higher level interface to use, making it easier for quick jobs.

It can be installed by running pipenv install requests.

The syntax, for those who have worked with web requests before is very straight forward:

# where r is the response from the request made
>>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
# you can also gather other important information such as status code
>>> r.status_code
200
# headers
>>> r.headers['content-type']
'application/json; charset=utf8'
# encoding
>>> r.encoding
'utf-8'
# the response body as text output
>>> r.text
u'{"type":"User"...'
# the response body as JavaScript Object Notation (JSON)
>>> r.json()
{u'private_gists': 419, u'total_private_repos': 77, ...}

Using the first example above, we want to query the domain the user supplied with each word in our word list appended at the start, remembering to place a . in-between.

This can be done in a number of ways:

Python supports string concatenation using + operator:
word + "." + runtime_options.domain
We can also use the string format() function too for concatenation:
"{}.{}".format(word, runtime_options.domain)
In Python 3.6 and higher (that’s us) you can also use f-strings (my personal favorite):
f"{word}.{runtime_options.domain}"

Each has its benefits and drawbacks, however we won’t get into them here.

So now, you should have get request that is (almost) ready to go. If you have already run your code, you may have noticed a problem. Requests raises a MissingSchema exception.

This is because on our URL in its current form, we don’t specify the protocol we are using, should it be made using http or https? You have a few options here:

Hard code the schema by appending http:// or https:// to the start of your URL (fastest)
Adding an option into our argument parser that allows the user to pick if the queries are made using http:// or https:// (flexible, but assumes only web based)
Forcing the user to supply the full URL, including the schema and then splitting it up so you can inject your word value between the schema and the domain. (most flexible, but more complex)

Any option you pick right now is fine, you can always change and update your code later.

Once you have picked how you plan to handle the schema, you can try running your code. You may notice however your code never gets past the first entry.

This happens because in our request we don’t specify the how long we should wait before determining that the server is unavailable – this can be affected by a number of things, such as network speed, the server actually being unavailable to serve content because it is down, doesn’t exist or some other reason.

Creating too many open connections and never closing them can also be problematic for servers, and can cause a DoS often referred to as a Slowloris.

So how can we fix it?

Requests is handy that way and have an optional timeout argument that can be specified. Again, how we provide this can be done in a number of ways:

Hard code the number of seconds to wait before we timeout. (Faster)
Add an option that can be specified so the user can account for slow network connections or servers that may take longer to respond. (More flexible)

So now we have the timeout problem fixed! However, something lurks deeper. After running it again, you will notice your application throwing a requests.exceptions.ConnectTimeout this is how the Requests library handles a connection time out, this is because depending on what you are doing a connection timeout may mean your application cannot run so you will need to handle this yourself.

If you recall Python’s motto is “Ask for forgiveness, not permission” which is what we are doing here, we are attempting to connect to a server without knowing if the server is up or even available.

The syntax for exception handling is:

try:
    with open(my_file) as new_file:
        lines = new_file.readlines()
except FileNotFoundError:
    sys.exit(2)

Run the application again and ensure all exceptions are handled and the output is as expected.

Once everything is working as expected you are done you have the simplest form of a subdomain brute forcer. Depending on the time available and how much you want to develop your tool there are a number of improvements you can make:

Directly querying DNS and checking for records
Querying SSL/TLS certificates for the domains the certificate covers
Using search engines or public resources to query for subdomains, some good sources include:
- Google
- Bing
- Yahoo
- Baidu: a Chinese search engine
- Virus Total: provides extensive information in addition to observed subdomains, which is a list of all subdomains it knows about.
- DNS Dumpster: a free domain research tool that can discover hosts related to a domain
- Censys.io: a security based search engine, the free tier provides 250 queries per month
- SearchDNS: allows you to query subdomains gives a domain name
- Shodan: an infrastructure based spider with an associated information caching database

While many of these are created for the purpose of subdomain enumeration, others such as Google Bing, Yahoo and Baidu index subdomains because that is part of their job.

Keep in mind these techniques will provide more accurate results, they can be much more involved to implement.