Handling Word Lists

When it comes to brute-forcing, word lists make the world go ‘round. Some people spend days, weeks, months and even years refining their word lists, from most common, to categorising by company. These lists are often built by the community for the community or are a hodge-podge of maybes. Either way crafting an efficient word-list is a loveless labour.

For our application we are going to have two types of list, an in-built one and the ability to handle one provided by the user.

Now keep in mind because we haven’t done any optimisation in the way of threading for this application make sure you keep your word lists small.

We have included 100 common subdomains in the file below, however, in the future you may want a bigger more complete word list for brute-forcing subdomains, for this we have recommended a few resources to get you started:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
www
mail
ftp
localhost
webmail
smtp
pop
ns1
webdisk
ns2
cpanel
whm
autodiscover
autoconfig
m
imap
test
ns
blog
pop3
dev
www2
admin
forum
news
vpn
ns3
mail2
new
mysql
old
lists
support
mobile
mx
static
docs
beta
shop
sql
secure
demo
cp
calendar
wiki
web
media
email
images
img
www1
intranet
portal
video
sip
dns2
api
cdn
stats
dns1
ns4
www3
dns
search
staging
server
mx1
chat
wap
my
svn
mail1
sites
proxy
ads
host
crm
cms
backup
mx2
lyncdiscover
info
apps
download
remote
db
forums
store
relay
files
newsletter
app
live
owa
en
start
sms
office
exchange
ipv4

Other Word Lists

  • dnscan a Python word list-based sub-domain scanner has a number of lists included including top 100, 500, 1000, 10, 000 as well as a few others

  • all.txt is a GitHub gist that claims to have all word lists from every DNS enumeration tools. It has over 67, 627 lines - May contain crude and offensive entries

  • SecLists is a collection of multiple types of lists used during security assessments including a list of common subdomains which can be found under Discovery/DNS.

For the purposes of testing we will be testing snakecharmingforbeginners.com a web-server set up specifically for this training that is running a default Apache configuration, so there is nothing exciting there sorry!

A Recipe for Handling Word Lists

In the previous section we created an Argument Parser that we can use to specify the domain we wish to target and the word list to use. In this section we will build on this and learn how to handle and process text files.

Concepts Covered

  • Exception handling

  • File handling

  • String manipulation

  • Type Hints

  • NameSpace object

  • Logging

  • NoneType

Ingredients

  • One new function in the file you used in the previous section called main

  • One liberal handful of exception handling

  • Two text files, one populated with the word list included above, another with approximately 50 subdomains, one sub-domain per line

  • One dash of list handling

Method

Create your main function. Between the brackets put the following runtime_options: argparse.Namespace, in Python this is called a positional parameter. Unlike named parameters which we covered in the earlier section, the order we supply these parameters is important. The value after the colon (:) argparse.Namespace is a “Type Hint”.

Note

Type Hints are useful because Python is a “dynamic language” which can make inferring the type of an object being used difficult.

Note

Duck Typing is a computer programming concept where an object can be used in any context up until it is used in a way that is not supported. To test whether a language supports duck typing, we apply what is aptly named the duck test - “If it walks like a duck and it quacks like a duck, then it must be a duck”. Duck typing is different to normal typing where an Objects ability is determined by its type rather than the presence of certain methods and properties.

The first thing we want to do is check if the user has provided a path to a custom word list, or if we will be using the in-built list. To access values in a Namespace we can use the following syntax runtime_options.name_of_value.

Tip

Many programming languages have the concept of none – in C, JavaScript and Java this is null in Ruby this value is represented by nil. In Python this is the NoneType or None when programming. It is the lack of assigned value.

Note

While learning Python you may come across the concepts of “Truthy” and “Falsy”. Truthy/Falsy are values of convenience for situations where you need to test whether a statement is binary in nature (True or False) instead of writing more complex statements.

For example to check if a list (array in other languages) is empty you may use the following traditional syntax:

if len(my_list) != 0:
    print("List is not empty")

In Python you can use the more compact syntax:

if my_list:
    print("List is not empty")

With this information you should be able to check if the path to the word list is specified or is None, if it is None remember to set something to use the in-built word list.

Now that we know where the path to our word list is we can try to open it. The function we use to do this is open() which will return a file object. open() can be used in one of two ways, with, or without a mode, by default the mode to access files is read-only:

my_file = open('filename')

If you want to read and write, write, or append to a file you can use r+, w or w+ respectively.

Caution

When using the write mode (w) it is important to remember if a file exists with the same name, the existing file will be erased and overwritten with the newer version.

When handling it is considered good practise to use the with keyword when dealing with file objects. This is because using the with keyword will ensure the file is closed properly even if an exception is raised.

Tip

If you are not using the with keyword, you should ensure you call the close() function to close the file and free up system resources used by the application. If you do not Python’s garbage collector will eventually destroy the object and close the open file for you, but the file may stay open for a while.

Additionally, different Python implementations will do this clean-up at different times meaning your application could continue to use resources for an extended period of time.

To access the files contents we can use a variety of functions read(size) will read a quantity of data and returns the value as a string. The size parameter is completely optional, and if not supplied the whole file will be read and returned.

Caution

Python doesn’t care if you use read() and the file is two, three or four times the size of your computer’s memory – so before reading in large files ensure your (and your user’s) computer can handle it.

readline() reads a single line from a file, newline characters (\n) are left at the end of the string and only omitted for the last line of the file, if the file doesn’t end in a new line. The benefit of this is the returned value is unambiguous - if readline() returns an empty string, you have reached the end of the file. If a \n is returned, a string containing only an empty line it is probably a line break.

If you want to read each line of the file you can use list(file) or readlines(). Your job in this step is to take the word list obtained in the previous step and load it into your application at the end of the function print the word list.

During the previous step you probably faced a number of inconvenient problems, missing files, wacky formats which is a great segue into the next part exception handling. Now if you have ever used software you will know that things do not always go the way we intend so sometimes we need to be extraordinary and handle those conditions.

Note

In Python we follow the motto “ask for forgiveness, not permission” when handling data, variables, anything. Meaning that before we check if the file is valid we try opening it. If it doesn’t because of an exception, we ask for forgiveness (by handling said exception).

In Python we use try.. except.. to handle exceptions. In the try block we write what we are trying to do, open a file, access a value in a dictionary etc. If that fails, we catch the problem in the except block. For example:

try:
    with open(my_file) as new_file:
        lines = new_file.readlines()
except FileNotFoundError:
    sys.exit(2)

In the example above we attempt to open a file and read the lines into a list, if the file cannot be found we can catch that exception and terminate the program before we encounter more issues, however exception handling doesn’t always need to lead to termination, it can also log a warning or set a variable.

It’s important to remember that we must specify which exceptions we are going to catch, and to order them from the most specific to the least, something that admittedly when you’re getting started can be difficult to workout. For example when handling a file in Python 3+ we handle FileNotFound then IOError then Exception however, generally speaking it is better to extend the base Exception class instead as it is too broad.

Tip

Python exception handling doesn’t just end with try... except... this is just the tip of the iceberg. The else clause also exists and allows you to run code if and only if the try clause does not raise any exceptions, for example:

try:
    with open(my_file) as new_file:
        lines = new_file.readlines()
except FileNotFoundError:
    load_backup_config()
else:
    parse_user_config()

You also have the finally clause, it can be used to define clean-up actions that must be executed under all circumstances, whether an exception has occurred or not:

try:
    with open(my_file) as new_file:
        lines = new_file.readlines()
except FileNotFoundError:
    load_backup_config()
else:
    parse_user_config()
finally:
    clean_up_artifacts()
    print("Thanks for using our software! Goodbye")

You may have noticed when you were printing the word list you have some inconvenient trailing characters like \n. These are normally not seen when you are reading a file in a text based application, but these little beasties will become a common occurrence when doing text processing.

To remove these before our print() statement we can quickly strip them out using the aptly named strip() function. This function returns a copy of the string where all the characters provided have been stripped from the beginning and end of the original string. The default behaviour is to strip white space characters (like a space).

str = "88888888this is string example....wow!!!8888888";
print str.strip('8);

Result:

this is string example....wow!!!

Now you will have noticed I have bolded some keywords in that previous paragraph, because these are important things to remember. Let’s address them now.

  • The strip() function returns a copy of the original string you need to assign the new value to a variable. This can be a new variable, or you can over-ride the old one. Just remember simply using this function will not modify the original string.

  • When you are stripping characters from a string, sometimes you want to strip characters on the left, sometimes only those on the right and sometimes there is an easy solutions to these problems! (This is one of those times).

lstrip

lstrip() is an alternative to strip() that returns a copy of the string where all characters have been stripped from the beginning of the string, for example:

str = "88888888this is string example....wow!!!8888888";
print str.lstrip('8')
Result
this is string example....wow!!!8888888

rstrip

rstrip() is an alternative to strip() that returns a copy of the string where all characters have been stripped from the end of the string, for example:

str = "88888888this is string example....wow!!!8888888";
print str.rstrip('8')
Result
88888888this is string example....wow!!!

Using a for loop you will be able to modify the values in the list and replace them, for more advanced users, using a list comprehension to achieve the same goal.

A for loop can be created by using the syntax for x in y where x is the name of the thing you are currently looking at and y is a collection of things, not to be confused with the Python collection module which is made up of high-performance container datatypes.

In this case, our collection is being stored as a list.

Important

In the context of this training, when we refer to a collection we are referring to a group of things.

It is considered Pythonic to name your list as if it were a collection of things, so if you had a number of books you may call your list books, this would mean your x value should be called something like book this is so when you read the code the context is implied, “I am looking at a book in a list of books”.

A for loop can then be created using the following syntax for book in books:.

List comprehensions do the exact same thing as a normal for loop however, provide a concise way to create lists meaning less code. They however, can sometimes become difficult to understand so while it is considered more Pythonic to use list comprehensions, sometimes being more verbose is easier on the programmer.

A list comprehension can be created using this general guide new_list = [expression(i) for i in old_list if filter(i)].

By now you should have an application that takes a file, whether that be user supplied, or built-in and print it out. The final step in this section is to add some logging.

An Incantation for Application Truths (i.e. Logging)

Logging allows us to track events that happen during runtime, we add these calls to our code to determine, when events happened and what may have caused problems during runtime. These events are classified into importance, or severity to the developer or the user. These can vary depending on the built-in configuration however are customisable. For today we will just be working with the default severities provided by Python.

But Buffy does this mean we need to log every activity? Heck. to. the. no. The Python documentation provides a handy table to determine if, and when you should do logging (reproduced below). I personally tend to blur this line a little bit by configuring my logger to display logging to the console as well as saving it to a file, this way the end-user can determine how much information they care about (but that’s a story for another time).

When to use Logging

Task you are performing

The best tool for the task

Display console output for ordinary usage of a command line script or program

print()

Report events that occur during normal operation of a program (e.g. for status monitoring or fault investigation)

logging.info() (or logging.debug() for very detailed output for diagnostic purposes)

Issue a warning regarding a particular runtime event

warnings.warn() in library code if the issue is avoidable and the client application should be modified to eliminate the warning

Report an error regarding a particular runtime event

Raise an exception

Report suppression of an error without raising an exception (e.g. error handler in a long-running server process)

logging.error(), logging.exception() or logging.critical() as appropriate for the specific error and application domain warning

Python provides five default levels of severity, these are standard levels and their applicability are described below:

Level

When it’s used

DEBUG

Detailed information, typically of interest only when diagnosing problems.

INFO

Confirmation that things are working as expected.

WARNING

An indication that something unexpected happened, or indicative of some problem in the near future (e.g. ‘disk space low’). The software is still working as expected.

ERROR

Due to a more serious problem, the software has not been able to perform some function.

CRITICAL

A serious error, indicating that the program itself may be unable to continue running.

The default logger is configured to record events classified as a WARNING or higher, meaning ERROR and CRITICAL will also be recorded. Events can be recorded in several different ways. In more advanced systems these may be piped to syslog so the device can send events to a centralised server, however, more often than not, local logging to the console or to a text file is sufficient.

A Simple Logger Example

import logging

# will print a message to the console
logging.warning('Look out for the Cobra!')

# will not print anything
logging.info('The sky is blue.')

This will produce the following output:

WARNING:root:Look out for the Cobra!

As mentioned previously, because we have not configured the logger, it will by default only log WARNING events and higher meaning the INFO event is suppressed.

Logging to a File

It’s very common to want to hold onto events that occur during the runtime of an application which is what we will cover next, logging to a file. There are lots of different options available when configuring loggers and we won’t go through them today, because that alone could be its own training, but you can find more details available in the Further Reading section down below.

import logging

# Configure the logger to store events in a file named "example.log", logging events DEBUG and higher
logging.basicConfig(filename='example.log', level=logging.DEBUG)

logging.debug('This is a debug message that contains important debugging information and will be saved to example.log')

logging.info('This is an informational message that contains information and will be saved to example.log')

logging.warning('This is a warning. You should be careful now.')

If you copy and run this code, you should find a new file called example.log inside should look something like:

DEBUG:root:This is a debug message that contains important debugging information and will be saved to example.log
INFO:root:This is an informational message that contains information and will be saved to example.log
WARNING:root:This is a warning. You should be careful now.

Using the two examples above, you should be able to now configure a logger for your application and have messages saved to a file.

Optional

  • Using the Python in-build Comma Separated Values (CSV) library add the option to use a CSV file instead of a Plain Text file. See if you can make your application detect if it is Plain Text or a CSV.

  • Archive your log files in a directory called logs saving each new log file with the time, date, month and year - this should be in ISO 8601 Notation (yyyy-mm-dd hh:mm:ss).