Handling Word Lists¶
When it comes to brute-forcing, word lists make the world go ‘round. Some people spend days, weeks, months and even years refining their word lists, from most common, to categorising by company. These lists are often built by the community for the community or are a hodge-podge of maybes. Either way crafting an efficient word-list is a loveless labour.
For our application we are going to have two types of list, an in-built one and the ability to handle one provided by the user.
Now keep in mind because we haven’t done any optimisation in the way of threading for this application make sure you keep your word lists small.
We have included 100 common subdomains in the file below, however, in the future you may want a bigger more complete word list for brute-forcing subdomains, for this we have recommended a few resources to get you started:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | www
mail
ftp
localhost
webmail
smtp
pop
ns1
webdisk
ns2
cpanel
whm
autodiscover
autoconfig
m
imap
test
ns
blog
pop3
dev
www2
admin
forum
news
vpn
ns3
mail2
new
mysql
old
lists
support
mobile
mx
static
docs
beta
shop
sql
secure
demo
cp
calendar
wiki
web
media
email
images
img
www1
intranet
portal
video
sip
dns2
api
cdn
stats
dns1
ns4
www3
dns
search
staging
server
mx1
chat
wap
my
svn
mail1
sites
proxy
ads
host
crm
cms
backup
mx2
lyncdiscover
info
apps
download
remote
db
forums
store
relay
files
newsletter
app
live
owa
en
start
sms
office
exchange
ipv4
|
Other Word Lists
dnscan a Python word list-based sub-domain scanner has a number of lists included including top 100, 500, 1000, 10, 000 as well as a few others
all.txt is a GitHub gist that claims to have all word lists from every DNS enumeration tools. It has over 67, 627 lines - May contain crude and offensive entries
SecLists is a collection of multiple types of lists used during security assessments including a list of common subdomains which can be found under
Discovery/DNS
.
For the purposes of testing we will be testing snakecharmingforbeginners.com
a web-server set up specifically for this training that is running a default Apache configuration, so there is nothing exciting there sorry!
A Recipe for Handling Word Lists¶
In the previous section we created an Argument Parser that we can use to specify the domain we wish to target and the word list to use. In this section we will build on this and learn how to handle and process text files.
Concepts Covered
Exception handling
File handling
String manipulation
Type Hints
NameSpace
objectLogging
NoneType
Ingredients¶
One new function in the file you used in the previous section called
main
One liberal handful of exception handling
Two text files, one populated with the word list included above, another with approximately 50 subdomains, one sub-domain per line
One dash of
list
handling
Method¶
Create your
main
function. Between the brackets put the followingruntime_options: argparse.Namespace
, in Python this is called a positional parameter. Unlike named parameters which we covered in the earlier section, the order we supply these parameters is important. The value after the colon (:
)argparse.Namespace
is a “Type Hint”.
Note
Type Hints are useful because Python is a “dynamic language” which can make inferring the type of an object being used difficult.
Note
Duck Typing is a computer programming concept where an object can be used in any context up until it is used in a way that is not supported. To test whether a language supports duck typing, we apply what is aptly named the duck test - “If it walks like a duck and it quacks like a duck, then it must be a duck”. Duck typing is different to normal typing where an Objects ability is determined by its type rather than the presence of certain methods and properties.
The first thing we want to do is check if the user has provided a path to a custom word list, or if we will be using the in-built list. To access values in a Namespace
we can use the following syntax runtime_options.name_of_value
.
Tip
Many programming languages have the concept of none
– in C, JavaScript and Java this is null
in Ruby this value is represented by nil
. In Python this is the NoneType
or None
when programming. It is the lack of assigned value.
Note
While learning Python you may come across the concepts of “Truthy” and “Falsy”. Truthy/Falsy are values of convenience for situations where you need to test whether a statement is binary in nature (True or False) instead of writing more complex statements.
For example to check if a list (array in other languages) is empty you may use the following traditional syntax:
if len(my_list) != 0: print("List is not empty")
In Python you can use the more compact syntax:
if my_list: print("List is not empty")
With this information you should be able to check if the path to the word list is specified or is None
, if it is None
remember to set something to use the in-built word list.
Now that we know where the path to our word list is we can try to open it. The function we use to do this is open()
which will return a file object. open()
can be used in one of two ways, with, or without a mode, by default the mode to access files is read-only:
my_file = open('filename')
If you want to read and write, write, or append to a file you can use r+
, w
or w+
respectively.
Caution
When using the write mode (w
) it is important to remember if a file exists with the same name, the existing file will be erased and overwritten with the newer version.
When handling it is considered good practise to use the with
keyword when dealing with file objects. This is because using the with
keyword will ensure the file is closed properly even if an exception is raised.
Tip
If you are not using the with
keyword, you should ensure you call the close()
function to close the file and free up system resources used by the application. If you do not Python’s garbage collector will eventually destroy the object and close the open file for you, but the file may stay open for a while.
Additionally, different Python implementations will do this clean-up at different times meaning your application could continue to use resources for an extended period of time.
To access the files contents we can use a variety of functions read(size)
will read a quantity of data and returns the value as a string. The size
parameter is completely optional, and if not supplied the whole file will be read and returned.
Caution
Python doesn’t care if you use read()
and the file is two, three or four times the size of your computer’s memory – so before reading in large files ensure your (and your user’s) computer can handle it.
readline()
reads a single line from a file, newline characters (\n
) are left at the end of the string and only omitted for the last line of the file, if the file doesn’t end in a new line. The benefit of this is the returned value is unambiguous - if readline()
returns an empty string, you have reached the end of the file. If a \n
is returned, a string containing only an empty line it is probably a line break.
If you want to read each line of the file you can use list(file)
or readlines()
. Your job in this step is to take the word list obtained in the previous step and load it into your application at the end of the function print the word list.
During the previous step you probably faced a number of inconvenient problems, missing files, wacky formats which is a great segue into the next part exception handling. Now if you have ever used software you will know that things do not always go the way we intend so sometimes we need to be extraordinary and handle those conditions.
Note
In Python we follow the motto “ask for forgiveness, not permission” when handling data, variables, anything. Meaning that before we check if the file is valid we try opening it. If it doesn’t because of an exception, we ask for forgiveness (by handling said exception).
In Python we use try.. except..
to handle exceptions. In the try
block we write what we are trying to do, open a file, access a value in a dictionary etc. If that fails, we catch the problem in the except
block. For example:
try:
with open(my_file) as new_file:
lines = new_file.readlines()
except FileNotFoundError:
sys.exit(2)
In the example above we attempt to open a file and read the lines into a list, if the file cannot be found we can catch that exception and terminate the program before we encounter more issues, however exception handling doesn’t always need to lead to termination, it can also log a warning or set a variable.
It’s important to remember that we must specify which exceptions we are going to catch, and to order them from the most specific to the least, something that admittedly when you’re getting started can be difficult to workout. For example when handling a file in Python 3+ we handle FileNotFound
then IOError
then Exception
however, generally speaking it is better to extend the base Exception
class instead as it is too broad.
Tip
Python exception handling doesn’t just end with try... except...
this is just the tip of the iceberg. The else
clause also exists and allows you to run code if and only if the try
clause does not raise any exceptions, for example:
try:
with open(my_file) as new_file:
lines = new_file.readlines()
except FileNotFoundError:
load_backup_config()
else:
parse_user_config()
You also have the finally
clause, it can be used to define clean-up actions that must be executed under all circumstances, whether an exception has occurred or not:
try:
with open(my_file) as new_file:
lines = new_file.readlines()
except FileNotFoundError:
load_backup_config()
else:
parse_user_config()
finally:
clean_up_artifacts()
print("Thanks for using our software! Goodbye")
You may have noticed when you were printing the word list you have some inconvenient trailing characters like \n
. These are normally not seen when you are reading a file in a text based application, but these little beasties will become a common occurrence when doing text processing.
To remove these before our print()
statement we can quickly strip them out using the aptly named strip()
function. This function returns a copy of the string where all the characters provided have been stripped from the beginning and end of the original string. The default behaviour is to strip white space characters (like a space).
str = "88888888this is string example....wow!!!8888888";
print str.strip('8);
Result:
this is string example....wow!!!
Now you will have noticed I have bolded some keywords in that previous paragraph, because these are important things to remember. Let’s address them now.
The
strip()
function returns a copy of the original string you need to assign the new value to a variable. This can be a new variable, or you can over-ride the old one. Just remember simply using this function will not modify the original string.When you are stripping characters from a string, sometimes you want to strip characters on the left, sometimes only those on the right and sometimes there is an easy solutions to these problems! (This is one of those times).
lstrip
¶
lstrip()
is an alternative to strip()
that returns a copy of the string where all characters have been stripped from the beginning of the string, for example:
str = "88888888this is string example....wow!!!8888888";
print str.lstrip('8')
Result¶
this is string example....wow!!!8888888
rstrip
¶
rstrip()
is an alternative to strip()
that returns a copy of the string where all characters have been stripped from the end of the string, for example:
str = "88888888this is string example....wow!!!8888888";
print str.rstrip('8')
Result¶
88888888this is string example....wow!!!
Using a for
loop you will be able to modify the values in the list and replace them, for more advanced users, using a list comprehension to achieve the same goal.
A for
loop can be created by using the syntax for x in y
where x
is the name of the thing you are currently looking at and y is a collection of things, not to be confused with the Python collection
module which is made up of high-performance container datatypes.
In this case, our collection is being stored as a list
.
Important
In the context of this training, when we refer to a collection we are referring to a group of things.
It is considered Pythonic to name your list
as if it were a collection of things, so if you had a number of books you may call your list books
, this would mean your x
value should be called something like book
this is so when you read the code the context is implied, “I am looking at a book in a list of books”.
A for
loop can then be created using the following syntax for book in books:
.
List comprehensions do the exact same thing as a normal for
loop however, provide a concise way to create lists meaning less code. They however, can sometimes become difficult to understand so while it is considered more Pythonic to use list comprehensions, sometimes being more verbose is easier on the programmer.
A list comprehension can be created using this general guide new_list = [expression(i) for i in old_list if filter(i)]
.
By now you should have an application that takes a file, whether that be user supplied, or built-in and print it out. The final step in this section is to add some logging.
An Incantation for Application Truths (i.e. Logging)¶
Logging allows us to track events that happen during runtime, we add these calls to our code to determine, when events happened and what may have caused problems during runtime. These events are classified into importance, or severity to the developer or the user. These can vary depending on the built-in configuration however are customisable. For today we will just be working with the default severities provided by Python.
But Buffy does this mean we need to log every activity? Heck. to. the. no. The Python documentation provides a handy table to determine if, and when you should do logging (reproduced below). I personally tend to blur this line a little bit by configuring my logger to display logging to the console as well as saving it to a file, this way the end-user can determine how much information they care about (but that’s a story for another time).
When to use Logging¶
Task you are performing |
The best tool for the task |
---|---|
Display console output for ordinary usage of a command line script or program |
|
Report events that occur during normal operation of a program (e.g. for status monitoring or fault investigation) |
|
Issue a warning regarding a particular runtime event |
|
Report an error regarding a particular runtime event |
Raise an exception |
Report suppression of an error without raising an exception (e.g. error handler in a long-running server process) |
|
Python provides five default levels of severity, these are standard levels and their applicability are described below:
Level |
When it’s used |
---|---|
|
Detailed information, typically of interest only when diagnosing problems. |
|
Confirmation that things are working as expected. |
|
An indication that something unexpected happened, or indicative of some problem in the near future (e.g. ‘disk space low’). The software is still working as expected. |
|
Due to a more serious problem, the software has not been able to perform some function. |
|
A serious error, indicating that the program itself may be unable to continue running. |
The default logger is configured to record events classified as a WARNING
or higher, meaning ERROR
and CRITICAL
will also be recorded. Events can be recorded in several different ways. In more advanced systems these may be piped to syslog
so the device can send events to a centralised server, however, more often than not, local logging to the console or to a text file is sufficient.
A Simple Logger Example¶
import logging
# will print a message to the console
logging.warning('Look out for the Cobra!')
# will not print anything
logging.info('The sky is blue.')
This will produce the following output:
WARNING:root:Look out for the Cobra!
As mentioned previously, because we have not configured the logger, it will by default only log WARNING
events and higher meaning the INFO
event is suppressed.
Logging to a File¶
It’s very common to want to hold onto events that occur during the runtime of an application which is what we will cover next, logging to a file. There are lots of different options available when configuring loggers and we won’t go through them today, because that alone could be its own training, but you can find more details available in the Further Reading section down below.
import logging
# Configure the logger to store events in a file named "example.log", logging events DEBUG and higher
logging.basicConfig(filename='example.log', level=logging.DEBUG)
logging.debug('This is a debug message that contains important debugging information and will be saved to example.log')
logging.info('This is an informational message that contains information and will be saved to example.log')
logging.warning('This is a warning. You should be careful now.')
If you copy and run this code, you should find a new file called example.log
inside should look something like:
DEBUG:root:This is a debug message that contains important debugging information and will be saved to example.log
INFO:root:This is an informational message that contains information and will be saved to example.log
WARNING:root:This is a warning. You should be careful now.
Using the two examples above, you should be able to now configure a logger for your application and have messages saved to a file.
Optional¶
Using the Python in-build Comma Separated Values (CSV) library add the option to use a CSV file instead of a Plain Text file. See if you can make your application detect if it is Plain Text or a CSV.
Archive your log files in a directory called
logs
saving each new log file with the time, date, month and year - this should be in ISO 8601 Notation (yyyy-mm-dd hh:mm:ss).