========================================================= Handling Word Lists ========================================================= When it comes to brute-forcing, word lists make the world go 'round. Some people spend days, weeks, months and even years refining their word lists, from most common, to categorising by company. These lists are often built by the community for the community or are a hodge-podge of maybes. Either way crafting an efficient word-list is a loveless labour. For our application we are going to have two types of list, an in-built one and the ability to handle one provided by the user. Now keep in mind because we haven't done any optimisation in the way of threading for this application make sure you keep your word lists small. We have included 100 common subdomains in the file below, however, in the future you may want a bigger more complete word list for brute-forcing subdomains, for this we have recommended a few resources to get you started: .. literalinclude:: one_hundered_subdomains.txt :linenos: .. topic:: Other Word Lists - `dnscan `_ a Python word list-based sub-domain scanner has a number of lists included including top 100, 500, 1000, 10, 000 as well as a few others - `all.txt `_ is a GitHub gist that claims to have all word lists from every DNS enumeration tools. It has over 67, 627 lines - **May contain crude and offensive entries** - `SecLists `_ is a collection of multiple types of lists used during security assessments including a list of common subdomains which can be found under ``Discovery/DNS``. For the purposes of testing we will be testing ``snakecharmingforbeginners.com`` a web-server set up specifically for this training that is running a default Apache configuration, so there is nothing exciting there sorry! --------------------------------------------------------- A Recipe for Handling Word Lists --------------------------------------------------------- In the previous section we created an Argument Parser that we can use to specify the domain we wish to target and the word list to use. In this section we will build on this and learn how to handle and process text files. .. topic:: Concepts Covered - Exception handling - File handling - String manipulation - Type Hints - ``NameSpace`` object - Logging - ``NoneType`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Ingredients ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - One new function in the file you used in the previous section called ``main`` - One liberal handful of exception handling - Two text files, one populated with the word list included above, another with approximately 50 subdomains, one sub-domain per line - One dash of ``list`` handling ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Method ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Create your ``main`` function. Between the brackets put the following ``runtime_options: argparse.Namespace``, in Python this is called a positional parameter. Unlike named parameters which we covered in the earlier section, the order we supply these parameters is important. The value after the colon (``:``) ``argparse.Namespace`` is a "Type Hint". .. note:: Type Hints are useful because Python is a "dynamic language" which can make inferring the type of an object being used difficult. .. note:: Duck Typing is a computer programming concept where an object can be used in any context up until it is used in a way that is not supported. To test whether a language supports duck typing, we apply what is aptly named the duck test - "If it walks like a duck and it quacks like a duck, then it must be a duck". Duck typing is different to normal typing where an Objects ability is determined by its type rather than the presence of certain methods and properties. The first thing we want to do is check if the user has provided a path to a custom word list, or if we will be using the in-built list. To access values in a ``Namespace`` we can use the following syntax ``runtime_options.name_of_value``. .. tip:: Many programming languages have the concept of ``none`` -- in C, JavaScript and Java this is ``null`` in Ruby this value is represented by ``nil``. In Python this is the ``NoneType`` or ``None`` when programming. It is the lack of assigned value. .. note:: While learning Python you may come across the concepts of "Truthy" and "Falsy". Truthy/Falsy are values of convenience for situations where you need to test whether a statement is binary in nature (True or False) instead of writing more complex statements. For example to check if a list (array in other languages) is empty you may use the following traditional syntax: .. sourcecode:: python if len(my_list) != 0: print("List is not empty") In Python you can use the more compact syntax: .. sourcecode:: python if my_list: print("List is not empty") With this information you should be able to check if the path to the word list is specified or is ``None``, if it is ``None`` remember to set something to use the in-built word list. Now that we know where the path to our word list is we can `try` to open it. The function we use to do this is ``open()`` which will return a file object. ``open()`` can be used in one of two ways, with, or without a mode, by default the mode to access files is read-only: .. sourcecode:: python my_file = open('filename') If you want to read and write, write, or append to a file you can use ``r+``, ``w`` or ``w+`` respectively. .. caution:: When using the write mode (``w``) it is important to remember if a file exists with the same name, the existing file will be erased and overwritten with the newer version. When handling it is considered good practise to use the ``with`` keyword when dealing with file objects. This is because using the ``with`` keyword will ensure the file is closed properly even if an exception is raised. .. tip:: If you are not using the ``with`` keyword, you should ensure you call the ``close()`` function to close the file and free up system resources used by the application. If you do not Python’s garbage collector will eventually destroy the object and close the open file for you, but the file may stay open for a while. Additionally, different Python implementations will do this clean-up at different times meaning your application could continue to use resources for an extended period of time. To access the files contents we can use a variety of functions ``read(size)`` will read a quantity of data and returns the value as a string. The ``size`` parameter is completely optional, and if not supplied the whole file will be read and returned. .. caution:: Python doesn't care if you use ``read()`` and the file is two, three or four times the size of your computer’s memory -- so before reading in large files ensure your (and your user's) computer can handle it. ``readline()`` reads a single line from a file, newline characters (``\n``) are left at the end of the string and only omitted for the last line of the file, if the file doesn't end in a new line. The benefit of this is the returned value is unambiguous - if ``readline()`` returns an empty string, you have reached the end of the file. If a ``\n`` is returned, a string containing only an empty line it is probably a line break. If you want to read each line of the file you can use ``list(file)`` or ``readlines()``. Your job in this step is to take the word list obtained in the previous step and load it into your application at the end of the function print the word list. During the previous step you probably faced a number of inconvenient problems, missing files, wacky formats which is a great segue into the next part exception handling. Now if you have ever used software you will know that things do not always go the way we intend so sometimes we need to be extraordinary and handle those conditions. .. note:: In Python we follow the motto "ask for forgiveness, not permission" when handling data, variables, anything. Meaning that before we check if the file is `valid` we try opening it. If it doesn't because of an exception, we ask for forgiveness (by handling said exception). In Python we use ``try.. except..`` to handle exceptions. In the ``try`` block we write what we are trying to do, open a file, access a value in a dictionary etc. If that fails, we catch the problem in the ``except`` block. For example: .. sourcecode:: python try: with open(my_file) as new_file: lines = new_file.readlines() except FileNotFoundError: sys.exit(2) In the example above we attempt to open a file and read the lines into a list, if the file cannot be found we can catch that exception and terminate the program before we encounter more issues, however exception handling doesn't always need to lead to termination, it can also log a warning or set a variable. It's important to remember that we must specify which exceptions we are going to catch, and to order them from the most specific to the least, something that admittedly when you're getting started can be difficult to workout. For example when handling a file in Python 3+ we handle ``FileNotFound`` then ``IOError`` then ``Exception`` however, generally speaking it is better to extend the base ``Exception`` class instead as it is too broad. .. tip:: Python exception handling doesn't just end with ``try... except...`` this is just the tip of the iceberg. The ``else`` clause also exists and allows you to run code if and only if the ``try`` clause does not raise any exceptions, for example: .. sourcecode:: python try: with open(my_file) as new_file: lines = new_file.readlines() except FileNotFoundError: load_backup_config() else: parse_user_config() You also have the ``finally`` clause, it can be used to define clean-up actions that must be executed under all circumstances, whether an exception has occurred or not: .. sourcecode:: python try: with open(my_file) as new_file: lines = new_file.readlines() except FileNotFoundError: load_backup_config() else: parse_user_config() finally: clean_up_artifacts() print("Thanks for using our software! Goodbye") You may have noticed when you were printing the word list you have some inconvenient trailing characters like ``\n``. These are normally not seen when you are reading a file in a text based application, but these little beasties will become a common occurrence when doing text processing. To remove these before our ``print()`` statement we can quickly strip them out using the aptly named ``strip()`` function. This function returns a **copy** of the string where all the characters provided have been stripped from the **beginning** and **end** of the original string. The default behaviour is to strip white space characters (like a space). .. sourcecode:: python str = "88888888this is string example....wow!!!8888888"; print str.strip('8); **Result:** .. sourcecode:: console this is string example....wow!!! Now you will have noticed I have bolded some keywords in that previous paragraph, because these are important things to remember. Let's address them now. - The ``strip()`` function returns a copy of the original string you need to assign the new value to a variable. This can be a new variable, or you can over-ride the old one. Just remember simply using this function will not modify the original string. - When you are stripping characters from a string, sometimes you want to strip characters on the left, sometimes only those on the right and sometimes there is an easy solutions to these problems! (This is one of those times). ``lstrip`` ========================================================= ``lstrip()`` is an alternative to ``strip()`` that returns a copy of the string where all characters have been stripped from the beginning of the string, for example: .. sourcecode:: python str = "88888888this is string example....wow!!!8888888"; print str.lstrip('8') Result --------------------------------------------------------- .. sourcecode:: console this is string example....wow!!!8888888 ``rstrip`` ========================================================= ``rstrip()`` is an alternative to ``strip()`` that returns a copy of the string where all characters have been stripped from the end of the string, for example: .. sourcecode:: python str = "88888888this is string example....wow!!!8888888"; print str.rstrip('8') Result --------------------------------------------------------- .. sourcecode:: console 88888888this is string example....wow!!! Using a ``for`` loop you will be able to modify the values in the list and replace them, for more advanced users, using a list comprehension to achieve the same goal. A ``for`` loop can be created by using the syntax ``for x in y`` where ``x`` is the name of the thing you are currently looking at and y is a collection of things, not to be confused with the Python ``collection`` module which is made up of high-performance container datatypes. In this case, our collection is being stored as a ``list``. .. important:: In the context of this training, when we refer to a collection we are referring to a group of things. It is considered Pythonic to name your ``list`` as if it were a collection of things, so if you had a number of books you may call your list ``books``, this would mean your ``x`` value should be called something like ``book`` this is so when you read the code the context is implied, "I am looking at a book in a list of books". A ``for`` loop can then be created using the following syntax ``for book in books:``. List comprehensions do the exact same thing as a normal ``for`` loop however, provide a concise way to create lists meaning less code. They however, can sometimes become difficult to understand so while it is considered more Pythonic to use list comprehensions, sometimes being more verbose is easier on the programmer. A list comprehension can be created using this general guide ``new_list = [expression(i) for i in old_list if filter(i)]``. By now you should have an application that takes a file, whether that be user supplied, or built-in and print it out. The final step in this section is to add some logging. --------------------------------------------------------- An Incantation for Application Truths (i.e. Logging) --------------------------------------------------------- Logging allows us to track events that happen during runtime, we add these calls to our code to determine, when events happened and what may have caused problems during runtime. These events are classified into importance, or severity to the developer or the user. These can vary depending on the built-in configuration however are customisable. For today we will just be working with the default severities provided by Python. But Buffy does this mean we need to log every activity? Heck. to. the. no. The Python documentation provides a handy table to determine if, and when you should do logging (reproduced below). I personally tend to blur this line a little bit by configuring my logger to display logging to the console as well as saving it to a file, this way the end-user can determine how much information they care about (but that's a story for another time). ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When to use Logging ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +---------------------------+-----------------------------+ | Task you are performing | The best tool for the task | +===========================+=============================+ | Display console output | ``print()`` | | for ordinary usage of a | | | command line script or | | | program | | +---------------------------+-----------------------------+ | Report events that occur | ``logging.info()`` (or | | during normal operation | ``logging.debug()`` for | | of a program (e.g. for | very detailed output for | | status monitoring or fault| diagnostic purposes) | | investigation) | | +---------------------------+-----------------------------+ | Issue a warning regarding | ``warnings.warn()`` in | | a particular runtime event| library code if the issue | | | is avoidable and the client | | | application should be | | | modified to eliminate the | | | warning | +---------------------------+-----------------------------+ | Report an error regarding | Raise an exception | | a particular runtime event| | +---------------------------+-----------------------------+ | Report suppression of an | ``logging.error()``, | | error without raising an | ``logging.exception()`` or | | exception (e.g. error | ``logging.critical()`` as | | handler in a long-running | appropriate for the specific| | server process) | error and application domain| | | warning | +---------------------------+-----------------------------+ Python provides five default levels of severity, these are standard levels and their applicability are described below: +---------------------------+-----------------------------+ | Level | When it's used | +===========================+=============================+ | ``DEBUG`` | Detailed information, | | | typically of interest only | | | when diagnosing problems. | +---------------------------+-----------------------------+ | ``INFO`` | Confirmation that things | | | are working as expected. | | | | +---------------------------+-----------------------------+ | ``WARNING`` | An indication that | | | something unexpected | | | happened, or indicative | | | of some problem in the near | | | future (e.g. ‘disk space | | | low’). The software is still| | | working as expected. | +---------------------------+-----------------------------+ | ``ERROR`` | Due to a more serious | | | problem, the software has | | | not been able to perform | | | some function. | +---------------------------+-----------------------------+ | ``CRITICAL`` | A serious error, indicating | | | that the program itself may | | | be unable to continue | | | running. | +---------------------------+-----------------------------+ The default logger is configured to record events classified as a ``WARNING`` or higher, meaning ``ERROR`` and ``CRITICAL`` will also be recorded. Events can be recorded in several different ways. In more advanced systems these may be piped to ``syslog`` so the device can send events to a centralised server, however, more often than not, local logging to the console or to a text file is sufficient. A Simple Logger Example ========================================================= .. code-block:: python import logging # will print a message to the console logging.warning('Look out for the Cobra!') # will not print anything logging.info('The sky is blue.') This will produce the following output: .. code-block:: console WARNING:root:Look out for the Cobra! As mentioned previously, because we have not configured the logger, it will by default only log ``WARNING`` events and higher meaning the ``INFO`` event is suppressed. Logging to a File --------------------------------------------------------- It's very common to want to hold onto events that occur during the runtime of an application which is what we will cover next, logging to a file. There are lots of different options available when configuring loggers and we won't go through them today, because that alone could be its own training, but you can find more details available in the Further Reading section down below. .. code-block:: python import logging # Configure the logger to store events in a file named "example.log", logging events DEBUG and higher logging.basicConfig(filename='example.log', level=logging.DEBUG) logging.debug('This is a debug message that contains important debugging information and will be saved to example.log') logging.info('This is an informational message that contains information and will be saved to example.log') logging.warning('This is a warning. You should be careful now.') If you copy and run this code, you should find a new file called ``example.log`` inside should look something like: .. code-block:: console DEBUG:root:This is a debug message that contains important debugging information and will be saved to example.log INFO:root:This is an informational message that contains information and will be saved to example.log WARNING:root:This is a warning. You should be careful now. Using the two examples above, you should be able to now configure a logger for your application and have messages saved to a file. Optional ========================================================= - Using the Python in-build Comma Separated Values (CSV) library add the option to use a CSV file instead of a Plain Text file. See if you can make your application detect if it is Plain Text or a CSV. - Archive your log files in a directory called ``logs`` saving each new log file with the time, date, month and year - this should be in ISO 8601 Notation (yyyy-mm-dd hh:mm:ss). Further Reading ========================================================= - `PEP 484 -- Type Hints `_. - `The Theory of Type Hints in Python `_. - `Duck Typing `_. - `Errors and Exceptions `_. - `logging — Logging facility for Python `_. - `Basic Logging Tutorial `_. - `Logging Cookbook `_. - `Python Datetime `_.