Create a Command Line Python Application for a Rotational Cipher

I don't have a background in cryptography, but it sure is fun. I was sitting in class, bored, not paying any attention to our discussion about Emmanuel Kant (you can't blame me for that one), and zoned out thinking about ciphers. I'm familiar with a simple cipher approach called a Caesar shift. This is a very easy cipher, where you have a rotation value (i.e. 5, or 10, or 1) and then rotate every character in the alphabet by that amount. Then, translate the original text to the new alphabet, character by character. That's pretty simple, and it only takes up to 26 guesses of the key to crack.

I was thinking, is there a way to do something somewhat similar (in simplicity for encoding), but have it be a lot harder to crack? That's when I came up with something a little different. Have a multi-digit number (i.e., 1495) and then use that as the key. The only difference is that you'd use the digits of that key as the rotation amount for a given character, and loop over the key's digits for subsequent letters. That's hard to explain with words, so let me give you an example.

Let's use the key 123, and encode the word "HELLO".

The first character, 'H', will be moved by the first digit in the key, 1. So 'H' becomes 'I'.
Next letter, 'E', and next digit, 2. 'E' gets shifted by 2 characters, because 2 is the next digit in the key. 'E' is now 'G'.
'L' then becomes 'O' since the next digit is 3.
What now? We've hit the end of our key, but we have two more letters to encode! Just start over at the beginning of the key.
The next 'L' would become 'M', and the 'O' at the end would become 'Q'.
In the end, you'd have the word "IGOMQ".

Decoding is also pretty straightforward. You go through the string again, letter by letter, this time subtracting the current key digit. In the end, you'll be back at "HELLO". I did some research after class and found out that this is a Rolling Cipher. That's cool and probably means this is already implemented in some way somewhere. However, I think it's a good opportunity to practice some new skills. I've never made a full-fledged Python application where you can use flags to pass arguments, and this sounds like some good practice. So this week, I've done just that. I've implemented a Rolling Cipher in Python, with the ability to pass in paths to input files and output files (as well as the option to pass an input string rather than pass an entire file). Note: instead of using a range to define my key sequence (10-15 being 10,11,12,13,14,15), I used an integer (1234) which then gets parsed into a sequence (1, 2, 3, 4).

Setting Up

I'll start by setting up my Python environment. I'm using Python 3.10, and the only library I'll need to import is argparse:

import argparse

I also want to create a mapping of letter to index, and index to letter. It allows for a quick lookup of characters and will come in handy later. Here are those maps:

    letter_to_idx = {
      'a':0, 'b':1, 'c':2, 'd':3, 'e':4,
      'f':5, 'g':6, 'h':7, 'i':8, 'j':9,
      'k':10, 'l':11, 'm':12, 'n':13, 'o':14,
      'p':15, 'q':16, 'r':17, 's':18, 't':19,
      'u':20, 'v':21, 'w':22, 'x':23, 'y':24, 'z':25
    }

    idx_to_letter = {
      0:'a', 1:'b', 2:'c', 3:'d', 4:'e',
      5:'f', 6:'g', 7:'h', 8:'i', 9:'j', 10:'k', 11:'l',
      12:'m', 13:'n', 14:'o', 15:'p', 16:'q', 17:'r',
      18:'s', 19:'t', 20:'u', 21:'v', 22:'w', 23:'x',
      24:'y', 25:'z'
    }

Next, I need to understand what my inputs will be. Let's start with the argument parser. Argparse needs to be initialized, and can be done in the main function like so:

    if __name__ == "__main__":
        parser = argparse.ArgumentParser(description='Perform a rolling cypher on some text')

Now, I want to provide some inputs. I need the key sequence, an input, an encoded output, and a decoded output (just to prove the algorithm works). To add arguments to the parser, you can use the add_arguments() function. So let's do that:

    parser.add_argument('-s', metavar='S', type=int, nargs=1, help='The rotation sequence as an integer')

    parser.add_argument('-i', metavar='I', type=str, nargs=1, help='The path to the input file')

    parser.add_argument('-t', metavar='T', type=str, nargs=1, help='Raw text to encode/decode')

    parser.add_argument('-e', metavar='E', type=str, nargs=1, help='The path to the output encoded file')

    parser.add_argument('-d', metavar='D', type=str, nargs=1, help='The path to the output decoded file')

I've set the -i flag to indicate that the following argument will be a path to an input file, -t will be an input string (rather than input file), -e is the path to the encoded output, and -d is the path to the decoded output. This is a barebones setup, you should also include a --help option which will list these. Now we need to parse the arguments to make them useful. I've done so in the code snippet below, with the addition of some minimal input validation:

type: embedded-entry-inline id: 3e7B7AdIbmPvJKHIeYkCm9

The parser.parse_args() function can be treated like an object, with the argument flags being the attributes. Each attribute is an array that contains the collected argument. You may notice when adding arguments, I passed a nargs value of 1. This specifies how many arguments the parser should collect following the given flag. So in the above code, I am just collecting all of the arguments and validating their existence. The optional arguments get set to a default value if they don't get provided. Notice that there is a function in there called parse_sequence(). I wrote that helper function, which converts an integer into an array of its digits:

    def parse_sequence(sequence):
      return [int(a) for a in str(sequence)]

It's a one-liner with a list comprehension. There are prettier and safer ways to perform this same operation without casting to a string using modular division, but this is for the sake of quick implementation. I've also implemented a document parser, which just collects a text file's contents and passes it all into a string in memory. If you wish to scale this tool, you'd probably not want to do this. Large files could result in hitting memory limits and are just not efficient. It may be better to stream in the data and read/write without ever collecting all the text in memory.

    def parse_doc(path):
      message = ""
      with(open(path, 'r')) as readfile:
        for line in readfile:
          message += line

      return message

Encoding Time

Now that we have our inputs, let's write our encoder! We should take in our input text and the key sequence, setting the current key sequence index to 0:

      def encode_from_text(text, sequence):
        current_index = 0

And we should convert our text to all lowercase, then initialize our encoded text to an empty string:

    tokens = text.lower()
    encoded = ""

Now, we want to iterate over the letters in the text, and then perform some action on them. I also want to encode numbers, so I can use the built-in isalpha() and isalnum() functions to perform some conditional checks on the character I'm evaluating:

    for letter in tokens:
      if letter.isalpha():
        # Shift the letter
      elif letter.isalnum():
        # Shift the number
      else:
       # just add the symbol to the encoded text,
       # Don't increment the current key sequence index
       encoded += letter

The process of shifting a letter is clear. I get the index of the current letter by keying into letter_to_idx. Then, I get which key digit I am using by keying the sequence array with the current_index value. I'll then calculate the index of the new letter, get that letter, and insert it into the encoded message. Finally, I'll increment the current index.

    for letter in tokens:
      if letter.isalpha():
          idx = letter_to_idx[letter]
          sequence_offset = sequence[current_index]
          encoded_index = calculate_index(sequence_offset, idx)
          encoded_letter = idx_to_letter[encoded_index]
          encoded += encoded_letter
          current_index = current_index + 1 if current_index < len(sequence) - 1 else 0
      elif letter.isalnum():
          # Shift the number
      else:
         # just add the symbol to the encoded text,
         # Don't increment the current key sequence index
         encoded += letter

Notice a couple of things. First, the current_index incrementation needs to assure that if we hit the length of the sequence array, we need to circle back to index 0. I used a conditional expression to simplify that logic. Second, what is this calculate_index() function? Well, it serves a similar "wrapping" purpose to what we did with current_index, except this is for containing our calculated letter encoding to a range of 0-25:

    def calculate_index(current_index, letter_index, encoding=True):
        alphabet_size = len(idx_to_letter) - 1
        if encoding:
            if current_index + letter_index > alphabet_size:
                return (current_index + letter_index) - alphabet_size
            else:
                return (current_index + letter_index)
        else:
            if letter_index - current_index < 0:
                return alphabet_size + (letter_index - current_index)
            else:
                return (letter_index -  current_index)

Here, I've passed the current index, letter index, and whether or not we are encoding as a kwarg (this comes in handy later when decoding). As you can see, I'm doing the arithmetic to ensure our calculated index is between 0 and 25. That's all this function does. Now we can move on to calculating numbers. This one is fairly simple. Cast the character to an integer, then add the current key digit to it. If it falls greater than 10 or less than 0, wrap it:

    elif letter.isalnum():
      int_letter = int(letter) + sequence[current_index] if int(letter) + sequence[current_index] < 10 else (int(letter) + sequence[current_index]) - 10
      encoded += str(int_letter)
      current_index = current_index + 1 if current_index < len(sequence) - 1 else 0

The full function will look like this, and our encoding is complete:

    def encode_from_text(text, sequence):
        current_index = 0
        tokens = text.lower()
            encoded = ""
            for letter in tokens:
                if letter.isalpha():
                    idx = letter_to_idx[letter]
                    sequence_offset = sequence[current_index]
                    encoded_index = calculate_index(sequence_offset, idx)
                    encoded_letter = idx_to_letter[encoded_index]
                    encoded += encoded_letter
                    current_index = current_index + 1 if current_index < len(sequence) - 1 else 0
                elif letter.isalnum():
                    int_letter = int(letter) + sequence[current_index] if int(letter) + sequence[current_index] < 10 else (int(letter) + sequence[current_index]) - 10
                    encoded += str(int_letter)
                    current_index = current_index + 1 if current_index < len(sequence) - 1 else 0
                else:
                    encoded += letter
            return encoded

Decoding

Decoding is practically identical to encoding, but instead of adding the key digit, you subtract it. Here's the function:

    def decode_from_text(text, sequence):
        current_index = 0
        tokens = text.lower()
        decoded = ""
        for letter in tokens:
            if letter.isalpha():
                idx = letter_to_idx[letter]
                sequence_offset = sequence[current_index]
                decoded_index = calculate_index(sequence_offset, idx, encoding=False)
                decoded_letter = idx_to_letter[decoded_index]
                decoded += decoded_letter
                current_index = current_index + 1 if current_index < len(sequence) - 1 else 0
            elif letter.isalnum():
                int_letter = int(letter) - sequence[current_index] if int(letter) - sequence[current_index] >= 0 else (int(letter) - sequence[current_index]) + 10
                decoded += str(int_letter)
                current_index = current_index + 1 if current_index < len(sequence) - 1 else 0
            else:
                decoded += letter
        return decoded

Writing Results

I also want to create a helper function to simplify the process of writing text to a file. Here is a simple function that can perform such a task:

    def write_message(path, message):
        with(open(path, 'w+')) as writefile:
            writefile.writelines(message)

        return path

Completing the 'main' Function

Now, to wrap things up, we need to actually call everything necessary to run the encoder and decoder functions once the program is called from the command line. Complete the main function:

    if __name__ == "__main__":
        parser = argparse.ArgumentParser(description='Perform a rolling cypher on some text')
        parser.add_argument('-s', metavar='S', type=int, nargs=1, help='The rotation sequence as an integer')
        parser.add_argument('-i', metavar='I', type=str, nargs=1, help='The path to the input file')
        parser.add_argument('-t', metavar='T', type=str, nargs=1, help='Raw text to encode/decode')
        parser.add_argument('-e', metavar='E', type=str, nargs=1, help='The path to the output encoded file')
        parser.add_argument('-d', metavar='D', type=str, nargs=1, help='The path to the output decoded file')
        args = parser.parse_args()
        sequence = parse_sequence(args.s[0]) if args.s[0] else [1]
        input_text_from_file = args.i[0] if args.i is not None else None
        input_text_in_place = args.t[0] if args.t is not None else None
        output_encoded = args.e[0] if args.e[0] else './encoded.txt'
        output_decoded = args.d[0] if args.d[0] else './encoded.txt'

        if input_text_from_file is None and input_text_in_place is None:
            raise Exception("No input text provided!")
        elif input_text_from_file is None:
            text = input_text_in_place
        else:
            text = parse_doc(input_text_from_file)

        encoded = encode_from_text(text, sequence)
        write_message(output_encoded, encoded)

        text = encoded
        decoded = decode_from_text(text, sequence)
        write_message(output_decoded, decoded)

        print(f"Done! The encoded message can be found in {output_encoded} and the decoded in {output_decoded}")

That's it! A lovely Rolling Cipher command line application in Python. If you want to view the complete source code, check it out on GitHub.