Automate the non-automated with expect and Chef

Some of you may have already used expect in linux. It's a nifty little tool. A tool that allows you to automate command line interfaces that require interactive prompts. It's simple, let's look at an example of expect in action.

#!/usr/bin/expect -f

# Change your own password automatically
spawn passwd
expect "Enter new UNIX password: "
send "mynewpassword\r"
expect "Retype new UNIX password: "
send "mynewpassword\r"
expect eof

Notice the shebang line #!/usr/bin/expect -f. Expect is an interactive shell. If you just type expect from the command line you will be placed in a shell that gives you commands such as spawn, send, and expect.

Let's look at what expect is doing in the code snippet above.


We tell expect to spawn a child process of passwd

spawn passwd

Next, we tell expect to wait for some stout from the child process where the text says Enter New UNIX password

expect "Enter new UNIX password: "

Then we tell expect to send a password with a carriage return, the \r idiom.

send "mynewpassword\r"

Finally, we expect the termination of the child process.

expect eof


So, how do we implement this in Chef? It's more simple than you think.

bash "modify_root_password" do
    user "root"
    code <<-EOF
    /usr/bin/expect -c 'spawn passwd
    expect "Enter new UNIX password: "
    send "#{user_root['password']}\r"
    expect "Retype new UNIX password: "
    send "#{user_root['password']}\r"
    expect eof'
    EOF
end

You're done! It's similar to the code with a few exceptions:

  • It's wrapped in Chefs bash resource and using the attribute code to tell it what to execute. The user_root hash comes from an encrypted data bag.

  • We use the '/usr/bin/expect -c' notation because as far as i know chef doesnt implement an expect interpreter under the script resource.


Here is another example creating a tarsnap keyfile:

bash "create_tarsnap_private_key" do
    not_if do
        File.exists?("#{node[:tarsnap][:private_key]}")
    end
    user "root"
    cwd "/tmp"
    code <<-EOF
    /usr/bin/expect -c 'spawn tarsnap-keygen --keyfile #{node[:tarsnap][:private_key]} --machine #{node[:fqdn]}  --user #{tarsnap_user['username']}
    expect "Enter tarsnap account password: "
    send "#{tarsnap_user['password']}\r"
    expect eof'
    EOF
end

I have made a cookbook that automates the tarsnap install process for you.

Remove stop words implementations in python

Removing stopwords is a common thing to do if you are indexing data. Typically you don't want words like and, the, and or to show up in your index. It waters down the results. I have modified an existing script by Durden to show small and large corpus of text and its benchmarks. Durden has already done a great job of setting up the code now let's see some test results.

Gist: https://gist.github.com/4684356

"""
Demonstration of ways to implement this API:
    sanitize(user_input, stop_words)

Related discussions:
    - Modifying a list while looping over it:
        - http://stackoverflow.com/questions/1207406/remove-items-from-a-list-while-iterating-in-python

    - Remove all occurences of a value in a list:
        - http://stackoverflow.com/questions/1157106/remove-all-occurences-of-a-value-from-a-python-list
"""
import sys
from timeit import timeit
from functools import partial
from collections import deque


def sanitize_1(user_input, stop_words):
    """Sanitize using set subtraction then wrapped in list()"""

    # Downsides:
    #   - Sets are unordered so if user_input was a sentence we lose the
    #     ordering because set difference will create a new set and not
    #     maintain ordering.

    return list(set(user_input) - set(stop_words))


def sanitize_2(user_input, stop_words):
    """Sanitize using intersection and list.remove()"""
    # Downsides:
    #   - Looping over list while removing from it?
    #     http://stackoverflow.com/questions/1207406/remove-items-from-a-list-while-iterating-in-python

    stop_words = set(stop_words)
    for sw in stop_words.intersection(user_input):
        while sw in user_input:
            user_input.remove(sw)

    return user_input


def sanitize_3(user_input, stop_words):
    """Sanitize using standard lists"""
    new_list = []
    for w in user_input:
        if w not in stop_words:
            new_list.append(w)
    return new_list


def sanitize_4(user_input, stop_words):
    """Sanitize using standard list comprehension"""
    return [w for w in user_input if w not in stop_words]


def sanitize_5(user_input, stop_words):
    """Sanitize using collection.deque and list comprehension"""
    user_input = deque(user_input)
    stop_words = deque(stop_words)
    return [w for w in user_input if w not in stop_words]


def _sanitize_funcs():
    """Get all the sanitize functions in scope"""
    module = sys.modules[__name__]

    for obj in sorted(vars(module).values()):
        if callable(obj) and obj.__name__.startswith('sanitize_'):
            yield obj


def get_functions(user_input, stop_words):
    """Build a list of the sanitize functions with parameters"""
    functions = []

    for f in _sanitize_funcs():
        functions.append(partial(f, user_input, stop_words))

    return functions



def check_results(functions):
    """Test the results of each function"""
    for f in functions:
        print '%s - %s' % (f.func.__name__, f())


def check_performance(functions, number=1000000):
    """Test the performance of each function"""
    for f in functions:
        print '%s [%f] %s' % (f.func.__name__, timeit(f, number=number), f.func.__doc__)


def main():
    # number of iterations for timeit
    # timeit defaults to 1,000,000 but change this number
    # to see different variations
    number = 10000

    # a list of stop words to be removed
    stop_words = ['the', 'that', 'to', 'as', 'there', 'has', 'and', 'or', 'is', 'not', 'a', 'of', 'but', 'in', 'by', 'on', 'are', 'it', 'if']

    user_input = 'the cat walked down the road.'.split()
    functions = get_functions(user_input, stop_words)

    print 'Timing with %d iterations' % number

    print '----- Function Results -----'
    check_results(functions)

    print '----- Function Performance -----'
    check_performance(functions, number)

    print '-' * 10

    user_input = """
        Proficient reading depends on the ability to recognize words quickly and effortlessly.[2] If word recognition is difficult, students use too much of their processing capacity to read individual words, which interferes with their ability to comprehend what is read.
        Many educators in the USA believe that students need to learn to analyze text (comprehend it) even before they can read it on their own, and comprehension instruction generally begins in pre-Kindergarten or Kindergarten. But other US educators consider this reading approach to be completely backward for very young children, arguing that the children must learn how to decode the words in a story through phonics before they can analyze the story itself.
        During the last century comprehension lesson/s usually comprised students answering teachers' questions, writing responses to questions on their own, or both.[citation needed] The whole group version of this practice also often included "Round-robin reading", wherein teachers called on individual students to read a portion of the text (and sometimes following a set order). In the last quarter of the 20th century, evidence accumulated that the read-test methods assessed comprehension more than they taught it. The associated practice of "round robin" reading has also been questioned and eliminated by many educators.
        Instead of using the prior read-test method, research studies have concluded that there are much more effective ways to teach comprehension. Much work has been done in the area of teaching novice readers a bank of "reading strategies," or tools to interpret and analyze text.[3] There is not a definitive set of strategies, but common ones include summarizing what you have read, monitoring your reading to make sure it is still making sense, and analyzing the structure of the text (e.g., the use of headings in science text). Some programs teach students how to self monitor whether they are understanding and provide students with tools for fixing comprehension problems.
        Instruction in comprehension strategy use often involves the gradual release of responsibility, wherein teachers initially explain and model strategies. Over time, they give students more and more responsibility for using the strategies until they can use them independently. This technique is generally associated with the idea of self-regulation and reflects social cognitive theory, originally conceptualized by
    """.lower().split()
    functions = get_functions(user_input, stop_words)   

    print '----- Function Results -----'
    check_results(functions)

    print '----- Function Performance -----'
    check_performance(functions, number)


if __name__ == "__main__":
    main()

Results with 10000 iterations:

$ python sanitize.py 
Timing with 10000 iterations
----- Function Results -----
sanitize_1 - ['down', 'road.', 'walked', 'cat'] ...
sanitize_2 - ['cat', 'walked', 'down', 'road.'] ...
sanitize_3 - ['cat', 'walked', 'down', 'road.'] ...
sanitize_4 - ['cat', 'walked', 'down', 'road.'] ...
sanitize_5 - ['cat', 'walked', 'down', 'road.'] ...
----- Function Performance -----
sanitize_1 [0.023346] Sanitize using set subtraction then wrapped in list()
sanitize_2 [0.018676] Sanitize using intersection and list.remove()
sanitize_3 [0.030571] Sanitize using standard lists
sanitize_4 [0.025661] Sanitize using standard list comprehension
sanitize_5 [0.044076] Sanitize using collection.deque and list comprehension
----------
----- Function Results -----
sanitize_1 - ['interferes', 'consider', 'effortlessly.[2]', 'text', 'over', 'through', 'questions', 'using', 'still', 'children'] ...
sanitize_2 - ['proficient', 'reading', 'depends', 'ability', 'recognize', 'words', 'quickly', 'effortlessly.[2]', 'word', 'recognition'] ...
sanitize_3 - ['proficient', 'reading', 'depends', 'ability', 'recognize', 'words', 'quickly', 'effortlessly.[2]', 'word', 'recognition'] ...
sanitize_4 - ['proficient', 'reading', 'depends', 'ability', 'recognize', 'words', 'quickly', 'effortlessly.[2]', 'word', 'recognition'] ...
sanitize_5 - ['proficient', 'reading', 'depends', 'ability', 'recognize', 'words', 'quickly', 'effortlessly.[2]', 'word', 'recognition'] ...
----- Function Performance -----
sanitize_1 [0.288719] Sanitize using set subtraction then wrapped in list()
sanitize_2 [0.107473] Sanitize using intersection and list.remove()
sanitize_3 [1.607902] Sanitize using standard lists
sanitize_4 [1.277046] Sanitize using standard list comprehension
sanitize_5 [1.938485] Sanitize using collection.deque and list comprehension

It seems in both cases using set.intersection provides you with the fastest and most accurate results. If you wish to comment please use the gist.

Notifying yourself of bill due dates with sendhub and python

Just what it says in the title. You have some bills and you want to notify yourself x days before it's due via SMS. It uses SendHub to send SMS messages.

The gist is located here: https://gist.github.com/4445947 and the contents of the gist are below.

#!/usr/bin/env python
# Due Date Notifier - Notify VIA SMS (SendHub) when a bill is going to be due
#  * Requires a SendHub account (free) - http://sendhub.com
#  * Has python package dependencies: simplejson, requests
#  * Recommend putting this on a crob job running once a day
#    I like my text messages at at 11am
#
# NOTE: Sendhub free accounts allow only 500 requests to the API per month.
#       This should suffice assuming the amount of bills you pay arent loco.
#
# Author: Glen Zangirolami - http://github.com/glenbot
# GIST: https://gist.github.com/4445947
import simplejson
import requests
import logging
from datetime import datetime

logging.basicConfig(level=logging.ERROR)
log = logging.getLogger(__name__)

# List of bills in the following format:
# (bill_name, day_of_month, [notify_x_days_before, notify_x_days_before, ...])
# Example: ('Macys Credit Card', 5, [7,3,1]) will notify you that
# the Macys card is due 7, 3, and 1 day before the 5th of any month
BILLS = (
    ('My Bill #1', 5, [7, 3, 1]),
    ('My Bill #2', 7, [7, 3, 1]),
    ('My Bill #4', 12, [7, 3, 1]),
)

# Phone numbers of people you want to send the text message to
# in the format "+12815555555"
SEND_SMS_TO = ["+12815555555"]  # change me to a real number

# SendHub API settings
# https://www.sendhub.com/settings
SENDHUB_USERNAME = '2815555555'  # 10 digit number you signed up with
SENDHUB_API_KEY = 'xxxxxxx'  # api key from settings page
SENDHUB_API_URL = 'https://api.sendhub.com/v1'


def sendhub_request(endpoint, payload={}, _type='GET'):
    """Make a request to the sendhub API"""
    # create the API url
    url = '%s/%s/' % (SENDHUB_API_URL, endpoint)
    validation = {
        'api_key': SENDHUB_API_KEY,
        'username': SENDHUB_USERNAME
    }

    if _type == 'GET':
        # inject the api key, and username in the query string
        payload.update(validation)
        log.debug('HTTP %s to %s with payload %s' % (_type, url, payload))
        return requests.get(url, params=payload)
    if _type == 'POST':
        headers = {'content-type': 'application/json'}
        log.debug('HTTP %s to %s with payload %s' % (_type, url, payload))
        return requests.post(
            url,
            params=validation,
            data=simplejson.dumps(payload),
            headers=headers
        )

    return None


def sendhub_get_contacts():
    """Get a list of contacts from the sendhub API"""
    request = sendhub_request('contacts')

    if request and request.status_code == 200:
        data = simplejson.loads(request.content)
        return data['objects']

    return []


def sendhub_send_sms(data):
    """Send an SMS text"""
    return sendhub_request('messages', data, 'POST')


def get_notifications():
    """Run through the dates and build a notification list"""
    # placeholder for all notifications
    notifications = []

    # get the current day of the month
    day_of_month = datetime.now().day

    for bill in BILLS:
        # unpack the bill data
        bill_name, bill_day, bill_notify = bill

        # see if we need to send a notification
        if day_of_month < bill_day:
            days_until_due = bill_day - day_of_month
            if days_until_due in bill_notify:
                sms = "A reminder that your %s is due in %d days" % (
                    bill_name,
                    days_until_due
                )
                notifications.append(sms)

    return notifications


def send_notifications(notifications):
    """Send the notifications to the receipients in SEND_SMS_TO"""
    sendhub_contacts = sendhub_get_contacts()
    contacts_to_send_to = []

    # parse a list of contacts to send to, we aren't using
    # groups from sendhub here although the code is easy enough
    # to modify to use groups
    for sendhub_contact in sendhub_contacts:
        if sendhub_contact['number'] in SEND_SMS_TO:
            contacts_to_send_to.append(sendhub_contact['id'])

    if contacts_to_send_to:
        for notification in notifications:
            data = {
                'contacts': contacts_to_send_to,
                'text': notification
            }
            log.debug('Sending payload %s to sendhub' % data)
            response = sendhub_send_sms(data)

            if response.status_code != 201:
                log.error('Could not send text message, HTTP Status %s' % (
                    response.status_code
                ))
    else:
        log.debug('Could not find any contacts to send to')


if __name__ == '__main__':
    notifications = get_notifications()
    send_notifications(notifications)

GitHub – glenbot

Glen Zangirolami

Houston, TX