This blog has moved to its new address at mycodebytes.com.
Classes for Fall 2012
I met with my program advisor at UWB on Monday to plan out the courses for my Junior year. Here’s what I’ll be taking this Fall:
CSS 301: Technical Writing for Computing Professionals
Explores methods for writing effective system specifications, user documentation and requests for proposals (RFPs). Examines RFP analysis techniques, writing plans, proposals, marketing documentation, and customer communications.
CSS 342: Mathematical Principles of Computing
Integrating mathematical principles with detailed instruction in computer programming. Explores mathematical reasoning and discrete structures through object-oriented programming. Includes algorithm analysis, basic abstract data types, and data structures.
CSS 360: Software Engineering
Surveys the software engineering processes, tools, and techniques used in software development and quality assurance. Topics include life-cycle models, process modeling, requirements analysis and specification techniques, quality assurance techniques, verification and validation, testing, project planning, and management.
As it turns out, I’ll be using mostly C++ my first year. Although C++ isn’t my favorite language, this will be a great opportunity to strengthen my knowledge and familiarity with it. Unfortunately, I haven’t really touched C++ at all since the first “Intro to CS” class I took back in 2010; I’ll definitely need to spend a few days brushing up once classes start up next month.
Computing & Software Systems
I am incredibly happy to announce that I’ve been accepted to the Computing & Software Systems program at the University of Washington, Bothell. Once classes begin in late September, I’ll be a full-time student working on a Bachelor of Science in Computer Science and Software Engineering.
I’d like for this blog to continue documenting my growth as a developer, so I will try to update it as frequently as possible with relevant material. I realize I have been slacking a lot on that front this summer, but between moving, enrolling at UWB, and the beautiful weather we have been blessed with this past month, I haven’t found a whole lot of time for programming this summer. I was, however, working on a Flask project that I’d really like to wrap up before classes start back up. With that in mind, I am considering holding a solo hackathon sometime in the next couple of weeks. I’ll post more about that once I’ve worked out the details.
Attending a university for a CS degree has been an aspiration of mine for over a decade; I am very excited about what the future holds, both academically and professionally.
Hosting a Flask project on WebFaction
The goal of this post is to outline my process for setting up a Flask app on a WebFaction hosting account. I am sure there are better ways to do this, but this is the process that has worked for me so far. This guide is mostly for my own recollection, but perhaps it will be useful to someone else.
Goodreads & Library Mashup
I finally have a working publicly-available version of the Goodreads Library Lookup project I’ve been working on for the past few weeks.
This tool retrieves a Goodreads member’s “to-read” shelf and checks to see if the books are available to check out from the Kindle Owner’s Lending Library and the King County Library System‘s eBook collections.
Just copy the numerical User ID in your Goodreads’ profile URL and paste it into the search field.
I chose this project mainly as a way of getting my feet wet with a number of new concepts:
The webpages are generated using Django, which gave me a chance to get my feet wet with the Django templating system. One word: Incredible! The stuff you can do with the Django templating system is really impressive and I am really looking forward to working with it more in the future. I worked with PHP for years, but other than WordPress, never got much exposure to working with template engines.
The site uses the library-lookup module I wrote to perform the grunt work of both retrieving the list of books from Goodreads (via their API) and gathering search results from Amazon and the King County Library System. The Amazon search is done through the Amazon API, while the KCLS search is done through old-fashioned web scraping. Since the web-scraping takes a minute or two to execute, I chose to load the search results via AJAX using jQuery.
Learning to use jQuery was fairly easy, but because I am so new to Django views and templates, it took a bit of trial and error to wrap my brain around how to integrate jQuery.
Moving your virtualenvs directory
Somehow when I first setup virtualenv and virtualenvwrapper, it placed my virtualenvs directory at ~/virtualenvs instead of the default ~/.virtualenvs. The difference in OSX is a matter of one being hidden in Finder and the other not. Since I tend to be pretty anal about maintaining a nice and tidy file structure, I felt compelled to figure out how to change it without messing everything up. Here’s the process for moving/renaming your virtualenvs directory:
Step 1
Move the actual directory using your shell:
mv virtualenvs/ .virtualenvs/
Step 2
Edit your $WORKON_HOME path in your shell startup file. For me, this file is located at ~/.bash_profile. I simply changed export WORKON_HOME=$HOME/virtualenvs into export WORKON_HOME=$HOME/.virtualenvs
Afterwards, you will need to restart your shell window so that it recognizes your changes.
Step 3
Run the mkvirtualenv command again for each of your virtual environments. This will recreate the scripts used by virtualenvwrapper to reflect the new location.
Searching Amazon
Yesterday was pretty eventful programming wise. I decided to resume working on the library-lookup module I created 5 months ago. The goal of the module is to query the Goodreads API for a list of books marked “to-read” by the user, then check to see if any of them are available in my local library’s ebook collection.
I had written the portion that connects to Goodreads and fetches the list of books back when I was working on it 5 months ago, but still have yet to get around to building the library portion because I’m fairly sure it will require some intensive web scraping. (Why don’t you have a public API yet, OverDrive?)
Yesterday, I decided to change the library-lookup methods to utilize classes. While I was doing this, I had the idea of adding the functionality to also check Amazon’s Kindle Lending Library. This meant having to figure out how to use Amazon’s API, which turned out to be incredibly confusing for new developers like myself. It involves signing up for two additional Amazon services, most notably Amazon Web Services. This was a bit daunting, because all I wanted was API access, yet they require you to sign up for a service that covers all aspects of their cloud computing platform. Fortunately, I found a wonderful Python wrapper which made the process of actually querying the AWS API moderately painless. Once that was done, I had to refresh my memory on how to use Python’s ElementTree to read the XML returned by Amazon. After struggling with that for an hour or two, I was finally able to get it working! Because I host the module on github, however, I needed to find a way to store my AWS credentials in a separate file so they wouldn’t be publicly available. I decided to use a config file, and after a few moments of googling I found the ConfigParser module which made this a piece of cake. I’d never messed with .cfg files before, but thanks to Python’s awesome documentation I had this up and going in a matter of minutes.
Here is the result of my work yesterday:
def search_amazon(self):
"""Searches Amazon Prime Lending Library
Returns False if the book is not found.
"""
# read AWS details from config file
config = ConfigParser.RawConfigParser()
config.read('librarylookup.cfg')
access_key = config.get('Amazon Web Services', 'amazon_access_key_id')
secret_key = config.get('Amazon Web Services', 'amazon_secret_key')
assoc_tag = config.get('Amazon Web Services', 'amazon_assoc_tag')
# connect to AWS API and fetch search results
amazon = bottlenose.Amazon(access_key, secret_key, assoc_tag)
xml_result = amazon.ItemSearch(Keywords=self.title +" Lending Library",
SearchIndex="Books")
# Prepare the XPath used to location TotalResults
# ItemSearchResponse (Root) -> Items -> TotalResults
namespace = '{http://webservices.amazon.com/AWSECommerceService/2011-08-01}'
xpath = namespace + 'Items/' + namespace + 'TotalResults'
# Find TotalResults from the XML
total_results = int(ET.XML(xml_result).findtext(xpath))
return total_results > 0
(You can see the full module on GitHub)
Speaking of GitHub, this is the point when I began to go a little crazy. I’m still fairly new to the way it works. I made the mistake of committing both my config file changes and my search_amazon changes in a single commit, when I really should have made them two separate commits. I spent the next two hours trying to figure out how to erase those commits so that I can resubmit them! For anyone out there who might have this problem in the future, (there is a severe lack of clear documentation on how to do this!) you can use the following command I found on stackoverflow:
git push -f origin HEAD^:master
Tip: You can increase the number of ^ symbols to rollback multiple commits.
Markov Analysis
Here is a fun little class I threw together this week while finishing up with Think Python.
It scans text files from Project Gutenberg to generate a dictionary that maps each word (and sets of two consecutive words) in the book to the word immediately following it. It then uses this dictionary to generate a random string of text based on any initial word you give it. I apologize for the horrible lack of commenting.
import string
import random
import os
class TextGenerator():
def __init__(self):
self.markov = dict()
def __str__(self):
return str(self.markov)
def add(self, file):
self.markov_analysis(self.process_file(file))
def generate_text(self, first_word='the', num=10):
count = 0
text = first_word
word = first_word
prev = None
while count < num:
if prev is not None and (prev, word) in self.markov:
word = self.random_word(self.markov[(prev, word)])
# print 'double: %s' % word
else:
word = self.random_word(self.markov[word])
# print 'single: %s' % word
# print '(%s, %s)' % (prev, word)
text += ' ' + word
count += 1
prev = word
return text
def process_file(self, file):
""" Processesses a plaintext file from project gutenberg """
result = []
gutenberg_header = True
fin = open(file)
for line in fin:
if not gutenberg_header: # check if we're past header
words = line.split()
for i in range(len(words)):
bad_chars = string.punctuation + string.whitespace
words[i] = words[i].lower().translate(string.maketrans("",""), bad_chars)
result.extend(words)
elif line.count('***') == 2: # check if we're at end of header
gutenberg_header = False
return result
def markov_analysis(self, word_list):
prev = None
second_prev = None
for word in word_list:
if prev is not None:
if second_prev is not None:
self.add_markov((second_prev, prev), word)
self.add_markov(prev, word)
second_prev = prev
prev = word
def add_markov(self, key, value):
if key not in self.markov:
self.markov[key] = dict()
self.markov[key][value] = self.markov[key].get(value, 0) + 1
def random_word(self, h):
t = []
for word, freq in h.items():
t.extend([word] * freq)
return random.choice(t)
def add_directory(self, directory):
for name in os.listdir(directory):
if os.path.isfile(directory+name):
print 'Scanning:', directory+name
self.add(directory+name)
def main():
myText = TextGenerator()
myText.add_directory('books/')
print myText.generate_text('the')
if __name__ == '__main__':
main()</pre>
Finding list combinations recursively
I’ve spent the past few hours trying to find a way to recursively search a list for all possible combinations of size n and I think I finally have it:
def combinations(n, list, combos=[]):
# initialize combos during the first pass through
if combos is None:
combos = []
if len(list) == n:
# when list has been dwindeled down to size n
# check to see if the combo has already been found
# if not, add it to our list
if combos.count(list) == 0:
combos.append(list)
combos.sort()
return combos
else:
# for each item in our list, make a recursive
# call to find all possible combos of it and
# the remaining items
for i in range(len(list)):
refined_list = list[:i] + list[i+1:]
combos = combinations(n, refined_list, combos)
return combos
Example:
Input:
['a', 'b', 'c', 'd']
Output:
['a', 'b']
['a', 'c']
['a', 'd']
['b', 'c']
['b', 'd']
['c', 'd']
While this took way longer than it should have, and I’m sure it’s terribly inefficient, I’m satisfied that I was able to persevere until finally coming up with a working solution.
Update: So yeah, while this code works fine for small lists, it is entirely too inefficient for my original goal: finding all possible 5-letter combinations out of the 26-letter alphabet. To be honest, I’m feeling pretty discouraged that my solution doesn’t work, but I’m hoping the insight I seem to lack will come with more experience. I’d be lying if I didn’t acknowledge that it’s times like these that somewhere deep down I begin to question if I have what it takes for a career in programming. I’d love to talk to some more experienced programmers and hear their thoughts on the matter. Do all programmers have moments like this?
def find_least(file='words.txt'):
alphabet ='a b c d e f g h i j k l m n o p q r s t u v w x y z'.split(' ')
combos = combinations(5, alphabet)
max = 0
max_combo = None
for combo in combos:
matches = filter_words(file, combo)
if matches > max:
max = matches
max_combo = combo
return (max, max_combo)
def filter_words(file, forbidden):
fin = open(file)
matches = 0
for line in fin:
word = line.strip()
if avoids(word, forbidden):
matches += 1
return matches
Update 2: I tried the same brute force method for testing all the possibilities by calculating the various combinations using itertools.combinations instead of my own function and the process still seemed to eat up all of my system resources, so that leads me to believe the bottleneck is likely my entire approach, not just the way in which I coded my combinations function. I’ve decided to put this problem aside for now and move on to something else.
Scaffolding
Just ran across this line in Think Python: How to Think Like a Computer Scientist:
The print statements we wrote are useful for debugging, but once you get the function working, you should remove them. Code like that is called scaffolding because it is helpful for building the program but is not part of the final product.
That is a brilliant name for temporary code used for debugging. Why have I never heard this term before?
