Site icon Fanzoo Technology, Inc.

What’s The Word?

Last month at Learn Something we tried something new, and it turned out really well!  Learn Something has always been a great place to get together and work on whatever everyone is interested in. Lately we’ve had a lot more inexperienced developers joining us and we wanted to make it easier for them to jump right in, contribute, and gain some skills. With that in mind, we decided to try something more structured that could involve everyone who was interested. We created a challenge to write an autocompletion algorithm. If given a .txt file and an input token, could they return the string that most commonly follows that input token in the .txt file? We split everyone up into two groups, each with an experienced developer leading.  One group chose to tackle the problem with JavaScript, while the other chose Ruby. Both successful solutions were done in just under 2 hours. The two solutions are as follows:

GROUP 1:

[su_spoiler title=”index.html”]
<html>
<body>
<input></input><button>AutoComplete</button>
<script src="jquery.js"></script>
<script src="lodash.js"></script>
<script src="script.js"></script>
</body>
</html>
[/su_spoiler] [su_spoiler title=”script.js”]
 
$(function() {

    $("button").click(function() {
        alert(findNext($("input").val()));
    });


    story = "The standard Lorem Ipsum passage, used since the 1500s  \"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\"  Section 1.10.32 of \"de Finibus Bonorƒum et Malorum\", written by Cicero in 45 BC  \"Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur?\"  1914 translation by H. Rackham  \"But I must explain to you how all this mistaken idea of denouncing pleasure and praising pain was born and I will give you a complete account of the system, and expound the actual teachings of the great explorer of the truth, the master-builder of human happiness. No one rejects, dislikes, or avoids pleasure itself, because it is pleasure, but because those who do not know how to pursue pleasure rationally encounter consequences that are extremely painful. Nor again is there anyone who loves or pursues or desires to obtain pain of itself, because it is pain, but because occasionally circumstances occur in which toil and pain can procure him some great pleasure. To take a trivial example, which of us ever undertakes laborious physical exercise, except to obtain some advantage from it? But who has any right to find fault with a man who chooses to enjoy a pleasure that has no annoying consequences, or one who avoids a pain that produces no resultant pleasure?\"  Section 1.10.33 of \"de Finibus Bonorum et Malorum\", written by Cicero in 45 BC  \"At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium voluptatum deleniti atque corrupti quos dolores et quas molestias excepturi sint occaecati cupiditate non provident, similique sunt in culpa qui officia deserunt mollitia animi, id est laborum et dolorum fuga. Et harum quidem rerum facilis est et expedita distinctio. Nam libero tempore, cum soluta nobis est eligendi optio cumque nihil impedit quo minus id quod maxime placeat facere possimus, omnis voluptas assumenda est, omnis dolor repellendus. Temporibus autem quibusdam et aut officiis debitis aut rerum necessitatibus saepe eveniet ut et voluptates repudiandae sint et molestiae non recusandae. Itaque earum rerum hic tenetur a sapiente delectus, ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat.\"  1914 translation by H. Rackham  \"On the other hand, we denounce with righteous indignation and dislike men who are so beguiled and demoralized by the charms of pleasure of the moment, so blinded by desire, that they cannot foresee the pain and trouble that are bound to ensue; and equal blame belongs to those who fail in their duty through weakness of will, which is the same as saying through shrinking from toil and pain. These cases are perfectly simple and easy to distinguish. In a free hour, when our power of choice is untrammelled and when nothing prevents our being able to do what we like best, every pleasure is to be welcomed and every pain avoided. But in certain circumstances and owing to the claims of duty or the obligations of business it will frequently occur that pleasures have to be repudiated and annoyances accepted. The wise man therefore always holds in these matters to this principle of selection: he rejects pleasures to secure other greater pleasures, or else he endures pains to avoid worse pains.\"";
    
    
    
    matches = []; //Matches return, Array filled with match records.

    regex = /\w+/g; //Current Regex.
    string = story; //Current String.

    wordTable = {};

    x = 0;

    while((match = regex.exec(string)) !== null){
        var matchRecord = {};
        matchRecord.match = regex;
        matchRecord.index = match.index; //Might want to increment by 1 to make Human Readable?
        matches.push(matchRecord);

        word = match[0];
        index = match.index;

        if (!wordTable[word]) {
            wordTable[word] = [index];
        }
        else {
            wordTable[word].push(index);
        }
    }

    token = "are";

    function findNext(token) {
        nextWordArray = {};



        _.each(wordTable[token], function(positionInString) {
            nextWordStart = story.indexOf(" ", positionInString) + 1;
            nextWordLength = story.substr(nextWordStart, story.length).indexOf(" ");

            nextnextWord = story.substr(nextWordStart, nextWordLength);

            if (!nextWordArray[nextnextWord]) {
                nextWordArray[nextnextWord] = 1;
            }
            else {
                nextWordArray[nextnextWord]++;
            }
        });

        bestWord = "";
        bestWordCount = 0;

        _.each(nextWordArray, function(item, key) {
            if (item > bestWordCount) {
                bestWord = key;
                bestWordCount = item;
            }
        });

        return bestWord;
    }
});
[/su_spoiler]

GROUP 2:

[su_spoiler title=”nomnom.rb”]
#!/usr/bin/ruby

require 'pp'

reading = false
lastword = ''
following_words = {}
File.open('input.txt', 'r:bom|utf-8') do |f1|
    while line = f1.gets
        if line.start_with?('I.')
            reading = true
        end
        if line.start_with?('End of the Project Gutenberg EBook ')
            reading = false
            break
        end
        if not reading
            next
        end
        line.scan(/\S+/) do |word|
            word = word.tr(',.!?"\';:','')
            word.downcase!
            if nil == following_words[lastword]
                following_words[lastword] = {}
            end
            if nil == following_words[lastword][word]
                following_words[lastword][word] = 1
            else
                following_words[lastword][word] += 1
            end
            lastword = word
        end
    end
end
newthing = {}
following_words.each do |firstword, seconds|
    max=0
    bestword = ''
    seconds.each do |second,weight|
        if weight > max
            max = weight
            bestword = second
        end
    end
    newthing[firstword] = bestword
end
ARGV.each do|a|
    puts a+ " " + (newthing[a] || "")
end
while word = STDIN.gets
    word.chomp!
    if word == "QUIT"
        exit
    end
    puts word + " " + (newthing[word] || "")
end
[/su_spoiler] In future iterations, both groups could work to handle hyphenated phrases and preserve pre-processing to log all following words. That would allow returning the 2 next most probable words, or the % probability of the next word. Currently, it only returns the most likely next word. The exercise wasn’t about finding the perfect solution but instead about the process of learning. Mixing experienced and inexperienced developers was a lot of fun. The beginners said they all learned something, and the more advanced devs got to explore a new language and engineer some creative solutions. Going forward, we will offer something more structured for those who attend but don’t have a specific plan for what to learn. If you aren’t interested in that topic, don’t worry, you can still work on whatever you want. Either way, come on out and Learn Something with us!
Exit mobile version