Tuesday, July 17, 2012

New Practical Project

For an internship/subcontracting job I'm doing right now, I have to do a lot of xml. Most of it is the same, and only 4 or so things change for each item I work with. Copying and pasting everything is rather a bore, and clicking and highlighting what I have to change is also tedious, so I had a thought.

Why not automate most of it and put some of my skills to work?

The answer? HECK YEAH NEW PROJECT

Its simple and I imagine I can get done with it in a day or soon depending on how my other work keeps me. (Website work and actual support myself work)

I'm doing it in C# since the file capabilities are really superb in that language, not too mention using visual study means that I'll have an interface up really fast.

I'll keep you all posted!

Thursday, July 12, 2012

Object Oriented Approaches to Websites

So far this summer, I haven't had any superbly ambitious goals or projects like I did last summer. But I have been working on revamping a website with a friend. The website is for an organization we're apart of, myself being one of the people who run it, and him being next years definite choice for one of three leadership positions.

Anyway, we're working on the website, and we start to take an OOP style on it. Using database objects to handle database queries and information we need from them, specialized entry objects for the database objects to accept and play with. Validation objects that act as decorators on those entry objects to make sure they're not malicious and such. Its pretty fun. But sometimes I wonder if I'm going too far with it.

It makes sense in concept to do this. Have a simple interface for the database, with a few core functions exposed, and a sure fire way to have the data we need by using an object for the data structure passed around. But at the same time, making an object simply to hold data thats there anyway and then pass it along seems a little silly. But I mean, by making that object and passing it, you're guaranteeing that the information you'd like will be there to pass along to the database. It's nice. Reliable and definitely a bonus of the OOP design. Also, by enforcing a standard for our models in our MVC design, we make it very easy to extend the website by future organization runners. A few controllers and a lot of models work well.

I wish I had a project to keep my occupied though. I thought about an inference engine in ruby, but I'm not sure yet.

Tuesday, May 1, 2012

Q-Learning Oscillatory Problems Solved

So, in implementing Q-Learning I found that often the situation would arise where the agent would move back and forth, back and forth, between the same adjacent squares. I thought about this for a bit, then decided this must be the case:

0        0    0  Initially on left
.25-> 0    0  Moves to right and evaluates space it was on
.25     0    0 Searchs for most valuable space near it
.25 <-.25 0 Finds the left space and moves
.50 ->.25 0 sees that the right space has value and moves, updating the leftmost as well
ad. infitum.
implement

After a bit of investigation I found my suspicions confirmed and proceeded to implement a few solutions. One being a varying epsilon during the total number of episodes, allowing for max exploration (completely random movements) in the beginning and then controlled exploitation (the Q-Learning algorithm itself chooses what to do, albeit with a small epsilon percent chance of random moves) later on. -- This strategy did not work very well.

I also implemented a type of memory, where during a single episode the agent would not get a reward for visiting the same space twice, and a modified version where the agent recieved a reward the first time it entered the food space from the left or right, but never recieved it again if it moved into the space from the left or right, and likewise on up and down. This was to try to cancel out oscillations. I also created a version where not just food, but all rewards were (including negative reinforcement) were taken away after the first visit. All these methods improved the performance of the algorithm drastically! It was great.

Saturday, April 14, 2012

Implementing Q-Learning

For a course, I'm currently trying to implement Q-Learning in Matlab on a simple problem. The problem is thus: Given a maze with walls surrounding the outside, a line of food to a goal point, and some designated start point. Have a small agent using Q-Learning techniques develop the ability to follow the line of food.

Q-Learning works, in the general sense like this:

From your current state take an action
What reward did you get from this action?
Update the state-action pair from before you moved with this reward that you got
Repeat until you find good estimates of the domain space in which you exist.

This is a vague description is good enough to develop an intuition for how the reinforcement learning paradigm of Machine Learning works in general. In specificity

Q(s,a) = Q(s,a) + a[ R + g*max(Q(s',a')-Q(s,a)]

Where a and g are scalar parameters that effect how the algorithm itself learns. The tricky thing in this case is that the s' and a' (standing for the next state and action pair ) are unknown if you were to take this equation literally. Q(s,a) stands for the state-action pair. In my implementation this is Q(x,y,a) where (x,y) is the position of the agent in the space, and the a is the action it took. Because my agent exists within an MxN grid, and can move left right up and down it has a total of MxNx4 possible states. Which for a simple sized grid of 25x25 is already above 2000 possible states. With such a large search space its surprising that this type of techniques works at all. Anyway, before I get distracted by complexity details...

With an initial Q(s,a) with an arbitrary a for the state you're given, you can then move into your algorithm, find the next state (chosen by picking the most valuable looking state around you with respect to your Q matrix), and then you can update old state with the state you just found. In other words, your most recently found Q pair is your prime terms in the equation above.

Sutton and Barto's book on reinforcement learning is available online at
http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node65.html

and is a standard book for reinforcement learning. Or so I'm told.



The cool thing I'd like to mention is that I create my environment by using paint, then use matlab to load in the image and create my environment from it. Unforunately, because I prefer my complexity small both bitmaps are rather small and beefing them up for view on this blog has messed with the resolution a bit. But take my word, the red around the outside is a single pixel wide and represents a wall. The black dot is the goal point, the purplish is the food that the agent must follow to the goal and of course the white is empty space. You can see it better in the top picture, but the start location is indicated by the green pixel. Anywho, once this is loaded into matlab you end up with something like this:


The purple dot is the agent, the blue next to the red is the start state, the cyan is the goal, and the red is the food. Blue around the edges is of course the wall.

The image is completely animated during runtime so I can watch how well or poorly the reinforcement learning is coming. 

Tuesday, April 3, 2012

Another interesting article - Back Propagation

http://itee.uq.edu.au/~cogs2010/cmc/chapters/BackProp/index2.html

For my CS 295 Machine Learning class, I was reading up on back propagation algorithms. This is a really good page that describes the principles really well. I'm currently implementing a back propagation algorithm which is classifying some data that my professor provided us. It's pretty interesting, and rather fun. Although using something like a blackbox ANN is kind of confusing, only because it's so very annoying to debug.

Sunday, March 25, 2012

Article my friend shared with me

http://www.akamai.com/dl/technical_publications/ConsistenHashingandRandomTreesDistributedCachingprotocolsforrelievingHotSpotsontheworldwideweb.pdf

Even if you only read the abstract, this is a really cool paper.

Wednesday, March 21, 2012

Wiimote Whiteboard Demonstration

http://www.youtube.com/watch?v=VwhGGChEUHg

This is an alpha demonstration of the Wiimote Whiteboard project that the project group that I'm in through UVM's C.S. Crew is a part of. My part in this project was project director and chief software engineer (if I had to give a title to it). In other words, I was responsible for delegating tasks out to each team member according to their ability as well as taking on a portion of the coding myself. Also, since I built the fundamentals of the program and held the master copy of the files I was also the one in direct contact with each group member for assistance on integrating things.

So let me break down what you're seeing in this short video (besides my Whistling). On the surface, you're seeing a projection of the screen from my laptop onto a canvas. There is (out of the picture) a Wiimote facing dead on to the canvas, and in my hand is a small IR LED attached to a 22 Ohm resistor and 2 1.5V batteries. The wiimote functions as a sensor that picks up the IR LED output. (Infrared light), this wiimote is paired to my laptop through a bluetooth dongle. This device is then found by C# software written by my group and Brian Peek's Wiimote Library.

The program pairs to the wiimote using Peek's library, and from there a wrapper class allows for points found from the Wiimote to be sent to the program. This wrapper class was my task in the group. The UI queries the wrapper class for another point and then decides to either draw, or to click a button depending on the points it recieves. There a few other things that I'd like to mention without talking about the details. The program is a completely multithreaded program. The UI runs on it's own thread, and the wrapper class is queried on a seperate thread. There a few timers that run during the course of the program, during calibration timers run to fill up streams of points that are then interpolated to configure the wiimote's point of view into one that transforms into the picturebox on the form.

After calibration, points are queued as well as written out to a data file, and whenever the UI asks for a point, a point is dequeued from the queue and shown on the drawing pane, or causes a click. A few windows 32 dll's are used to cause the mouse to follow the IR pen and to help aid clicking a button. There is a variable rate of points coming in from the queue, this is decided by a variable normally set to 10, that corresponds to a timer querying the queue every 10 milliseconds to sync up with the Wiimotes 100 report generations a second. All of this combined enables us to do what we did in that short video.