Q-Learning works, in the general sense like this:
From your current state take an action
What reward did you get from this action?
Update the state-action pair from before you moved with this reward that you got
Repeat until you find good estimates of the domain space in which you exist.
This is a vague description is good enough to develop an intuition for how the reinforcement learning paradigm of Machine Learning works in general. In specificity
Q(s,a) = Q(s,a) + a[ R + g*max(Q(s',a')-Q(s,a)]
Where a and g are scalar parameters that effect how the algorithm itself learns. The tricky thing in this case is that the s' and a' (standing for the next state and action pair ) are unknown if you were to take this equation literally. Q(s,a) stands for the state-action pair. In my implementation this is Q(x,y,a) where (x,y) is the position of the agent in the space, and the a is the action it took. Because my agent exists within an MxN grid, and can move left right up and down it has a total of MxNx4 possible states. Which for a simple sized grid of 25x25 is already above 2000 possible states. With such a large search space its surprising that this type of techniques works at all. Anyway, before I get distracted by complexity details...
With an initial Q(s,a) with an arbitrary a for the state you're given, you can then move into your algorithm, find the next state (chosen by picking the most valuable looking state around you with respect to your Q matrix), and then you can update old state with the state you just found. In other words, your most recently found Q pair is your prime terms in the equation above.
Sutton and Barto's book on reinforcement learning is available online at
http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node65.html
and is a standard book for reinforcement learning. Or so I'm told.
The cool thing I'd like to mention is that I create my environment by using paint, then use matlab to load in the image and create my environment from it. Unforunately, because I prefer my complexity small both bitmaps are rather small and beefing them up for view on this blog has messed with the resolution a bit. But take my word, the red around the outside is a single pixel wide and represents a wall. The black dot is the goal point, the purplish is the food that the agent must follow to the goal and of course the white is empty space. You can see it better in the top picture, but the start location is indicated by the green pixel. Anywho, once this is loaded into matlab you end up with something like this:
The purple dot is the agent, the blue next to the red is the start state, the cyan is the goal, and the red is the food. Blue around the edges is of course the wall.
The image is completely animated during runtime so I can watch how well or poorly the reinforcement learning is coming.