Projects
-
Auto-Vectorization
Auto-vectorization example and intermediate results.
-
Reinforcement Learning with the Cart-Pole System
Short explanation and video.
-
The Wallaby Color Tool
Short explanation and installer download.
-
Turing's Toy Tanks!
Video playthrough.
Auto-Vectorization
Example input:
Final auto-vectorization output:
Reinforcement Learning with the Cart-Pole System
The goal of this project was to create an intelligent agent that balances an inverted pendulum that is attached by a fixed pivot to the top of a movable cart. The features of this system known to the agent are: horizontal position of the cart, x, horizontal velocity of the cart, Δx, angle of the pendulum, Θ, and the angular velocity of the pendulum, ΔΘ. The cart-pole system is in a valid state if x is within some constant distance from the center of the track and Θ is within 45 degrees of the vertical. The agent can interact with the system by exerting a left or right force of a constant magnitude, F, on the cart.
Euler's method with a time step of 0.01 seconds was used to numerically approximate the system. The agent was prompted after each of these time steps on what action to take to balance the pole.
This problem falls under what is called unsupervised learning, which differs from supervised learning in that given a set of inputs it is not known what the output should be.
While there is no information that can be used to reinforce the agent immediately, the system is eventually terminated when it reaches an invalid state (i.e. the pole has fallen over or the cart is out of bounds). Because of the nature of the system, what the agent did immediately before this invalid state may or may not have been the correct action. This is because there are some states that inescapably lead to an invalid states. Therefore, supervised learning on this last action would probably not be productive (however, this was not tested). Instead, it is more useful to somehow look at the whole series of decisions that brought the system to failure.
The temporal difference method (TD-method) was designed for learning in dynamical systems which require prediction. Rather than learning from differences between predicted and actual outcomes, the temporal difference method learns from differences between temporally successive predictions. In other words, the constant weights associated with a given prediction are updated based on the difference of this prediction with the predictions that came before it. In our case, previous predictions are the predictions from one, two, three, etc. time steps ago.
The video below shows an agent being trained with the TD-method. Training occurs after each failed run. The video shows the agent converging to a solution immediately after its first successful run—this is not generally the case.
The Wallaby Color Tool
The Wallaby Color Tool is an image manipulation application. Developed for Windows using WPF and C#. More information, screenshots, and an installer for the application can be found here.
Screenshot of color swapping:
Original image (The Starry Night).
Interesting technical features:
- All image processing is executed in background.
- Heavier image processing is executed in parallel.
- The graph dynamically updates to show the mapping of the original hues to the new hues.
- The image processing function can be updated manually and these changes appear on the graph.
- Any part of the image can be masked to include/exclude image processing of that part of the image.
Turing's Toy Tanks!
Comp 441, Computer Game Design and Development. Created with DirectX 9.
Developers: Matthew Jarvis, Garth Murray, and Jonathan Worobey.
Full playthrough:
Pittsburgh, PA