Teaching a Computer to Paint

A Neural Algorithm for Artistic Style
Artsy - Pablo Picasso

Neural Networks

Neural Networks have recently gained focus in the computer science and artifical intelligence community, mostly in part due to their insane learning capabilities and variety of uses. What are initially computer models based around the idea of mimicking the human brain and neural networks generally consist of 3 sections: the input layer, the output layer, and a given N number of hidden layers (referred to as neurons/nodes). The input layer receives some data and the network’s job is to accurately predict an output value by performing some computational function within the hidden layers. These hidden layers usually consist of several neurons/nodes that communicate with each other to compute some value given the input, and a linear/non-linear function (kind of like the human brain).


Last year, researchers in the BETHGE Lab at the University of Tuebingen released a paper titled “A Neural Algorithm of Artistic Style” which employed the use of a Convolutional Neural Network (CNN) to extract the style of an image and adapt it to a given arbitrary image. Researchers like Leon Gatys, Alexander Ecker, and Matthias Bethge were inspired by Deep Learning Programs (such as Google’s Deep Dream) and their ability to process images that they decided to see if the same technology could be used to do texture transfers and produce non-photorealistic renderings of existing images.

The program was built on top of an already trained 19-layer VGG Network using the Caffe-Framework. The problem with adapting artwork onto arbitrary images is that you need to both separate the “style” of an image (minimalism, photorealism, abstract) and the actual content (dog, cat, house). When the VGG network was initially trained, it was designed to capture a high level of content, revealing more information about the image and its arrangement rather than exact pixel values. Using this, you can visualize the information captured by constructing a feature map using only the higher convoluted layers of an image. To obtain style, the researchers decided to capture texture information. To do this on the feature map, they correlated different filter responses on each layer of the network over the entire feature space.

Once that was complete, it was a simple matter of combining the feature maps to create an image that both preserved the global content of an image and whose localized structures mimicked the artistic style of an image. This ended up creating some really realistic looking images…

(Note: For the sake of the blog, I oversimplified the process and I highly recommend you check out the paper for the detailed methods!)

Open-Source Implementation

Currently the original algorithm, described in the paper, isn’t released to the public, but is being offered as a service (paid?). There exists, however, several open source implementations using frameworks like torch7 and TensorFlow. The one I decided to use was by a PhD student at Stanford (Justin Johnson). Currently, several companies are offering these implementations for free/paid services online so you can generate your own images.

For all the generated images below, I ran the program using the following values as I found these to give me the best/most accurate results:

th neural_style.lua -style_image <style_img>
  -content_image <content_img>
  -gpu 0 -backend cudnn -cudnn_autotune
  -optimizer lbfgs -num_iterations 1000
  -init image -style_weight 300
  -content_weight 1 -tv_weight .003

I used my GTX 970 to render all the images and the time for 1000 iterations ranged between 15 seconds and 2 minutes (with max width of 900 pixels before I ran out of VRAM). Below you can see a progression of the neural network as it modifies a base image into the style of Rain Princess by Leonid Afremov.

Through trial and error, I ended up rendering a lot of images and below are some of my favorite ones so far:

Final Thoughts

Overall this neural network, despite being a pain to setup, was incredibly interesting to work with and I genuinely was very surprised on how well it was able to do pattern/texture recognition. Thanks again to all my friends who let me post my results using their profile pictures. Hopefully, this was interesting to read for a (first) blog post. I’ll try to do more like these of other cool shit I find around.