clarifai – THE HYPERTEXT

novel camera

rg — Tue, 01 Dec 2015 17:10:37 +0000

I have spent the last few months completing a novel I started a long time ago and turning it into a non-linear interactive experience. For my final project in several classes, I have transferred this novel into a printer-equipped camera to make a new and different type of photographic experience.

Inside the antique camera is a Raspberry Pi with a camera module behind the lens. The flow of passages is controlled by a single, handwritten JSON file. When there is overlap between the tags detected in an image by Clarifai and the tags assigned to a passage, and the candidate passage occurs next in a storyline that has already begun, that passage is printed out. If no passage can be found, the camera prints poetry enabled by a recursive context-free grammar and constructed from words detected in the image.

This week, I am planning to add a back end component that will allow photos taken to be preserved as albums, and passages printed to be read later online. For now, here is the JSON file that controls the order of output:

{
    "zero": {
        "tags": ["moon", "swamp", "marble", "north america", "insect", "street"],
        "order": 0,
        "next": ["story"]
    },
    "guam_zero": {
    	"tags": ["computer", "technology", "future", "keyboard", "politics"],
    	"order": 0,
    	"next": ["guam_one"]
    },
    "guam_one": {
    	"tags": ["computer", "technology", "future", "keyboard", "politics"],
    	"order": 1,
    	"next": []
    },
    "dream_zero": {
    	"tags": ["dream", "dark", "night", "sleep", "bed", "bedroom", "indoors"],
    	"order": 0,
    	"next": ["chess_board"]
    },
    "chess_board": {
    	"tags": ["dream", "dark", "night", "sleep", "bed", "bedroom", "indoors"],
    	"order": 2,
    	"next": ["black_queen", "black_pawn", "black_king", "black_rook", "white_king", "white_knight"]
    },
    "black_queen": {
    	"tags": ["dream", "dark", "black", "night", "sleep", "bed", "bedroom", "indoors", "chess", "game", "queen"],
    	"order": 3,
    	"next": ["wake_up"]
    },
    "black_pawn": {
    	"tags": ["dream", "dark", "black", "night", "sleep", "bed", "bedroom", "indoors", "chess", "game", "pawn"],
    	"order": 3,
    	"next": ["wake_up"]
    },
    "black_king": {
    	"tags": ["dream", "dark", "black", "night", "sleep", "bed", "bedroom", "indoors", "chess", "game", "king"],
    	"order": 3,
    	"next": ["wake_up"]
    },
    "black_rook": {
    	"tags": ["dream", "dark", "black", "night", "sleep", "bed", "bedroom", "indoors", "chess", "game", "rook", "castle"],
    	"order": 3,
    	"next": ["wake_up"]
    },
    "white_king": {
    	"tags": ["dream", "dark", "white", "night", "sleep", "bed", "bedroom", "indoors", "chess", "game", "king"],
    	"order": 3,
    	"next": ["wake_up"]
    },
    "white_knight": {
    	"tags": ["dream", "dark", "white", "night", "sleep", "bed", "bedroom", "indoors", "chess", "game", "knight"],
    	"order": 3,
    	"next": ["wake_up"]
    },
    "wake_up": {
    	"tags": ["dream", "dark", "night", "sleep", "bed", "bedroom", "indoors"],
    	"order": 4,
    	"next": []
    },
    "forget": {
    	"tags": ["man", "men", "boy"],
    	"order": 0,
    	"next": []
    },    
    "story": {
    	"tags": ["moon", "swamp", "marble", "north america", "insect", "night", "street", "woman", "women", "girl"],
    	"order": 1,
    	"next": ["miss_vest", "forget"]
    },
    "miss_vest": {
    	"tags": ["moon", "swamp", "marble", "north america", "insect", "night", "street", "woman", "women", "girl"],
    	"order": 2,
    	"next": ["envelope", "forget"]
    },
    "envelope": {
    	"tags": ["moon", "swamp", "marble", "north america", "insect", "night", "street", "woman", "women", "girl", "paper", "envelope", "mail"],
    	"order": 3,
    	"next": ["apartment", "forget"]
    },
    "apartment": {
    	"tags": ["moon", "swamp", "marble", "north america", "insect", "night", "street", "woman", "women", "girl", "paper", "envelope", "mail"],
    	"order": 4,
    	"next": ["email"]
    },
    "email": {
    	"tags": ["moon", "swamp", "marble", "north america", "insect", "night", "woman", "women", "girl", "paper", "envelope", "mail", "computer", "technology"],
    	"order": 5,
    	"next": ["match"]
    },
    "match": {
    	"tags": ["moon", "swamp", "marble", "north america", "insect", "night", "man", "men", "boy", "paper", "envelope", "mail", "computer", "technology"],
    	"order": 5,
    	"next": ["smithpoint", "morning"]
    },
    "morning": {
    	"tags": ["day", "sun", "bedroom", "bed", "breakfast", "morning", "dream", "dark", "night"],
    	"order": 6,
    	"next": ["call"]
    },
    "call": {
    	"tags": ["phone", "telephone", "technology", "computer"],
    	"order": 7,
    	"next": ["smithpoint"]
    },
    "smithpoint": {
    	"tags": ["moon", "swamp", "marble", "north america", "insect", "night", "man", "men", "boy", "bar", "drink", "alcohol", "wine", "beer"],
    	"order": 8,
    	"next": ["drive", "forget"]
    },
    "drive": {
    	"tags": ["moon", "swamp", "marble", "north america", "insect", "night", "man", "men", "boy", "bar", "drink", "alcohol", "wine", "beer"],
    	"order": 9,
    	"next": ["take_pill", "toss_pill"]
    },
    "take_pill": {
    	"tags": ["drug", "pill", "man", "men", "boy", "bar", "night", "drink", "alcohol", "wine", "beer"],
    	"order": 10,
    	"next": ["meet_stranger_drugs", "john_home"]
    },
    "toss_pill": {
    	"tags": ["moon", "swamp", "marble", "north america", "insect", "girl", "street", "woman", "women"],
    	"order": 10,
    	"next": ["meet_stranger_no_drugs"]
    },
    "meet_stranger_drugs": {
    	"tags": ["moon", "swamp", "marble", "north america", "insect", "night", "man", "men", "boy", "bar", "drink", "alcohol", "wine", "beer"],
    	"order": 11,
    	"next": ["john_home"]
    },
    "meet_stranger_no_drugs": {
    	"tags": ["moon", "swamp", "marble", "north america", "insect", "night", "man", "men", "boy", "bar", "drink", "alcohol", "wine", "beer"],
    	"order": 11,
    	"next": ["painting"]
    },
    "painting": {
    	"tags": ["painting", "art", "moon", "swamp", "marble", "north america", "insect", "night", "man", "men", "boy", "bar", "drink", "alcohol", "wine", "beer"],
    	"order": 12,
    	"next": []
    },
    "john_home": {
    	"tags": ["drug", "pill", "man", "men", "boy", "bar", "night", "drink", "alcohol", "wine", "beer"],
    	"order": 13,
    	"next": []
    }

}

And here is the code that’s currently running on the Raspberry Pi:

import RPi.GPIO as GPIO
from Adafruit_Thermal import *
import time
import os
import sys
import json
import picamera
from clarifai.client import ClarifaiApi
from pattern.en import referenced

import gen

# Init Clarifai
os.environ["CLARIFAI_APP_ID"] = "nAT8dW6B0Oc5qA6JQfFcdIEr-CajukVSOZ6u_IsN"
os.environ["CLARIFAI_APP_SECRET"] = "BnETdY6wtp8DmXIWCBZf8nE4XNPtlHMdtK0ISNJQ"
clarifai_api = ClarifaiApi() # Assumes Env Vars Set

# Init System Paths
APP_PATH = os.path.dirname(os.path.realpath(__file__))
IMG_PATH = os.path.join(APP_PATH, 'img')
TALE_PATH = os.path.join(APP_PATH, 'tales')

# Init tale_dict
with open(os.path.join(APP_PATH, 'tales_dict.json'), 'r') as infile:
    tale_dict = json.load(infile)

# Seen tales
seen_tales = list()

# Init Camera
camera = picamera.PiCamera()

# Init Printer
printer = Adafruit_Thermal("/dev/ttyAMA0", 9600, timeout=5)
printer.boldOn()

# Init GPIO
# With camera pointed forward...
# LEFT:  11 (button), 15 (led)
# RIGHT: 13 (button), 16 (led)
GPIO.setmode(GPIO.BOARD)
ledPins = (15,16)
butPins = (11,13)

for pinNo in ledPins:
    GPIO.setup(pinNo, GPIO.OUT)

for pinNo in butPins:
    GPIO.setup(pinNo, GPIO.IN, pull_up_down=GPIO.PUD_UP)

# Open Grammar Dict
with open(os.path.join(APP_PATH, 'weird_grammar.json'), 'r') as infile:
    grammar_dict = json.load(infile)

def blink_left_right(count):
    ledLeft, ledRight = ledPins
    for _ in range(count):
        GPIO.output(ledRight, False)
        GPIO.output(ledLeft, True)
        time.sleep(0.2)
        GPIO.output(ledRight, True)
        GPIO.output(ledLeft, False)
        time.sleep(0.2)
    GPIO.output(ledRight, False)

def to_lines(sentences):
    def sentence_to_lines(text):
        LL = 32
        tokens = text.split(' ')
        lines = list()
        curLine = list()
        charCount = 0
        for t in tokens:
            charCount += (len(t)+1)
            if charCount > LL:
                lines.append(' '.join(curLine))
                curLine = [t]
                charCount = len(t)+1
            else:
                curLine.append(t)
        lines.append(' '.join(curLine))
        return '\n'.join(lines)
    sentence_lines = map(sentence_to_lines, sentences)
    return '\n\n'.join(sentence_lines)

def open_tale(tale_name):
    with open(os.path.join(TALE_PATH, tale_name), 'r') as infile:
        tale_text = to_lines(
            filter(lambda x: x.strip(), infile.read().strip().split('\n'))
        )
    return tale_text

def pick_tale(tags, next_tales):
    choice = str()
    record = 0
    for tale in tale_dict:
        if tale in next_tales or tale_dict[tale]['order'] == 0:
            score = len(set(tale_dict[tale]['tags']) & set(tags))
            if tale in next_tales and score > 0 and not tale in seen_tales:
                score += 100
            if score > record:
                choice = tale
                record = score
    return choice


blink_left_right(5)
imgCount = 1
cur_tale = str()


while True:
    inputLeft, inputRight = map(GPIO.input, butPins)
    if inputLeft != inputRight:
        try:
            img_fn = str(int(time.time()*100))+'.jpg'
            img_fp = os.path.join(IMG_PATH, img_fn)

            camera.capture(img_fp)

            blink_left_right(3)

            result = clarifai_api.tag_images(open(img_fp))
            tags = result['results'][0]['result']['tag']['classes']

            if cur_tale:
                next_tales = tale_dict[cur_tale]['next']
            else:
                next_tales = list()

            tale_name = pick_tale(tags, next_tales)
            cur_tale = tale_name

            if tale_name:
                lines_to_print = open_tale(tale_name)
                seen_tales.append(tale_name)

            else:
                grammar_dict["N"].extend(tags)

                if not inputLeft:
                    sentences = [gen.make_polar(grammar_dict, 10, sent=0) for _ in range(10)]
                elif not inputRight:
                    sentences = [gen.make_polar(grammar_dict, 10) for _ in range(10)]
                else:
                    sentences = gen.main(grammar_dict, 10)

                lines_to_print = to_lines(sentences)

            prefix = '\n\n\nNo. %i\n\n'%imgCount

            printer.println(prefix+lines_to_print+'\n\n\n')

            grammar_dict["N"] = list()
            imgCount += 1
        except:
            blink_left_right(15)
            print sys.exc_info()

    elif (not inputLeft) and (not inputRight):
        offCounter = 0
        for _ in range(100):
            inputLeft, inputRight = map(GPIO.input, butPins)
            if (not inputLeft) and (not inputRight):
                time.sleep(0.1)
                offCounter += 1
                if offCounter > 50:
                    os.system('sudo shutdown -h now')
            else:
                break

Click here for a Google Drive folder with all the passages from the novel.

Sound Camera, Part II

rg — Tue, 06 Oct 2015 02:20:44 +0000

Using JavaScript and Python Flask, I created a functional software prototype of the Sound Camera: rossgoodwin.com/soundcamera

The front-end JavaScript code is available on GitHub. Here is the primary back-end Python code:

import os
import json
import uuid
from base64 import decodestring
import time
from random import choice as rc
from random import sample as rs
import re

import PIL
from PIL import Image
import requests
import exifread

from flask import Flask, request, abort, jsonify
from flask.ext.cors import CORS
from werkzeug import secure_filename

from clarifai.client import ClarifaiApi

app = Flask(__name__)
CORS(app)

app.config['UPLOAD_FOLDER'] = '/var/www/SoundCamera/SoundCamera/static/img'
IMGPATH = '/var/www/SoundCamera/SoundCamera/static/img/'

clarifai_api = ClarifaiApi()

@app.route("/")
def index():
    return "These aren't the droids you're looking for."

@app.route("/img", methods=["POST"])
def img():
	request.get_data()
	if request.method == "POST":
		f = request.files['file']
		if f:
			filename = secure_filename(f.filename)
			f.save(os.path.join(app.config['UPLOAD_FOLDER'], filename))
			new_filename = resize_image(filename)
			return jsonify(uri=main(new_filename))
		else:
			abort(501)

@app.route("/b64", methods=["POST"])
def base64():
	if request.method == "POST":
		fstring = request.form['base64str']
		filename = str(uuid.uuid4())+'.jpg'
		file_obj = open(IMGPATH+filename, 'w')
		file_obj.write(fstring.decode('base64'))
		file_obj.close()
		return jsonify(uri=main(filename))

@app.route("/url")
def url():
	img_url = request.args.get('url')
	response = requests.get(img_url, stream=True)
	orig_filename = img_url.split('/')[-1]
	if response.status_code == 200:
		with open(IMGPATH+orig_filename, 'wb') as f:
			for chunk in response.iter_content(1024):
				f.write(chunk)
		new_filename = resize_image(orig_filename)
		return jsonify(uri=main(new_filename))
	else:
		abort(500)


# def allowed_img_file(filename):
#     return '.' in filename and \
# 		filename.rsplit('.', 1)[1].lower() in set(['.jpg', '.jpeg', '.png'])

def resize_image(fn):
    longedge = 640
    orientDict = {
        1: (0, 1),
        2: (0, PIL.Image.FLIP_LEFT_RIGHT),
        3: (-180, 1),
        4: (0, PIL.Image.FLIP_TOP_BOTTOM),
        5: (-90, PIL.Image.FLIP_LEFT_RIGHT),
        6: (-90, 1),
        7: (90, PIL.Image.FLIP_LEFT_RIGHT),
        8: (90, 1)
    }

    imgOriList = []
    try:
        f = open(IMGPATH+fn, "rb")
        exifTags = exifread.process_file(f, details=False, stop_tag='Image Orientation')
        if 'Image Orientation' in exifTags:
            imgOriList.extend(exifTags['Image Orientation'].values)
    except:
        pass

    img = Image.open(IMGPATH+fn)
    w, h = img.size
    newName = str(uuid.uuid4())+'.jpeg'
    if w >= h:
        wpercent = (longedge/float(w))
        hsize = int((float(h)*float(wpercent)))
        img = img.resize((longedge,hsize), PIL.Image.ANTIALIAS)
    else:
        hpercent = (longedge/float(h))
        wsize = int((float(w)*float(hpercent)))
        img = img.resize((wsize,longedge), PIL.Image.ANTIALIAS)

    for val in imgOriList:
        if val in orientDict:
            deg, flip = orientDict[val]
            img = img.rotate(deg)
            if flip != 1:
                img = img.transpose(flip)

    img.save(IMGPATH+newName, format='JPEG')
    os.remove(IMGPATH+fn)
    
    return newName

def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in xrange(0, len(l), n):
        yield l[i:i+n]

def get_tags(fp):
    fileObj = open(fp)
    result = clarifai_api.tag_images(fileObj)
    resultObj = result['results'][0]
    tags = resultObj['result']['tag']['classes']
    return tags

def genius_search(tags):
    access_token = 'd2IuV9fGKzYEWVnzmLVtFnm-EYvBQKR8Uh3I1cfZOdr8j-BGVTPThDES532dym5a'
    payload = {
        'q': ' '.join(tags),
        'access_token': access_token
    }
    endpt = 'http://api.genius.com/search'
    response = requests.get(endpt, params=payload)
    results = response.json()
    hits = results['response']['hits']
    
    artists_titles = []
    
    for h in hits:
        hit_result = h['result']
        if hit_result['url'].endswith('lyrics'):
            artists_titles.append(
                (hit_result['primary_artist']['name'], hit_result['title'])
            )
    
    return artists_titles

def spotify_search(query):
    endpt = "https://api.spotify.com/v1/search"
    payload = {
        'q': query,
        'type': 'track'
    }
    response = requests.get(endpt, params=payload)
    result = response.json()
    result_zero = result['tracks']['items'][0]
    
    return result_zero['uri']

def main(fn):
    tags = get_tags(IMGPATH+fn)
    for tag_chunk in chunks(tags,3):
        artists_titles = genius_search(tag_chunk)
        for artist, title in artists_titles:
            try:
                result_uri = spotify_search(artist+' '+title)
            except IndexError:
                pass
            else:
                return result_uri


if __name__ == "__main__":
    app.run()

It uses the same algorithm discussed in my prior post. Now that I have the opportunity to test it more, I am not quite satisfied with the results it is providing. First of all, they are not entirely deterministic (you can upload the same photo twice and end up with two different songs in some cases). Moreover, the results from a human face — which I expect to be a common use case — are not very personal. For the next steps in this project, I plan to integrate additional data including GPS, weather, time of day, and possibly even facial expressions in order to improve the output.

The broken cameras I ordered from eBay have arrived, and I have been considering how to use them as cases for the new models. I also purchased a GPS module for my Raspberry Pi, so the next Sound Camera prototype, with new features integrated, will likely be a physical version. I’m planning to use this Kodak Brownie camera (c. 1916):

Candidate Image Explorer

rg — Thu, 17 Sep 2015 15:53:26 +0000

For this week’s homework in Designing for Data Personalization with Sam Slover, I made progress on a project that I’m working on for Fusion as part of their 2016 US Presidential Election coverage. I began this project by downloading all the images from each candidate’s Twitter, Facebook, and Instagram account — about 60,000 in total — then running those images through Clarifai‘s convolutional neural networks to generate descriptive tags.

With all the images hosted on Amazon s3, and the tag data hosted on parse.com, I created a simple page where users can explore the candidates’ images by topic and by candidate. The default is all topics and all candidates, but users can narrow the selection of images displayed by making multiple selections from each field. Additionally, more images will load as you scroll down the page.

Unfortunately, the AI-enabled image tagging doesn’t always work as well as one might hope.

Here’s the page’s JavaScript code:

var name2slug = {};
var slug2name = {};

Array.prototype.remove = function() {
    var what, a = arguments, L = a.length, ax;
    while (L && this.length) {
        what = a[--L];
        while ((ax = this.indexOf(what)) !== -1) {
            this.splice(ax, 1);
        }
    }
    return this;
}

Array.prototype.chunk = function(chunkSize) {
    var array=this;
    return [].concat.apply([],
        array.map(function(elem,i) {
            return i%chunkSize ? [] : [array.slice(i,i+chunkSize)];
        })
    );
}

function dateFromString(str) {
	var m = str.match(/(\d+)-(\d+)-(\d+)T(\d+):(\d+):(\d+)Z/);
	var date = new Date(Date.UTC(+m[1], +m[2], +m[3], +m[4], +m[5], +m[6]));
	var options = {
	    weekday: "long", year: "numeric", month: "short",
	    day: "numeric", hour: "2-digit", minute: "2-digit"
	};
	return date.toLocaleTimeString("en-us", options);
}

function updatePhotos(query) {
	$.ajax({
		url: 'https://api.parse.com/1/classes/all_photos?limit=1000&where='+JSON.stringify(query),
		type: 'GET',
		dataType: 'json',
		success: function(response) {
			// console.log(response);
			$('#img-container').empty();

			var curChunk = 0;
			var resultChunks = response['results'].chunk(30);

			function appendPhotos(chunkNo) {

				resultChunks[chunkNo].map(function(obj){
					var date = dateFromString(obj['datetime'])
					var imgUrl = "https://s3-us-west-2.amazonaws.com/electionscrape/" + obj['source'] + "/400px_" + obj['filename'];
					var fullImgUrl = "https://s3-us-west-2.amazonaws.com/electionscrape/" + obj['source'] + "/" + obj['filename'];
					$('#img-container').append(
						$('').append(
							''+slug2name[obj['candidate']]+'
'+date+'
'+obj['source']+''
						) // not a missing semicolon
					);
					// console.log(obj['candidate']);
					// console.log(obj['datetime']);
					// console.log(obj['source']);
					// console.log(obj['filename']);
				});

			}

			appendPhotos(curChunk);

			window.onscroll = function(ev) {
			    if ((window.innerHeight + window.scrollY) >= document.body.offsetHeight) {
			        curChunk++;
			        appendPhotos(curChunk);
			    }
			};


		},
		error: function(response) { "error" },
		beforeSend: setHeader
	});
}

function setHeader(xhr) {
	xhr.setRequestHeader("X-Parse-Application-Id", "ID-GOES-HERE");
	xhr.setRequestHeader("X-Parse-REST-API-Key", "KEY-GOES-HERE");
}

function makeQuery(candArr, tagArr) {

	orArr = tagArr.map(function(tag){
		return { "tags": tag };
	})

	if (tagArr.length === 0 && candArr.length > 0) {
		var query = {
			'candidate': {"$in": candArr}
		};
	}
	else if (tagArr.length > 0 && candArr.length === 0) {
		var query = {
			'$or': orArr
		};
	}
	else if (tagArr.length === 0 && candArr.length === 0) {
		var query = {};
	}
	else {
		var query = {
			'candidate': {"$in": candArr},
			'$or': orArr
		};
	}

	updatePhotos(query);

}

(function(){

$('.grid').masonry({
  // options
  itemSelector: '.grid-item',
  columnWidth: 300
});

var selectedCandidates = [];
var selectedTags = [];

$.getJSON("data/candidates.json", function(data){
	var candNames = Object.keys(data).map(function(slug){
		var name = data[slug]['name'];
		name2slug[name] = slug;
		slug2name[slug] = name;
		return name;
	}).sort();

	candNames.map(function(name){
		$('#candidate-dropdown').append(
			''+name+''
		);
	});

	$('.candidate-item').click(function(){
		var name = $(this).text();
		var slug = name2slug[name];
		if ($.inArray(slug, selectedCandidates) === -1) {
			selectedCandidates.push(slug);
			makeQuery(selectedCandidates, selectedTags);
			console.log(selectedCandidates);
			$('#selected-candidates').append(
				$('')
					.click(function(){
						$(this).fadeOut("fast", function(){
							selectedCandidates.remove(name2slug[$(this).text()]);
							makeQuery(selectedCandidates, selectedTags);
							console.log(selectedCandidates);
						});
					}) // THIS IS NOT A MISSING SEMI-COLON
			);
		}
	});
});


$.getJSON("data/tags.json", function(data){
	var tags = data["tags"].sort();
	tags.map(function(tag){
		$('#tag-dropdown').append(
			''+tag+''
		);
	});

	$('.tag-item').click(function(){
		var tag = $(this).text();
		if ($.inArray(tag, selectedTags) === -1) {
			selectedTags.push(tag);
			makeQuery(selectedCandidates, selectedTags);
			console.log(selectedTags);
			$('#selected-tags').append(
				$('')
					.click(function(){
						$(this).fadeOut("fast", function(){
							selectedTags.remove($(this).text());
							makeQuery(selectedCandidates, selectedTags);
							console.log(selectedTags);
						});
					})
			);
		}
	});
});

makeQuery(selectedCandidates, selectedTags);

})();

Sound Camera

rg — Mon, 14 Sep 2015 04:06:42 +0000

I have been looking for ways to push the conceptual framework behind word.camera to another domain. This week, I have been prototyping a script that chooses music based on photographs. Ideally, the end result will be a wearable camera / music player that selects tracks for you based on your environment. Unfortunately, the domain sound.camera has been claimed, but I’m still planning to use the name “Sound Camera” for this project.

My code:
iPython Notebook

The script I wrote gets concept words from the image via Clarifai, then searches song lyrics for those words on Genius, then finds the song on Spotify. Below are some images I put through the algorithm. You can click on each one to hear the song that resulted, though you will need to login to Spotify to do so.

The next step will be to get this code working on a Raspberry Pi inside one of the film camera bodies I just received via eBay.

word.camera

rg — Sat, 11 Apr 2015 05:12:58 +0000

lexograph /ˈleksəʊɡɹɑːf/ (n.)
A text document generated from digital image data

Last week, I launched a web application and a concept for photographic text generation that I have been working on for a few months. The idea came to me while working on another project, a computer generated screenplay, and I will discuss the connection in this post.

word.camera is responsive — it works on desktop, tablet, and mobile devices running recent versions of iOS or Android. The code behind it is open source and available on GitHub, because lexography is for everyone.

Users can share their lexographs using unique URLs. Of all this lexographs I’ve seen generated by users since the site launched (there are now almost 7,000), this one, shared on reddit’s /r/creativecoding, stuck with me the most: http://word.camera/i/7KZPPaqdP

I was surprised when the software noticed and commented on the singer in the painting behind me: http://word.camera/i/ypQvqJr6L

I was inspired to create this project while working on another project. This semester, I received a grant from the Future of Storytelling Initiative at NYU to produce a computer generated screenplay, and I had been thinking about how to generate text that’s more cohesive and realistically descriptive, meaning that it would transition between related topics in a logical fashion and describe a scene that could realistically exist (no “colorless green ideas sleeping furiously”) in order to making filming the screenplay possible . After playing with the Clarifai API, which uses convolutional neural networks to tag images, it occurred to me that including photographs in my input corpus, rather than relying on text alone, could provide those qualities. word.camera is my first attempt at producing that type of generative text.

At the moment, the results are not nearly as grammatical as I would like them to be, and I’m working on that. The algorithm extracts tags from images using Clarifai’s convolutional neural networks, then expands those tags into paragraphs using ConceptNet (a lexical relations database developed at MIT) and a flexible template system. The template system enables the code to build sentences that connect concepts together.

This project is about augmenting our creativity and presenting images in a different format, but it’s also about creative applications of artificial intelligence technology. I think that when we think about the type of artificial intelligence we’ll have in the future, based on what we’ve read in science fiction novels, we think of a robot that can describe and interact with its environment with natural language. I think that creating the type of AI we imagine in our wildest sci-fi fantasies is not only an engineering problem, but also a design problem that requires a creative approach.

I hope lexography eventually becomes accepted as a new form of photography. As a writer and a photographer, I love the idea that I could look at a scene and photograph it because it might generate an interesting poem or short story, rather than just an interesting image. And I’m not trying to suggest that word.camera is the final or the only possible implementation of that new art form. I made the code behind word.camera open source because I want others to help improve it and make their own versions — provided they also make their code available under the same terms, which is required under the GNU GPLv3 open source license I’m using. As the technology gets better, the results will get better, and lexography will make more sense to people as a worthy artistic pursuit.

I’m thrilled that the project has received worldwide attention from photography blogs and a few media outlets, and I hope users around the world continue enjoying word.camera as I keep working to improve it. Along with improving the language, I plan to expand the project by offering a mobile app and generated downloadable ebooks so that users can enjoy their lexographs offline.

Click Here for Part II