PerfectLearn beta period uptime report

PerfectLearn’s overall uptime (Pingdom’s summary is provided below) for the month of March was 99.51%. This uptime also coincided with PerfectLearn’s beta period. So, on the whole, I’m pretty satisfied with PerfectLearn’s stability during its beta phase.

PerfectLearn Pingdom Report March 2015

PerfectLearn Pingdom Report March 2015

All of the downtimes except the downtime on March 09 where scheduled downtimes for deployment purposes (the roll-out of bug fixes and functionality enhancements). The "3h 20m" downtime on March 09, however, was due to my VPS hosting provider scheduling a Xen security-related update with a mandatory reboot.

The apparent stability of PerfectLearn makes me confident that version 1.0 of PerfectLearn is ready to be formally released.

Stay tuned for updates. Subscribe to the PerfectLearn newsletter.

PerfectLearn development update March 2015

In the last two weeks only one big(ish) change has been implemented and deployed. All the other changes to PerfectLearn have been minor user interface-related tweaks and fixes. The big change was to PerfectLearn’s editor component. Previously, PerfectLearn was using the Bootstrap-wysihtml5 editor. Bootstrap-wysihtml5 is a reasonable editor. Nonetheless, in retrospect, it has proven to not be up to the task of serious text editing. In many respects, it is a decidedly liteweight editor. So, after discussing this issue with some of the more active beta users, I decided to swap it for a Markdown-based editor.

PerfectLearn Markdown editor

PerfectLearn Markdown editor

The new editor has some neat functionality, including the ability to preview the resulting HTML before saving the topic and a full-screen option (which I find particularly useful).

PerfectLearn full-screen Markdown editor

PerfectLearn full-screen Markdown editor

Based on the feedback from the users and my own impression when using PerfectLearn, I’m convinced that replacing the editor, even at this late stage of the beta phase, was the right thing to do.

Stay tuned for updates. Subscribe to the PerfectLearn newsletter.

How I used the CIA World Factbook to test my product

In preparation for the release of the first version of PerfectLearn, testing is the order of the day. To make the testing process both more realistic and more enjoyable I decided to load an external dataset into PerfectLearn to see how it handled a non-trivial topic map.


Screencast showing the CIA World Factbook data after it has been imported into PerfectLearn.

After searching online for a couple of hours I finally settled on the CIA World Factbook which in its own words “provides information on the history, people, government, economy, geography, communications, transportation, military, and transnational issues for 267 world entities.” All in all, the World Factbook is an interesting dataset that the CIA has made available for personal use.

The first thing to do when confronted with a task like this is to try to get a basic understanding of the nature of the data.

The first thing to do when confronted with a task like this is to try to get a basic understanding of the nature of the data. After examining the contents of the decompressed factbook.zip file I concluded that the following files and directories were sufficient to extract the necessary information to build the initial topic map ontology with some supporting images for each country’s topic:

  • geos
    • *.html: HTML documents for the 267 world entities.
    • print/country/*.pdf: the corresponding PDF documents for the 267 world entities.
  • graphics
    • flags/large/*.gif: country flags in GIF format.
    • maps/newmaps/*.gif: country maps in GIF format.
  • wfbExt
    • sourceXML.xml: XML file mapping country names, codes, and the corresponding regions.
CIA World Factbook Directory

CIA World Factbook Directory

There really is much more data available in the World Factbook than what I am alluding to. For example, in the fields and rankorder directories there is all kinds of data related to country comparisons (within several categories) and the appendix directory contains information about international organizations and groups, international environmental agreements, and so forth. Furthermore, there are both physical and political maps and population pyramids (in BMP format!) for all of the countries and territories. That is, the World Factbook is comprehensive to say the least.

With an initial understanding of the data the next step is to extract the information that is relevant for the current purpose. The HTML files in the geos directory provide the majority of the actual content for the countries, territories, and regions. In addition, the wfbExt/sourceXML.xml file (an excerpt of which is provided below) provides a convenient mapping between the countries and accompanying regions. That is, each country record in the sourceXML.xml file includes “name”, “fips”, and “Region” attributes which effectively links countries with regions while also providing the country code (the fips field) for the individual countries (and territories). The sourceXML.xml file will be crucial in the next phase when we are actually importing data into the topic map. For now, however, we need to focus on extracting the text for each country’s topic.

sourceXML.xml file excerpt

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<country>
	<country name="Afghanistan" fips="AF" Region="South Asia" />
	<country name="Akrotiri" fips="AX" Region="Europe" />
	<country name="Albania" fips="AL" Region="Europe" />
	<country name="Algeria" fips="AG" Region="Africa" />
	<country name="American Samoa" fips="AQ" Region="Oceania" />
	<country name="Andorra" fips="AN" Region="Europe" />
	<country name="Angola" fips="AO" Region="Africa" />
	<country name="Anguilla" fips="AV" Region="Central America" />
    ...
</country>

To painlessly extract data from HTML I normally resort to Apache Tika. Apache Tika is a Java library that makes it easy to extract meta data and text from numerous different file types, including (but not limited to) PDFs, Word files, Excel files, PowerPoint files, and, in this case, HTML files.

All in all, only two (Groovy) scripts are required to extract the text from HTML files and import the data into PerfectLearn while at the same time creating the necessary relationships between the topics. What the first script, Extract.groovy (provided below), does is relatively straightforward. First of all, it imports the necessary Apache Tika classes (lines 7-11), defines the source and target paths for the directory with the original HTML files and the directory to write the text files with the extracted text (lines 14-19), followed by creating the target directory (line 25). Next, the extraction of text from the HTML files starts by iterating over all of the HTML files (in the source directory) and calling the extractContent function to actually extract the textual content from each of the HTML files which is subsequently written to a file in the processed directory (lines 27-39). The extractContent function is the most complex code in this script but all it does is request Tika to return the content of the document’s body as a plain-text string by removing all the HTML-related markup (lines 45-65) after which the extracted text is passed to the sanitize function (lines 71-79) to remove superfluous text and to inject some markup to ensure better legibility of the text when it’s finally rendered in PerfectLearn. As you can see, Tika is doing the vast majority of the heavy lifting in this script.

Extract.groovy

/*
Extract country text script (from accompanying HTML files)
By Brett Alistair Kromkamp
January 09, 2015
*/

import org.apache.tika.Tika
import org.apache.tika.metadata.Metadata
import org.apache.tika.parser.html.HtmlParser
import org.apache.tika.parser.ParseContext
import org.apache.tika.sax.BodyContentHandler

// ***** Constants *****
final def ORIGINAL_PATH = '/home/brettk/Source/groovy/perfectlearn-miscellaneous/cia-factbook/data/original/geos'
final def PROCESSED_PATH = '/home/brettk/Source/groovy/perfectlearn-miscellaneous/cia-factbook/data/processed/geos'

// ***** Setup *****
def originalDirectory = new File(ORIGINAL_PATH)
def processedDirectory = new File(PROCESSED_PATH)

// ***** Logic *****
println 'Starting extraction process.'

// Create 'processed' directory.
processedDirectory.mkdirs() // Non-destructive.

originalDirectory.eachFile { file ->
    if (file.isFile() && file.name.endsWith('.html')) {
        def textFileName = generateTextFileName(file.name.toString())

        // Create file with extracted text.
        def textFile = new File("$PROCESSED_PATH/$textFileName")
        textFile.withWriter { out ->
            def textContent = extractContent(file.text)
            println textFileName
            out.writeLine(textContent)
        }
    }
}

println 'Done!'

// ***** Helper functions *****

String extractContent(String content) {
    BodyContentHandler handler = new BodyContentHandler()
    Metadata metadata = new Metadata()
    InputStream stream

    def result = ''
    try {
        if (content != null) {
            stream = new ByteArrayInputStream(content.getBytes())
            new HtmlParser().parse(
                stream, 
                handler, 
                metadata, 
                new ParseContext())
            result = sanitize(handler.toString()).trim()
        } 
        return result
    } finally {
        stream.close()
    }
}

String generateTextFileName(String htmlFileName) {
    return htmlFileName.replaceAll(~/\.html/, '') + '.txt'
}

String sanitize(String content) {
    return content
        .replaceAll(~/(?m)^\s+/, '')
        .replaceAll(~/(?s)^Javascript.*Introduction ::/, 'Introduction ::')
        .replaceAll(~/(?s)EXPAND ALL.*/, '')
        .replaceAll(~/(?m)^([A-Z].*\s+)::.*/, '<h2>$1</h2>')
        .replaceAll(~/(?m)^([a-z])([a-z|\s]*):/, '<strong>$1$2</strong>: ')
        .replaceAll(~/(?m)^([A-Z])([a-z|\s|-]*):/, '<h3>$1$2</h3>')
}

The next script, Import.groovy (provided below), although longer than the previous one, is relatively straightforward, as well. The important thing to realize with this script is that its main function is to iterate over the, previously mentioned, sourceXML.xml file to create and store the countries, territories, and regions (as topics) in the topic map. First of all, the script imports the necessary Java libraries including the PerfectLearn topic map engine (lines 8-17), followed by setting up the necessary constants for paths, database-related parameters, and other miscellaneous values (lines 21-32). The next thing it does is instantiate the PerfectLearn topic map engine (line 38) and creates some required topics for the World Factbook topic map ontology (lines 42-56). Once the necessary topics have been created, the sourceXML.xml is loaded and the country/territory/region records are read into a list (lines 62-64) for subsequent iteration (line 69). On each iteration the following actions are performed:

  • The required region identifier, region name, country identifier, country name, and country code are extracted for subsequent use (lines 72-77).
  • The textual content for each country, territory, or region is retrieved from the appropriate text file that was generated by the Extract.groovy script (lines 81-86).
  • The background information, excerpt, and timeline year are retrieved for each country, territory, or region to create the necessary meta data for subsequent display in the timeline component (lines 90-105, and lines 256-272, 274-283, 285-290, for the getBackgroundExcerpt, getBackground, and getTimelineYear functions, respectively).
  • The country or territory topic is created and stored (lines 109-114).
  • The country or territory text occurrence is created and stored (lines 118-126).
  • The region topic is created and stored (lines 130-137).
  • The association (that is, relationship) between a country or territory and its concomitant region is stored (line 141).
  • Coordinates are extracted from the country’s textual content and if the second set of coordinates is present (for the capital city), the meta datum with the coordinates is created and stored for subsequent visualization in the map component. The convertToDdCoordinates function is called with the extracted coordinates to convert from a degrees-minutes-seconds format to a decimal degrees format which is the required format for Google Maps (lines 145-149, and lines 241-254 for the convertToDdCoordinates function).
  • A link (occurrence) is added for each country, territory, or region pointing back to the appropriate page in the CIA World Factbook website (lines 153-161).
  • The flag (occurrence) is added for each country (lines 165-179, and lines 292-301 for the copyFile function).
  • The map (occurrence) is added for each country or territory (lines 185-195, and lines 292-301 for the copyFile function).

Next, the associations to establish the appropriate relationships between the regions themselves and between the regions and the "world" (topic) are created and stored in the topic map (lines 201-219). Finally, the textual content for the world topic is retrieved (lines 223-226) and the accompanying occurrence is created and saved (lines 228-236).

Import.groovy

/*
Import CIA World Factbook into PerfectLearn Topic Map Engine
By Brett Alistair Kromkamp
January 15, 2015
*/

// Import necessary Java libraries including the PerfectLearn topic map engine.
import com.polishedcode.crystalmind.base.Utils
import com.polishedcode.crystalmind.base.Language;
import com.polishedcode.crystalmind.map.store.TopicStore;
import com.polishedcode.crystalmind.map.store.TopicStoreException;
import com.polishedcode.crystalmind.map.model.*

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

// ***** Constants *****
// Setup necessary paths, database-related parameters, and other miscellaneous values.
final def COUNTRIES_PATH = '/home/brettk/Source/groovy/perfectlearn-miscellaneous/cia-factbook/data/original/wfbExt/sourceXML.xml'
final def MAPS_PATH = '/home/brettk/Source/groovy/perfectlearn-miscellaneous/cia-factbook/data/original/graphics/maps/newmaps'
final def FLAGS_PATH = '/home/brettk/Source/groovy/perfectlearn-miscellaneous/cia-factbook/data/original/graphics/flags/large'
final def PROCESSED_PATH = '/home/brettk/Source/groovy/perfectlearn-miscellaneous/cia-factbook/data/processed/geos'

final def DATABASE = 'pldb_1'
final def SHARD_INFO = "localhost;3306;${DATABASE}"
final def USERNAME = '********'
final def PASSWORD = '********'
final long TOPIC_MAP_IDENTIFIER = 64L
final def COUNTRIES_TOTAL = 268
final def UNIVERSAL_SCOPE = '*'

// ***** Logic *****
println 'Starting importing process.'

// Instantiate the PerfectLearn topic store.
TopicStore topicStore = new TopicStore(USERNAME, PASSWORD)

// Bootstrap required topics.
println 'Bootstrapping...'
def bootstrapTopics = [
	new Entity(identifier: 'country', name: 'Country', instanceOf: 'topic'),
	new Entity(identifier: 'region', name: 'Region', instanceOf: 'topic'),
	new Entity(identifier: 'world', name: 'The World', instanceOf: 'topic'),
	new Entity(identifier: 'part-of', name: 'Part Of', instanceOf: 'topic')
]

bootstrapTopics.each { bootstrapTopic ->
	Topic topic = new Topic(
		bootstrapTopic.identifier,
		bootstrapTopic.instanceOf,
		bootstrapTopic.name, 
		Language.EN)
	topicStore.putTopic(SHARD_INFO, TOPIC_MAP_IDENTIFIER, topic, Language.EN)
}

/*
Iterate over country records (in sourceXML.xml) by extracting the necessary
attributes to create countries, territories, and regions.
*/
def countriesContent = new File(COUNTRIES_PATH).text
def countriesXml = new XmlSlurper().parseText(countriesContent)
def countries = countriesXml.country

assert COUNTRIES_TOTAL == countries.size()

println 'Iterating over countries...'
for (country in countries) {
	// For each country/territory/region extract the region identifier, 
	// region name, country identifier, country name, and country code. 
	def regionIdentifier = Utils.slugify(country.@Region.text())
	if (regionIdentifier) {	
		def regionName = country.@Region.text()
		def countryIdentifier = Utils.slugify(country.@name.text())
		def countryName = country.@name.text() 
		def countryCode = country.@fips.text().toLowerCase()

		// Get topic's text.
		println "Getting topic's text..."
		def topicContentPath = "$PROCESSED_PATH/${countryCode}.txt"
		def topicContentFile = new File(topicContentPath)
		def topicContent = ''
		if (topicContentFile.exists()) {
			topicContent = topicContentFile.text
		}

		// Extract the country's background excerpt.
		println "Extracting the country's background excerpt..."
		if (topicContent) {
			def excerpt = getBackgroundExcerpt(topicContent)
			def background = getBackground(topicContent)

			// Add the appropriate timeline related meta data.
			println 'Adding the timeline metadata...'
			if (excerpt && background) {
				def timelineYear = getTimelineYear(background)
				def timelineMedia = "<blockquote>${excerpt.find(~/(?s)^\S*^(.*?)[.?!]\s/).trim()}<blockquote>".toString()
				if (timelineYear && timelineMedia && excerpt) {
					topicStore.createMetadatum(SHARD_INFO, TOPIC_MAP_IDENTIFIER, 'timeline-event-startdate', timelineYear, countryIdentifier, Language.EN, '', DataType.STRING, UNIVERSAL_SCOPE)
					topicStore.createMetadatum(SHARD_INFO, TOPIC_MAP_IDENTIFIER, 'timeline-media', timelineMedia, countryIdentifier, Language.EN, '', DataType.STRING, UNIVERSAL_SCOPE)
					topicStore.createMetadatum(SHARD_INFO, TOPIC_MAP_IDENTIFIER, 'timeline-text', excerpt, countryIdentifier, Language.EN, '', DataType.STRING, UNIVERSAL_SCOPE)
				}
			}
		}

		// Create and store the country or territory topic.
		println 'Creating and storing the country topic...'
		Topic countryTopic = new Topic(
			countryIdentifier,
			'country',
			countryName, 
			Language.EN)
		topicStore.putTopic(SHARD_INFO, TOPIC_MAP_IDENTIFIER, countryTopic, Language.EN)

		// Create and store the topic's text occurrence.
		println "Creating and storing the topic's text..."
		Occurrence occurrence = new Occurrence(countryIdentifier)
		occurrence.with {
			instanceOf = 'text'
			scope = UNIVERSAL_SCOPE
			language = Language.EN
			resourceData = topicContent.getBytes()	
		}
		topicStore.putOccurrence(SHARD_INFO, TOPIC_MAP_IDENTIFIER, occurrence)
		topicStore.createMetadatum(SHARD_INFO, TOPIC_MAP_IDENTIFIER, 'label', countryName, occurrence.identifier, Language.EN, '', DataType.STRING, UNIVERSAL_SCOPE)

		// Create and store the region topic.
		println 'Creating and storing the region topic...'
		if (!topicStore.topicExists(SHARD_INFO, TOPIC_MAP_IDENTIFIER, regionIdentifier)) {
			Topic regionTopic = new Topic(
				regionIdentifier,
				'region',
				regionName, 
				Language.EN)
			topicStore.putTopic(SHARD_INFO, TOPIC_MAP_IDENTIFIER, regionTopic, Language.EN)
		}

		// Create associations between countries and regions.
		println 'Creating associations between countries and regions...'
		topicStore.createAssociation(SHARD_INFO, TOPIC_MAP_IDENTIFIER, 'country', countryIdentifier, 'region', regionIdentifier)

		// Create coordinates metadatum for each country's capital.
		println "Creating coordinates for country's capital..."
		def coordinates = topicContent.findAll(~/(?m)(^[-+]?\d{1,2}\s*\d{1,2}\s*[A-Z]),\s*([-+]?\d{1,2}\s*\d{1,3}\s*[A-Z])/)
		if (coordinates[1]) {
			ddCoordinates = convertToDdCoordinates(coordinates[1])
			topicStore.createMetadatum(SHARD_INFO, TOPIC_MAP_IDENTIFIER, 'map-coordinates', ddCoordinates, countryIdentifier, Language.EN, '', DataType.STRING, UNIVERSAL_SCOPE)
		}

		// Add link occurrence to each topic pointing to the original CIA World Factbook country page. 
		println 'Adding CIA World Factbook country page link...'
		Occurrence linkOccurrence = new Occurrence(countryIdentifier)
		linkOccurrence.with {
			instanceOf = 'url'
			scope = UNIVERSAL_SCOPE
			language = Language.EN
			resourceRef = "https://www.cia.gov/library/publications/the-world-factbook/geos/${countryCode}.html"
		}
		topicStore.putOccurrence(SHARD_INFO, TOPIC_MAP_IDENTIFIER, linkOccurrence)
		topicStore.createMetadatum(SHARD_INFO, TOPIC_MAP_IDENTIFIER, 'label', "$countryName CIA World Factbook Page", linkOccurrence.identifier, Language.EN, '', DataType.STRING, UNIVERSAL_SCOPE)

		// Add flag (occurrence) to each topic and copy image to appropriate (web application resources) directory.
		println 'Adding flag...'
		def imageDirectoryName = "/home/brettk/www/static/$TOPIC_MAP_IDENTIFIER/images/$countryIdentifier"
		def imageDirectory = new File(imageDirectoryName)
		imageDirectory.mkdirs() // Non-destructive.

		def serverImageDirectoryName = "/static/$TOPIC_MAP_IDENTIFIER/images/$countryIdentifier"
		
		Occurrence flagOccurrence = new Occurrence(countryIdentifier)
		flagOccurrence.with {
			instanceOf = 'image'
			scope = UNIVERSAL_SCOPE
			language = Language.EN
			resourceRef = "$serverImageDirectoryName/${flagOccurrence.identifier}.gif"
		}
		topicStore.putOccurrence(SHARD_INFO, TOPIC_MAP_IDENTIFIER, flagOccurrence)
		topicStore.createMetadatum(SHARD_INFO, TOPIC_MAP_IDENTIFIER, 'label', "$countryName (Flag)", flagOccurrence.identifier, Language.EN, '', DataType.STRING, UNIVERSAL_SCOPE)

		copyFile("$FLAGS_PATH/${countryCode}-lgflag.gif", "$imageDirectoryName/${flagOccurrence.identifier}.gif")

		// Add map (occurrence) to each topic.
		println 'Adding map...'
		Occurrence mapOccurrence = new Occurrence(countryIdentifier)
		mapOccurrence.with {
			instanceOf = 'image'
			scope = UNIVERSAL_SCOPE
			language = Language.EN
			resourceRef = "$serverImageDirectoryName/${mapOccurrence.identifier}.gif"
		}
		topicStore.putOccurrence(SHARD_INFO, TOPIC_MAP_IDENTIFIER, mapOccurrence)
		topicStore.createMetadatum(SHARD_INFO, TOPIC_MAP_IDENTIFIER, 'label', "$countryName (Map)", mapOccurrence.identifier, Language.EN, '', DataType.STRING, UNIVERSAL_SCOPE)

		copyFile("$MAPS_PATH/${countryCode}-map.gif", "$imageDirectoryName/${mapOccurrence.identifier}.gif")
	}
}

// Create associations between regions.
println 'Creating associations between regions...'
def regionIdentifiers = [
	'africa',
	'central-america',
	'central-asia',
	'east-asia',
	'europe',
	'middle-east',
	'north-america',
	'oceania',
	'south-america',
	'south-asia'
]
for (outerRegionIdentifier in regionIdentifiers) {
	for (innerRegionIdentifier in regionIdentifiers.findAll { it != outerRegionIdentifier } ) {
		topicStore.createAssociation(SHARD_INFO, TOPIC_MAP_IDENTIFIER, 'region', outerRegionIdentifier, 'region', innerRegionIdentifier)
	}
	// Create associations between the world topic and the regions.
	topicStore.createAssociation(SHARD_INFO, TOPIC_MAP_IDENTIFIER, 'part-of', 'world', 'region', outerRegionIdentifier)
}

// Add the appropriate text occurrence ('xx.txt') to the 'world' topic.
println "Adding text occurrence to the 'World' topic..."
def worldTopicContentFileName = "${PROCESSED_PATH}/xx.txt"

def worldTopicContentFile = new File(worldTopicContentFileName)
def worldTopicContent = worldTopicContentFile.text

Occurrence worldOccurrence = new Occurrence('world')
worldOccurrence.with {
	instanceOf = 'text'
	scope = UNIVERSAL_SCOPE
	language = Language.EN
	resourceData = worldTopicContent.getBytes()
}
topicStore.putOccurrence(SHARD_INFO, TOPIC_MAP_IDENTIFIER, worldOccurrence)
topicStore.createMetadatum(SHARD_INFO, TOPIC_MAP_IDENTIFIER, 'label', 'world', worldOccurrence.identifier, Language.EN, '', DataType.STRING, UNIVERSAL_SCOPE)

println 'Done!'

// ***** Helper methods *****
def convertToDdCoordinates(String dmsCoordinates) { // Format: 17 49 S, 31 02 E
	// http://en.wikipedia.org/wiki/Geographic_coordinate_conversion
	def parts = dmsCoordinates.replace(',', '').split(' ')

	def ddLatitude = parts[0].toInteger() + (parts[1].toInteger() / 60) 
	if (parts[2] == 'S') {
		ddLatitude = 0 - ddLatitude
	}
	def ddLongitude = parts[3].toInteger() + (parts[4].toInteger() / 60)
	if (parts[5] == 'W') {
		ddLongitude = 0 - ddLongitude
	}
	return "($ddLatitude, $ddLongitude)"
}

def getBackgroundExcerpt(String content) {
	def result = content
		.find(~/(?s)<\/h3>.*<h2>Geography/)
		?.replaceAll(~/<\/h3>/, '')
		?.replaceAll(~/<h2>Geography/, '')
	if (result) {
		if (result.size() > 320) {
			result = result[0..320]
		}
		if (result[-1] != '.') {
			result = result << '...'
		}
	} else {
		result = ''
	}
	return result.toString()
}

def getBackground(String content) {
	def result = content
		.find(~/(?s)<\/h3>.*<h2>Geography/)
		?.replaceAll(~/<\/h3>/, '')
		?.replaceAll(~/<h2>Geography/, '')
	if (result == null) {
		result = ''
	}
	return result
}

def getTimelineYear(String content) {
	def bcYears = content.findAll(~/\d{4}\sB.C./).collect { it.replace(' .B.C.', '') }
	def adYears = content.findAll(~/\d{4}/)
	def years = adYears - bcYears
	return years[0]
}

def copyFile(String sourcePath, String targetPath) {
	Path source = Paths.get(sourcePath)
	Path destination = Paths.get(targetPath)

	try {
		Files.copy(source, destination);
	} catch (IOException e) {
		e.printStackTrace();
	}
}

// ***** Models *****

class Entity {
	String identifier
	String name
	String instanceOf
}

And that’s it, folks! In a follow-up article I will document how to improve the import process outlined in this article to make much better use of the resources provided by the World Factbook. However, on this first iteration, the current import process provides me with sufficient data to thoroughly test PerfectLearn with a non-trivial topic map.

Stay tuned for updates. Subscribe to the PerfectLearn newsletter.

Multiple projects in PerfectLearn

One of the main reasons for building PerfectLearn is to use it myself. I genuinely find it useful to employ a topic map-based approach to organize my personal knowledge. Having successfully used PerfectLearn’s predecessor, QueSucede.com, as an online personal knowledge base for the past seven years has convinced me of the utility of an application that helps a user to manage their (documented) knowledge and to turn it into a tangible thing of value.

Learning and Creativity

Learning and Creativity

When thinking about things that would make PerfectLearn even more useful I only have to examine the pain points I am experiencing when using the application. Currently, one of the bigger "problems" I see with PerfectLearn is the issue of one topic map per user. That is, when a user signs up to use PerfectLearn, the application creates a topic map for that user. That is, each user gets one, and only one, topic map. And that, my friends, is a limitation.

When looking at my own needs, I see that I want to be able to create multiple independent topic maps to manage unrelated projects. For example, if you are a student using PerfectLearn, I can imagine you creating a specific topic map for your thesis and creating other topic maps for, well, other purposes. What this means is that PerfectLearn needs to have the ability for the user to create, select, and manage multiple projects where each project is a self-contained topic map isolated from the user’s other topic maps. In retrospect, I consider this to be an essential feature of PerfectLearn and will start implementing it as soon as PerfectLearn version 1.0 has been released.

If you have any suggestions with regards to the project feature let me know by submitting the feedback form.

Update (January 11, 2015): After some more consideration, I have decided to implement the project feature before launching version 1.0 of PerfectLearn. The main reason for doing so is that the implementation of this feature involves changing the topic map’s database definition. Doing this change after launching PerfectLearn would require potentially quite tricky migration of user data from the previous database definition to the new database definition with the accompanying downtime and risk of data loss. All in all, I don’t expect the implementation of the project feature and subsequent testing to significantly delay the launch of PerfectLearn.

Stay tuned for updates. Subscribe to the PerfectLearn newsletter.

PerfectLearn, the final sprint

Since publishing the PerfectLearn development update on December 06 (2014), the following functionality has been completed:

  • Generate and display a tag cloud based on the user’s tagged topics
  • Edit note
  • Edit URL
  • Edit video link
  • Edit metadatum
  • Numerous minor bug and user-interface fixes
Bokeh Pens by Long Mai (Flickr): http://www.flickr.com/photos/25740835@N08/4377921097/in/photolist-7ERZqe-8BsFs5-9gWAUM-dHW7sD-7X5ABM-b3DidZ-8B616C-awGhso-d45bmf-b8ycpD-dLbKg6-8pBLCf-d2g9Kb-9UE3uM-8JhofC-942FuL-dHW7L8-96dGfb-bBJRGA-aN4z24-dMunkg-dMzVuG-dMun6g-dMunpM-dMumVR-dMzVPw-dMzVGU-dMzVKE-dMumXM-cwTTq1-dKAxSW-e3GcNo-dGSW5V-dMzVEE-9vc6Jr-bjCuBK-9GnevJ-eAE9nj-e34fr4-ebukxw

Bokeh Pens by Long Mai (Flickr)

This means that the topics index and the front-end validation of forms are the only remaining bits of functionality left to implement for version 1.0. As you can see, I’m slightly behind schedule. Nonetheless, I feel that good progress is being made and I also expect to make up some time during the Christmas break.

I also hope to blog on a more regular basis from now until, at least, PerfectLearn has been released.

Thanks for being there for me.

Stay tuned for updates. Subscribe to the PerfectLearn newsletter.

PerfectLearn development update December 2014

PerfectLearn is almost done.

I feel both anxious and excited writing those words. I’ve been working on PerfectLearn, on and off, for almost twenty months. And, it’s been even more time if you take into account that PerfectLearn is the culmination of two other projects, QueSucede.com and ContextNote that I started developing in 2007 and 2011, respectively.

Bokeh Pens

Bokeh Pens

One of my goals with PerfectLearn was to not repeat the same mistakes that I made in previous projects. Specifically, I didn’t want to make the mistake of developing an application in isolation. In that respect, I have been talking to several people within the fields of personal knowledge management and digital learning environments to try to understand how to help individual learners. I also recorded several screencasts showing how to use PerfectLearn and published them on YouTube. The feedback I got from people who watched the screencasts has proven to be invaluable.

Shipping a product is a feature. A very important feature.

But, let’s get back to the reason of this blog post which is to explain the current state of PerfectLearn’s development. We will also take a brief look at some features that I have postponed adding to the application until after the first version of the application has been published. Shipping a product is a feature. A very important feature. That is why I have removed some functionality requirements from version 1.0 of the application. They will be added. Just not now.

First of all, in terms of actual functionality (for version 1.0) the following items are still outstanding:

  • Generate and display a tag cloud based on the user’s tagged topics
  • Front-end form validation for all of the forms in the application
  • Edit note
  • Edit topic
  • Edit URL
  • Edit video link
  • Edit metadatum
  • Topic index (with pagination)

That’s it! That’s what I mean when I say that “PerfectLearn is almost done.” Nonetheless, a couple of things need to be done between finishing the implementation of the above mentioned functionality and actually getting the application into the hands of users. Specifically, in relation to pre-launch testing it is my intention to do the following:

  • Internal testing: after having published PerfectLearn on the production server, I will put the application through its paces and do as many “stupid” things as possible in the application with the explicit intention of breaking it. Every time I break the application, I will fix the bug and repeat the process.
  • Private beta testing: once I have completed the internal testing I will provide access to everyone who has asked to test the application. I will do this in a way that will make it possible for me to provide timely personal support.

And now for the all-important timeframes. Implementing the above-mentioned functionality will be done by December 15. Straight after that I will deploy the application to the production server and start the internal testing phase. Taking into account that I will be doing this during the Christmas holiday, I expect this phase to take up to two weeks which means that beta-testing should start in the first weeks of January (2015). I’m unsure as to how long the beta-testing phase will take but I’m hoping no more than two to four weeks depending on what issues come to light. So, that means that version 1.0 of PerfectLearn should be launched no later than the beginning of February, 2015.

Finally, let’s take a look at some of the features that I have scrapped from PerfectLearn version 1.0:

  • The most important feature (at this stage) that did not make the cut is full-text search. Search is obviously an important feature but implementing it (with elasticsearch) will add, at the pace I am able to work on PerfectLearn, another two weeks to the development schedule. I’m not willing to do that. So, like I said, search will be added. But not now.
  • The next feature, the ability to generate eBooks (in PDF, EPUB, and Kindle formats) from a set of user-selected topics is something that I’m very interested in doing from the point of view of making PerfectLearn a viable combined research tool and eBook authoring system. This feature is adjacent to PerfectLearn’s primary value proposition (helping you turn your personal knowledge into a valuable asset) and therefore will be added at a later stage and perhaps not even made available to regular users of the application.
  • Dropbox and Google Drive integration. That is, the ability to attach files and images to topics that will automatically be stored in your Dropbox or Google Drive account.
  • Currently, your documented knowledge in PerfectLearn is not publicly accessible (by design). Being able to publish your documented knowledge to a personal learning portfolio (with, for example, LinkedIn integration) is another feature that I would like to add to PerfectLearn in the not too-distant future.
  • Finally, to make it straightforward to get information into PerfectLearn as part of your documented knowledge, I will develop a browser extension that makes it possible to add a webpage (as a topic) quickly and easily to PerfectLearn by just clicking a button.

In summary, the vast majority of PerfectLearn’s feature set has already been implemented. Initial production testing should start in approximately ten to fourteen days time. After that, there will be a beta testing phase that should take no more than two to four weeks which brings us to early February of 2015 to release the first version of PerfectLearn.

Finally, I would like to thank everyone who has helped me along the way. In many ways I couldn’t have done this alone. Thank you.

Stay tuned for updates. Subscribe to the PerfectLearn newsletter.

PerfectLearn feedback

Over the last couple of weeks I have released several screencasts (on the PerfectLearn YouTube Channel) in which I attempt to explain how PerfectLearn works in conjunction with providing an overview of PerfectLearn’s benefits.

The feedback from numerous people that have watched the videos has been overwhelmingly positive. Not only have people expressed their interest in PerfectLearn but also, and perhaps more surprisingly, I have also received a lot of very useful insights at both a product level and at a market(ing) level.

With this blog article it is my intention to capture (in no particular order) for future reference what I consider to be the most important insights I have obtained from discussing PerfectLearn with several people after having published the screencasts on YouTube.

User Data

There is nothing more important than knowledge and specifically when talking about individuals, their personal (documented) knowledge is of inestimable value. Hence, for a user knowing that their investment in PerfectLearn is safe because if necessary they can get access to a full dump of their data/documented knowledge is an important consideration when evaluating an application like PerfectLearn. In that respect, it makes sense to offer several ways for a user to be able to export their data, including JSON and XML dumps, HTML, and perhaps even Markdown.

User Context and Touchpoints

Users will be accessing and using PerfectLearn in different locations, contexts, and on different devices. Obviously, one size doesn’t fit all. My thinking in that respect has been heavily influenced by the concepts of touchpoints and cross-channel blueprints (as outlined in the article Cross Channel Design With Alignment Diagrams). Specifically, I am leaning towards the following cross-channel blueprint:

In the above graph you will see how each touchpoint (that is, phone, tablet, and desktop computer) is different with respect to the main user intentions/interactions. That is, on a phone, knowledge acquisition is the most important user intent, on a tablet the intents are more equally divided between knowledge acquisition, knowledge surfacing, and knowledge organisation. And finally, on a desktop machine, the user is probably more focused on the actual organisation of knowledge. The actual touchpoint proportions are arguable but the principle of an application behaving differently depending on the user’s current touchpoint and context is valid.

Target Groups

With regards to marketing PerfectLearn and the accompanying communication of PerfectLearn’s benefits it makes sense to focus on specific user needs.

The plan is to first focus on knowledge (management) geeks and life-long learners. The next target group will be people who are researching/investigating one or more topics of interest (both professionally and non-professionally). Finally, the third target group will be both teachers and students. Obviously, these three groups are not mutually exclusive and it is more than likely that there is at least some overlap between them.

The important lesson to take away from this point is that you really do need to understand each group’s unique pain points and ensure that you effectively communicate how your product addresses those pain points.

Stay tuned for updates. Subscribe to the PerfectLearn newsletter.

Why am I building PerfectLearn?

Why PerfectLearn?

It is a question that I have been asked on more than one occasion. It is also a question that is easy to answer. However, to answer it fully, I have to give you a bit of background information.

Several years ago I built an application that helped me to store and retrieve the essential bits of knowledge that I deemed necessary to do my job well. Like a lot of people, I changed my role several times within the company that I worked for. First I was a developer, then I became lead developer followed by becoming the head of software development and finally, I became the company’s IT manager. With absolute confidence I can say that a major factor, other than my colleagues, that contributed to my success in each of these roles was my ability to effectively manage the required job-related knowledge. And, as you have likely already guessed, it was my application that made this task of knowledge management easier.

PerfectLearn web application

PerfectLearn web application

So, when I was offered an exciting software development-related job at another company, it was only natural for me to resort to my application to help me to keep on top of the demands of the new job. This time around, however, although the application did what it did very well, it was also beginning to show its age. In poetic terms, the application was definitely a child of its time. That is, when I originally implemented the application in 2006 it wasn’t that easy to access high-quality semantic web services in a structured manner. Today, doing exactly that is, relatively speaking, a straight-forward exercise. The number of available public-facing high-quality APIs and web services has skyrocketed. Mainly for that reason, I decided to re-implement my application with the vision of maintaining the versatility and expressive power of topic maps in combination with great semantic web services to automatically supplement your own documented knowledge. And I think I have come along way in accomplishing that vision. PerfectLearn, without any user intervention, automatically displays related Wikipedia articles, Flickr images, YouTube videos, and news stories from various sources all seamlessly complementing your own documented knowledge.

Moreover, the semantic nature of PerfectLearn is not limited to external web services returning related information; due to PerfectLearn’s underlying data structure, topic maps, you can define semantically meaningful relations between your own topics making it possible to expand your documented knowledge without the risk of your knowledge becoming disjointed. In addition, these semantic relationships (associations, in topic map terminology) provide you with a very precise context for any given topic while at the same time making it easy to navigate your documented knowledge in an exploratory fashion.

I have already got to a stage within the application’s development that it can actually be used and I am pleased with the result. Lots of potential features are still missing but the application is useable and this time around, I built the application not just for me but for other people as well. Why? If the application is useful to me it will be useful to other people as well. I’m convinced of that.

Stay tuned for more tutorials and screencasts and sign up for the newsletter and get the latest in updates. Subscribe to the PerfectLearn newsletter.

PerfectLearn development update August 2013

I started the development of PerfectLearn almost four months ago and good progress has been made with the project. Although, every time I think that I’m close to finishing, I realise that the application is still missing essential functionality. I published my initial list of outstanding tasks over a month ago when I thought that I was in the final stage of the project. Now, one month later and I estimate that I have at least another three or four weeks to go before I can confidently declare that PerfectLearn is ready for release. You know… the whole Ninety-ninety rule.

PerfectLearn web application

PerfectLearn web application

Taking a look at the application’s todo list shows that I still have the following tasks to complete (grouped by task type):

Application development

  • Add and view tags. Tagging is an awesome way to organise and subsequently find your documented knowledge and its implementation is absolutely imperative. I’m going to implement tagging using associations making automatic categorisation of your information possible. The whole automatic categorisation of information thing is something that you have to experience to see how it works in practice.
  • Edit topic comments. At the moment, you can add and remove topic comments but you cannot edit them. It’s a quick thing to implement, but somehow I just haven’t got around to doing it.
  • Delete images and attachments. Uploading images and attachments for subsequent viewing and downloading is done; being able to delete them, however, is still pending implementation. With regards to uploading images and attachments, the current implementation stores the files on the same server where the application is running. I am, however, considering swapping out the current implementation with an Amazon S3 implementation.
  • Edit links. Adding links to the current topic and removing links from the current topic is finished, but I still have to implement the ability to edit links. Again, it’s a quick piece of functionality to implement. Nevertheless, it has taken a back-seat to more compelling features. Obviously, it needs to be finished before release.
  • Add, edit and remove metadata. Behind the scenes, the PerfectLearn topic map engine uses the concept of metadata to complement the various entities (that is, topics, occurrences, and associations) within the application with additional information. A non-admin user is never exposed to the concept of metadata within the application, that is, metadata management is transparent to the “normal” user. However, the admin user, does have the ability to manually manage metadata and it’s this user interface-related functionality that is still pending implementation.
  • Topic search. I’m of the opinion that search is of less importance in a topic map-based system compared to a non-topic map based system due to the inherent ease that the former provides you in terms of exploratory navigation of your documented knowledge. Nonetheless, having full text search is obviously very useful when you just need to quickly find whatever you are looking for without much ceremony. I am still undecided as to which search engine I will use to implement search within PerfectLearn. Currently, I am reviewing both elasticsearch and Apache Solr, both based on the Apache Lucene engine.
  • Translation of the application’s user interface into Spanish. According to Wikipedia, Spanish is the third-most used language on the web (2011 figures) with over 160 million users. In this context, it is also interesting to note that PerfectLearn has full support for multi-lingual content. That is, as a user you can easily switch between managing textual and binary content for different languages. Translation of the application’s user interface to Spanish is already on-going.
  • Supplemental navigation systems, including a topic index and next topic and previous topic navigation. Good Information Architecture (IA) advocates that it makes sense to provide not just, the so-called, “embedded” navigation systems (that is, global, local, and contextual navigation) but also supplemental and social navigation systems like (topic) indexes and tags.
  • Google Drive integration. Google Drive is a powerful file storage and real-time collaboration environment. Being able to organise and access your Google Drive documents by topic from within the application is a very compelling feature. I will add Google Drive support after the first release of PerfectLearn.
  • Client-side form validation. I have already implemented server-side validation. Nonetheless, it only makes sense to include client-side validation to reduce the application’s network chattiness and to provide a more streamlined user experience.
  • Upgrade to Twitter Bootstrap 3 (including typeahead.js integration). Who within the web community hasn’t heard of Twitter Bootstrap yet? It is a very powerful front-end web framework that makes it very easy to implement clean and functional user interfaces. Recently, version 3.0 of the project was released boasting a “mobile-first” approach making it a no-brainer with regards to upgrading from version 2 of the framework.
  • User profile page. Self-explanatory.
  • Browse user portfolios and view individual portfolios. The personal network and portfolio are both dimensions of the personal learning environment and, in that respect, PerfectLearn has the ability to publish individual topics from your documented knowledge repository into your online, publicly accessible learning portfolio. The work on this part of the application is already ongoing. I just need to refine and polish the experience.
  • LinkedIn integration. From my point of view, a person using a personal learning environment will do so for several reasons. Obviously, managing their documented knowledge in an effective manner is probably quite high on that list. In addition, being able to evidence your knowledge (to, for example, a prospective employer) is equally important and that is where the “portfolio” aspect of a personal learning environment comes into play. Having the ability to surface your knowledge directly within your LinkedIn Activity feed provides you, the user, with real value.
  • Atom syndication format-based web feed for the user’s portfolio. Self-evident.

Back-end development

  • getPublishedTopicReferences method. This method retrieves all of a topic’s related topics that have also been published in the user’s learning portfolio (explicitly excluding those topics that haven’t been publicly published) so as to provide the navigational context of a public topic without linking to unpublished or private topics.

Marketing

It doesn’t matter how good your product or service is, if nobody knows about its (relative) merits you are as good as dead in the water. In that respect, in parallel to the on-going development of PerfectLearn, I am actively pursuing the marketing-related activities outlined below.

  • Listening to (and acting upon) user feedback. Several people are providing me with on-going constructive criticism with regards to the application’s feature set, user experience, and in general, its value proposition. Although I have a strong product vision for PerfectLearn, it only makes sense to listen to people who have valuable insights into your product’s market and use cases so as to incorporate valid suggestions into the application.
  • Writing product tutorials (including screencasts). I have a list of tutorials and accompanying screencasts to educate (excuse the pun) prospective users with regards to PerfectLearn’s feature set:
    • Web queries overview
    • How to create a topic
    • How to create a simple association
    • How to create a non-trivial association
    • How to add a member to an association
    • How to add a topic reference to a member of an association
    • How to customise the semantic web queries (for semantically related articles, videos, images, and news stories)
    • Language switching and its consequences
    • How to manage your online learning portfolio
  • Engaging with influencers within the EdTech and personal learning environment space:

Legal mumbo jumbo

  • Terms & conditions. Self-explanatory.
  • Privacy policy. Self-explanatory.

For the moment, the above list outlines what is still pending with regards to PerfectLearn’s implementation before I can release a beta version of the application. If you have any suggestions with regards to what you have seen up until now, I will be grateful for your feedback.

Stay tuned for more tutorials and screencasts and sign up for the newsletter and get the latest in updates. Subscribe to the PerfectLearn newsletter.

Introducing PerfectLearn

PerfectLearn is a personal knowledge base and learning environment with extensive semantic web integration in combination with graph, timeline, and map-based visualizations for straightforward navigation of a person’s documented knowledge.

PerfectLearn is a simple, focused tool to help you to easily manage your personal documented knowledge in a manner that goes beyond the typical hierarchical organization of documents.

PerfectLearn: Helping you turn your personal knowledge into a valuable asset.

In today’s world, lifelong learning is not only possible with the advent of the web and its related technologies but it is also almost obligatory if you want to keep abreast of advances within your personal and professional fields of interest. Having the necessary means to help you manage your learning processes and experiences is crucial to the pursuit of knowledge. In that respect, PerfectLearn is the ideal tool.

PerfectLearn

PerfectLearn

PerfectLearn’s versatile underlying data structure, topic maps, makes it straightforward to relate your topics of interest in a meaningful way. Having this context provides several obvious benefits including the ability to expand your knowledge without the risk of it becoming disjointed and being able to find the information you are looking for in a more intuitive manner. This in turn makes it easier and quicker for you to assimilate and internalize the information so that it becomes an integral part of your knowledge.

In addition, PerfectLearn’s integration with the web automatically complements and enhances your own topics of study with related study materials, including articles from Wikipedia and Freebase, videos from YouTube, images from Flickr, and the latest news from a variety of news sources.

Sign up for the PerfectLearn newsletter and get the latest in updates. Watch PerfectLearn in action on the PerfectLearn YouTube channel. Follow PerfectLearn on Twitter.