Give me five

Give me five is an open source Chrome extension that allows you to recommend the content you push to Lateral based on the content of the page you’re currently visiting. It’s the same code base that the NewsBot Chrome extension is built upon.

Clustering debates from UK politicians

What kind of language do British parliamentarians use? We scraped, parsed and vectorised a sample of recent debates from the House of Commons. We then applied a k-means clustering algorithm to these vectors, and created a word cloud for each cluster.

Teaching machines new languages

Previously we’ve written about how machines can learn meaning. One of the exciting opportunities of this approach is that it also means they can learn new languages very quickly. We have recently started working on supporting new languages, and thought we would share some initial impressions here.

Building a personal research assistant in a spreadsheet

A while back we partnered up with Blockspring to enable anyone to use our API without needing to write any code. They’ve created an awesome solution that allows you to make use of a range of great APIs using only a spreadsheet. This enables you to bring data into your spreadsheet, run text-analysis and much more. […]

Article Extractor API

Today we are pleased to announce the release of our Article Extractor API! When recommending content it’s important to ensure you are only recommending for the relevant text of an article. We have often faced this challenge with online articles and blogs. We’d want to fetch a URL but just extract the main body of […]

We’re open-sourcing a new out-of-memory ANN search tool

For the last few months, we’ve been doing occasional work on an approximate nearest neighbours (ANN) vector search tool, written in Python. It’s still not finished and there are many rough edges, but it comes with a working DynamoDB adaptor and hence operates out-of-memory, one our main requirements. On the down side, it isn’t as fast […]

The awesomeness of pjax

Or how sometimes simpler is better. pjax is a jQuery library created by Chris Wanstrath, the current CEO of GitHub. It uses AJAX (loading data with javascript without refreshing the page) and pushState (changing the URL of the page without refreshing the page) to create a faster browsing experience. It is faster because instead of loading […]

The arXiv as Dataset

The arXiv is a repository of over 1 million preprints in physics, mathematics and computer science. It is truly open access, and the preprints are an excellent dataset for testing out all sorts of language modelling / machine learning prototypes.