Introduction

The Lateral API

Pre-populated datasets

Tools

Sign-up for an API Key

Clustering - Get Started

Clustering groups your document corpus into clusters based on similarity. You must specify a number of clusters and the API will asynchronously group all your documents into that number of clusters. Imagine that you have a corpus of documents which is clearly broken up into sports, technology and medicine. If you requested that three clusters are created from the corpus then we would expect the API to create three clusters that correspond to the three main topics. Requesting more clusters should create a finer categorisation.

Create cluster model

You can do this using the following cURL command:

curl --request POST \
  --url https://api-v3.lateral.io/cluster-models \
  --header 'content-type: application/json' \
  --header 'subscription-key: YOUR_API_KEY' \
  --data '{"number_clusters":10}'

To start the clustering of your documents you need to create a cluster model. A cluster model is a group of clusters. Firstly, ensure you have documents added to the API. If you need to add documents then please see the relevant section in the Getting Started documentation. Once you have added documents then you can create a cluster model by posting to /cluster-model.

Once the status switches to trained then you will be able to continue.

{
  "id": 6,
  "number_clusters": 10,
  "created_at": "2015-09-17T15:08:37.940Z",
  "status": "training"
}

You will receive a response that returns various details about the cluster model such as the id and the status. The status here is important, it will remain as training while the API asynchronously creates the clusters.

View clusters

To retrieve clusters run the following cURL command:

curl --request GET \
  --url https://api-v3.lateral.io/cluster-models/{id}/clusters \
  --header 'content-type: application/json' \
  --header 'subscription-key: YOUR_API_KEY'

Once the cluster model is trained then you can retrieve the clusters. Once a cluster model is created, any new documents added to the API will be automatically fitted to a cluster.

Depending on how many clusters you requested, this will respond something like:

[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ]

Clusters are enumerated upwards from 0. So if you created a cluster model with 10 clusters then the response will look like it does above.

Documents

Using cURL:

curl --request GET \
  --url https://api-v3.lateral.io/cluster-models/{id}/clusters/{cluster_id}/documents \
  --header 'content-type: application/json' \
  --header 'subscription-key: YOUR_API_KEY'

This will return a list of document IDs:

[ 51, 52, 53, 54, 55 ]

To get a list of documents that belong to a cluster you can call the /cluster-models/{cluster_model_id}/clusters/{cluster_id}/documents endpoint.

Words

Using cURL:

curl --request GET \
  --url https://api-v3.lateral.io/cluster-models/{cluster_model_id}/clusters/{cluster_id}/words \
  --header 'content-type: application/json' \
  --header 'subscription-key: YOUR_API_KEY'

To get a list of words that belong to a cluster you can call the /cluster-models/{cluster_model_id}/clusters/{cluster_id}/words endpoint.

This will return a list of words belonging to the cluster:

[
  {
    "word": "aut",
    "importance": 0.400776
  },
  {
    "word": "ut",
    "importance": 0.77988
  },
  ...
]

Importance is a float between 0 and 1 with 1 being more important. The importance of a word measures how close its meaning is to the main theme of the cluster.

Word cloud

Using cURL:

curl --request GET \
  --url https://api-v3.lateral.io/cluster-models/{cluster_model_id}/clusters/{cluster_id}/word-cloud \
  --header 'content-type: application/json' \
  --header 'subscription-key: YOUR_API_KEY'

To get an image of a word cloud that represents the cluster you can call the /cluster-models/{cluster_model_id}/clusters/{cluster_id}/word-cloud endpoint.

This will return an image like:

Word cloud

Next section: LIP API Reference