Clustering groups your document corpus into clusters based on similarity. You must specify a number of clusters and the API will asynchronously group all your documents into that number of clusters. Imagine that you have a corpus of documents which is clearly broken up into sports, technology and medicine. If you requested that three clusters are created from the corpus then we would expect the API to create three clusters that correspond to the three main topics. Requesting more clusters should create a finer categorisation.
You can do this using the following cURL command:
curl --request POST \
--url https://api-v3.lateral.io/cluster-models \
--header 'content-type: application/json' \
--header 'subscription-key: YOUR_API_KEY' \
--data '{"number_clusters":10}'
To start the clustering of your documents you need to create a cluster model. A cluster model is a group of clusters. Firstly, ensure you have documents added to the API. If you need to add documents then please see the relevant section in the Getting Started documentation. Once you have added documents then you can create a cluster model by posting to /cluster-model
.
Once the
status
switches totrained
then you will be able to continue.
{
"id": 6,
"number_clusters": 10,
"created_at": "2015-09-17T15:08:37.940Z",
"status": "training"
}
You will receive a response that returns various details about the cluster model such as the id
and the status
. The status here is important, it will remain as training
while the API asynchronously creates the clusters.
To retrieve clusters run the following cURL command:
curl --request GET \
--url https://api-v3.lateral.io/cluster-models/{id}/clusters \
--header 'content-type: application/json' \
--header 'subscription-key: YOUR_API_KEY'
Once the cluster model is trained then you can retrieve the clusters. Once a cluster model is created, any new documents added to the API will be automatically fitted to a cluster.
Depending on how many clusters you requested, this will respond something like:
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ]
Clusters are enumerated upwards from 0. So if you created a cluster model with 10 clusters then the response will look like it does above.
Using cURL:
curl --request GET \
--url https://api-v3.lateral.io/cluster-models/{id}/clusters/{cluster_id}/documents \
--header 'content-type: application/json' \
--header 'subscription-key: YOUR_API_KEY'
This will return a list of document IDs:
[ 51, 52, 53, 54, 55 ]
To get a list of documents that belong to a cluster you can call the /cluster-models/{cluster_model_id}/clusters/{cluster_id}/documents
endpoint.
Using cURL:
curl --request GET \
--url https://api-v3.lateral.io/cluster-models/{cluster_model_id}/clusters/{cluster_id}/words \
--header 'content-type: application/json' \
--header 'subscription-key: YOUR_API_KEY'
To get a list of words that belong to a cluster you can call the /cluster-models/{cluster_model_id}/clusters/{cluster_id}/words
endpoint.
This will return a list of words belonging to the cluster:
[
{
"word": "aut",
"importance": 0.400776
},
{
"word": "ut",
"importance": 0.77988
},
...
]
Importance is a float between 0 and 1 with 1 being more important. The importance of a word measures how close its meaning is to the main theme of the cluster.
Using cURL:
curl --request GET \
--url https://api-v3.lateral.io/cluster-models/{cluster_model_id}/clusters/{cluster_id}/word-cloud \
--header 'content-type: application/json' \
--header 'subscription-key: YOUR_API_KEY'
To get an image of a word cloud that represents the cluster you can call the /cluster-models/{cluster_model_id}/clusters/{cluster_id}/word-cloud
endpoint.
This will return an image like:
Simply enter your details below and we'll email your API key to you!