Introduction

The Lateral API

API Reference

Pre-populated datasets

Tools

Sign-up for an API Key

Article Extractor

Introduction

The Article Extractor is an API that takes a URL and returns a JSON object that contains parsed elements from the article. The parser will only work with websites that are in an article format such as newspapers, blogs or magazines. The successful extraction of the various parts cannot guaranteed because of the many possibilities that websites could be formatted. We find that the parser works for the majority of examples.

Try it out

Visual
JSON
{
      "title": "Air traffic controllers release timelapse tour of UK airspace",
      "author": "",
      "published": null,
      "url": "http://www.bbc.co.uk/news/uk-30108947",
      "image": "http://news.bbcimg.co.uk/media/images/79117000/jpg/_79117585_79117513.jpg",
      "videos": [],
      "keywords": [
        "ukclaire",
        "airspace",
        "nats",
        "showing",
        "timelapse",
        "air",
        "controllers",
        "tour",
        "traffic",
        "uk",
        "release",
        "video",
        "thousands",
        "skies",
        "typical"
      ],
      "summary": "A timelapse video showing thousands of planes coming in and out of the UK has been released by NATS.",
      "body": "A timelapse video showing thousands of planes flying in and out of the UK has been released by NATS (National Air Traffic Services).\\\n\\\nAround 6,000 flights take off and land across the region during an average 24-hour period.\\\n\\\nAir traffic experts collected radar data to create an overview of a typical day in the skies above the UK.\\\n\\\nClaire Brennan reports.\\\n\\\nFootage courtesy of NATS."
    }

JSON spec

FieldTypeDetails
titleStringArticle title from social media meta tags, falling back to the <title> tag
authorStringAuthor of the article
publishedDateDate of publication
urlURLArticle URL
imageURLMain image
videosArrayArray of embedded video URLs
keywordsArrayArray of keywords that appear in the article
summaryStringSummary of the article
bodyStringMain body of the article

Usage

You can use the following cURL command to add a document:

Note: you’ll need to replace the text API_KEY with your API key.

curl --request GET \
  --url 'https://document-parser-api.lateral.io/?url=http://www.bbc.com/news/31047780' \
  --header 'content-type: application/json' \
  --header 'subscription-key: API_KEY'

Let’s say that you have the URL http://www.bbc.com/news/31047780 and you want to parse it. The first thing that you will need to do is get an API key.

Pipe output of the cURL command to the Python:

curl --request GET \
  --url 'https://document-parser-api.lateral.io/?url=http://www.bbc.com/news/31047780' \
  --header 'content-type: application/json' \
  --header 'subscription-key: API_KEY' | python -m json.tool

The output of the cURL command will be a JSON object as specified above. If you want to pretty print the returned JSON object for testing then (if you’re using Python 2.6+) you can pipe the output of the cURL command to the Python.

To call the API in your programming language of choice, check out the API specification where there are code samples available.