Introduction

The Lateral API

Tools

Article Extractor

Pricing

Our pricing is based on the usage and support level required. Please share an estimate of the amount of articles you will be parsing monthly. We will look at this to provide you with a quote.

Introduction

The Article Extractor is an API that takes a URL and returns a JSON object that contains parsed elements from the article. The parser will only work with websites that are in an article format such as newspapers, blogs or magazines. The successful extraction of the various parts cannot guaranteed because of the many possibilities that websites could be formatted. We find that the parser works for the majority of examples.

Try it out

Select an example article below, a real request to the API is made but is limited to a choice of three articles to prevent abuse. If you want to use the API follow the Usage instructions below.

Visual
JSON
{
      "title": "Air traffic controllers release timelapse tour of UK airspace",
      "author": "",
      "published": null,
      "url": "http://www.bbc.co.uk/news/uk-30108947",
      "image": "http://news.bbcimg.co.uk/media/images/79117000/jpg/_79117585_79117513.jpg",
      "videos": [],
      "keywords": [
        "ukclaire",
        "airspace",
        "nats",
        "showing",
        "timelapse",
        "air",
        "controllers",
        "tour",
        "traffic",
        "uk",
        "release",
        "video",
        "thousands",
        "skies",
        "typical"
      ],
      "summary": "A timelapse video showing thousands of planes coming in and out of the UK has been released by NATS.",
      "body": "A timelapse video showing thousands of planes flying in and out of the UK has been released by NATS (National Air Traffic Services).\\\n\\\nAround 6,000 flights take off and land across the region during an average 24-hour period.\\\n\\\nAir traffic experts collected radar data to create an overview of a typical day in the skies above the UK.\\\n\\\nClaire Brennan reports.\\\n\\\nFootage courtesy of NATS."
    }

JSON spec

FieldTypeDetails
titleStringArticle title from social media meta tags, falling back to the <title> tag
authorStringAuthor of the article
publishedDateDate of publication
urlURLArticle URL
imageURLMain image
videosArrayArray of embedded video URLs
keywordsArrayArray of keywords that appear in the article
summaryStringSummary of the article
bodyStringMain body of the article

Usage

You can use the following cURL command to add a document:

Note: you’ll need to replace the text API_KEY with your API key.

curl --request GET \
  --url 'https://document-parser-api.lateral.io/?url=http://www.bbc.com/news/31047780' \
  --header 'content-type: application/json' \
  --header 'subscription-key: API_KEY'

Let’s say that you have the URL http://www.bbc.com/news/31047780 and you want to parse it. The first thing that you will need to do is get an API key.

Pipe output of the cURL command to the Python:

curl --request GET \
  --url 'https://document-parser-api.lateral.io/?url=http://www.bbc.com/news/31047780' \
  --header 'content-type: application/json' \
  --header 'subscription-key: API_KEY' | python -m json.tool

The output of the cURL command will be a JSON object as specified above. If you want to pretty print the returned JSON object for testing then (if you’re using Python 2.6+) you can pipe the output of the cURL command to the Python.

To call the API in your programming language of choice, check out the API specification where there are code samples available.