The ZENARK API

Access realtime and historical news from thousands of sources using our news API.

The ZENARK API allows easy integration of news content into your application or project. Dynamically access our near real-time database containing news, press releases and articles from a variety of verified sources. The ZENARK API is feature rich while remaining easy to use. Easily write precise queries that accurately return relevant content.

Our customers are building a wide range of applications including brand / reputation monitoring, AI model training, mobile news applications and trend forecasting. They all have a common requirement for a reliable, robust and trusted news delivery platform.

A Trusted Platform

Powerful API
  • A simple but flexible API based on open standards
  • Easily integrated with both legacy and modern systems
  • Scalable cloud based architecture
  • Curated sources
  • Hand curated content carefully selected and monitored for quality and consistency
  • Not just news ‐ we include government publications, press releases and blogs
  • Smart processing
  • Intelligent scheduling ensures almost real-time discovery of news stories
  • Content filtered for duplicates, spam and adverts
  • Boilerplate text removed to prevent false positives
  • Reliable
  • Not dependent on publisher feeds such as RSS or News Sitemaps
  • No missing articles
  • Based on scalable and resilient cloud solutions
  • Architecture

    Our technology has evolved over the last 20 years to become a robust and trusted platform for discovering, indexing and searching news content. It is widely used by thousands of users on a daily basis.

    Challenges

    Unlike traditional search, indexing news and stories has different challenges. Content must be discovered and indexed as close to real-time as possible.

    Accuracy is a key challenge. The platform must accurately identify and retrieve items in a timely fashion without impacting the resources of other web properties. Once the item is retrieved it must be deconstructed to simple plain text by removing boilerplate templates. Meta information such as the title, publication date, article text is precisely extracted from a range of HTML sources. Not all HTML is equal. The structure and standards used in pages can vary greatly.

    Image placeholder

    Our content is selected and regularly checked by hand to ensure a continuing high quality of search results

    Image placeholder

    We use predictive algorithms to determine how our crawlers are scheduled -- this results in near real-time detection and retrieval

    Image placeholder

    Spam and duplicates are automatically removed as part of the verification stage

    Image placeholder

    The text is parsed to accurately extract dates, the title and text as well as remove boilerplate text

    Image placeholder

    Additional information such as region and language is extracted and stored as metadata

    Image placeholder

    The parsed document and the associated meta-data are indexed and made available via our API

    HTTP request and JSON response

    The ZENARK service uses a standards based HTTP Request with JSON response scheme that is familiar to developers of API based systems

    • No additional developer tools or skills required
    • Requests can be submitted at the comand line
    • Additional query parameters allow result filtering
    							
    	{   
    	"ref": "ZZyRdGw5",
        "title": "China believes a mysterious pneumonia outbreak is caused 
        by a new strain of virus from the same family as SARS" 
        "source": {     
            "name": "RTE.ie",  "region": "IE"
            }
        "published": "2020-01-16",  "lang":"en",
    	"keywords": 
    		["SARS", "covid-19", "Xu Jianguo", "World Health Organization"]
       }
    } 
    						

    Features at a glance

    Realtime indexing

    Rapid discovery and indexing of articles

    Boilerplate removal

    Only the article text is indexed

    Handpicked content

    We carefully choose sources for the index

    JSON format response

    Use your favorite library to integrate our content

    User friendly search

    We use a familiar syntax for constructing queries

    20 years experience

    Our people and technology are not the new kids in town

    Reliable

    The technology is incredibly reliable and well proven

    Duplicate detection

    Duplicates and other anomalous articles are automatically handled

    AI metadata

    Supplemental information is added using AI processing

    Simple API

    Our API can be accessed at the command line

    Multiple languages

    We support and detect content in multiple languages

    Variety of content

    It's not just news .. we index government, blogs and social content