API guide| zenark news api

GETTING STARTED

The Zenark News API is a set of tools that allow application developers to integrate news search functionality within their product. The API provides realtime access to the Zenark news search engine.

The Zenark search engine is a proven technology that allows searching of realtime and historical articles. Articles include news, blogs, press releases and other items published online.

Typically this type of data is described as unstructured. The Zenark API transforms the data into structured data and that can be accessed via a versatile search query language. The Zenark News API is based on the familiar HTTP REST standard allowing easy integration with leading programming languages and tools.

The basic search syntax is similar to that used with other search engines. Our advanced features such as filtering, sorting and formatting allow significant accuracy and relevance in results.

Before you start you will need the following:

a Zenark issued API key
a HTTP client to make requests
a JSON (JavaScript Object Notation) library to decode responses

NOTE A default key is sent in the welcome email. You can also access the console to manage API keys.

For clarity, this product is intended for developers with a reasonable proficiency with web 2.0 application development.

MAKING A SEARCH REQUEST

A search is submitted to the Zenark API endpoint by using a standard HTTPS request. The HTTP query string MUST include the x (expression) parameter with an associated search expression.

To execute a search perform these two steps by (a) set the HTTP authorization header and (b) submit the query with a value for the ?x parameter

(a) set the header value to include your API key

authorization: apikey your_api_key

(b) use the https endpoint to perform the query

https://api.zenark.com/v1/search?x=beyonce

You will find a default API key in initial welcome email sent by Zenark. Keys can be generated and managed using the API console.

SUBMIT SEARCH USING HTTP PARAMETERS

The basic query string parameter ?x= specifies the search expression. This is the only mandatory parameter. The other optional parameters enable filtering, data formatting and sorting features. These optional parameters are described below.

The expression should be encoded using standard URL encoding (percent). Pay particular attention to the encoding of quotes and spaces that may be used as part of the query.

A search expression (or query) is broken up into terms and operators. Terms maybe a single word such as "beyonce" or a phrase. A Phrase is a group of words surrounded by double quotes such as "beyonce knowles".

By default, terms and phrases use the AND conjunction.

"beyonce knowles" shakira

The encoded HTTP query for this search is:

?x=%22beyonce%20knowles%22%20shakira

This query searches for documents containing both the phrase "Beyonce Knowles" and the term shakira.

It is possible to add field names to the expression to filter searches beyond options available as HTTP query parameters. For example, we can search the title for the term "Beyonce Knowles":

t:"beyonce knowles"

See below for more examples of filtering using field names.

SEARCH SYNTAX

By default, a search will:

search the text fields in a document which are the title (t) and body (b)

use a default AND conjunction for the query

sort the results according to the "discovery date" with newer items at the top of the result set

Supported syntax

syntax	example	notes
`term`	beyonce	Matches documents containing the single term in default text fields. The default text fields are `t` (Title) and `b` (Body).
`"term1 term2"`	"beyonce knowles"	A phrase is a combination of terms enclosed with quote characters. This will only match documents containing the exact phrase.
`term1 term2`	beyonce shakira	Returned documents MUST contain both terms in text fields. Terms default to AND conjunction comparisons.
`term1 OR term2`	beyonce OR shakira	Returns documents containing any of the specified terms.
`-term1`	beyonce -shakira	Removes documents containing the negated term. An expession containing a `-` operator must include other terms that return documents.
`field:"phrase"` `field:term`	t:shakira	Search the title field for the specified term. Text fields are `t` (title) `b` body. See full list of fields below.
`phrase~proximity`	"beyonce shakira"~10	Search words occuring within a specific proximity. Our example returns matches when the terms are within 10 words of each other.
`pd:[start TO end]`	pd:[20210101 TO 20210105] dd:[20210101 TO 20210105]	A range search can be applied for date fields. We store two distinct dates. The date the article was first discovered (discovery date `dd`) by our crawlers and the date item was "apparently" published (published date `pd`). Dates are represented using the YYYYMMDD format. Date fields are described in detail below.
`keyfield:value`	r:ie	Matches documents with a specific key field value. Key fields include `r` (region), `d` (domain). It is frequently useful to combine key fields using an OR clause. Results can be restricted to French or Irish publications uses an OR clause on region `(r:fr OR r:ie)`.
`(clauseA) (clauseB)`	(shakira OR beyonce) (d:bbc.com OR d:nytimes.com OR d:guardian.co.uk)	Allows the construction of a nested or sub-query. Bracketed items are evaluated first. Our example searches titles from a small set of three publications by using the `d` domain key field.

The search syntax allows the selection of specific fields as part of a query. Fields fall into two types:

The Text fields that containing one or more words. These are the title and body. These are the defacto search fields

Fields that contain a single term such as the date published, region, website domain or document URL. These can be further divided into text fields or date types.

DATE FIELDS

Each article has two associated dates:

published (pd) the date our algorithm has determined as the probable publication date.
discovered (dd) the timestamp indicating when our software first identified this item.

The fields dd: and pd: can be used in an expression to filter by date.

adele pd:[20200101 TO 20220731]

adele dd:[20220101 TO 20220731]

Date discovered is stored on the index using a 1 minute level of accuracy so its possible to search for 15 minutes of fame:

"andy warhol" dd:[202201011205 TO 202201011220]

Although date discovered is indexed to 1 minute accuracy, it is stored with millisecond accuracy (using UTC) whereas our published date is accurate to the day (YYYY-MM-DD format).

No guarantee can be given to the accuracy of either date. Although the date published and date discovered should correspond to the article date, exceptions occasionally occur.

Date published is resolved using a best effort algorithm. Errors in the published date are possible due to issues such as timezones and formatting. Date discovered can be more accurate but is not without issues. The extracted dates tend to be more accurate for larger publications.

THE RESPONSE OBJECT

The expected response to a query is a HTTP 200 code and a JSON object. A typical (edited for display) response is shown below. In the case of a normal response, the following is typically returned:


{
    "result": "ok",
    "identifier": "xxxxxxxxx",
    "offset": 450,
    "totalHits": 44749,
    "availableHits": 2500,
    "pageHits": 50,    
    "results": [
        {
            "summary": "The long wait for a new Beyonce project will soon come to an end, with the singer officially posting a July 29 date (with pre-save and pre-buy options) for a project titled Renaissance.",
            "offset": 450,
            "id": 1133903645565,
            "published": "2022-06-16",
            "source": {
                "name": "Variety",
                "region": "US",
                "domainid": "variety.com",
                "sid": 4183,
                "home": "https://variety.com/"
            },
            "discoveredAt": "2022-06-16T10:16:18.940Z",
            "title": "Beyonce Reveals Renaissance, Forthcoming Project Set for July 29 Release",
            "url": "https://zenark.com/z/01QGu3WUD767cunVQ1nuS0139SPePZ7HpkDoY6ByYbj45B5u8SNoGPoisAwDKTyszkE1YZui43T6I6CK0FKvhA29xpfuH5HuZBaTWFIx8RXhEsdqWUS6dFsvfN00165Jg"
        },
        {
            "summary": "Crazy In Love singer Beyonce Knowles was spotted checking into a hotel in the Italian village of Portofino, on the same weekend Kourtney Kardashian ties the knot with Travis Barker at the luxury resort",
            "offset": 451,
            "id": 1130448846062,
            "published": "2022-05-22",
            "source": {
                "name": "Daily Mirror",
                "region": "GB",
                "domainid": "mirror.co.uk",
                "sid": 81,
                "home": "http://www.mirror.co.uk/"
            },
            "discoveredAt": "2022-05-22T10:28:48.114Z",
            "title": "Beyonce arrives at Kourtney Kardashian wedding destination - after 'feud with Kim'",
            "url": "https://zenark.com/z/11Q9E2yRZK9Q14Fpj9CfqaLbDEb9QDWHuFNSm7DwBHQ1t5zCnPNEHqCM85V1CVBr7VDOknOYd9D3K2R71DWKnRf2tSsnPm03vhKS2HbzCAf2GsxZKXeE31sIEy000B0bD"
        }
}

The response object is presented in JSON format and divided into three sections distinct:

the header containing the page and hit counts, an identifier
a JSON array containing result items
an optional JSON array of "beacon" offsets useful for pagination (not shown)

RESPONSE OBJECT FIELDS

This table outlines the items expected in a response object. Fields marked with may not available in all subscriptions.

Field	Notes
result	A string indicating the outcome of the query. Successful queries will have the result "ok" whereas failures will be flagged with "error". Error conditions are described in more detail below.
availableHits	The size of the result set. This value is dependent on the parameter "limit", the subscription type and pagesize. Generally the limit value provides 20 pages of page data (by default this will be 25 * 20). In turn, this value will determine the maximum number of beacons available for pagination.
totalHits	The total number of matching documents found. This may exceed the availableHits value. As well as the search expression, totalHits will be calculated on the basis of your subscription and filters applied.
pageHits	The number of items in the current page. This will correspond to the size of the results array.
offset	the offset in availableHits of the first item in this page
results[]	an array containing result items
results > id	a unique numeric (8 byte) identifier for this item
results > url	a unique link to the page associated with your API key.
results > publishedAt	the date our algorithms have extrapolated as the probable date of publication. It uses a basic ISO format YYYY-MM-DD.
results > discoveredAt	the timestamp reflecting the first discovery of this article by the Zenark platform. This may or may not correspond to the published date. This timestamp is presented in extended ISO format and is accurate to a millisecond. Note that the indexed value of date discovered is truncated to minute accuracy (ie you can't search with millisecond or even second accuracy).
results > offset	Offset of this item in the overall result set
results > title	the title that our algorithm suggests for the article. This is normally accurate.
result > summary	a short excerpt of the article. Depending on the summaryFormat parameter, this can be either the first few words of the article or it a context sensitive snippet with matching terms marked.
result > source > domainid	the domain part of the publisher URL. Most commonly the web address without the "www." part. The domain id is the persistent value used to identify a publisher. For example, the Zenark platform considers bbc.co.uk and bbc.com as separate publishers.
result > source > sid	a unique numeric identifier for the publication. The sid and domainid is a immutable relationship.
result > source > name	the name of the publication
result > source > home	publication home page
result > source > region	the region primarily associated with the publication
beacons > page	a beacon page number
beacons > skipTo	the id for the first document on the associated page
beacons > offset	the offset of the first item on the associated page

Note that totalHits >= availableHits >= pageHits.

For small result sets (with a single page) totalHits == availableHits == pageHits == results.length

A combination of the pagesize and limit parameters will dictate the value of availableHits and the number of beacons for pagination ( see below ).

OPEN A NEWS ITEM

An item can be opened using the results url item. This uses the static https://zenark.com/z/ endpoint with a unique reference. These references are valid for an undefined period and should not be stored by applications. The article id provides a method for persisting a reference to an article.

https://zenark.com/z/[results reference]

A unique reference is generated for each item in the result set. This allows the generation of metrics and reports into your clients usage of zenark API based applicaations.

You can track URL events in the console. This feature may not be available in all subscriptions

NOTE You should not resolve the URL in your server side code

BEACONS

Beacons are the preferred method to navigate results using a pagination style mechanism. Although it may appear complex, beacons allow a point-in-time snapshot of results avoiding many issues with the dynamic nature of search results.

Beacons are enabled by specifying the parameter beacons=true.

Beacons are page offsets generated at a specific point in time. They resist issues that could occur when the underlying index is modified as new results are added. To forward to a specific reference indicated by a beacon, use the skipTo=<id> query parameter described below.

FILTERS AND OTHER OPTIONS

NOTE A query can be constructed by simply passing in a value for the mandatory x parameter. Programmers can choose to use HTTP query string parameters to filter searches or allow users to directly filter by using query modifiers.

HTTP Parameter	Usage	Notes
x	The mandatory search expression or query	The search query as a URLEncoded object. This is the only mandatory parameter. Specific fields maybe searched using the field:[value] syntax. ?x=beyonce ?x=%22beyonce%20knowles%22 ?x=t%3Abeyonce ?x=d%3Anytimes.com
date	Specify the date range for the search	Filter by date. By default, this filters both the pd: and dd: fields. Dates are expected in YYYY-MM-DD format [2023-12-31]. A single date value creates a range extending to TODAY. ?x=rihanna&date=2022-01-01 ?x=rihanna&date=2020-01-01,2022-12-31 ?x=rihanna&date=2020-01-01&date=2022-12-31
dateFilter	Specify field used for date filtering	Permitted values are `discovered`, `published` or `both`. By default, `both` date fields are filtered. ?x=adele&date=2022-01-01&dateFilter=published
pagesize	The pagesize to use for the query	The default value for page size is 25. The maximum permitted value is 100 (developer tier) or 500 (others). Adjusting the page size increases the "cost" of the query as described below. ?x=adele&pagesize=100
sortOrder	Allows different ordering of results	Permitted values are `sortDefault`, `sortPublished` . By default, `sortDefault` will order based on the date discovered. ?x=adele&sortOrder=sortPublished
summaryFormat	Select format of summary output	Two options are permitted for the summary format. The default ("firstlines") shows the first few words of the article text. The option ("context") will show a snippet of content containing the matched text. The matching text is wrapped in a span tag to permit CSS formatting. A further parameter ("htmlClassname") allows a the CSS class to be named. ?x=adele&summaryFormat=context The default context classname of zenark maybe changed by specifying the ("htmlClassname"). ?x=adele &summaryFormat=context &htmlClassname=mycompanystyle This option is not available on the developer tier
skipTo	Reposition starting position to this document id	Used for pagination. The ID must exist otherwise the set will default to the first item in the result set. ?x=adele&skipTo=2435610461664222 Beacons are provided in ther response object that show the document id at specific page breaks. See below.
region	Filter by region(s)	A list of one or more comma separated region codes ?x=cher&region=gb ?x=cher&region=gb&region=ie ?x=cher&region=gb,ie

Note that items marked with are not available or maybe restricted depending on subscription type.

The date used for filtering can be selected using dateFilter. By default both dates are used when no filter is specified.

It is possible to include date filters as part of the HTTP query using the date parameter

?x=rihanna&date=2020-01-01,2022-12-31

is equivalent to this:

rihanna pd:[2020-01-01 TO 2022-12-31] dd:[2020-01-01 TO 2022-12-31]

since expression elements are joined automatically using an AND clause and the default date parameter combined both date discovered and date published.

Using date filters as part of the query string allows applications to restrict and ensure correctness of date inputs. For example, an application can use a pop-up calendar object to select a date range. Other fields can be treated in a similar fashion by application developers.

PASSING MULTIPLE VALUES

A number of parameters may accept multiple values. These include the date and region parameters.

Two options are permitted to submit multiple parameter values

&region=IE,US,NL,DE

&region=IE&region=US&region=NL&region=DE

USING FIELDS

Most fields are both indexed (searchable) and stored (are included in a response object).

Here is a simple example of using the title field to only search items with a term in the title.

t:beyonce

Here is a list of all fieldnames.

Field	Name	Notes
id	identfier	A unique immutable zenark identifier for the article.
t	title	the apparent title of the article
s	summary	a generated summary for the item
b	body	the apparent title of the article
pd	published date	published date
dd	discovered date	discovered date
sid	source id	A unique identifier for the source.
d	domain	A unique identifier for the source.
r	region	the region associated with the source
h	home	The home page for the source
sn	source name	the name of the publication

Fields marked with are indexed and maybe used in a query. Fields marked with are stored and can be returned as part of the response object

By default, a search will automatically includes both text fields (title and body).

The fields domain, sid and home are linked. If a site changes the domain part of the site address, we will discontinue the previous site and create a new site using the new domainid. This will result in a new domain, sid and a new home page. The sname field may remain the same.

ERROR CONDITIONS AND CODES

HTTP RESPONSE CODES

Similar to other web applications, success is indicated with the HTTP 200 code and the JSON "result" field. A response of Http 200 with a result of "ok" indicates that the request was processed without error. A failed request will include a non 200 http response code and the result field will contain the value "error".

Errors are generally indicated with 400, 401, 500 etc codes. This table shows the expected HTTP response codes along with a "cause" field.

The response object assocated with an error condition is JSON format and contains field values that can identify the root cause of the error.

code/result	cause	notes
`200/ok`	N/A	The request was received and executed. The response with be a JSON object representing the search results. See below for the structure of a successful response object
`400/error`	nox	The query is missing the mandatory x parameter
`400/error`	options	An option passed in the HTTP query string contains an error.
`400/error`	user	A user error is flagged. This generally indicates a malformed search query. Although a user error such as a missing quote or incorrect field name maybe to blame, this error is can also be associated with incorrect URLencoding of the x parameter in the request.
`401/error`	auth	The request has failed authentication. This maybe due to an invalid API key or deactivated account. The error object will contain an exact cause.
`403/error`	usage	The account has reached the daily quota of requests
`429/error`	rate	The account is exceeding the requests per second limit
`500/error`	platform	This indicates a transient error occurred with the Zenark service during the execution of a search.

EXAMPLE CODE

JAVA

Below we provide the code required to contact the https://api.zenark.com/v1/search endpoint and submit a simple query


		// the zenark search endpoint 
		String ENDPOINT = "https://api.zenark.com/v1/search";

        // our generic test query 
		String rawQuery = "t:beyonce t:shakira";

		// encode the query string 
		String encodedQuery = URLEncoder.encode(rawQuery, "UTF-8");
		
		// construct our URL which only contains the mandatory x parameter
		URL url = new URL(ENDPOINT + "?x=" + encodedQuery);

		HttpsURLConnection httpURLConnection 
		                 = (HttpsURLConnection) url.openConnection();

		// Set our authorisation header using the appropriate APIKEY 
		httpURLConnection.setRequestProperty("authorization", "ApiKey  zN0TaRealKey");
		
		// the HTTP status code 
		int httpResponseCode = httpURLConnection.getResponseCode();
		
		// status == 200
		if (httpResponseCode == HttpURLConnection.HTTP_OK) {

			// Read the JSON response object using the inputstream etc 
			InputStream is = connection.getInputStream(); 

            // Use UTF-8 
            Charset charset = Charset.forName("UTF-8"); 
            InputStreamReader isReader = new InputStreamReader(is, charset); 
            BufferedReader bufferedReader = new BufferedReader(isReader);

            // now use our buffered reader to create the JSONObject
            // we are using JSON from org.json.JSONObject
            JSONTokener tokener = new JSONTokener(bufferedReader);
            JSONObject json = new JSONObject(tokener);


		} else {


			// handle unexpected response 

		}

Cookie Warning

Contents

GETTING STARTED

MAKING A SEARCH REQUEST

SUBMIT SEARCH USING HTTP PARAMETERS

SEARCH SYNTAX

DATE FIELDS

THE RESPONSE OBJECT

RESPONSE OBJECT FIELDS

OPEN A NEWS ITEM

BEACONS

FILTERS AND OTHER OPTIONS

PASSING MULTIPLE VALUES

USING FIELDS

ERROR CONDITIONS AND CODES

HTTP RESPONSE CODES

EXAMPLE CODE

JAVA

the platform

the API guide

the FAQ

about zenark

jobs

contact

Cookie Warning

Contents

GETTING STARTED

MAKING A SEARCH REQUEST

SUBMIT SEARCH USING HTTP PARAMETERS

SEARCH SYNTAX

DATE FIELDS

THE RESPONSE OBJECT

RESPONSE OBJECT FIELDS

OPEN A NEWS ITEM

BEACONS

FILTERS AND OTHER OPTIONS

PASSING MULTIPLE VALUES

USING FIELDS

ERROR CONDITIONS AND CODES

HTTP RESPONSE CODES

EXAMPLE CODE

JAVA