ZENARK News API

Getting started

This is a draft document based on an early version of our API. Beta users have access to updated API documents.

The ZENARK News API is a simple HTTP REST API for searching news, blogs, press releases and more. Typically this type of data is described as unstructured. The ZENARK API transforms the data into structure data and allows access via a versatile query language.

Searches can be performed with simple search syntax similar to that used with other search engines. We also provide advanced features such as filtering by source, region or date. We can specify quality options to allow more precision in results.

Making a request

The ZENARK News API uses a simple API key for authorization. Key based authentication is well suited to the server to server communication model.

The endpoint is only presented for HTTPS traffic. We don't support plain HTTP exchanges. By only supporting HTTPS we ensure that the API key and request are encrypted on the wire.

Here is a typical request:

Authorization: ApiKey your-api-key 
			    	
https://api.zenark.com/v1/search?q=Beyonce

The APIKey is set as part of the authorization property in HTTP request headers.

The response is shown below (we've edited this for clarity):


"size": "1320",
"hits": "16997",
"results": [
    {
    "ref": "BAIyRdGw5",
    "discoveredAt": "2021-05-16 18:32:15+0000",
    "source": { 
        "sid": "2711",
        "domainid": "geo.tv",
        "name": "Geo News",
        "home": "https://www.geo.tv/",
        "region": "US"
        },
    "published": "2021-05-16",
    "title": "Seth Rogen hilariously recalls meeting Beyonce at Grammys",
    "summary": "Canadian-American actor Seth Rogen hilariously recalled how meeting Beyonce went wrong. The 
    			39-year-old shared on E! News\u2019 Daily Pop how he felt \u201chumiliated\u201d during a 
    			failed encounter with the iconic "    }
} 

A successful response will be indicated by HTTP response code 200. This returns a JSON structure similar to that shown. A failure will generally return a 40x code depending on the error involved. We describe error codes, the associated conditions and possible solutions in the API documentation.

Java Example

Below we provide the code required to contact the search endpoint and submit a simple query


		// the endpoint 
		String ENDPOINT = "https://api.zenark.com/v1/search";

		String rawQuery = "t:Beyonce t:shakira";

		// important to encode the query string;  particularly when it contains characters such as the :
		String encodedQuery = URLEncoder.encode(rawQuery, "UTF-8");
		
		// construct our URL 
		URL url = new URL(ENDPOINT + "?q=" + encodedQuery);

		HttpsURLConnection httpURLConnection = (HttpsURLConnection) url.openConnection();

		// Set our authorisation header using the appropriate APIKEY 
		httpURLConnection.setRequestProperty("authorization", "ApiKey  000000000000");
		
		// the HTTP status code 
		int httpResponseCode = httpURLConnection.getResponseCode();
		
		// status == 200
		if (httpResponseCode == HttpURLConnection.HTTP_OK) {

			// Read the JSON response object 
			// This can be parsed using any JSON compliant library 
			try (BufferedReader r = new BufferedReader(new InputStreamReader(httpURLConnection.getInputStream()))) {

				String line;

				while ((line = r.readLine()) != null) {

					System.out.println(line);
				}

			}

		} else {


			// handle unexpected response 

		}

			    	

Query API

We've already encountered the q parameter provided as part of the HTTP request. The table shows other parameters available with the search endpoint.

Parameter Name Notes
q query The query string as a URLEncoded object. A full description of the query syntax is provided below.
offset offset Allows pagination of results by 'fast-forwarding' to the specified item. The value should be a valid ZENARK document id.
limit limit Specify the maximum number of items returned by a search. The default value is 100.
id identifier A user defined identifier to allow tracing of search. Useful for debug purposes.

Search syntax

By default, a search will:

  • search the text fields in a document which are the title (t) and body (b)
  • use a default AND conjunction for the query
  • order the return documents using the apparent date of publication
  • Supported syntax

    syntax example notes
    term beyonce Matches documents containing the single term in default text fields. The default text fields are t (Title) and b (Body).
    "phrase" "beyonce knowles" Matches documents containing the exact phrase.
    term1 term2 beyonce shakira Returned documents MUST contain both terms in text fields. Terms default to AND conjunction comparisons.
    term1 OR term2 beyonce OR shakira Returns documents containing any of the specified terms.
    -term1 beyonce -shakira Removes documents containing the negated term. An expession containing a - operator must include other terms that return documents.
    field:"phrase" field:term t:shakira Search the title field for the specified term. Text fields are t (title) b body. See full list of fields below.
    phrase~proximity "beyonce shakira"~10 Search words occuring within a specific proximity. Our example returns matches when the terms are within 10 words of each other.
    date:[start TO end] pd:[20210101 TO 20210105] A range search can be applied to date fields. Two date fields are stored. The first processed date dd and the date the document was apparently published pd. Our example shows a search using the published date. Dates are represented using the YYYYMMDD format.
    keyfield:value r:ie Matches documents with a specific key field value. Key fields include r (region), d (domain). It is frequently useful to combine key fields using an OR clause. Results can be restricted to UK or Irish publications uses an OR clause on region (r:uk OR r:ie).
    (clauseA) (clauseB) (shakira OR beyonce) (d:bbc.com OR d:nytimes.com OR d:guardian.co.uk) Allows the construction of a nested or sub-query. Bracketed items are evaluated first. Our example searches titles from a small set of three publications by using the d domain key field.

    The search syntax allows the selection of specific fields as part of a query. Fields generally fall into these groups:

  • Text fields such as the title or body contain multiple strings
  • String fields contain a single search text item. Examples include the domain, region or document URL.
  • Date fields are similar to numeric fields since the naturally allow a specific date or a range
  • Field Full Name Notes
    t title The title
    d page domain the domain part of the URL (ie test.com and not www.test.com). See note (3) below.
    r region the apparent region associated with the publisher of the page. See note (4) below.

    The response object

    In common with other REST style applications, the API will return a HTTP Status code indicating success or failure. To provide additional context, the response will also contain a result field (see below) that provides a more detailed indication of the response content.

    Typically an error will appear as follows. If more detailed information is available this is also included in the JSON response.

    	{
    		"result":"error", 
    		"message":"missing search string"
    	} 

    In the case of a normal response, the following is typically returned:

    	
    "size": "1320",
    "hits": "16997",
    "results": [
        {
        "ref": "BAIyRdGw5",
        "discoveredAt": "2021-05-16 18:32:15+0000",
        "source": { 
            "sid": "2711",
            "domainid": "geo.tv",
            "name": "Geo News",
            "home": "https://www.news.tv/",
            "region": "US"
            },
        "published": "2021-05-16",
        "title": "Seth Rogen hilariously recalls meeting Beyonce at Grammys",
        "summary": "Canadian-American actor Seth Rogen hilariously recalled how meeting Beyonce went wrong. The 
        			39-year-old shared on E! News\u2019 Daily Pop how he felt \u201chumiliated\u201d during a 
        			failed encounter with the iconic "    }
    } 
    

    Handling errors

    It is recommended that HTTP Status codes are used in conjunction with JSON response object to correctly handle exception conditions. This will contain a status field.

  • "result":"ok" Indicates that the request was processed and the response contains a valid search response. This will always be paired with a HTTP status code of 200.
  • "result":"authorization" The authorization result is returned with HTTP status 401 indicating an authorization problem. The detail field will indicate a precise cause of failure.
  • "result":"fail" The search provide failed to parse indicating a user level error. The error response will include a detailed reason for the failure. The HTTP status will be 400.
  • "result":"error" An error response indicates a fundamental issue with the request such as a missing query. The detail field will provide an exact cause. An error also returns HTTP status 400.
  • It should be noted that HTTP Status code 400 is frequently ambiguous indicating either a coding issue or a user level error. Coding errors "result":"error" generally relate to missing or poorly encoded parameter values. User level errors "result":"fail" include unexpected values such as a string for a date field, missing quotes or invalid field names.

    HTTP status codes

    code cause notes
    200 OK The request was received and executed. The response with be a JSON object representing the search results. See below for the structure of a successful response object
    400 Bad Request A problem parsing the request. A 400 code indicates that the URL parameters could not be processed. Commonly this is due to a missing or poorly encoded q parameter. Make sure that your URL query string is escaped correctly using URLEncode or equivalent.
    400 Parser error
    401 Authorization error This indicates that the required authorization header is missing, formatted incorrectly or the key has expired. The detail field will indicate a precise reason for failure.
    503 Service error This indicates a transient error occurred with the ZENARK service.