GETTING STARTED

The Zenark News API is a set of tools that allow application developers to integrate news search functionality within their product. The API provides realtime access to the Zenark news search engine.

The Zenark search engine is a proven technology that allows searching of realtime and historical articles. Articles include news, blogs, press releases and other items published online.

Typically this type of data is described as unstructured. The Zenark API transforms the data into structured data and that can be accessed via a versatile search query language. The Zenark News API is based on the familiar HTTP REST standard allowing easy integration with leading programming languages and tools.

The basic search syntax is similar to that used with other search engines. Our advanced features such as filtering, sorting and formatting allow significant accuracy and relevance in results.

Before you start you will need the following:

  • a Zenark issued API key
  • a HTTP client to make requests
  • a JSON (JavaScript Object Notation) library to decode responses

NOTE A default key is sent in the welcome email. You can also access the console to manage API keys.

For clarity, this product is intended for developers with a reasonable proficiency with web 2.0 application development.

MAKING A SEARCH REQUEST

A search is submitted to the Zenark API endpoint by using a standard HTTPS request. The HTTP query string MUST include the x (expression) parameter with an associated search expression.

To execute a search perform these two steps by (a) set the HTTP authorization header and (b) submit the query with a value for the ?x parameter

(a) set the header value to include your API key

authorization: apikey your_api_key

(b) use the https endpoint to perform the query

https://api.zenark.com/v1/search?x=beyonce

You will find a default API key in initial welcome email sent by Zenark. Keys can be generated and managed using the API console.

SUBMIT SEARCH USING HTTP PARAMETERS

The basic query string parameter ?x= specifies the search expression. This is the only mandatory parameter. The other optional parameters enable filtering, data formatting and sorting features. These optional parameters are described below.

The expression should be encoded using standard URL encoding (percent). Pay particular attention to the encoding of quotes and spaces that may be used as part of the query.

A search expression (or query) is broken up into terms and operators. Terms maybe a single word such as "beyonce" or a phrase. A Phrase is a group of words surrounded by double quotes such as "beyonce knowles".

By default, terms and phrases use the AND conjunction.

"beyonce knowles" shakira
The encoded HTTP query for this search is:
?x=%22beyonce%20knowles%22%20shakira

This query searches for documents containing both the phrase "Beyonce Knowles" and the term shakira.

It is possible to add field names to the expression to filter searches beyond options available as HTTP query parameters. For example, we can search the title for the term "Beyonce Knowles":

t:"beyonce knowles"

See below for more examples of filtering using field names.

SEARCH SYNTAX

By default, a search will:

  • search the text fields in a document which are the title (t) and body (b)
  • use a default AND conjunction for the query
  • sort the results according to the "discovery date" with newer items at the top of the result set
  • Supported syntax

    syntax example notes
    term beyonce Matches documents containing the single term in default text fields. The default text fields are t (Title) and b (Body).
    "term1 term2" "beyonce knowles" A phrase is a combination of terms enclosed with quote characters. This will only match documents containing the exact phrase.
    term1 term2 beyonce shakira Returned documents MUST contain both terms in text fields. Terms default to AND conjunction comparisons.
    term1 OR term2 beyonce OR shakira Returns documents containing any of the specified terms.
    -term1 beyonce -shakira Removes documents containing the negated term. An expession containing a - operator must include other terms that return documents.
    field:"phrase" field:term t:shakira Search the title field for the specified term. Text fields are t (title) b body. See full list of fields below.
    phrase~proximity "beyonce shakira"~10 Search words occuring within a specific proximity. Our example returns matches when the terms are within 10 words of each other.
    pd:[start TO end] pd:[20210101 TO 20210105]

    dd:[20210101 TO 20210105]

    A range search can be applied for date fields. We store two distinct dates. The date the article was first discovered (discovery date dd) by our crawlers and the date item was "apparently" published (published date pd). Dates are represented using the YYYYMMDD format. Date fields are described in detail below.
    keyfield:value r:ie Matches documents with a specific key field value. Key fields include r (region), d (domain). It is frequently useful to combine key fields using an OR clause. Results can be restricted to French or Irish publications uses an OR clause on region (r:fr OR r:ie).
    (clauseA) (clauseB) (shakira OR beyonce) (d:bbc.com OR d:nytimes.com OR d:guardian.co.uk) Allows the construction of a nested or sub-query. Bracketed items are evaluated first. Our example searches titles from a small set of three publications by using the d domain key field.

    The search syntax allows the selection of specific fields as part of a query. Fields fall into two types:

  • The Text fields that containing one or more words. These are the title and body. These are the defacto search fields
  • Fields that contain a single term such as the date published, region, website domain or document URL. These can be further divided into text fields or date types.
  • DATE FIELDS

    Each article has two associated dates:

    • published (pd) the date our algorithm has determined as the probable publication date.
    • discovered (dd) the timestamp indicating when our software first identified this item.

    The fields dd: and pd: can be used in an expression to filter by date.

    adele pd:[20200101 TO 20220731]
    adele dd:[20220101 TO 20220731]

    Date discovered is stored on the index using a 1 minute level of accuracy so its possible to search for 15 minutes of fame:

    "andy warhol" dd:[202201011205 TO 202201011220]

    Although date discovered is indexed to 1 minute accuracy, it is stored with millisecond accuracy (using UTC) whereas our published date is accurate to the day (YYYY-MM-DD format).

    No guarantee can be given to the accuracy of either date. Although the date published and date discovered should correspond to the article date, exceptions occasionally occur.

    Date published is resolved using a best effort algorithm. Errors in the published date are possible due to issues such as timezones and formatting. Date discovered can be more accurate but is not without issues. The extracted dates tend to be more accurate for larger publications.

    THE RESPONSE OBJECT

    The expected response to a query is a HTTP 200 code and a JSON object. A typical (edited for display) response is shown below. In the case of a normal response, the following is typically returned:

    
    {
        "result": "ok",
        "identifier": "xxxxxxxxx",
        "offset": 450,
        "totalHits": 44749,
        "availableHits": 2500,
        "pageHits": 50,    
        "results": [
            {
                "summary": "The long wait for a new Beyonce project will soon come to an end, with the singer officially posting a July 29 date (with pre-save and pre-buy options) for a project titled Renaissance.",
                "offset": 450,
                "id": 1133903645565,
                "published": "2022-06-16",
                "source": {
                    "name": "Variety",
                    "region": "US",
                    "domainid": "variety.com",
                    "sid": 4183,
                    "home": "https://variety.com/"
                },
                "discoveredAt": "2022-06-16T10:16:18.940Z",
                "title": "Beyonce Reveals Renaissance, Forthcoming Project Set for July 29 Release",
                "url": "https://zenark.com/z/01QGu3WUD767cunVQ1nuS0139SPePZ7HpkDoY6ByYbj45B5u8SNoGPoisAwDKTyszkE1YZui43T6I6CK0FKvhA29xpfuH5HuZBaTWFIx8RXhEsdqWUS6dFsvfN00165Jg"
            },
            {
                "summary": "Crazy In Love singer Beyonce Knowles was spotted checking into a hotel in the Italian village of Portofino, on the same weekend Kourtney Kardashian ties the knot with Travis Barker at the luxury resort",
                "offset": 451,
                "id": 1130448846062,
                "published": "2022-05-22",
                "source": {
                    "name": "Daily Mirror",
                    "region": "GB",
                    "domainid": "mirror.co.uk",
                    "sid": 81,
                    "home": "http://www.mirror.co.uk/"
                },
                "discoveredAt": "2022-05-22T10:28:48.114Z",
                "title": "Beyonce arrives at Kourtney Kardashian wedding destination - after 'feud with Kim'",
                "url": "https://zenark.com/z/11Q9E2yRZK9Q14Fpj9CfqaLbDEb9QDWHuFNSm7DwBHQ1t5zCnPNEHqCM85V1CVBr7VDOknOYd9D3K2R71DWKnRf2tSsnPm03vhKS2HbzCAf2GsxZKXeE31sIEy000B0bD"
            }
    }
    

    The response object is presented in JSON format and divided into three sections distinct:

    • the header containing the page and hit counts, an identifier
    • a JSON array containing result items
    • an optional JSON array of "beacon" offsets useful for pagination (not shown)

    RESPONSE OBJECT FIELDS

    This table outlines the items expected in a response object. Fields marked with may not available in all subscriptions.

    Field Notes
    result A string indicating the outcome of the query. Successful queries will have the result "ok" whereas failures will be flagged with "error". Error conditions are described in more detail below.
    availableHits The size of the result set. This value is dependent on the parameter "limit", the subscription type and pagesize. Generally the limit value provides 20 pages of page data (by default this will be 25 * 20). In turn, this value will determine the maximum number of beacons available for pagination.
    totalHits The total number of matching documents found. This may exceed the availableHits value. As well as the search expression, totalHits will be calculated on the basis of your subscription and filters applied.
    pageHits The number of items in the current page. This will correspond to the size of the results array.
    offset the offset in availableHits of the first item in this page
    results[] an array containing result items
    results > id a unique numeric (8 byte) identifier for this item
    results > url a unique link to the page associated with your API key.
    results > publishedAt the date our algorithms have extrapolated as the probable date of publication. It uses a basic ISO format YYYY-MM-DD.
    results > discoveredAt the timestamp reflecting the first discovery of this article by the Zenark platform. This may or may not correspond to the published date. This timestamp is presented in extended ISO format and is accurate to a millisecond. Note that the indexed value of date discovered is truncated to minute accuracy (ie you can't search with millisecond or even second accuracy).
    results > offset Offset of this item in the overall result set
    results > title the title that our algorithm suggests for the article. This is normally accurate.
    result > summary a short excerpt of the article. Depending on the summaryFormat parameter, this can be either the first few words of the article or it a context sensitive snippet with matching terms marked.
    result > source > domainid the domain part of the publisher URL. Most commonly the web address without the "www." part. The domain id is the persistent value used to identify a publisher. For example, the Zenark platform considers bbc.co.uk and bbc.com as separate publishers.
    result > source > sid a unique numeric identifier for the publication. The sid and domainid is a immutable relationship.
    result > source > name the name of the publication
    result > source > home publication home page
    result > source > region the region primarily associated with the publication
    beacons > page a beacon page number
    beacons > skipTo the id for the first document on the associated page
    beacons > offset the offset of the first item on the associated page

    Note that totalHits >= availableHits >= pageHits.

    For small result sets (with a single page) totalHits == availableHits == pageHits == results.length

    A combination of the pagesize and limit parameters will dictate the value of availableHits and the number of beacons for pagination ( see below ).

    OPEN A NEWS ITEM

    An item can be opened using the results url item. This uses the static https://zenark.com/z/ endpoint with a unique reference. These references are valid for an undefined period and should not be stored by applications. The article id provides a method for persisting a reference to an article.

    https://zenark.com/z/[results reference]

    A unique reference is generated for each item in the result set. This allows the generation of metrics and reports into your clients usage of zenark API based applicaations.

    You can track URL events in the console. This feature may not be available in all subscriptions

    NOTE You should not resolve the URL in your server side code

    BEACONS

    Beacons are the preferred method to navigate results using a pagination style mechanism. Although it may appear complex, beacons allow a point-in-time snapshot of results avoiding many issues with the dynamic nature of search results.

    Beacons are enabled by specifying the parameter beacons=true.

    Beacons are page offsets generated at a specific point in time. They resist issues that could occur when the underlying index is modified as new results are added. To forward to a specific reference indicated by a beacon, use the skipTo=<id> query parameter described below.

    FILTERS AND OTHER OPTIONS

    NOTE A query can be constructed by simply passing in a value for the mandatory x parameter. Programmers can choose to use HTTP query string parameters to filter searches or allow users to directly filter by using query modifiers.

    HTTP
    Parameter
    Usage
     
    Notes
     
    x The mandatory search expression or query The search query as a URLEncoded object. This is the only mandatory parameter. Specific fields maybe searched using the field:[value] syntax.

    ?x=beyonce ?x=%22beyonce%20knowles%22 ?x=t%3Abeyonce ?x=d%3Anytimes.com

    date Specify the date range for the search Filter by date. By default, this filters both the pd: and dd: fields. Dates are expected in YYYY-MM-DD format [2023-12-31]. A single date value creates a range extending to TODAY.

    ?x=rihanna&date=2022-01-01 ?x=rihanna&date=2020-01-01,2022-12-31 ?x=rihanna&date=2020-01-01&date=2022-12-31

    dateFilter Specify field used for date filtering Permitted values are discovered, published or both. By default, both date fields are filtered.

    ?x=adele&date=2022-01-01&dateFilter=published

    pagesize The pagesize to use for the query The default value for page size is 25. The maximum permitted value is 100 (developer tier) or 500 (others). Adjusting the page size increases the "cost" of the query as described below. ?x=adele&pagesize=100
    sortOrder Allows different ordering of results Permitted values are sortDefault, sortPublished . By default, sortDefault will order based on the date discovered.

    ?x=adele&sortOrder=sortPublished

    summaryFormat Select format of summary output Two options are permitted for the summary format. The default ("firstlines") shows the first few words of the article text. The option ("context") will show a snippet of content containing the matched text. The matching text is wrapped in a span tag to permit CSS formatting. A further parameter ("htmlClassname") allows a the CSS class to be named.

    ?x=adele&summaryFormat=context

    The default context classname of zenark maybe changed by specifying the ("htmlClassname"). ?x=adele
    &summaryFormat=context
    &htmlClassname=mycompanystyle

    This option is not available on the developer tier

    skipTo Reposition starting position to this document id Used for pagination. The ID must exist otherwise the set will default to the first item in the result set. ?x=adele&skipTo=2435610461664222

    Beacons are provided in ther response object that show the document id at specific page breaks. See below.

    region Filter by region(s) A list of one or more comma separated region codes ?x=cher&region=gb ?x=cher&region=gb&region=ie ?x=cher&region=gb,ie

    Note that items marked with are not available or maybe restricted depending on subscription type.

    The date used for filtering can be selected using dateFilter. By default both dates are used when no filter is specified.

    It is possible to include date filters as part of the HTTP query using the date parameter

    ?x=rihanna&date=2020-01-01,2022-12-31

    is equivalent to this:

    rihanna pd:[2020-01-01 TO 2022-12-31] dd:[2020-01-01 TO 2022-12-31]

    since expression elements are joined automatically using an AND clause and the default date parameter combined both date discovered and date published.

    Using date filters as part of the query string allows applications to restrict and ensure correctness of date inputs. For example, an application can use a pop-up calendar object to select a date range. Other fields can be treated in a similar fashion by application developers.

    PASSING MULTIPLE VALUES

    A number of parameters may accept multiple values. These include the date and region parameters.

    Two options are permitted to submit multiple parameter values

    &region=IE,US,NL,DE
    &region=IE&region=US&region=NL&region=DE

    USING FIELDS

    Most fields are both indexed (searchable) and stored (are included in a response object).

    Here is a simple example of using the title field to only search items with a term in the title.

    t:beyonce

    Here is a list of all fieldnames.

    Field Name Notes
    id identfier A unique immutable zenark identifier for the article.
    t title the apparent title of the article
    s summary a generated summary for the item
    b body the apparent title of the article
    pd published date published date
    dd discovered date discovered date
    sid source id A unique identifier for the source.
    d domain A unique identifier for the source.
    r region the region associated with the source
    h home The home page for the source
    sn source name the name of the publication

    Fields marked with are indexed and maybe used in a query. Fields marked with are stored and can be returned as part of the response object

    By default, a search will automatically includes both text fields (title and body).

    The fields domain, sid and home are linked. If a site changes the domain part of the site address, we will discontinue the previous site and create a new site using the new domainid. This will result in a new domain, sid and a new home page. The sname field may remain the same.

    ERROR CONDITIONS AND CODES
    HTTP RESPONSE CODES

    Similar to other web applications, success is indicated with the HTTP 200 code and the JSON "result" field. A response of Http 200 with a result of "ok" indicates that the request was processed without error. A failed request will include a non 200 http response code and the result field will contain the value "error".

    Errors are generally indicated with 400, 401, 500 etc codes. This table shows the expected HTTP response codes along with a "cause" field.

    The response object assocated with an error condition is JSON format and contains field values that can identify the root cause of the error.

    code/result cause notes
    200/ok N/A The request was received and executed. The response with be a JSON object representing the search results. See below for the structure of a successful response object
    400/error nox The query is missing the mandatory x parameter
    400/error options An option passed in the HTTP query string contains an error.
    400/error user A user error is flagged. This generally indicates a malformed search query. Although a user error such as a missing quote or incorrect field name maybe to blame, this error is can also be associated with incorrect URLencoding of the x parameter in the request.
    401/error auth The request has failed authentication. This maybe due to an invalid API key or deactivated account. The error object will contain an exact cause.
    403/error usage The account has reached the daily quota of requests
    429/error rate The account is exceeding the requests per second limit
    500/error platform This indicates a transient error occurred with the Zenark service during the execution of a search.
    EXAMPLE CODE
    JAVA

    Below we provide the code required to contact the https://api.zenark.com/v1/search endpoint and submit a simple query

    
    		// the zenark search endpoint 
    		String ENDPOINT = "https://api.zenark.com/v1/search";
    
            // our generic test query 
    		String rawQuery = "t:beyonce t:shakira";
    
    		// encode the query string 
    		String encodedQuery = URLEncoder.encode(rawQuery, "UTF-8");
    		
    		// construct our URL which only contains the mandatory x parameter
    		URL url = new URL(ENDPOINT + "?x=" + encodedQuery);
    
    		HttpsURLConnection httpURLConnection 
    		                 = (HttpsURLConnection) url.openConnection();
    
    		// Set our authorisation header using the appropriate APIKEY 
    		httpURLConnection.setRequestProperty("authorization", "ApiKey  zN0TaRealKey");
    		
    		// the HTTP status code 
    		int httpResponseCode = httpURLConnection.getResponseCode();
    		
    		// status == 200
    		if (httpResponseCode == HttpURLConnection.HTTP_OK) {
    
    			// Read the JSON response object using the inputstream etc 
    			InputStream is = connection.getInputStream(); 
    
                // Use UTF-8 
                Charset charset = Charset.forName("UTF-8"); 
                InputStreamReader isReader = new InputStreamReader(is, charset); 
                BufferedReader bufferedReader = new BufferedReader(isReader);
    
                // now use our buffered reader to create the JSONObject
                // we are using JSON from org.json.JSONObject
                JSONTokener tokener = new JSONTokener(bufferedReader);
                JSONObject json = new JSONObject(tokener);
    
    
    		} else {
    
    
    			// handle unexpected response 
    
    		}