Install Solr Service


curl http://localhost:8983/solr/demo/update -d '
  "id" : "book1",
  "title_t" : "The Way of Kings",
  "author_s" : "Brandon Sanderson"


curl http://localhost:8983/solr/demo/get?id=book1
  "doc": {
    "id" : "book1",
    "author_s": "Brandon Sanderson",
    "title_t" : "The Way of Kings",
    "_version_": 1410390803582287872

Of course for queries, you can always just use your browser and click on the link


or cut’n’paste the URL into your browser and modify the query directly in the address bar to try out different requests.



Dynamic Fields

The “id” field is already pre-defined in every schema. Lucene and Solr need to know the types of fields so that they can be indexed in the correct way.

There are a number of options for defining new fields:

  • Edit the schema.xml file to define the fields.
  • Use the Schema API to add the new fields.
  • Use dynamicFields, a form of convention-over-configuration that maps field names to field types based on patterns in the field name. For example, every field ending in “_i” is taken to be an integer.
  • Use “schemaless” mode, where field types are auto-detected (guessed) based on the first value seen for that field

For other document fields in this tutorial, we have chosen to use convention over configuration via dynamic fields.

Dynamic fields includes the essential benefits of schemaless – namely the ability to add new fields on the fly without having to pre-define them.

Our schema has some common dynamicField patterns defined for use:

Field Suffix  Multivalued Suffix  Type  Description
_t  _txt  text_general  Indexed for full-text search so individual words or phrases may be matched.
_s  _ss   string  A string value is indexed as a single unit. This is good for sorting, faceting, and analytics. It’s not good for full-text search.
_i  _is   int   a 32 bit signed integer
_l  _ls   long  a 64 bit signed long
_f  _fs   float   IEEE 32 bit floating point number (single precision)
_d  _ds   double  IEEE 64 bit floating point number (double precision)
_b  _bs   boolean   true or false
_dt   _dts  date  A date in Solr’s date format
_p    location  A lattitude and longitude pair for geo-spatial search


Let’s update book1 and add cat_s, a category field, a publication year, and an ISBN. Via dynamic fields, a field name ending with _i tells Solr to treat the value as an integer, while a field name ending with _s is treated as a string.

$ curl http://localhost:8983/solr/demo/update -d '
 {"id"         : "book1",
  "cat_s"      : { "add" : "fantasy" },
  "pubyear_i"  : { "add" : 2010 },
  "ISBN_s"     : { "add" : "978-0-7653-2635-5" }

Now go ahead and ask for the document back, and you should see the new fields:

curl http://localhost:8983/solr/demo/get?id=book1

See Atomic Updates for more document update options.


First, lets add a few more documents so we have something to search for. This time we’ll demonstrate indexing documents in CSV (comma separated values) format:

curl http://localhost:8983/solr/demo/update?commitWithin=5000 -H 'Content-type:text/csv' -d '
book1,fantasy,2010,The Way of Kings,Brandon Sanderson,The Stormlight Archive,1,Tor
book2,fantasy,1996,A Game of Thrones,George R.R. Martin,A Song of Ice and Fire,1,Bantam
book3,fantasy,1999,A Clash of Kings,George R.R. Martin,A Song of Ice and Fire,2,Bantam
book4,sci-fi,1951,Foundation,Isaac Asimov,Foundation Series,1,Bantam
book5,sci-fi,1952,Foundation and Empire,Isaac Asimov,Foundation Series,2,Bantam
book6,sci-fi,1992,Snow Crash,Neal Stephenson,Snow Crash,,Bantam
book7,sci-fi,1984,Neuromancer,William Gibson,Sprawl trilogy,1,Ace
book8,fantasy,1985,The Black Company,Glen Cook,The Black Company,1,Tor
book9,fantasy,1965,The Black Cauldron,Lloyd Alexander,The Chronicles of Prydain,2,Square Fish
book10,fantasy,2001,American Gods,Neil Gaiman,,,Harper'

We added the commitWithin=5000 parameter to indicate that we would like our updates to be visible within 5000 milliseconds (5 seconds). The Lucene library that Solr uses for full-text search works off of point-in-time snapshots that must be periodically refreshed in order for queries to see new changes.

Note that although we often use JSON in our examples, Solr is actually data format agnostic – you’re not artificially tied to any particular transfer-syntax or serialization format such as JSON or XML.

Your First Solr Search Request

Now let’s query our book collection! For example, we can find all books with “black” in the title field:


The fl parameter stands for “field list” and specifies what stored fields should be returned from documents matching the query. We should see a result like the following:

      "title_t":"The Black Company",
      "author_s":"Glen Cook"},
      "title_t":"The Black Cauldron",
      "author_s":"Lloyd Alexander"}]

See Solr Query for more solr query examples and syntax.

Solr Search Request in JSON

If you prefer using JSON to search the index, you can use the JSON Request API:

curl http://localhost:8983/solr/demo/query -d '
  "query" : "title_t:black",
  "fields" : ["title_t", "author_s"]

Sorting and Paging Search Results

By default, Solr will return the top 10 documents ordered by highest score (relevance) first. Let’s change things up and return the top 3 search results, limiting them to books published by Bantam, and sorting by publication year descending:

curl http://localhost:8983/solr/demo/query -d '
sort=pubyear_i desc&

And we get the response as requested:

      "title_t":["A Clash of Kings"]},
      "title_t":["A Game of Thrones"]},
      "title_t":["Snow Crash"]}]

Parameter Explanation:

q=: – The : query matches all documents in the index. fq=publisher_s:Bantam – “fq” parameters are filter queries, which don’t affect scoring, but filter out documents that don’t match the given query. These are cached separately and reused across different requests, greatly accelerating throughput. See Advanced Filter Caching in Solr for more details. sort=pubyear_i desc – This sorts on the “pubyear_i” field descending. Solr has many advanced sorting options such as tie-break sorts and sorting by a function of document fields! rows=3 – “rows” specifies the number of results to return, while “start” specifies an offset into the sorted list for paging purposes. Also see Deep Paging for options to efficiently page deeply into result sets.


Facet Functions & Analytics

Sub-Facets / Nested Facets


17 Apr, 2015 in analytics / facets / json / solr / solr search tagged facet aggregations / facet functions / facet statistics / Faceted Search / JSON facet API / JSON facets / nested aggregations / nested facets / query facet / range facet / solr aggregations / solr facet / solr facet example / subfacets / terms facet by yonik (updated on April 29, 2015)

Facet Types
Testing Using Curl
Terms Facet
Query Facet
Range Facet
Related page: Facet Functions
Related page: Sub-Facets


Solr 5.1 has a completely re-written faceted search module with a structured JSON API to control the faceting commands. NOTE: Some examples use syntax only supported in Solr 5.2! Download a Solr 5.2 snapshot to try them out.

The structured nature of nested sub-facets are more naturally expressed in a nested structure like JSON rather than the flat structure that normal query parameters provide.

Goals of the new Faceting Module:

First class JSON support
Easier programmatic construction of complex nested facet commands
Support a much more canonical response format that is easier for clients to parse
First class analytics support
Ability to sort facet buckets by any calculated metric
Support a cleaner way to do distributed faceting
Support better integration with other search features

Of course if you prefer to use Solr’s existing faceting capabilities, that’s fine too. You can even use both at once if you want!

UPDATE: The JSON Facet API is now part of the JSON Request API, so a complete request may be expressed in JSON. Ease of Use

Some of the ease-of-use enhancements over traditional Solr faceting come from the inherent nested structure of JSON. As an example, here is the faceting command for two different range facets using Solr’s flat API:

&facet=true &facet.range={!key=age_ranges}age &f.age.facet.range.start=0 &f.age.facet.range.end=100 & &facet.range={!key=price_ranges}price &f.price.facet.range.start=0 &f.price.facet.range.end=1000 &

And here is the equivalent faceting command in the new JSON Faceting API: { age_ranges: { type : range field : age, start : 0, end : 100, gap : 10 } , price_ranges: { type : range field : price, start : 0, end : 1000, gap : 50 } }

These aren’t even nested facets, but already one can see how much nicer the JSON API looks. With deeply nested sub-facets and statistics, the clarity of the inherently nested JSON API only grows. JSON extensions

A number of JSON extensions have been implemented to further increase the clarity and ease of constructing a JSON faceting command by hand. For example: { // this is a single-line comment, which can help add clarity to large JSON commands / traditional C-style comments are also supported / x : "avg(price)" , // Simple strings can occur unquoted y : 'unique(manu)' // Strings can also use single quotes (easier to embed in another String) }

Debugging JSON

Nicely indented JSON is very easy to understand. If you get a large piece of non-indented JSON somehow, and are trying to make sense of it, you can cut and paste into one of the online validators: Both of these validators will indent your JSON, even when it contains extensions unsupported by them (such as comments or bare strings).

Facet Types

There are two types of facets, one that breaks up the domain into multiple buckets, and aggregations / facet functions that provide information about the set of documents belonging to each bucket.

Faceting can be nested! Any bucket produced by faceting can further be broken down into multiple buckets by a sub-facet.

Statistics are facets

Statistics are now fully integrated into faceting. Since we start off with a single facet bucket with a domain defined by the main query and filters, we can even ask for statistics for this top level bucket, before breaking up into further buckets via faceting. Example: json.facet={ x : "avg(price)", // the average of the price field will appear under "x" y : "unique(manufacturer)" // the number of unique manufacturers will appear under "y" }

See facet functions for a complete list of the available aggregation functions. JSON Facet Syntax

The general form of the JSON facet commands are:

: { : } Example: top_authors : { terms : { field : authors, limit : 5 } }

After Solr 5.2, a flatter structure with a “type” field may also be used:

: { "type" : , } Example: top_authors : { type : terms, field : authors, limit : 5 }

The results will appear in the response under the facet name specified. Facet commands are specified using json.facet request parameters. Test Using Curl

To test out different facet requests by hand, it’s easiest to use “curl” from the command line. Example: $ curl http://localhost:8983/solr/query -d 'q=:&rows=0& json.facet={ categories:{ type : terms, field : cat, sort : { x : desc}, facet:{ x : "avg(price)", y : "sum(price)" } } } '

Terms Facet

The terms facet, or field facet, produces buckets from the unique values of a field. The field needs to be indexed or have docValues.

The simplest form of the terms facet { top_genres : { terms : genre_field } }

An expanded form allows for more parameters: { top_genres : { type : terms, field : genre_field, limit : 3, mincount : 2 } }

Example response: "top_genres":{ "buckets":[ { "val":"Science Fiction", "count":143}, { "val":"Fantasy", "count":122}, { "val":"Biography", "count":28} ] }


field – The field name to facet over.
offset – Used for paging, this skips the first N buckets. Defaults to 0.
limit – Limits the number of buckets returned. Defaults to 10.
mincount – Only return buckets with a count of at least this number. Defaults to 1.
sort – Specifies how to sort the buckets produced. “count” specifies document count, “index” sorts by the index (natural) order of the bucket value. One can also sort by any facet function / statistic that occurs in the bucket. The default is “count desc”. This parameter may also be specified in JSON like sort:{count:desc}. The sort order may either be “asc” or “desc”
missing – A boolean that specifies if a special “missing” bucket should be returned that is defined by documents without a value in the field. Defaults to false.
numBuckets – A boolean. If true, adds “numBuckets” to the response, an integer representing the number of buckets for the facet (as opposed to the number of buckets returned). Defaults to false.
allBuckets – A boolean. If true, adds an “allBuckets” bucket to the response, representing the union of all of the buckets. For multi-valued fields, this is different than a bucket for all of the documents in the domain since a single document can belong to multiple buckets. Defaults to false.
prefix – Only produce buckets for terms starting with the specified prefix.

Query Facet

The query facet produces a single bucket that matches the specified query.

An example of the simplest form of the query facet { high_popularity : { query : "popularity:[8 TO 10]" } }

An expanded form allows for more parameters (or sub-facets / facet functions): { high_popularity : { type : query, q : "popularity:[8 TO 10]", facet : { average_price : "avg(price)" } } }

Example response: "high_popularity" : { "count" : 147, "average_price" : 74.25 }

Range Facet

The range facet produces multiple range buckets over numeric fields or date fields.

Range facet example: { prices : { type : range, field : price, start : 0, end : 100, gap : 20 } }

Example response: "prices":{ "buckets":[ { "val":0.0, // the bucket value represents the start of each range. This bucket covers 0-20 "count":5}, { "val":20.0, "count":3}, { "val":40.0, "count":2}, { "val":60.0, "count":1}, { "val":80.0, "count":1} ] }

To ease migration, these parameter names, values, and semantics were taken directly from the old-style (non JSON) Solr range faceting.


field – The numeric field or date field to produce range buckets from
mincount – Minimum document count for the bucket to be included in the response. Defaults to 0.
start – Lower bound of the ranges
end – Upper bound of the ranges
gap – Size of each range bucket produced
hardend – A boolean, which if true means that the last bucket will end at “end” even if it is less than “gap” wide. If false, the last bucket will be “gap” wide, which may extend past “end”.
other – This param indicates that in addition to the counts for each range constraint between facet.range.start and facet.range.end, counts should also be computed for…
    "before" all records with field values lower then lower bound of the first range
    "after" all records with field values greater then the upper bound of the last range
    "between" all records with field values between the start and end bounds of all ranges
    "none" compute none of this information
    "all" shortcut for before, between, and after
include – By default, the ranges used to compute range faceting between facet.range.start and facet.range.end are inclusive of their lower bounds and exclusive of the upper bounds. The “before” range is exclusive and the “after” range is inclusive. This default, equivalent to lower below, will not result in double counting at the boundaries. This behavior can be modified by the facet.range.include param, which can be any combination of the following options…
    "lower" all gap based ranges include their lower bound
    "upper" all gap based ranges include their upper bound
    "edge" the first and last gap ranges include their edge bounds (ie: lower for the first one, upper for the last one) even if the corresponding upper/lower option is not specified
    "outer" the “before” and “after” ranges will be inclusive of their bounds, even if the first or last ranges already include those boundaries.
    "all" shorthand for lower, upper, edge, outer

Solr Facet Functions and Analytics

17 Apr, 2015 in analytics / facets / grouping / solr tagged facet analytics / facet by function / facet statistics / Faceted Search / histogram facet / JSON facets / realtime analytics / solr analytics / solr facet / solr facet example / solr facets / solr statistics by yonik (updated on April 29, 2015)

NOTE: Some examples use syntax only supported in Solr 5.2! Download a Solr 5.2 snapshot to try them out.

Traditional faceted search (also called guided navigation) involves counting search results that belong to categories (also called facet constraints). The new facet functions in Solr extends normal faceting by allowing additional aggregations on document fields themselves. Combined with the new Sub-facet feature, this provides powerful new realtime analytics capabilities. Also see the page about the new JSON Facet API.

Aggregation Functions

Faceting involves breaking up the domain into multiple buckets and providing information about each bucket. There are multiple aggregation functions / statistics that can be used: Aggregation Example Effect sum sum(sales) summation of numeric values avg avg(popularity) average of numeric values sumsq sumsq(rent) sum of squares min min(salary) minimum value max max(mul(price,popularity)) maximum value unique unique(state) number of unique values percentile percentile(salary,50,75,99,99.9) calculates percentiles

Numeric aggregation functions such as avg can be on any numeric field, or on another function of multiple numeric fields.

Simple Example

The faceting domain starts with the set of documents that match the main query and filters. We can ask for statistics over this whole set of documents:

http://localhost:8983/solr/query?q=*:*& json.facet={x:'avg(price)'}

And the response will contain a facets section: [...] "facets":{ "count":32, "x":164.10218846797943 } [...]

If we want to break up the domain into buckets and then calculate a function per bucket, we simply add a nested facet command to the facet parameters. For example (using curl this time): $ curl http://localhost:8983/solr/query -d 'q=:& json.facet={ categories:{ type : terms, // terms facet creates a bucket for each indexed term (or value) in the field field : cat, facet:{ x : "avg(price)", y : "sum(price)" } } } '

The response will contain the two stats we asked for in each category bucket. [...] "facets":{ "count":32, "categories":{ "buckets":[ { "val":"electronics", "count":12, "x":231.02666823069254, "y":2772.3200187683105 }, { "val":"memory", "count":3, "x":86.66333262125652, "y":259.98999786376953 }, [...]

Facet Sorting

The default sort for a field or terms facet is by bucket count descending. We can optionally sort ascending or descending by any facet function that appears in each bucket. For example, if we wanted to find the top buckets by average price, then we would add sort:"x desc" to the previous facet request: $ curl http://localhost:8983/solr/query -d 'q=:& json.facet={ categories:{ type : terms, field : cat, sort : "x desc", // can also use sort:{x:desc} facet:{ x : "avg(price)", y : "sum(price)" } } } '

Try it out

Facet functions and Subfacets are currently only in Solr 5.1. Download the latest release and give it a spin!

Solr Subfacets

NOTE: Some examples use syntax only supported in Solr 5.2! Download a Solr 5.2 snapshot to try them out.

Subfacets (also called Nested Facets) is a more generalized form of Solr’s current pivot faceting that allows adding additional facets for every bucket produced by a parent facet.

Subfacet advantages over pivot faceting:

Subfacets work with facet functions (statistics), enabling powerful real-time analytics
Can add a subfacet to any facet type (field, query, range)
A subfacet can be of any type (field/terms, query, range)
A given facet can have multiple subfacets
Just like top-level facets, each subfacet can have it’s own configuration (i.e. offset, limit, sort, stats)

Subfacet Syntax

Subfacets are part of the new Facet Module, and are naturally expressed in the JSON Facet API. Every facet command is actually a sub-facet since there is an implicit top-level facet bucket (the domain) defined by the documents matching the main query and filters. Simply add a facet section to the parameters of any existing facet command.

For example, a terms facet on the “genre” field looks like:

top_genres:{ type: terms, field: genre, limit: 5 }

Now if we wanted to add a subfacet to find the top 4 authors for each genre bucket:

top_genres:{ type: terms, field: genre, limit: 5, facet:{ top_authors:{ type: terms, field: author, limit: 4 } } }

Complex Subfacet Examples

Assume we want to do the following complex faceting request:

Facet on the “genre” field and find the top buckets
For ever “genre” bucket generated above, find the top 7 authors
For ever “genre” bucket, create a bucket of high popularity items (defined by popularity 8 – 10) and call it “highpop”
For ever “highpop” bucket generated above, find the top 5 publishers

In short, this request finds the top authors for each genre and finds the the top publishers for high popularity books in each genre. Using the JSON Facet API, the full request (using curl) would look like the following:

$ curl http://localhost:8983/solr/query -d 'q=:& json.facet= { top_genres:{ type: terms, field: genre, facet:{ top_authors: { type : terms, // nested terms facet field: author, limit: 7 }, highpop:{ type : query, // nested query facet q: "popularity:[8 TO 10]", // lucene query string facet:{ publishers:{ type: terms, // nested terms facet under the nested query facet field: publisher, limit: 5 } } } } } } '

An example response would look like the following:

[...] "facets":{ "top_genres":{ "buckets":[{ "val":"Fantasy", "count":5432, "top_authors":{ // these are the top authors in the "Fantasy" genre "buckets":[{ "val":"Mercedes Lackey", "count":121}, { "val":"Piers Anthony", "count":98}]}}, "highpop":{ // bucket for books in the "Fantasy" genre with popularity between 8 and 10 "count":876 "publishers":{ // top publishers in this bucket (highpop fantasy) "buckets":[{ "val":"Bantam Books", "count":346}, { "val":"Tor", "count":217}]}},

      "val":"Science Fiction",  // the next genre bucket


All the reporting and sorting was done using document count (i.e. number of books). If instead, we wanted to find top authors by total revenue (assuming we had a “sales” field), then we could simply change the author facet from the previous example as follows:

    type: terms,
    field: author,
    limit: 7,
    sort: "revenue desc",
      revenue: "sum(sales)"

Try it out

Facet functions and Subfacets are in Solr 5.1 and later. Download the latest release and give it a spin!

results matching ""

    No results matching ""