IndexedItemSearchProvider design notes

Requirements

Should be file based, not requiring any additional backends
Should be performant enough to search datasets with as many as 500000 features
Must be able to search numeric, text or enum properties

In the remaining section we describe the structure of the index.

indexRoot.json

indexRoot captures the overall structure of the index. It has the following fields:

idProperty
- Required
- Name of the feature property that is used as ID for indexing. This is also sometimes used by the catalog item to uniquely identify & highlight the selected feature.
resultsDataUrl: string
- Required
- URL of the CSV results data file mapping a feature by its ID to result data associated with the feature.
indexes: Record<string, Index>
- Required
- An object whose keys are the property names and values are the corresponding Index definition.

Results data file

A CSV file mapping an indexed feature to result data using its ID.

eg:

"building_id","latitude","longitude","height","street_address"
"abc1","45.0","100.0","120","abc def"

It should contain a header for each column. It should also have a column for the idProperty specified in the indexRoot.json file. Terria also recongnizes a few special columns which it uses to construct a target to zoom to when the user selects the result.

latitude
- Required
- The latitude of the feature
longitude
- Required
- The longitude of the feature
height
- Optional
- The height of the feature
radius
- Optional
- The radius of the bounding sphere containing the feature

A zoom target is constructed using the latitude, longitude and height or the radius whichever is known. height is the height of the feature and radius is a radius of the bounding sphere to zoom to.

Index types

Terria implements the following index types:

Numeric Index

Numeric index is used for searching numeric properties. It can be used for searching features with numeric values within a range, for eg: Buildings with height between 100m and 180m. Numeric index is represented as an array of [id, value] pairs sorted by value. This makes it easy to perform a binary search on the index.

Definition

type: "numeric"
- Required
range: {min: number, max: number}
- Required
- The range of values in the index.
url: string
- Required
- URL of the numeric index file

eg:

{
  type: "numeric",
  range: {min: 7.58, max: 92901.63},
  url: "1.csv"
}

Numeric index file

A numeric index file is a CSV file sorted by its value. The file must have two named columns dataRowId and value. The dataRowId is the index of the corresponding feature in the results data file.

"dataRowId","value"
"332600","7.58"
...
"63462","92901.63"

Enum Index

Enum index is useful for searching fixed list of strings, eg: Roof material property can have a fixed set of values like Tile, Metal, Fiberglass, Concrete, Plastic etc. An enum index contains a sub-index for each of its value. The sub-index is simply a list of feature IDs that have that value.

Definition

type: "enum"
- Required
values: Record<string, EnumValue>
- Required
- An object whose keys are the enum string and value defines the enum value index.

eg:

"ROOF_MATERIAL": {
  "type": "enum",
  "values": {
    "Unclassified": { "count": 273889, "url": "3-0.csv" },
    "Tile": { "count": 130063, "url": "3-1.csv" },
    "Metal": { "count": 113671, "url": "3-2.csv" },
    "Fiberglass/Plastic": { "count": 5653, "url": "3-3.csv" },
    "Flat Concrete": { "count": 2476, "url": "3-4.csv" },
    "": { "count": 20454, "url": "3-5.csv" }
  }
}

Enum value index

Defines the index for a single enum member.

Definition

count: number
- Required
- Number of features that have this enum value.
url: string
- Required
- URL of the enum value index file.

Enum value index file

The enum value index file is a CSV file with a single column named dataRowId which is the index of the corresponding feature in the results data file.

eg:

"dataRowId"
"1"
"2"
...
"546205"

Text Index

Text index is used for searching arbitrary text properties, for eg: street address. We use Minisearch for generating and searching text index.

Definition

type: "text"
Required
url: string
Required
URL of the text index file.

eg:

 { "type": "text", "url": "2.json" }

Text index file

Text index file is a JSON file with the following structure:

index: MiniSearch
- Required
- The searialized Minisearch index instance
options: MiniSearchOptions
- Required
- The options used to create the MiniSearch instance.

Why CSV and not JSON?

In our testing we found that parsing CSV in javascript is significantly faster than parsing JSON (a difference of 2-3secs). So we decided to use CSV for representing numeric & enum indexes. Text index is persisted as a JSON searialization of the Minisearch instance.