Documentation for search engines

Search engines adopting the IndexNow protocol agree to support the protocol and future evolution of the protocol.
To participate, a search engine MUST have a noticeable presence in at least one market or be closely linked to the search market and make a significant contribution to the number of url submissions. Search Engines MUST also agree to share URLs received by its /indexnow endpoint with other search engines participating in the protocol in a timely manner.

Search Engine Metadata and Identity


The IndexNow protocol website has a well-known address (https://www.indexnow.org/searchengines.json) where the reference to the metadata and identity file of every participating search engine is listed in the following form:

                {
                  "<se_1_id>": "<se_1_indexnow_meta_json_url>",
                  "<se_2_id>": "<se_2_indexnow_meta_json_url>"
                }
            

Each participating search engine MUST host a meta.json file (e.g., https://<se_1_hostname>/indexnow/meta.json) to identify itself and provide necessary metadata to support the protocol. The meta.json file has the following attributes and structure (which may evolve over time with the advancement of the protocol):

                {
                  "id": "<se_1_id>",
                  "api": "<se_1_indexnow_endpoint_url>",
                  "host": "<se_1_hostname>",
                  "logs": "<se_1_indexnow_logs_manifest_url>",
                  "name": "<se_1_name>",
                  "homepage": "<se_1_homepage_url>",
                  "logo": "<se_1_logo_https_or_data_url>",
                  "unsubscribe": false,
                  "notifierIPs": [
                    {"ipv4Prefix": "<se_1_ipv4_cidr_1>"},
                    {"ipv4Prefix": "<se_1_ipv4_cidr_2>"},
                    {"ipv6Prefix": "<se_1_ipv6_cidr_1>"},
                    {"ipv6Prefix": "<se_1_ipv6_cidr_2>"}
                  ],
                  "publicKeys": [
                    "<se_1_public_key_1>",
                    "<se_1_public_key_2>"
                  ]
                }
            

The api field of the meta.json file holds the absolute URI (i.e., with https protocol scheme and FQDN hostname) of the /indexnow endpoint. The API URI can have a different hostname (e.g., a subdomain or a different domain name) controlled by (or delegated to) the search engine than the domain name of the search engine itself. The host field holds the hostname of the search engine. The id field holds the short name/identifier of the search engine, which must be unique (ideally, a single token, without any spaces or special characters). This field is required to establish association with entries in the https://www.indexnow.org/searchengines.json.

The logs field points to a manifest URI that lists URIs of recent logs for asynchronous access. Logging is described further in the Logging/Retention section below.

The name, homepage, and logo fields are not mandatory for the protocol, but can be used in rich listing of protocol participants.

The unsubscribe field holds a boolean value to tell all participating entities, when set to to true, that the search engine described by the meta.json file is not interested in getting realtime push-style notifications of verified URLs. The default value of the field is false. The field name is chosen as unsubscribe instead of subscribe so that the absence of the field would mean that the search engine should be notified by all.

The notifierIPs is an array of network addresses in Classless Inter-Domain Routing (CIDR) notation to advertise IP ranges that other partners should add to their allowlists to enable high-volume and high-frequency notification requests. This field can contain both IPv4 and IPv6 addresses.

The publicKeys field contains an array of one or more public key(s) with corresponding private key(s) being possessed by the search engine securely. Each notification payload digest is signed by a private key for which a corresponding public key is present in the meta.json file. The purpose of advertising public keys via the authoritative meta.json file is to enable signature verification.

Search engines participating in the protocol MUST poll the partners list (i.e., https://www.indexnow.org/searchengines.json) file at least once every 24 hours, preferably more frequently, to discover any new participants and refresh the copies of meta.json file of every participant. This means, changes in fields like notifierIPs and publicKeys may not propagate to all the participants for many hours. To avoid missing any notifications search engines SHOULD add new information in the metadata file at least 24 hours in advance and continue to consider their stale information for at least 24 hours after pruning it from the metadata file.

Notification/Re-Pinging

An IndexNow protocol participating search engine MUST notify every other participant (except those that have unsubscribed from notifications) with the URLs, after receiving notifications from one or more websites and verifying the authority of notifiers against their URLs, within 10 second after the verification. Since the URL list is already verified by the primary search engine who received the notification from websites, other search engines can consume the list as authorized change notifications. The primary search engine can also batch URLs from various sources together for up to 10,000 URLs per batch.

The search engine to search engine notification is sent using the HTTP POST method with a JSON payload encoded with UTF-8 character encoding. The schema of the JSON data is similar to the POST method of the public API (i.e., websites to search engines notification). However, the payload only contains URLs, and any metadata (like hostname and key) are communicated using HTTP headers. Also, when the primary search engine notifies every other participant, it uses /indexnow?noreping target endpoint (with the noreping query parameter) to tell the recipients that they do not need to propagate it any further, because the primary search engine will notify them all.

If a search engine <se_1> were to notify another search engine <se_2> with a list of verified URLs from one or more sources, it can send an HTTP POST request as following:

                POST /indexnow?noreping HTTP/1.1
                Content-Type: application/json; charset=utf-8
                Content-Length: <payload_length>
                Host: <se_2_host>
                X-IN-Notifier: <se_1_id>
                X-IN-Notifier-Public-Key: <public_key>
                X-Signed-Payload-Digest: <hex_digest>
                
                {
                  "urlList": [
                    "https://example.com/foo",
                    "https://example.com/bar",
                    "https://example.org/foo",
                    "https://example.org/bar"
                  ]
                }
            

On receipt of this notification, the recipient (i.e., <se_2>) can identify the notifier (i.e., <se_1>) using the X-IN-Notifier HTTP header value, verify the authenticity of the notifier and the integrity of the payload as necessary, and return a suitable HTTP response code. If the recipient host is unreachable or returns a 2xx or 5xx response code, the notifier can move on. However, in case of a 4xx response code the notifier MUST correct any issues with the request and try again. Any error responses (i.e., 4xx and 5xx response codes) SHOULD contain a payload explaining the error (when applicable). In the future, as we learn more common scenarios of failures in SE-to-SE notification exchanges it might be desired to standardize the schema of error payload as well.

Authentication/Signature

Each search engine participating in the IndexNow protocol generates one or more public and private key pairs. They keep their private keys securely with them and advertise corresponding public keys in their meta.json file under the publicKeys attribute. When they decide to rotate their keys, they MUST update their meta.json file with the new key(s) and allow 24 hours for other partners to sync up their local copies.

When sending a notification to another search engine, the sender uses X-IN-Notifier-Public-Key HTTP header to tell the recipient which public key to use to verify the signature. Currently, the RSA public-key cryptography algorithm is used, but it MAY change in the future to support other algorithms. The recipient first checks whether the public key reported in the X-IN-Notifier-Public-Key header is a member of the publicKeys attribute of the notifier search engine's meta.json file. Then it uses that public key to decrypt the value of the X-Signed-Payload-Digest header, which should result in a hash digest that matches the hash of the plain-text payload. Currently, the SHA256 hashing algorithm is used, but it MAY change in the future to support other algorithms.

Logging/Retention

Every IndexNow partner is required to store logs of all the notified and verified URLs received from various websites. These logs must be retained for at least one week and be made available to all the partners on demand via HTTP GET requests (for asynchronous downloads, as opposed to the synchronous POST notifications that are prone to losses due to quality of service issues or network congestion).

The IndexNow log files contain TAB-separated values (a TSV file) with the columns representing the Unix time (i.e., Epoch seconds) when the notification was received and the URL. Each line contains only one URL, and the corresponding epoch time.

                <epochtime>	<url_1>
                <epochtime>	<url_2>
                <epochtime>	<url_3>
            

These log files are rotated periodically, compressed (using GZip compression), and made available to other partners. It is recommended to use the convention of indexnow-log-<se_1_id>-<YYYYMMDD>-<hhmmss>.tsv.gz for naming files where dates and times are in UTC and represent the time of the last entry in the corresponding log file. IndexNow partners are required to rotate their notification logs at least once every day, but can choose to rotate logs more often, such as every hour, every few minutes, or after a certain number of lines. It is recommended to keep individual log files under 50 million lines each to avoid files being too large.

To make these log files discoverable, every IndexNow partner serves a manifest file at the URI advertised in their meta.json file under the logs field. The manifest is a JSON file containing two fields updated and url respectively per log file URI. The updated field contains the UTC date and time of the last entry of the corresponding log file as <YYYY-MM-DD>T<hh:mm:ss>Z and the second field url is an absolute HTTPS URI where the corresponding log file can be downloaded from. Note that the updated field of the manifest file has the date and time in the ISO 8601 format (a string) for consistent sorting while the first column of the logs themselves report the same as Unix epoch seconds (an integer) to be concise.

                {
                    "logs": [
                        {
                            "updated": "<utc_iso_8601_datetime_N>",
                            "url": "<log_file_uri>"
                        },
                        {
                            "updated": "<utc_iso_8601_datetime_N-1>",
                            "url": "<log_file_uri>"
                        }
                    ]
                }
            

The manifest file must be regenerated each time a new log file is rotated, an old one is deleted, or the URIs of log files have changed. IndexNow partners are free to choose how they want to organize their log files and from which location they want to serve them, but it is recommended to not change their URIs frequently. If a partner is attempting to access a log file URI from a stale manifest file that was deleted in the interim, return a 404 or any other appropriate HTTP status code. If the URI of the manifest file itself is to be changed, keep the old one accessible or redirecting to the new address for at least 24 hours after it is changed in the meta.json file.

Access to log files is granted to all the search engines participating in the protocol. The IndexNow partners authorize downloads of logs by matching the network address of the source of HTTP requests against the notifierIPs found in the meta.json file of all the protocol partner search engines. Participating search engines that are interested in downloading these logs from machines other than the ones sending notifications can include additional network addresses in their meta.json file. In the future, we may move to using a downloaderIPs field or a different authentication and authorization method, if the meta.json file-based access exhibits any issues.

Authors

Sawood Alam Internet Archive sawood@archive.org

Martin Fiala Seznam martin.fiala@firma.seznam.cz

Vishnevsky Gleb Yandex kaikash7@yandex-team.ru

Fabrice Canel Microsoft

March 2024