Documentation for search engines
Search engines adopting the IndexNow protocol agree to support the protocol and future evolution of the protocol.
To participate, a search engine MUST have a noticeable presence in at least one market or be closely linked to the search market and make a significant contribution to the number of url submissions. Search Engines MUST also agree to share URLs received by its /indexnow
endpoint with other search engines participating in the protocol in a timely manner.
Search Engine Metadata and Identity
The IndexNow protocol website has a well-known address (https://www.indexnow.org/searchengines.json
) where the reference to the metadata and identity file of every participating search engine is listed in the following form:
{
"<se_1_id>" : "<se_1_indexnow_meta_json_url>",
"<se_2_id>" : "<se_2_indexnow_meta_json_url>"}
Each participating search engine MUST host a meta.json
file (e.g., https://<se_1_hostname>/indexnow/meta.json
) to identify itself and provide necessary metadata to support the protocol. The meta.json
file has the following attributes and structure (which may evolve over time with the advancement of the protocol):
{
"id" : "<se_1_id>",
"api" : "<se_1_indexnow_endpoint_url>",
"host" : "<se_1_hostname>",
"logs" : "<se_1_indexnow_logs_manifest_url>",
"name" : "<se_1_name>",
"homepage" : "<se_1_homepage_url>",
"logo" : "<se_1_logo_https_or_data_url>",
"unsubscribe" : false,
"notifierIPs" : [{"ipv4Prefix": "<se_1_ipv4_cidr_1>"},
{"ipv4Prefix": "<se_1_ipv4_cidr_2>"},
{"ipv6Prefix": "<se_1_ipv6_cidr_1>"},
{"ipv6Prefix": "<se_1_ipv6_cidr_2>"}
],
"publicKeys" : ["<se_1_public_key_1>",
"<se_1_public_key_2>"
]
}
The api
field of the meta.json
file holds the absolute URI (i.e., with https
protocol scheme and FQDN hostname) of the /indexnow
endpoint. The API URI can have a different hostname (e.g., a subdomain or a different domain name) controlled by (or delegated to) the search engine than the domain name of the search engine itself. The host
field holds the hostname of the search engine. The id
field holds the short name/identifier of the search engine, which must be unique (ideally, a single token, without any spaces or special characters). This field is required to establish association with entries in the https://www.indexnow.org/searchengines.json
.
The logs
field points to a manifest URI that lists URIs of recent logs for asynchronous access. Logging is described further in the Logging/Retention section below.
The name
, homepage
, and logo
fields are not mandatory for the protocol, but can be used in rich listing of protocol participants.
The unsubscribe
field holds a boolean value to tell all participating entities, when set to to true
, that the search engine described by the meta.json
file is not interested in getting realtime push-style notifications of verified URLs. The default value of the field is false
. The field name is chosen as unsubscribe
instead of subscribe
so that the absence of the field would mean that the search engine should be notified by all.
The notifierIPs
is an array of network addresses in Classless Inter-Domain Routing (CIDR) notation to advertise IP ranges that other partners should add to their allowlists to enable high-volume and high-frequency notification requests. This field can contain both IPv4 and IPv6 addresses.
The publicKeys
field contains an array of one or more public key(s) with corresponding private key(s) being possessed by the search engine securely. Each notification payload digest is signed by a private key for which a corresponding public key is present in the meta.json
file. The purpose of advertising public keys via the authoritative meta.json
file is to enable signature verification.
Search engines participating in the protocol MUST poll the partners list (i.e., https://www.indexnow.org/searchengines.json
) file at least once every 24 hours, preferably more frequently, to discover any new participants and refresh the copies of meta.json
file of every participant. This means, changes in fields like notifierIPs
and publicKeys
may not propagate to all the participants for many hours. To avoid missing any notifications search engines SHOULD add new information in the metadata file at least 24 hours in advance and continue to consider their stale information for at least 24 hours after pruning it from the metadata file.
Notification/Re-Pinging
An IndexNow protocol participating search engine MUST notify every other participant (except those that have unsubscribed from notifications) with the URLs, after receiving notifications from one or more websites and verifying the authority of notifiers against their URLs, within 10 second after the verification. Since the URL list is already verified by the primary search engine who received the notification from websites, other search engines can consume the list as authorized change notifications. The primary search engine can also batch URLs from various sources together for up to 10,000 URLs per batch.
The search engine to search engine notification is sent using the HTTP POST
method with a JSON payload encoded with UTF-8
character encoding. The schema of the JSON data is similar to the POST
method of the public API (i.e., websites to search engines notification). However, the payload only contains URLs, and any metadata (like hostname and key) are communicated using HTTP headers. Also, when the primary search engine notifies every other participant, it uses /indexnow?noreping
target endpoint (with the noreping
query parameter) to tell the recipients that they do not need to propagate it any further, because the primary search engine will notify them all.
If a search engine <se_1>
were to notify another search engine <se_2>
with a list of verified URLs from one or more sources, it can send an HTTP POST
request as following:
POST /indexnow?noreping HTTP/1.1
Content-Type: application/json; charset=utf-8
Content-Length: <payload_length>
Host: <se_2_host>
X-IN-Notifier: <se_1_id>
X-IN-Notifier-Public-Key: <public_key>
X-Signed-Payload-Digest: <hex_digest>
{
"urlList": [
"https://example.com/foo",
"https://example.com/bar",
"https://example.org/foo",
"https://example.org/bar"
]
}
On receipt of this notification, the recipient (i.e., <se_2>
) can identify the notifier (i.e., <se_1>
) using the X-IN-Notifier
HTTP header value, verify the authenticity of the notifier and the integrity of the payload as necessary, and return a suitable HTTP response code. If the recipient host is unreachable or returns a 2xx
or 5xx
response code, the notifier can move on. However, in case of a 4xx
response code the notifier MUST correct any issues with the request and try again. Any error responses (i.e., 4xx
and 5xx
response codes) SHOULD contain a payload explaining the error (when applicable). In the future, as we learn more common scenarios of failures in SE-to-SE notification exchanges it might be desired to standardize the schema of error payload as well.
Authentication/Signature
Each search engine participating in the IndexNow protocol generates one or more public and private key pairs. They keep their private keys securely with them and advertise corresponding public keys in their meta.json
file under the publicKeys
attribute. When they decide to rotate their keys, they MUST update their meta.json
file with the new key(s) and allow 24 hours for other partners to sync up their local copies.
When sending a notification to another search engine, the sender uses X-IN-Notifier-Public-Key
HTTP header to tell the recipient which public key to use to verify the signature. Currently, the RSA public-key cryptography algorithm is used, but it MAY change in the future to support other algorithms. The recipient first checks whether the public key reported in the X-IN-Notifier-Public-Key
header is a member of the publicKeys
attribute of the notifier search engine's meta.json
file. Then it uses that public key to decrypt the value of the X-Signed-Payload-Digest
header, which should result in a hash digest that matches the hash of the plain-text payload. Currently, the SHA256 hashing algorithm is used, but it MAY change in the future to support other algorithms.
Logging/Retention
Every IndexNow partner is required to store logs of all the notified and verified URLs received from various websites. These logs must be retained for at least one week and be made available to all the partners on demand via HTTP GET
requests (for asynchronous downloads, as opposed to the synchronous POST
notifications that are prone to losses due to quality of service issues or network congestion).
The IndexNow log files contain TAB
-separated values (a TSV file) with the columns representing the Unix time (i.e., Epoch seconds) when the notification was received and the URL. Each line contains only one URL, and the corresponding epoch time.
<epochtime> <url_1>
<epochtime> <url_2>
<epochtime> <url_3>
These log files are rotated periodically, compressed (using GZip compression), and made available to other partners. It is recommended to use the convention of indexnow-log-<se_1_id>-<YYYYMMDD>-<hhmmss>.tsv.gz
for naming files where dates and times are in UTC and represent the time of the last entry in the corresponding log file. IndexNow partners are required to rotate their notification logs at least once every day, but can choose to rotate logs more often, such as every hour, every few minutes, or after a certain number of lines. It is recommended to keep individual log files under 50 million lines each to avoid files being too large.
To make these log files discoverable, every IndexNow partner serves a manifest file at the URI advertised in their meta.json
file under the logs
field. The manifest is a JSON file containing two fields updated
and url
respectively per log file URI. The updated
field contains the UTC date and time of the last entry of the corresponding log file as <YYYY-MM-DD>T<hh:mm:ss>Z
and the second field url
is an absolute HTTPS URI where the corresponding log file can be downloaded from. Note that the updated
field of the manifest file has the date and time in the ISO 8601 format (a string) for consistent sorting while the first column of the logs themselves report the same as Unix epoch seconds (an integer) to be concise.
{
"logs" : [{
"updated" : "<utc_iso_8601_datetime_N>",
"url" : "<log_file_uri>"},
{
"updated" : "<utc_iso_8601_datetime_N-1>",
"url" : "<log_file_uri>"}
]
}
The manifest file must be regenerated each time a new log file is rotated, an old one is deleted, or the URIs of log files have changed. IndexNow partners are free to choose how they want to organize their log files and from which location they want to serve them, but it is recommended to not change their URIs frequently. If a partner is attempting to access a log file URI from a stale manifest file that was deleted in the interim, return a 404
or any other appropriate HTTP status code. If the URI of the manifest file itself is to be changed, keep the old one accessible or redirecting to the new address for at least 24 hours after it is changed in the meta.json
file.
Access to log files is granted to all the search engines participating in the protocol. The IndexNow partners authorize downloads of logs by matching the network address of the source of HTTP requests against the notifierIPs
found in the meta.json
file of all the protocol partner search engines. Participating search engines that are interested in downloading these logs from machines other than the ones sending notifications can include additional network addresses in their meta.json
file. In the future, we may move to using a downloaderIPs
field or a different authentication and authorization method, if the meta.json
file-based access exhibits any issues.
Authors
Sawood Alam Internet Archive sawood@archive.org
Martin Fiala Seznam martin.fiala@firma.seznam.cz
Vishnevsky Gleb Yandex kaikash7@yandex-team.ru
Fabrice Canel Microsoft
March 2024