two datasets, one from clickhouse, one from nginx-cache, clickhouse dataset has more data, colo information, but missing
Because of the size of clickhouse dataset, it is sampled from 1% to 40 by object id.
timestamp
client country (anonymized)
coloId,
obj_id (md5 of host+path+query)
obj_id2 (md5 of host+path)
responseSize
content type
file extension
n level (the number of '/' in path)
ttl (augment from crawling, not always accurate and still missing 10% data)
age
cache status
method (get/purge)
zone id (md5 hash)
client request host (md5 hash)
edgehost (md5 hash)
referhost (md5 hash)
has query string (bool)
n param in query
client country
hot object (bool)
timestamp
obj_id
response body size
http method
content type
file extension
n level
expire ttl
age
cache status
zone id (md5 hash)
zone plan
chunked response
has query
n param in query
cache control
hostname (sha1 hash)
http range