Web缓存杂谈--Etag & If-None-Match

时间:2023-12-10 16:43:56

一、概述
缓存通俗点讲,就是将已经得到的‘东东’存放在一个相对于自己而言,尽可能近的地方,以便下次需要时,不会再二笔地跑到起始点(很远的地方)去获取,而是就近解决,从而缩短时间和节约金钱(坐车要钱嘛)。Web缓存,也是同样的道理,说白了,就是当你第一次访问网址时,将这个东东(representations),如html页面、图片、JavaScript文件等,存在一个离你较近的地方,当你下次还需要它时,不用再一次跋山涉水到服务器(origin servers)去获取。继而,web缓存的优势也就很明显了:

  1、 减少了网络延迟,加快了页面响应速度,增强了用户体验嘛。(因为我是就近获取的,路程缩短了,所以响应速度当然比到遥远的服务器去获取快哦);

  2、 减少了网络带宽消耗嘛。(就近获取);

  3、 通过缓存,我们都不用到服务器 (origin servers)去请求了,从而也就相应地减轻了服务器的压力。

那web缓存将这些东东放在哪儿呢?下面我就看看有哪些缓存种类,从而了解放在哪吧。

二、Web缓存的种类

--数据库缓存--:

当web应用关系复杂,数据表蹭蹭蹭往上涨时,可以将查询后的数据放到内存中进行缓存,下次再查询时,就直接从内存缓存中获取,从而提高响应速度。

--CDN缓存--:

CDN通俗点,就是当我们发送一个web请求时,会先经过它一道手,然后它帮我们计算路径,去哪得到这些东东(representations)的路径短且快。这个是网站管理员部署的,所以他们也可以将大家经常访问的representations放在CDN里,这样,就响应就更快了。

--代理服务器缓存--:

代理服务器缓存,其实跟下面即将讲的浏览器缓存性质差不多,差别就是代理服务器缓存面向的群体更广,规模更大而已。即,它不只为一个用户服务,一般为大量用户提供服务,同一个副本会被重用多次,因此在减少相应时间和带宽使用方面很有效。

--浏览器缓存--:

简而言之,就是,每个浏览器都实现了 HTTP 缓存,我们通过浏览器使用HTTP协议与服务器交互的时候,浏览器就会根据一套与服务器约定的规则进行缓存工作。当我们点击浏览器上‘后退’或者‘前进’按钮时,显得特别有用。

三、Web缓存的执行机制

所谓机制就是一些双方的约定,清晰地告诉对方,什么时候该做什么事。web缓存也一样,你总得告诉我(请求)什么时候到缓存中去获取,什么到服务器去获取representations吧。So,也得有一套相应的机制,web 缓存机制分为两大部分http协议(HTTP1.0和HTTP1.1)和网站管理人员制定的协议。抛开网站内部制定的协议,我们来看看http协议中定义的缓存机制。

By the way,我们可以在HTML文档中的<head>中通过<meta>来缓存,如下:

<meta http-equiv="Pragma" content="no-cache"/>

但,它只有部分浏览器可以用,并且代理服务器也不会鸟它。(因为meta在html中,代理服务器几乎不回去读它滴)。

--http缓存机制--

1、 Expires

http缓存机制主要在http响应头中设定,响应头中相关字段为Expires、Cache-Control、Last-Modified、If-Modified-Since、Etag。

HTTP 1.0协议中的。简而言之,就是告诉浏览器在约定的这个时间前,可以直接从缓存中获取资源(representations),而无需跑到服务器去获取。

另:Expires因为是对时间设定的,且时间是Greenwich Mean Time (GMT),而不是本地时间,所以对时间要求较高。

2、 Cache-Control

HTTP1.1协议中的,因为有了它,所以可以忽略上面提到的Expires。因为Cache-Control相对于Expires更加具体,细致。

且,就算同时设置了Cache-Control和Expires,Cache-Control的优先级也高于Expires。

下面就来看看,Cache-Control响应头中常用字段的具体含义:

  (1)、max-age:用来设置资源(representations)可以被缓存多长时间,单位为秒;

  (2)、s-maxage:和max-age是一样的,不过它只针对代理服务器缓存而言;

  (3)、public:指示响应可被任何缓存区缓存;

  (4)、private:只能针对个人用户,而不能被代理服务器缓存;

  (5)、no-cache:强制客户端直接向服务器发送请求,也就是说每次请求都必须向服务器发送。服务器接收到请求,然后判断资源是否变更,是则返回新内容,否则返回304,未变更。这个很容易让人产生误解,使人误以为是响应不被缓存。实际上Cache-Control: no-cache是会被缓存的,只不过每次在向客户端(浏览器)提供响应数据时,缓存都要向服务器评估缓存响应的有效性。

  (6)、no-store:禁止一切缓存(这个才是响应不被缓存的意思)。

3、 Etag & If-None-Match

HTTP/1.1 200 OK
Date: Fri, 30 Oct 1998 13:19:41 GMT
Server: Apache/1.3.3 (Unix)
Cache-Control: max-age=3600, must-revalidate
Expires: Fri, 30 Oct 1998 14:19:41 GMT
Last-Modified: Mon, 29 Jun 1998 02:28:12 GMT
ETag: "3e86-410-3596fbbc"
Content-Length: 1040
Content-Type: text/html

Etag是属于HTTP 1.1属性,它是由服务器生成返回给前端,

当你第一次发起HTTP请求时,服务器会返回一个Etag
Web缓存杂谈--Etag & If-None-Match
并在你第二次发起同一个请求时,客户端会同时发送一个If-None-Match,而它的值就是Etag的值(此处由发起请求的客户端来设置)。
Web缓存杂谈--Etag & If-None-Match

然后,服务器会比对这个客服端发送过来的Etag是否与服务器的相同

如果相同,就将If-None-Match的值设为false,返回状态为304,客户端继续使用本地缓存,不解析服务器返回的数据(这种场景服务器也不返回数据,因为服务器的数据没有变化嘛)
Web缓存杂谈--Etag & If-None-Match

如果不相同,就将If-None-Match的值设为true,返回状态为200,客户端重新解析服务器返回的数据

说白了,
ETag 实体标签: 一般为资源实体的哈希值
即ETag就是服务器生成的一个标记,用来标识返回值是否有变化

且Etag的优先级高于Last-Modified

温馨提示【If-None-Match头中的ETag必须和返回的ETag值一样,有双引号】:
Web缓存杂谈--Etag & If-None-Match

生成ETag的逻辑:
org.springframework.web.filter.ShallowEtagHeaderFilter#generateETagHeaderValue

    /**
* Generate the ETag header value from the given response body byte array.
* <p>The default implementation generates an MD5 hash.
* @param inputStream the response body as an InputStream
* @param isWeak whether the generated ETag should be weak
* @return the ETag header value
* @see org.springframework.util.DigestUtils
*/
protected String generateETagHeaderValue(InputStream inputStream, boolean isWeak) throws IOException {
// length of W/ + " + 0 + 32bits md5 hash + "
StringBuilder builder = new StringBuilder(37);
if (isWeak) {
builder.append("W/");
}
builder.append("\"0");
DigestUtils.appendMd5DigestAsHex(inputStream, builder);
builder.append('"');
return builder.toString();
}

3xx
301 Move Permanently
302 Found
304 Not Modified

Web缓存杂谈--Etag & If-None-Match

4、 Last-Modified & If-Modified-Since

Last-Modified与Etag类似。不过Last-Modified表示响应资源在服务器最后修改时间而已。与Etag相比,不足为:

  (1)、Last-Modified标注的最后修改只能精确到秒级,如果某些文件在1秒钟以内,被修改多次的话,它将不能准确标注文件的修改时间;

  (2)、如果某些文件会被定期生成,当有时内容并没有任何变化,但Last-Modified却改变了,导致文件没法使用缓存;

  (3)、有可能存在服务器没有准确获取文件修改时间,或者与代理服务器时间不一致等情形。

然而,Etag是服务器自动生成或者由开发者生成的对应资源在服务器端的唯一标识符,能够更加准确的控制缓存。

四、扩展阅读

[1]、"Caching Tutorial"

RFC 7232              HTTP/1.1 Conditional Requests            June 2014
This method relies on the fact that if two different responses were
sent by the origin server during the same second, but both had the
same Last-Modified time, then at least one of those responses would
have a Date value equal to its Last-Modified time. The arbitrary
60-second limit guards against the possibility that the Date and
Last-Modified values are generated from different clocks or at
somewhat different times during the preparation of the response. An
implementation MAY use a value larger than 60 seconds, if it is
believed that 60 seconds is too short.

2.3. ETag

   The "ETag" header field in a response provides the current entity-tag
for the selected representation, as determined at the conclusion of
handling the request. An entity-tag is an opaque validator for
differentiating between multiple representations of the same
resource, regardless of whether those multiple representations are
due to resource state changes over time, content negotiation
resulting in multiple representations being valid at the same time,
or both. An entity-tag consists of an opaque quoted string, possibly
prefixed by a weakness indicator. ETag = entity-tag
entity-tag = [ weak ] opaque-tag
weak = %x57.2F ; "W/", case-sensitive
opaque-tag = DQUOTE *etagc DQUOTE
etagc = %x21 / %x23-7E / obs-text
; VCHAR except double quotes, plus obs-text
      Note: Previously, opaque-tag was defined to be a quoted-string
([RFC2616], Section 3.11); thus, some recipients might perform
backslash unescaping. Servers therefore ought to avoid backslash
characters in entity tags.
An entity-tag can be more reliable for validation than a modification
date in situations where it is inconvenient to store modification
dates, where the one-second resolution of HTTP date values is not
sufficient, or where modification dates are not consistently
maintained.
   Examples:
ETag: "xyzzy"
ETag: W/"xyzzy"
ETag: ""

https://tools.ietf.org/html/rfc7232#section-2.3

The ETag or entity tag is part of HTTP, the protocol for the World Wide Web. It is one of several mechanisms that HTTP provides for web cache validation, which allows a client to make conditional requests. This allows caches to be more efficient, and saves bandwidth, as a web server does not need to send a full response if the content has not changed. ETags can also be used for optimistic concurrency control,[1] as a way to help prevent simultaneous updates of a resource from overwriting each other.

An ETag is an opaque identifier assigned by a web server to a specific version of a resource found at a URL. If the resource representation at that URL ever changes, a new and different ETag is assigned. Used in this manner ETags are similar to fingerprints, and they can be quickly compared to determine whether two representations of a resource are the same.
https://en.wikipedia.org/wiki/HTTP_ETag

17.13 ETag support
An ETag (entity tag) is an HTTP response header returned by an HTTP/1.1 compliant web server used to determine change in content at a given URL. It can be considered to be the more sophisticated successor to the Last-Modified header. When a server returns a representation with an ETag header, the client can use this header in subsequent GETs, in an If-None-Match header. If the content has not changed, the server returns 304: Not Modified.

Support for ETags is provided by the Servlet filter ShallowEtagHeaderFilter. It is a plain Servlet Filter, and thus can be used in combination with any web framework. The ShallowEtagHeaderFilter filter creates so-called shallow ETags (as opposed to deep ETags, more about that later).The filter caches the content of the rendered JSP (or other content), generates an MD5 hash over that, and returns that as an ETag header in the response. The next time a client sends a request for the same resource, it uses that hash as the If-None-Match value. The filter detects this, renders the view again, and compares the two hashes. If they are equal, a 304 is returned. This filter will not save processing power, as the view is still rendered. The only thing it saves is bandwidth, as the rendered response is not sent back over the wire.

You configure the ShallowEtagHeaderFilter in web.xml:

<filter>
<filter-name>etagFilter</filter-name>
<filter-class>org.springframework.web.filter.ShallowEtagHeaderFilter</filter-class>
</filter> <filter-mapping>
<filter-name>etagFilter</filter-name>
<servlet-name>petclinic</servlet-name>
</filter-mapping>

https://docs.spring.io/spring-framework/docs/3.2.0.M2/reference/html/mvc.html#mvc-exceptionhandlers

http://www.spring4all.com/article/1553
java config方式配置:

@Configuration
public class WebConfig { @Bean
public Filter shallowEtagHeaderFilter() {
return new ShallowEtagHeaderFilter();
}
}

https://*.com/questions/26151057/add-a-servlet-filter-in-a-spring-boot-application
https://*.com/questions/30350137/generating-etag-using-spring-boot
https://javadeveloperzone.com/spring-boot/spring-boot-etag-header-example/

Web缓存杂谈--Etag & If-None-Match