Start Free Trial

asked on

The way Apache serve PHP files in terms of Last-Modified and Cach-Control header?

I cheched the default behavior of my Apache server:

- In case of a PHP file without sessions in it (session_start), the server responds without Cache-Control and Last-Modified in the reponse headers.
- In case of a PHP file with sessions, it responds with i.a.

Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache

Open in new window

I understand the second case and why Apache is doing this.

I don't understand the first case 100%. Apache does not include the Last-Modified header, because apparently by default it assumes a PHP file is dynamic, although a PHP file is not necessarily dynamic, for example:

<?php
    echo 'test';
?>

Open in new window

This is pretty logical of Apache, because you could just use an html file in a case like that. So we have to see the file as something like:

<?php
    echo microtime();
?>

Open in new window

Now every request the file is different. So the Last-Modified header would not make sense and that's why Apache is not serving (by default) the Last-Modified header in case of PHP files.

Till now I understand everything and it's clear.

Now check this: https://tools.ietf.org/html/rfc7234#page-5

A cache should do by default:

Although caching is an entirely OPTIONAL feature of HTTP, it can be assumed that reusing a cached response is desirable and that such reuse is the default behavior when no requirement or local configuration prevents it. Therefore, HTTP cache requirements are focused on preventing a cache from either storing a non-reusable response or reusing a stored response inappropriately, rather than mandating that caches always store and reuse particular responses.

So in principe a browser can store what it wants, except if there is some information in the response headers, which is preventing it.

So with no Cache-Control and Last-Modified headers, a browser could in theory just use PHP files from cache. In practise it's not happening often, but in theory it's possible.

So I don't understand why Apache by default is not including something like "Cache-Control: no-cache" or maybe even better "Cache-Control: no-store" (note 1, see at the end). Then also in theory a browser will not use a PHP file from cache. Now in theory by default a cache could use php files from cache and that's not what you want. Of course I could change the default settings, but I don't understand why Apache is doing it like this by default? What's the reason behind that? It does not seem really logical to me.

Note 1: From my point of view, "no-store" would be better than "no-cache", because with "no-cache" the PHP files will still end up in browser cache. Why storing them if you're planning never to use them? That's kind of waste of expensive SSD space.

It is because Apache is not going to add any extra headers that you have not explicitly asked it to add. It is not Apache's job to guess at what you want - it delivers the most minimal content and headers by default and it is up to you to specify any extra configuration or have code that adds headers.

In this case, Apache is not really adding those headers when you are using sessions. When you use a session, the PHP engine is injecting those headers into the response, not Apache.

Remember that Apache has no idea what PHP is or what it does. It simply is configured to look for requests that end in .php and hand those requests over to the PHP engine to be processed. It is your job to add headers via configuration or via PHP code if you want to control client caching.

ASKER

You're saying:

"It is because Apache is not going to add any extra headers that you have not explicitly asked it to add."

With an html file for example, they're adding the extra header "Last-Modified" automatically by default, so therefore it's not true. Or at least they are not consistent with that. So I don't think this is the reason, because in other cases it's not like that.

PS And thanks for correcting me with using Apache instead of PHP engine. In the future I'll talk about the PHP engine. Thanks!

When it comes to static content like HTML files and images and Javascript files, etc... Apache doesn't have to guess about the age of the data.

It is not handing those requests off to a plugin to be handled. In those cases, Apache is the one reading the file directly and sending it to the browser, so it can provide a Last-Modified header based on the file's modified timestamp.

Whenever a request is being handled by something other than Apache itself, Apache loses the ability to assume anything about the content that might come back from the request handler, so it's not going to provide those kinds of Last-Modified timestamps on that data.

I should note that caching is not a simple process within Apache. There are various methodologies to it, with different headers. For example, beyond the basic Last-Modified headers, you have ETags and Expires and Cache-Control headers that can be impacted by different plugins and Apache configuration (e.g. I have mod_expires directives in my Apache setup to apply some basic rules to most of my static content so JS files are cached for a day while images are cached for a week, etc...).

Apache has a write-up on how it handles its caching (this is the Apache 2.4 guide):
https://httpd.apache.org/docs/2.4/caching.html

ASKER

When it comes to static content like HTML files and images and Javascript files, etc... Apache doesn't have to guess about the age of the data.

A php file can also be static in principle.

<?php
    echo 'bla';
?>

Open in new window

So that's not convincing me.

But you're actually saying that Apache is handling the request of an html file, but the PHP engine is handling the request of an PHP file (in terms of reponse headers)?

That could be an explanation, but then I don't understand the PHP engine (instead of Apache).

In an earlier post you were saying:

It is because Apache is not going to add any extra headers that you have not explicitly asked it to add.

With "Apache" in that quote, we meant actually the PHP engine. But the PHP engine is adding extra headers, also if you've not explicitely asked for it.

It's adding the headers that are most common. For example when you're using session_start, the PHP engine will add some headers to avoid caching at all. The PHP engine assumes that's probably what you want (in terms of security).

So it would make much more sense if the PHP engine by default also would avoid storing the php files without sessions in it. I don't see any reason why to do it differently. Actually how it's done now by default, it's even wrong, because when having a response header without Expires / Cache-Control / Last-Modified et cetera, the cache is in principle theoretically free to cache (in this case the php file). But you don't want caching in combination with php files by default. The same like you don't want caching in combination with php files with sessions.

A php file can also be static in principle.

No, not really. Your example is not static. It might generate the same content every time, but the keyword there is that it's GENERATING the content. So even though the result is the same and you and I can obviously see that there's no way for the output to be different, the overall process to get to the final output is still dynamic.

Imagine you have a toy factory with an assembly line. You have the manager Apache McManager and then the assembly line worker PHP McWorker. Let's say that some business people come to meet with Apache and they see a report sitting on his desk and one of them asks, "Hey, can I have that report?"

Since it's already right there on his desk, Apache just takes it and gives it to the person, and provides some additional information (since it's his report), saying, "Sure, but you should know that it's 5 days old."

Another business person says he wants two copies of "Super Toy X" for his son. Apache says, "Okay, I don't happen to have any of those assembled here in my office, but I can go ask PHP to make a couple." So Apache steps out of his office and goes down to the factory floor and tells PHP McWorker to go assemble two new copies of Super Toy X and bring it up to his office when done. Apache then leaves the floor.

PHP McWorker immediately gets to work. He happens to already have one assembled, so he grabs that one, and then assembles the second one from scratch and then brings them both up to Apache's office.

Apache takes the toys and without looking at them further, hands them to the business man who was asking for them.

Now, Apache wasn't on the factory floor during the Assembly - he just gave the request over to PHP McWorker and left it up to PHP to accomplish the goal however PHP felt the best way was to do it. PHP didn't explicitly tell Apache McManager that one toy was already assembled and one was assembled from scratch, so Apache had no idea that one toy was older than the other.

However, Apache McManager knew how old the report was because it was his report and was right there in his office.

Coming back to the technical side, static content is the content that is already saved in a file that Apache has access to. Apache doesn't have to go anywhere to get to it - it simply accesses the file, gets its properties (last modified date, etc), and then streams the data straight to the browser.

Dynamic content is ANYTHING that is passed to an external handler. So when a PHP request comes in, Apache isn't handling most of it - it's just taking the order and passing it over to PHP. It's up to PHP to handle the request appropriately (optionally injecting HTTP headers), and return the final data back to Apache to give back to the original requester.

So even if PHP generates the same data over and over again, there's no way for Apache to know that in advance. PHP is still generating data on the fly, and so it's up to PHP to state how old something is or even to not state it at all.

ASKER

I know what you're saying, but my question is not about that. It does not matter who exactly is doing what in this case. When there are sessions in a php file, "they" can (and do) add headers, so they could also add for example the same headers when it's about a php file without sessions. There are anyway no dates involved with no-cache, no-store et cetera.

Maybe I'm misunderstanding the question. I thought the question was:

...but I don't understand why Apache is doing it like this by default? What's the reason behind that?

The above content explains the workings / reasoning behind what headers are sent.

If you're now asking about how to handle headers within a PHP script, you can use the header() function to add headers. For example, at the top of a script, you could add:

header("Cache-Control: no-cache, must-revalidate");

...to send that Cache-Control HTTP header back with the page response. You can send anything back, really:

header("Hello: world");

(the browser just wouldn't know what to do with the "Hello" header so it would be ignored).

ASKER

Or maybe I was not clear enough ;). I'm also not asking how to handle headers within a PHP script.

See this post of me (last 3 alinea's):

https://www.experts-exchange.com/questions/29072718/The-way-Apache-serve-PHP-files-in-terms-of-Last-Modified-and-Cach-Control-header.html?anchor=a42397441¬ificationFollowed=201295346&anchorAnswerId=42397441#a42397441

With "Apache" in that quote, we meant actually the PHP engine. But the PHP engine is adding extra headers, also if you've not explicitely asked for it.

It's adding the headers that are most common. For example when you're using session_start, the PHP engine will add some headers to avoid caching at all. The PHP engine assumes that's probably what you want (in terms of security).

So it would make much more sense if the PHP engine by default also would avoid storing the php files without sessions in it. I don't see any reason why to do it differently. Actually how it's done now by default, it's even wrong, because when having a response header without Expires / Cache-Control / Last-Modified et cetera, the cache is in principle theoretically free to cache (in this case the php file). But you don't want caching in combination with php files by default. The same like you don't want caching in combination with php files with sessions.

Or just the question itself. I'll repeat one part:

So I don't understand why Apache by default is not including something like "Cache-Control: no-cache" or maybe even better "Cache-Control: no-store" (note 1, see at the end). Then also in theory a browser will not use a PHP file from cache. Now in theory by default a cache could use php files from cache and that's not what you want. Of course I could change the default settings, but I don't understand why Apache is doing it like this by default? What's the reason behind that? It does not seem really logical to me.

But you have to read the context with it, otherwise it's difficult to see where that question is coming from. I think I understand everything, but it does not seem logical to me what they're doing (see bold part above).

Are you asking why the PHP engine by default DOES NOT send a "Cache-Control: no-store" HTTP header for non-session PHP scripts?

If so, it's mainly because PHP isn't designed to add content where it's not absolutely needed. The only reason it adds some headers for session info is because it's a separate extension that is intended to work differently from other PHP pages and sessions are inherently likely to have dynamic content that should not be cached. And even with that extension, it offers ways (e.g. with "session_cache_limiter()") to control those headers when you do use sessions.

There are plenty of PHP pages that can benefit from caching. Take your "static in principle" example and consider web pages that don't really have any per-user content but rather are mostly HTML and use some PHP to help build the page. The output might be exactly the same for everyone, even though they're PHP pages - and that might be cacheworthy.

Bottom line, programming languages should be doing as little as possible outside of what you specifically ask for in your code. The more "extra" stuff they do (like the session extension adding headers), the greater the chance that unintended bugs / undesired effects will be created.

ASKER

Thanks!! That's indeed what my question was about.

If with sessions it's a separate extension, in principle they could also make other extensions for other situations (if they would really want).

So probably they just don't want it.

You're saying:

Bottom line, programming languages should be doing as little as possible outside of what you specifically ask for in your code

I do not 100% agree with this. Usually programming languages are also taking most common situations as default behavior / default settings. That's actually why I would expect a different default behavior. You can have "static" PHP files, which you wanna "cache", but usually you don't want that.

And now if you want to enable caching the right way with php files (wihout sessions), you must first at least add Last-Modified "by hand". So there are 2 options (caching php files without session, or not caching). But for both options you have to add some headers by yourself. I would say, just check what is most desirable in most situations and use that as default.

I think now there are a lot of webmasters using PHP files (without sessions), assuming that the server is handling the files the right way in terms of caching. But by default they are handling it the wrong way, so you actually must add some headers:

1. In case of no caching: no-cache, no-store, et cetera -> if you don't add this, a cache COULD in principle still use caching to serve the PHP file to you (but without the advantages of (re)validating, because there is no Last-Modified).
2. In case of caching: Last-Modified and max-age / Expires / et cetera

So I'm not 100% convinced yet this is the best way to deal with it.

You can have "static" PHP files, which you wanna "cache", but usually you don't want that.

I wouldn't agree with that. You might be looking at the situation through the lens of your own work with PHP, but there's a very wide range of usage and quite a bit of it can benefit from caching. In fact, there are entire CDN products dedicated to providing cached versions of "dynamic" pages and they get a lot of business BECAUSE there is so much of it.

in principle they could also make other extensions for other situations (if they would really want).

Sure. You can also build your own extensions if you really want. Check out Zephir.

I do not 100% agree with [programming language should be doing as little as possible]

Well, that's your opinion to have, and we have a lot of different programming languages because everyone figures they have a better way to do things (and sometimes it IS better).

So I'm not 100% convinced yet this is the best way to deal with it.

Okay, I'm not trying to convince you that anything is the best way or not. You just asked why it doesn't do it by default, and I'm just explaining why. You can always suggest different behavior to the PHP developer team if you think there's a better way of doing things - there are lots of good ideas that get submitted to them every day.

ASKER

Thanks again!

I was indeed talking about my situations as just a webmaster. You only need to use a time or a date in PHP and caching is already not desirable.

Thanks for Zephir, but I don't wanna go that far, because I can just add some headers "by hand" and then I'm also where I want to be. I only think that most webmasters are just using the default settings / behavior in terms of headers in combination with PHP files, so then it can cause problems (caching while you don't want caching). So I think I just must see it as you're saying in your last alinea.

Thanks for you help, time, energy!

Bear in mind that PHP also has the auto_prepend_file directive in the php.ini file, which will let you force a given PHP file to automatically run at the start of -every- PHP script. So if you wanted to just set up your own server to push those headers out by default at the beginning of all PHP script execution, you could have a simple PHP file like:

<?php
header("Expires: Thu, 19 Nov 1981 08:52:00 GMT");
header("Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0");
header("Pragma: no-cache");

Open in new window

(It's a good practice to not include the final "?>" line here to avoid the possibility of accidental whitespace)

Then save that file somewhere in the filesystem and in your php.ini file, specify the full path to it in your auto_prepend_file directive and restart Apache (or if you're using PHP-FPM, restart that).

That should force PHP to push those headers out each time, and then you can optionally override those settings in your actual scripts. So if you had script xyz.php that needed different values for Cache-Control, for example, you'd just have it specify its own header and it'll override the value:

<?php
// xyz.php
header("Cache-Control: max-age=86400");
...rest of the script...

Open in new window

Also bear in mind that most browsers and servers today are using HTTP/1.1 for their protocol, and I'm FAIRLY sure that it states that if there are no caching-related headers sent AT ALL for a resource, the browser should NOT cache the result, which is why the default configuration for most web servers with PHP (or other scripting languages) plugins just works fine without extra configuration. In the meantime, Apache will send over those kinds of headers for static content, which leads to that content being cached.

However, you were talking about things in principle, and it's really up to the browser to make that choice (not all browsers obey the guidelines), so I didn't really bring this up before, but it's probably worth mentioning. (And I could also be wrong on that - I don't regularly look at the RFC, but it's here if you want to read all the specifics on caching in section 13: https://www.ietf.org/rfc/rfc2616.txt)

ASKER

Thanks a lot again. Especially your last comment is really useful, because I'm also more worried about other webmasters. But I tested it in some browsers before and they are indeed not caching in a situation like that, so that's why I talked about "in principle". If that would be part of the HTTP/1.1, then there is anyway no problem, so that would be already a real answer to my question. If you or someone could find it somewhere in that protocol, I'm really interested in that, because then also "in principle" I don't have to worry about it and my question is not relevant anymore. A while ago I searched a bit for it, but I could not find it, but it really would make sense if something like that is in the protocol. So if someone finds it, let me know. Anyway thanks a lot again!

ASKER

By the way, in a case like that (no caching-related headers sent AT ALL for a resource), the browser does not cache the result in my tests. But the browser is still storing the result in browser cache. Anyway it seems a bit not logical to me. Then the browser also could just not store the result. Maybe there is a reason for it, but I don't see it (yet).

PS I tested this in Firefox, so with "the browser" I meant Firefox. And they also didn't use the cached result in case of "disconnected", because otherwise that maybe could be a reason to store it, but also in a case like that they're not using the cached result.

Firefox has traditionally been cache-heavy, and there are how-to articles out there on how to configure it to use memory for temporary caching instead of disk (specifically to spare the life of SSDs). Every browser is going to try different tricks to be better than other browsers. There is some value in a TEMPORARY cache, particularly when using the back button. You usually do not want to re-run the original request when using the back button, so browsers will sometimes perform temporary caching to allow a user to quickly go "back" through their history without re-issuing the request. This is a pretty important thing particularly when dealing with GET requests that perform sensitive operations like charging a credit card.

A browser could theoretically store everything temporary in memory, but not everyone has a ton of memory to dedicate to their browser, and most users have default virtual memory settings where some of that content is swapped out to disk anyway, so there's only so much you can do about it.

As far as the RFC and specifics go, I indicated that all of the caching rules are in section 13. If you're looking for an in-depth understanding of the caching concepts, you really have to just read through that whole section because it all impacts when-to-cache behavior. Can't really get any more precise than that without leaving out important context.

ASKER

Thanks! I thought I also tested the back / forward button, but after your post I double checked it. You're indeed right about caching in a cache like that (back / forward button in combination with a php file, without session and with no caching-related headers). So that explains indeed why Firefox is storing the file.

But before you were saying:

Also bear in mind that most browsers and servers today are using HTTP/1.1 for their protocol, and I'm FAIRLY sure that it states that if there are no caching-related headers sent AT ALL for a resource, the browser should NOT cache the result

So anyway you can search what you want in the protocols, you will not find it I think. Otherwise Firefox would also not cache it with the back / forward button.

And this: https://tools.ietf.org/html/rfc7234#page-5

A cache should do by default:

Although caching is an entirely OPTIONAL feature of HTTP, it can be assumed that reusing a cached response is desirable and that such reuse is the default behavior when no requirement or local configuration prevents it. Therefore, HTTP cache requirements are focused on preventing a cache from either storing a non-reusable response or reusing a stored response inappropriately, rather than mandating that caches always store and reuse particular responses.

So from this you could also already conclude that your quote above can not be true, because if there are no headers preventing caching (no caching-related headers sent AT ALL for a resource), then caching is optional, so a browser in theory could use caching. Otherwise they didn't have to mention this.

By the way, you were referring to "section 13" in the RFC. There is no "section 13" in my opinion. Maybe you're referring to:

https://www.ietf.org/rfc/rfc2616.txt

But in 2014 already, RFC2616 was replaced by multiple RFCs (7230-7237), so nowadays you must refer to those new links. Some things in the old 2616 are not correct anymore at this moment. Or you were not referring to RFC 2616?

Yes, I was referring to section 13 in RFC 2616 (I specified the link at the end of comment #42398188). I haven't kept up on the updates - I just had 2616 bookmarked from years ago. In the new RFC 7234, the language is actually a LOT clearer and supports what I told you. Look at section 3. The default behavior is to NOT cache anything, and then it provides a list of conditions / exceptions where caching is allowed:

A cache MUST NOT store a response to any request, unless: (...list of required criteria that needs to be met to enable caching...)

The final section of conditions in that paragraph (the sub-list starting with "the response either:") REQUIRE some kind of caching-related header in order to be met.

ASKER

Awesome! Thank you so much! Sorry I missed the link in an earlier post of you, but now I see it indeed.

Now I only don't understand 1 thing, because on one hand they're saying:

A cache MUST NOT store a response to any request, unless:

But they are also saying:

Note that, in normal operation, some caches will not store a response
that has neither a cache validator nor an explicit expiration time,
as such responses are not usually useful to store. However, caches
are not prohibited from storing such responses.

If we're going back to our example / test in Firefox (has neither a cache validator nor an explicit expiration time). Then Firefox stores the resource and they're using it for caching when pressing the back / forward button. I don't understand that "note", because in a situation like that, first I would think "MUST NOT store a response", because of the "unless" list (no matches I think). But the note is saying something else (the opposite). How I have to see it, because now it looks strange to me, something like:

A MUST NOT be B unless case C, but Note: A is B in case X. Then you can not say "A MUST NOT be B", but maybe I'm seeing it the wrong way?

That note is saying, "Even though clients SHOULDN'T cache responses that don't have any cache/expiration headers, it's not prohibited."

Remember that it's talking about caching mechanisms on a very GENERIC scale, not JUST web browsers. For example, consider a search engine spider (either public engine or maybe a corporate, internal search appliance) that gets the response and then "caches" it for indexing purposes, even though it's technically a dynamic result.

So it's not contradicting itself - it's just saying, "This is what normally happens, but hey, there may be circumstances that we don't foresee where it could be useful to cache things even if they go against our normal guidelines."

ASKER

You're talking about caching, but me and they are talking about storing. Those things are not the same, but for now I'll just read "store" instead of "cache" in your comment.

If this is normally happening:

A cache MUST NOT store a response to any request, unless:

The "unless" list is there to talk about "circumstances" where it could be useful to cache things. So if there was a case where you SHOULD NOT cache things, then it had to be part of the "unless" list, because then it's not MUST (but SHOULD NOT).

To make it clear with a comparison.

COMPARE: [i]A != B[/i] WITH: [i]A cache MUST NOT store a response to any request[/i].
UNLESS LIST: situation C, situation D
SHOULD NOT: Should not means it can be: [i]A != B[/i] OR [i]A = B[/i] (let's say this is situation E)

Open in new window

So if they meant what you are saying, then they had to say:

A != B, unless situation C / D / E. Note: In situation E: A != B OR A = B

But now they are saying:

A != B, unless situation C / D. Note: In situation E this CAN be also the case: A = B

So A !=B can not be A = B at the same time.

So if it would be true what you are saying, then the protocol is wrong. Or otherwise what is exactly wrong about the thinking of me above, because I don't see anything wrong about it.

ASKER

Also look at: https://tools.ietf.org/html/rfc7234#page-5

They are saying:

Proper cache operation preserves the semantics of HTTP transfers
([RFC7231]) while eliminating the transfer of information already
held in the cache. Although caching is an entirely OPTIONAL feature
of HTTP, it can be assumed that reusing a cached response is
desirable and that such reuse is the default behavior when no
requirement or local configuration prevents it. Therefore, HTTP
cache requirements are focused on preventing a cache from either
storing a non-reusable response or reusing a stored response
inappropriately, rather than mandating that caches always store and
reuse particular responses.

So first they are saying that and then suddenly this:

https://tools.ietf.org/html/rfc7234#page-6

A cache MUST NOT store a response to any request, unless:

So first they are talking about using HTTP headers to prevent a cache from storing a response, because by default they can store and cache.
And then in the next alinea they are saying "MUST NOT, unless", so that implies the default behavior is MUST NOT.

So actually I don't understand them at all...

Yes, I'm using store/cache interchangeably. Technically speaking, the RFC treats "cache" as an object that stores content, while I'm using a more colloquial meaning of cache as a verb that implies the storing of content.

So in the first portion on page 5, it reads:

...it can be assumed that reusing a cached response is desirable and that such reuse is the default behavior...

Notice that they're talking about a response that HAS been cached. It is NOT saying "...it can be assumed that caching a response is desirable and that caching is the default behavior..." So that content on page 5 is suggested that the use of content that has already been cached should be the default behavior.

The content on page 6 refers to the conditions that would lead to something BECOMING cached.

As far as the whole "A != B" thing, I think you might be leaning too heavily on some of the wording. It sounds like you're taking "MUST NOT" and choosing to read that as "without exception". The entire point of the last paragraph of that section is to provide a generic exception, presumably to cover unforeseen scenarios.

Note that, in normal operation, some caches will not store a response
that has neither a cache validator nor an explicit expiration time,
as such responses are not usually useful to store. However, caches
are not prohibited from storing such responses.

So let's parse this:

"Note that, in normal operation" = Circumstances

"some caches will not store a response that has neither a cache validator nor an explicit expiration time" = Behavior under those circumstances

"as such responses are not usually useful to store" = The reason for that behavior

"However," = Exception to the normal behavior that was defined

"caches are not prohibited from storing such responses" = It can be up to the client (the cache) to choose to ignore the guidelines for its own reasons

Might it have been clearer to use "SHOULD NOT" instead of "MUST NOT" ? Maybe. You could always take it up with the authors of the RFC (they're listed at the top), but I think the fact that the exception wording is so tightly coupled with the definition of caching rules makes it fairly easy to understand it as an exception to the rule.

I feel like we're veering into legalistic areas where we're parsing out individual words of an RFC, even though we've passed the point of validating the browsers' behaviors. I am not currently an expert in legal/contractual language, so if we're going to continue down the road of parsing RFC language, I feel like we're getting into territory where I just have to tell you to go ask the original authors.

Out of curiosity, what's the end goal here? Are you trying to understand the RFC for the sake of building your own browser, or are you trying to understand things for proper web server configuration, or are you trying to find flaws in default behavior of major web clients, or is this there some other context to all of this?

ASKER

Thanks a lot again!

I agree with you that page 5 and 6 are about different things. I was reading it the wrong way. Page 5 was about REUSING a cached response, so the response is already stored (I did not realize that). Thanks a lot for that!

Till now I still think page 6 is wrong. I'm not seeing "MUST NOT" immediately as "without exception". They are coming with an "unless list". But if something is not in the unless list, I indeed assume "MUST NOT", otherwise they had to put it in that list. I think that's not that weird, right?

I don't know if we can replace "MUST NOT" BY "SHOULD NOT", because maybe in the other cases it's really MUST. MUST and SHOULD are anyway 2 totally different, unreplacable terms.

So if I follow you, then probably they meant:

A cache MUST NOT store a response to any request, unless:
- Circumstance 1
- Circumstance 2
- Et cetera
- A reponse that has neither a cache validator nor an explicit expiration time.

PS In the last circumstance, a cache can choose whether to store the response or not.

In the protocol I'm seeing "MUST NOT, unless list" and the last circumstance is not in the "unless list", so then I'm taking the "MUST NOT". The "unless list" are already the exceptions, so it's a bit weird to come up with one more execption later on, while you could also just put it in the "unless list".

So actually I don't think it's only about interpreting words. I think it's just incorrect if they meant it like you're saying.

But indeed let's go back to the origin question. For now I'll just follow your words and not the protocol. So if we have an PHP file (without sessions): according to the protocol a browser CAN use caching in a case like that.

That's where my question is coming from. Apache is not making the choise for me. Usually with an PHP file (WITH sessions), Apache decides what the best headers are by default. Without sessions there are no validators and no cach-control headers by default. So to make the good choise in practise, first I have to know and understand the protocol (about that subject), becasue then I know what browser CAN do.

So about your questions:

Are you trying to understand the RFC for the sake of building your own browser?

No thanks ;). I think I would overestimate myself if I would think to build my own browser. Anyway there are already good browsers. And for sure I'm not the one who can do better on my own, so it would be kind of useless.

Are you trying to find flaws in default behavior of major web clients?

That's not my intention, because you've to put a lot of time in it and at the end you'll not really get anything back. Anyway meanwhile I already reported 10 bugs or something to browsers (confirmed), but that's not because I think it's cool. It's more that I'm frustrated, because something is not working as expected (takes a lot of extra time). So I'm only reporting a bug, so other people will not have the same frustration.

or are you trying to understand things for proper web server configuration
or is this there some other context to all of this

Yes, it's for my own understanding, but it has always a connection with real situations, because I'm not learning things if I don't really need them (maybe if I would have more time).
I'm not a newbie to webdeveloping, but as probably a lot of webmasters, some things you're just doing, without thinking too much about it (because everyone is doing it). I'm actually rewriting some system of me (kind of CMS), but I gave myself some more time for it, to dive deeper into some things (for better understanding). Now I'm also writing an article (for myself), so if I'll forget things I can read it back. I studied physics in the past and not an IT study, so sometimes I also miss some basic knowledge, so it's good to read some extra things sometimes. And my English is not that well as you've maybe already noticed ;), so when reading protocols it's pretty important that you take the text the right way. That's why sometimes I want to double check things on communities like this.

But this question started as follows:
- How I have to deal with caching and PHP files?
- Before I only changed some settings via htaccess for static files (js, css, images et cetera). Most tutorials, articles on the web about caching are not really saying that much about caching in combination with php files.
- Then I checked the exact difference between php files with and without sessions.
- Then I saw Apache was not sending any validators or cach-control headers in a case like that (php file without sessions).
- Then I was wondering how a browser is dealing with that by default.

At this point I know from you, a browser can do both in principe in a case like that (caching or not caching). So by default Apache is not arranging caching or avoiding it with php files (without sessions). So now I think it's anyway better to choose one of them. Or avoiding storing and no caching. Or storing and some caching instructions for caching. This is just about a normal server (no proxy or clustered hosting).

It's also not 100% really important, but by thinking about it and trying to find an answer I'm always learning unexpected things, which are pretty important.

For example: before I asked this question, I was also checking RFC 2616 and not the new one. I had no idea there was a new one. A lot of well known, good articles on the internet about caching (last-modified 2017) are even only talking about RFC 2616. But then I found out it was outdated (when I wanted to get an answer on another question I asked myself).

Pretty long post ;), but I wanted to explain that I'm not only asking a question to ask the question ;). And that there is really something behind it. So my goal is not to find words, which I can interpret on different ways and then make a problem of it. Then I'm really confused by the words, because they seem incorrect to me. For now that's the case: storing / unless list / must not et cetera.

So to reiterate, Apache is not going to handle almost any HTTP headers for PHP files.

Remember that Apache is a generic web server. Tomorrow, someone might come up with LanguageX that is superior to PHP in every way but it requires certain headers to be sent to the browser, and the authors of LanguageX come up with a plugin for Apache so that Apache can run LanguageX scripts.

There is no way for the Apache authors to know in advance that LanguageX scripts require certain headers. It simply hands the job off to the LanguageX engine via a plugin, and it expects the LanguageX engine to handle all of the headers as it sees fit.

Any plugin, including the plugin that connects PHP to Apache, will act the same way. Apache will simply give almost full control of the response to the PHP engine. The PHP engine makes a few basic assumptions (e.g. headers for sessions) but otherwise delegates control over the headers and response content to the PHP script code itself.

So the chain here is:

1. User browser asks the Apache web server for xyz.php
2. Apache sees the .php extension and hands the request off to the PHP engine
3. The PHP engine runs the script and delegates the headers and content off to the xyz.php script
4. Unless xyz.php specifies otherwise, the PHP engine will add basic headers as appropriate (e.g. no-cache headers for sessions).
5. The final response is returned to the browser.

If you are building your own CMS and are planning on distributing it, then it would be a smart thing to have your code control the HTTP headers so that you get a consistent, predictable experience on other servers and versions of PHP.

ASKER

That actually the PHP engine is arranging the headers and not Apache I know meanwhile, because you said it before. I don't want to distribute the system, it's pure for own use. And I know how I want to handle PHP files (with sessions) and I know now (about) how the process is working. But now it's about what to do with an PHP file (without sessions), but therefore first I need to know what the protocol is saying about it and what the default behavior of browser can be in a case like that. So for now my focus goes to understanding that part of the RFC. And then I can choose what to do with the headers (for an PHP file with no sessions). I think it's like you're saying, because my tests in Firefox confirms that behavior. But then the text in the protocol is wrong.

Anyway I still think it would be more logical, the PHP Engine would send some more headers in case of php files without sessions. Apache is for example also doing that with .html files. Usually they take the most common case and use that as default. Now by default if I make an PHP file with for example every day some new news items, then by default the file can be stored and there can be caching. Or they have to change the protocol, so the protocol says that a browser SHOULD not store the result.

Actually you said the same before:

and I'm FAIRLY sure that it states that if there are no caching-related headers sent AT ALL for a resource, the browser should NOT cache the result

That would be logical indeed! So I totally agree with you, but the protocol says:

Note that, in normal operation, some caches will not store a response
that has neither a cache validator nor an explicit expiration time,
as such responses are not usually useful to store. However, caches
are not prohibited from storing such responses.

Actually this "note" is not saying that much. They're talking about "SOME caches", so actually they are saying everything is possible. And my test in Firefox shows that Firefox is just storing the resource in case of neither a cache validator nor an explicit expiration time.

So till now it still doesn't make that much sense to me. Your answers here make sense to me. And actually you would also expect, the browser would (should) not store the resource in a case like that (see the quote of you in this post).

ASKER

I'll make it a bit more clear what it's about now (for me).

You said:

Also bear in mind that most browsers and servers today are using HTTP/1.1 for their protocol, and I'm FAIRLY sure that it states that if there are no caching-related headers sent AT ALL for a resource, the browser should NOT cache the result

Then I asked further about it and you said this:

In the new RFC 7234, the language is actually a LOT clearer and supports what I told you. Look at section 3. The default behavior is to NOT cache anything, and then it provides a list of conditions / exceptions where caching is allowed:

A cache MUST NOT store a response to any request, unless: (...list of required criteria that needs to be met to enable caching...)

So actually before I started about the "Note", you also thought it's "MUST NOT" and you gave that as reason for what you said earlier. And that's a pretty logical thought. I would be happy if it would be "MUST NOT" (or even SHOULD NOT), because that would make more sense to me. Then I would have the feeling I understand it. Now I'm just thinking, it's still kind of weird, maybe I missed something. I would think the same like you:

I'm FAIRLY sure that it states that if there are no caching-related headers sent AT ALL for a resource, the browser should NOT cache the result

Anyway I don't wanna say, you're wrong, because I see you know a lot about it and you're saying usfull and logical things. But I think even for you, the protocol is maybe kind of confusing, because you said this:

The default behavior is to NOT cache anything

in combination with:

A cache MUST NOT store a response to any request, unless

But actually in the case we were talking about, probably it is not like that, because of that "Note" I mentioned. So apparently we were looking at the wrong part of the protocol and we had to quote the Note.

Including the "Note", the unless list is really big, so I'm even wondering if there're default "MUST NOT" situations left.

The "MUST NOT" list defines normal behavior, and the "note" is there as an exception to allow a cache to handle storing in any non-normal situations. I can't really clarify it beyond that, so if you disagree with that idea, then you should really email the authors of the RFC.

I'm going to be a little busy over the next few weeks, so I probably won't be providing many updates going forward.

ASKER

No problem you're busy. You've helped me already a lot, so thanks a lot! And maybe I will send an email to the authors.

Earlier you also thought (fairly sure) a browser should not store the representation in case of no caching-related headers. Actually you came up with that part in RFC 7234, to tell me it's indeed like that (in case of no caching-related headers). But then according to that note, actually the opposite turned out to be true: a browser can store the representation in a case like that. So it's not only that I think the protocol is not saying it clear, but in terms of the content itself, I would also exepect that a browser should not store store the representation in a case like that.

But anyway, you've explained me how it is, so maybe for now I just have to leave it behind me and just accept it's like that, without thinking too much about it anymore. Anyway there is not much to change or to really have influence on.

So one more time, thank you so much for your time, knowledge, patience and answers to my questions.

There's little bits of the solution throughout almost every comment, so for summary purposes, I recommend that #42396955 and #42399789 be accepted as the answer.

ASKER

It still looks a bit unlogical to me, because of the reasons I already gave against it above.

I can't really do much about that. That's just the way the document is written, and standard browser behavior should concur with what I've said, which should support my comments as accurate. Is there something else that hasn't been answered?

ASKER

Yeah one question more, see:

https://www.experts-exchange.com/questions/29072718/The-way-Apache-serve-PHP-files-in-terms-of-Last-Modified-and-Cach-Control-header.html?anchor=a42416774¬ificationFollowed=201966823&anchorAnswerId=42402407#a42402407

So actually at the end it was not what you were thinking / and would expect at the beginning?

At the end, it was still the same as what I said in the beginning (no caching headers = no caching by default).

I just suggested two comments that would be decent starting points for others that might come across this page in the future and wanted to jump right into the main parts of the answer but feel free to pick whatever comments you think were helpful.

ASKER

If it would be like:

no caching headers = no caching by default

then I would accept the answer, but it's not like that.

"By default" some caches will cache and some caches will not cache in a case like that. Firefox is for example using caching in case of the back / forward button (that case of caching is not in the unless list we were talking about).

This is the case:

In normal operation, some caches will not store a response that has neither a cache validator nor an explicit expiration time,
as such responses are not usually useful to store. However, caches
are not prohibited from storing such responses.

So by default in some cases:

no caching headers = no caching

and in some cases:

no caching headers != no caching

This question needs an answer!

Become an EE member today

7 DAY FREE TRIAL

Members can start a 7-Day Free trial then enjoy unlimited access to the platform.

View membership options

or

Learn why we charge membership fees

We get it - no one likes a content blocker. Take one extra minute and find out why we block content.