Get default index files for server

Let's see if I can describe this well enough this time. Please read the whole thing. I'm looking for some creative thinking/thinking outside the box. The reason is that there are is not one function or environment variable that will give me what I need.

I am looking to get a list of the default index files on the server. This may be a windows server (IIS) or it could be Apache, or any other type of server for that matter (if there are any other servers). I don't want to set the directory indexes, I want to retrieve what they are currently set to.

Usually there is some provision for setting the default file that will be sent if someone requests a folder. For instance if someone requests:

The files that could be returned could be varied but usually fall into the following list:

There could be others (additions to this list might also be helpful)

I do not have access to any server configuration files. For instance, I cannot retrieve httpd.conf on an Apache sever and read through the configuration to find the "DirectoryIndex" entry.

Without knowing what type of server I am dealing with I need to retrieve the files that will be returned by the server as default files and in the order that it would return them.

This could be fairly time consuming as it does not need to run on a regular basis, it would only need to run when setting up/installing a system and then possibly when the uses decides to refresh the setup after making changes to the server configuration.

Any thoughts? I'm not necessarily looking for the exact code that will do it, I'm more or less looking for ideas on how this might be solved.
LVL 18
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

If you do not have access to the configuration files, then this is impossible. The only way you can RELIABLY know the different files is to be able to see the configuration files. The index files can be anything. It can be set to gr8gonzo.html, but if you can't see the configuration, then there's no way to know that.

Now, you may want to investigate something like sudo. Sudo is a program that lets the web server gain temporary privileges to do things like access configuration files. It's a little complicated to set up and takes administrative privileges to install, but if you have that, then it should work.

For IIS, I'm not 100% sure where the Directory Index data is stored, but I'd guess it's in the registry, which means you should be able to use regedit to access the data. There's no good sudo type of program for Windows that I know of, but you could set up a scheduled job that runs regedit and dumps the data to a file. Then your program could read the setting from the file.
Hube02Author Commented:

I don't live in a world where impossible exists. Extremely difficult, yes. Not done before, most definitely. But impossible? Nothing is completely impossible except to the person that believes that it is.

Yes, some small percentage of people could set up the default file to be nearly anything, but I really don't care about the 0.1 percent that would use anything outside of the what would be considered standard in this case. Most servers are set up with defaults and these are generally only changed when they need to be.

I woke up this morning with an idea of how this can actually be accomplished.

Lets say that I have a folder set up and in this folder I have a simple page of every possible (standard) file name. In each of these files there is some text that will tell me what file I'm looking at. Then lets say that I use fopen(''http://'.$_SERVER['HTTP_HOST'].'/test_folder/', 'r') and then read in the contents of the page. Now I look at the file contents and see what file I've actually retrieved. I can then delete this file and repeat the process, each time getting a different file until I don't get any of the existing files and instead get a 404 error message or a directory listing.
Hi Hube02,

That's a good attitude to have. Plus, you're also now providing some additional information about permissions that you have, like being able to write to a folder inside the web site, and also that you're only looking for potential standard index names, not the EXACT list. Once we're dealing with fuzzy / non-exact info, then we're getting somewhere. (That's why I capitalized "RELIABLY" in my last post)

I guess this sort of comes down to what your end goal is. It sounds like you're trying to create some  general-purpose utility for determining the index files on any given web server. If that's what you're trying to do, then there are other caveats (such as your script having permissions to write new files to "test_folder", or contexts in which the script is run). If you can give us some more information about the goal of your project (what it is, how it will be used, and who will use it are good things to know), that would help.

Assuming you have permissions to write to test_folder, then your method could potentially work. You would have to establish a few things first:

1. That test_folder is empty (or at least empty of any standard index files).

2. There's no authentication prompts on test_folder or any server-side measures (e.g. Allow/Deny in  Apache) that would prevent a script from directly accessing a page inside that folder.

2. You would have to determine the response that is given when no index filename is matched. Sometimes it's not just a regular 404 HTTP response - a lot of servers will use custom 404 pages or will use directory indexes. You'll need to know what the negative is in order to be 100% sure of your positives.

3. You'll need to be be sure that there's no load-balancing solution that might throw your script onto a different server that hasn't received your newly-created index file yet. Not every server has real-time syncing on their filesystems, so if you create test_folder/index.html on Server A and try to access, you MIGHT hit a load-balancer that reads from Server B, which may not have your index.html for another minute or two.

4. That there are no URL rewriting mechanisms that might give false positives.or negatives. Example: mod_rewrite might take all index.* and rewrite them to index.html. In that case, if you were testing index.php, then it might give a 404 since index.html isn't in place, even though index.php would technically be valid. (At this point, it sort of depends on what you're trying to accomplish and what you consider to be a valid index.)

When it all comes down, permissions and parameters sound like they will be your big thing. If you're writing a general tool for other people, then you have the power to establish parameters (e.g. you need X, Y, and Z for this to work). If you're writing something for yourself on servers that you know about, then it depends on what you want from your tool. If you want something accurate, then you need to figure out a way to get a copy of the config files (even if an outside process is delivering them to you because the script doesn't have permissions). If you want something semi-accurate, then you need to be willing to deal with potential caveats, but it will get you most of the way there.

On a side note, there ARE times when things are impossible without permissions. I cannot use my home computer to access secured computers in the Pentagon without having some existing permissions or knowledge of back-entry ways. Trying to do it directly just will not work, and things are designed that way. Yes, I could probably get a job at the Pentagon, work my way up, and after a long while, I might be able to influence things so that I could do that, but this is not the type of solution you're after. :)

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Exploring SharePoint 2016

Explore SharePoint 2016, the web-based, collaborative platform that integrates with Microsoft Office to provide intranets, secure document management, and collaboration so you can develop your online and offline capabilities.

Hube02Author Commented:
I am building a module in PHP that will set permissions on a folder/page basis. Withing the admin section you can browse the file system of the server and select pages or folders that require login and set the permission level that is required for access.

I can specify any requirements that I want. There is already a requirement that PHP scripts must have access to create and delete files and folders. This is required for setup. However, for it to actually work.

1) The developer doing the installation must include the configuration file for the system at the start of all pages. Makes sense to me because I usually have a central configuration file that is called by every page on a Web site. I do not believe in on-page scripts, so all the files that are needed are loaded by the configuration script.

2) That the pages being accesses are PHP and not some other type of file. It will not be able to control access to anything not a php file.

3) That there is an index file in every folder or that directory indexes is off so that you cannot just go to a folder and see the files in it.

While this system is easy to use it will require a bit of knowledge, but it will it is extremely flexible.

My thought was to do some inspection when the admin page is loaded where the admin can make the selection of files or folder and supply a warning when a PHP file is not the default file in the folder. I was also thinking about doing some type of inspection to make sure that the module configuration was loaded by the pages, after all, there is no point in installing a user login system if the system is not called. Basically I am looking for a way to idiot proof the system so that when it detected a hole it could shout "HEY YOU! YOU NEED TO DO SUCH AN SUCH!"

This may not be something I build into the system now, but something that I would like to plan to do at some point.

Eventually, sometime down the road, this module will become one part in a much larger system where this checking would not be necessary, or as necessary. I'm working on something that I have not been able to find, a truly modular Web site management system where different parts can easily be plugged and unplugged, simply because I find that was is available is crap. Either it is to hard to install for the average joe, or it is too hard to use, or the coding is so much crappy spaghetti code that it is nearly impossible to make changes, then once you do make changes you can forget about every updating it if they come out with a new version.

This permissions module is the first step on my journey and will be the base that all the other modules will be built of of. And since I'm building a module whose sole purpose is site security, making sure people don't get into things their not supposed to, I thought there should be some type of self diagnoses system to check the site to see if the person using the system left any gaping holes that could possibly be exploited.

For now, I'm thinking that I would start by gathering a list of all the most common index files to do my testing with. Then create an empty directory and place each of these files into the folder. Calling the folder through http:// would retrieve one file at a time that I can inspect and know what was retreived, then delete the file that was retreived and continue the process untill I got a "Directory Listing Denide" Or a 404 page, or something completely unexpected which would most likely mean there was some type of rewriting going on. The next step wold be to actually inspect all the file on the site to insure that the module configuration scripts were called for each page. This would be a bit more complicated based on the size of the site because the only way would be to read in each file and see if the module configuration was included the way it was supposed to be. But that would be the next step.
Interesting. Sounds like a pretty difficult project, so remember that an ounce of planning is worth a pound of development. I think a lot of people have tried going down the path you're taking, and just didn't plan it out very well, so that's why you find a bunch of crappy programs out there.

Somethings may not be able to be avoided (e.g. plugins becoming obsolete with new versions). New versions are often prompted by security fixes, so if your plugins rely on code that seemed solid at one point but didn't account for problem XYZ, then you may need to make those plugins obsolete in order to maintain a secure system. :)

I'm not really a negative thinker or anything - it's just good to plan effectively and realistically.

One note - if you require that .htaccess files be enabled (you mentioned IIS, so that may not be an option), then you could theoretically use mod_rewrite and a dynamic .htaccess file to protect non-PHP files. :) Just some food for thought.
Hube02Author Commented:
Planning and organization are the most important part of this project. That's why I'm in control of it and all the parts need to meet my standards. There always has to be a plan for how the different parts will interact. I've seen projects stall because someone needed to go back and almost completely rewrite a previous script simply because they did not plan on how the next script would interact with it

Actually, something else that is in the planning stages is a php based rewrite module that can be installed where mod_rewrite is not available and that does not require admin rights on the server to get installed like ISAPI Rewrite. But like I said, this is only in the planning stages. The plan is to actually make it more versatile than mod_rewrite. This will be needed once we get to the CMS portion and the Ecommerce module.

I have a time line of sorts, that does not have any specific times, just an organization of what needs to be completed before other projects can be started. No times simply because this can only be done in between paying clients at the moment. Hopefully they will be expanding the development capacity soon and there'll be someone else to to all the little things that take up most of my time and I can concentrate
Hube02Author Commented:
Thanks for the input
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.