Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

ISAPI Extension troubleshooting UNDER IIS 6.0

Posted on 2005-05-02
19
Medium Priority
?
736 Views
Last Modified: 2012-06-21
I asked a question in January amd was given the reply that I need to go to my developers.  I argued wrongly and after being told why I was wrong I realized that I actually got help (by being told to get those dfevelopers to do something and stop wasting my time).  That was question

Q_21287827.html

Now the developers have reqritten the extension.  It is much much faster that before for several reasons.  1 they added caching to memory for images.  If a client brings up their image the dynamically created thumb will be held in memory for 30 seconds  so the enxt request will be instant.

Many of ther previous isues are resolved.  One issue I have though is the app occassionally takes forever to load an image and only killing w3wp.exe in Task manager solves that issue.

Reading about app pools and trying to learn something I went in and made these changes

Increase Worker processes to 5  Default 1
Recycle worker processes after 500 requests  Default off
Shutdown worker processes after idle for 5 minutes Default 20 (OFF)
Enable pinging 5 seconds (default 60)
Rapid fail protection  3 and 3  default 5 and 5


This is the ONLY app on this server.  It dynamically creates a thumbnail from a full size image to the size needed based on the area of the site the user is in at the time of the request.  A ColdFusion web site on another server calls the dll with whatever paramaters are required.  There is also an editor that allows the users to crop, resize, RED reduction etc etc all using the same dll (ISAPI Extension).

My question is broad, I know, but what did those changes I make do?  They helped tremendously.  The app does not hang anymore at all.  But I have no idea what I did and we have only a test locad on the server right now so I do not know what will happen when I go live.

Can I increase WP's to a much higher number like 50 or is that a waste of time.  Are there any other settings I shiould/could make to help optimize performance stability.  Unluckily the developer admits to not having the IIS knowledge to make these suggestions.  He dos understand the importance of knowing this info but he has basically modified one of his applications for us and has gone way beyond what we paid him to do.    So I am at a crossroads now and even though I understand that the developer is the key to knowing what is happening I was hoping that I could get some assistance in understanding how to monitor/troubleshoot IIS 6 so I can give the developer some feedback and he can better understand what is needed to make the application more stable.

The machine is an IBM 360 Quad Xeon capable with 2 XEON MP Processors 1.6Ghz, 2meg L2 cache.  It has 4 Gig of Ram.  As I mentioned this is a decicated server to only this application.  The images reside on two NAS servers on the same local network connected via copper Gigabit links.  Currently I see very little performance difference between locally accessed images and network accessed images.

Any details on changing the app pool settings and other configs in IIS 60 appreciated.Running in default IIS 6.0 mode..

Thanks

Doug
0
Comment
Question by:dcohn
  • 9
  • 4
  • 4
  • +1
19 Comments
 
LVL 2

Expert Comment

by:KenSchaefer
ID: 13915273
Well, it seems that the code is probably still buggy, but the use of multiple worker process means that when one is completely blocked (all the threads are blocked or busy), another process is still available to handle requests.

All the steps you've taken are "work arounds" as far as I can tell. You're essentially telling IIS to run more w3wp.exe and recycle them (basically get rid of the old ones, and start up new ones) at a more rapid pace than before.

If you want to get to the bottom of the issue with the ISAPI extension, then have the developer give you a debug build with symbols, and use a tool like IISState to get a dump of the w3wp.exe process when it's hung.

http://www.iisfaq.com/Default.aspx?tabid=2513
http://www.adopenstatic.com/faq/IISConfigureIISState.aspx

Cheers
Ken
0
 
LVL 37

Expert Comment

by:meverest
ID: 13917355
I tend to agree with Ken.  I remember your last post on this subject, and am pleased that you finally got some action from those guys ;-)

Debugging the module is really their job once again, and at the very least they should provide you with a debug build and instruction on how to get them the detail that they need to find out what is going wrong.  The first step is to get them to recognise that it is behaving badly - have you got some sort of response on that point?

Cheers.
0
 
LVL 3

Author Comment

by:dcohn
ID: 13920843
Thank you for the TIPs.  
I know I need the developer but hope you can read through this and give me some hints on how to approach this because as you MAY see I have to at least try to see if I can find the issues.  I probably can get a debug builld as well and maybe that will allow me to do some of the troubleshooting???

 I believe the issue is financially driven BUT in some way I am unsure.  That is we paid X dollars for an app to do a certain task.  That app was supposed to work in IIS5 and IIS6.  WE had been using the app for 4 months when we saw the issues in IIS6 and he said to change it to IIS5 mode IIS6 but this did not resolve the issues.

So time went by and I complained some more and he decided to do some more work on it, even though he delivered the working product already and  we were using it.  So the question comes in as what was the deal.  To deliver something that works and additional troubleshooting costs more or keep working for ever for that one set fee.  Obviously we cannot expect them to keep working for nothing.  At the same time we need code that works in IIS 6.  

So it is a dilemma <SP> and I think the developer has been really great about all this extra work.  He does it because it can improve his base product I assume since this was something he made by modified a standard product they market already.

So I am at a crossroad.  On one side my employer wants it working but does not want to spend any more money or even more likely the developer feels the issue is beyond his scope of understanding and says just that (He says they are not IIS experts).  He seems to be willing to do additional work as it helps his create a better product.

As an example here is a statement from the developer regarding a DLL that he had me change when he first delivered the product.  One dll read in the data in 3 meg chunks while the other read it in 12 byte chunks.

"Such approach was used in old days of expensive RAM, because it needs only 12K to store intermediate data before JPEG decoding. You have more than enough RAM now to don’t worry about 3Mbytes of ordinary RAM.
So now we should answer the question: “Why IIS6 can’t reliably read large chunks of data at once”? Take a look on the Quality of Service setting of your server. Maybe some process has much higher priority and interrupts our reading operation (we use ReadFile win API method for this). Unfortunately we are not experts in this area."

Anyway I read all of the links you sent and have a question.

Since I am running in Worker Process mode why do I have svchost.exe in my task manager. Is that normal? I know that dllhost is use in IIS5 mode not workerprocesses (w3wp).  Yet I also have one DLLHOST in task manager.  In Task Manager I see an equal amount of svchost.exe running as SYSTEM as I have W3wp.exe running as Network Service.  I know you can change how w3wp runs and that network service is the recommended secure method.

But I also have 3 more instances of svchost in task manager running as local service (2) and network Service(1).  So I have 8 instances of svchost.

What I find very odd is when I first see that images  do not load properly (on first try without a refresh in the browser lets say) if I go into task manager and check out CPU usage there is always ONE W3wp process running with exactly 25% CPU. Never higher.  This is only when there is an issue otherwise they are all at near 0 under CPU.  

If I kill that one W3wp process again within Task manager the page instantly loads the image perfectly and again the site is ripping fast.  And by the way, when working the new Image Server is Unbelievably fast.  I mean that it can dynamically created 50 thumbs in an instant.  A blink of the eye while the previous version would still be chugging away creating the thumbs one by one relatively slowly (like 2 per second).  I mention this to let you know that at least we see the potential is there.

Now I completely understood your responses and I know the bugs are most likely in the code BUT I noticed on the IISSTATE page that issues can be related to the database query or the remote file access (which we do for every image) and it leads me to wonder if this may be related to something else in the mix.

We have examined the IIS logs files that the application creates and not one line says anything except 200 ok that has any value.  For example we have a few 404's but they are dead URLs that people must have embedded in their websites that we no longer allow and an occasional 304 which I believe means the image was loaded from the users browser cache.  Where we see the real errors in in our CF logs where our HTTP get does not write the file where it is supposed to.  We have no idea where the error occurs though since CF just says it did not write the file.  The problem is most likely well before that call.  When using the same CF server, same calls etc everything works fine using the old build of the IMAGE SERVER running on Win2K.  BUT it is slow as molasses rolling up an elephants tail on a cold winter morning in New Delhi.

If our Image serving application was failing would it reflect in the IIS logs or is that a developer question as well?  What I am really asking is whether there is any other way to troubleshoot this issue to be sure we eliminate everything OUTside the applkication itself.  IE what is the app is not the problem.  The developer claims that he cannot replicate the problem we have.  And we cannot make it hjappen either.  It is intermittent for sure.

Thanks for just reading this gibberish and doubly thanks for any comments, good or bad.

Regards,  Doug
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 2

Expert Comment

by:KenSchaefer
ID: 13922558
> Since I am running in Worker Process mode why do I have svchost.exe in my task manager. Is that normal?

Yes, it is normal. Svchost.exe is a generic host process for services. Most of your system services live inside various svchost.exe processes. The actual grouping of services inside svchost.exe process depends on various registry keys. If you want to see what services live inside which processes you can use the command: tasklist /svc (Windows XP, Windows 2003), or you can use Process Explorer from www.sysinternals.com

> Yet I also have one DLLHOST in task manager

Dllhost.exe is generic host process for running apps (like IIS out-of-process COM+ apps) that are not implemented as .exe files. So, there may be other programs that are running inside dllhost.exe. It may have nothing to do with IIS.

> Now I completely understood your responses and I know the bugs are most likely in the code BUT I
> noticed on the IISSTATE page that issues can be related to the database query or the remote file
> access (which we do for every image) and it leads me to wonder if this may be related to something
> else in the mix.

It could very well be related to something else. But we have no idea what it could be, and until we get some information about what's actually going on inside IIS when the problem occurs, anything we tell you will just be idle speculation and guesses.

Cheers
Ken
0
 
LVL 3

Author Comment

by:dcohn
ID: 13922598
The running IISState is a good idea, yes?
0
 
LVL 37

Expert Comment

by:meverest
ID: 13922677
Hi Doug, i begin to understand the problem a little better - i initially asumed that the developer was an internal development team rather than an outsourced arrangement.  Even so, the problem is entirely the developer - not the system.  The thing about microsoft product is that it is 100% proprietry software.  Windows and IIS provides an API to extend the shipped features, and if you want to use the API's, the developer simply has to deal with the way it operates and functions.  It is the job of the developer to make the extension code work with the system - build the code according to documentation, and if it doesn't work, seek support from the vendor (ie Microsoft) to work through any problems with the systems not behaving as documented (if that is the case)

My take on all this is that the developer is a dud.  If they can;t make it work, then find another developer.

Your report to the boss needs to be along the lines of "We have paid for an extension module that is supposed to do X.  They have not delivered it, and refuse to deliver it.  We have been ripped off."

What to do about it depends on the will of the boss.  Take them to task for it and insist that they come up with the goods, or cut their losses and go somewhere else for the product we need.  If you buy a car from a new car sales yard and it keeps conking out in the middle of a busy highway, do you open the lid and try to work out what is wrong, or do you take it back to where you got it from and insist that they either repair it or replace with a new one?

Sorry that this suggestion does not help you out much with your present predicament, but I honestly don't think that there is much point in trying to work it out yourself.  I reckon that they (the developer) are lucky to be dealing with someone like yourself who is at least willing (and able) to help do their diagnostic work. My understanding of this is that they are clearly shirkingtheor responsibility - i doubt if they will get far by shipping dodgy code and treating their customers like crap.

Regards,  Mike.



0
 
LVL 3

Author Comment

by:dcohn
ID: 13923359
I see your point and I think our issue is that we used it for too long under IIS5 and did not complain enough because we needed a solution that fit within our price point.

They almost have it.

I hear you and unluckily agree but the bottom line is we have no time to go elsewhere now as we are live and must make this work no matter what.

Obviously we have another bug as well, it's called the fundsRlo bug.  

Again thanks for the help

DC
0
 
LVL 37

Expert Comment

by:meverest
ID: 13923597
>> Obviously we have another bug as well, it's called the fundsRlo bug.

and a nasty bug that is! ;-)

I wish I/we could help more.

cheers,  Mike.
0
 
LVL 34

Expert Comment

by:Dave_Dietz
ID: 13932952
To address some of the remaining questions:

Increase Worker processes to 5  Default 1
- This adds additional worker processes to the Applciation Pool.  It is not recommended to increas this by more than one worker process per 2 processors on the machine.  More than this can result in thread thrashing and problems from context switching.  Additonally, if your applciations in that Application Pool depend on session state this will likely kill them since sessions state is not shared between the worker processes and there is no way of telling which worker process will handle any given request.

Recycle worker processes after 500 requests  Default off
- This tells the Worker process to shut down and spawn a new one after it has serviced 500 requests.  not a problem, but 500 is a low number if the site is handling a lot of traffic and there is some overhead in shutting down one process and starting a new one.  The same issues with Sessions State apply here as well.

Shutdown worker processes after idle for 5 minutes Default 20 (OFF)
- All this does is tell the worker process it can shut down if it has been idle for mor ethan 5 minutes.  Useful if you have some Application pools that receive very ew hits and you want to lower the drain on system resources by allowing the process to exit rather than sit idle.

Enable pinging 5 seconds (default 60)
- The Web Admin Service will ping a worker process occasionaly to make sure it is still responding and kill it if it isn't.  Lowering the value of this actually can show a slight performance hit.  I would suggest returning it to 60 seconds.

Rapid fail protection  3 and 3  default 5 and 5
- Worker Processes will try to automatically restart if they fail for any reason.  If you have a Worker Process that fails very quickly after starting many times it can be a performance problem for the server.  This setting basically says if a Worker Process fails this many time in this time period don't try to start it again.  3 and 3 is fine, but 5 and 5 should be fine too.  This setting really only makes a difference if you have Worker Processes bombing out due to bad code (or for some other reason).

-------
Since I am running in Worker Process mode why do I have svchost.exe in my task manager. Is that normal? I know that dllhost is use in IIS5 mode not workerprocesses (w3wp).  Yet I also have one DLLHOST in task manager.  In Task Manager I see an equal amount of svchost.exe running as SYSTEM as I have W3wp.exe running as Network Service.  I know you can change how w3wp runs and that network service is the recommended secure method.

But I also have 3 more instances of svchost in task manager running as local service (2) and network Service(1).  So I have 8 instances of svchost.

svchost.exe houses services such as DNSCache, DHCP, TermService, RPCSS,  W32Time, EventSystem, etc....  Having several of them is normal.  (I have 9...)
The dllhost.exe you see is likely the System Application for the Component Services subsystem and is normal as well.
----------
What I find very odd is when I first see that images  do not load properly (on first try without a refresh in the browser lets say) if I go into task manager and check out CPU usage there is always ONE W3wp process running with exactly 25% CPU. Never higher.  This is only when there is an issue otherwise they are all at near 0 under CPU.  

If I kill that one W3wp process again within Task manager the page instantly loads the image perfectly and again the site is ripping fast.  And by the way, when working the new Image Server is Unbelievably fast.  I mean that it can dynamically created 50 thumbs in an instant.  A blink of the eye while the previous version would still be chugging away creating the thumbs one by one relatively slowly (like 2 per second).  I mention this to let you know that at least we see the potential is there.

From this I assume you probably have 4 processors in the server.  The one w3wp.exe that is using 25% CPU is likely caught in a tight code loop of some sort and is actually using 100% of one processor.  This is not a good thing and indicates a problem.  Personally I would capture a hang dump of the process at that point and see what the heck is sucking down so much CPU.
--------------

Overall I would lay a guess that the ISAPI extension is misbehaving and running into a logic error somewhere that results in a code loop.  A hang dump would bear some of this out but you'd have to have debug symbols for the ISAPI extension to be sure.  I would strongly recommend reducing the number of worker processes for the Application Pool to 1 or 2 and *absolutely* not increase it to 50.

Aside from that I'd consider beating (or suing) the developer with a bat until he isolate and fixes the problems in the code.

Dave Dietz
0
 
LVL 3

Author Comment

by:dcohn
ID: 13971949
Thanks to all of you for any assistance you have given.  I am going to ask for help in debugging this mess.  I am married to this application already.  My employer says make it work, the developer, well you guys know what I know about that.

The developer wrote a debug version of them application.

I install Microsoft debugging tools
Microsoft Symbols for W2003 PRE SP1 (cause SP1 is not on the box)
IISSTATE

I am resetting the WP's to 1 as it was for now since only 1 gets all the work anyway and he actually recommended it.

Now get this.  he developer has a setting as follows.  He has me set something to 4 processors.  I told him that when the application hangs the W3WP is at 25%.  I am sure this is because of that setting he has me set to 4.  He feels that since it is an Intel Xeon with Hyperthreading MP that 2 processors work like 4.

This is his exact quote

"This version has one small modification to improve system stability (at least we hope that it will). We also discussed changes you’ve made to server and want to recommend you to set:
Increase Worker processes to 1;
Recycle worker processes after 5000 (or even more) requests.

In the current version of VPS you can’t change the number of threads created in it. The thread number is equal to the number of processors available in the server (4 in your case). So the number of Processes should be 1, and the number of threads should be 4 (for your win2k3 server). Every working VPS thread may utilize no more than 25% of CPU.
If you don’t see more utilization, this just means that you don’t have simultaneous image processing operations in concurrent threads. At least Windows should manage threads in such way. But if you read somewhere information about multithreaded processing in IIS6, please, send me a link to it."

So I am going to try to run IISSTATE and give it the PID of the W3wp and hope we get something worthwhile.

Thasnks for anything you guys can glean for this extremely limited information.
0
 
LVL 34

Accepted Solution

by:
Dave_Dietz earned 2000 total points
ID: 13973007
"He feels that since it is an Intel Xeon with Hyperthreading MP that 2 processors work like 4."

To a certain extent this is correct.  Two hyperthreading procs will appear as 4 procs to the system - 2 physical and 2 logical. For the most part you can treat them as 4 processors.

"So I am going to try to run IISSTATE and give it the PID of the W3wp and hope we get something worthwhile."

I would recommend a different tool - DebugDiag.  You can download this from http://beta.microsoft.com.  You will need a Passport ID to log in and then you need to use the name 'DebugDiag' as the guest user name.

There is a powerpoint slide deck and the tool available foe download.  Install the tool and use it to generate a hang dump of the W3WP when you see it using 25% CPU.  You can then try using the auto-analysis feature of the DebugDiag tool to see if it can give you useful data and/or make the dump avaialble and we can look it over as well.

Dave Dietz
0
 
LVL 3

Author Comment

by:dcohn
ID: 13980630
I get an error anyway with IISSTATE

I installed the symbols on the machine which I see I did not need to but I don't get the error as their is nothing blocking outbound access.

C:\iisstate>iisstate -p 3420 -d -sc
Symbol search path is: SRV*C:\iisstate\symbols*http://msdl.microsoft.com/downloa
d/symbols

Microsoft (R) Windows Debugger  Version 6.2.0013.1
Copyright (c) Microsoft Corporation. All rights reserved.

*** wait with pending attach
The call to LoadLibrary(exts) failed, Win32 error 2
    "The system cannot find the file specified."
Please check your debugger configuration and/or network access.
Symbol search path is: SRV*C:\iisstate\symbols*http://msdl.microsoft.com/downloa
d/symbols
0
 
LVL 3

Author Comment

by:dcohn
ID: 13986226
Well I wrote a response here and probably forgot to post it in mhy excitement.  (HAHAHA)

I got IISSTATE to work and could not crach the friggin program.  AH but finally it did HANG and IISTATE threw an error and din't dump  BUGGY DEBUGGER ???????

So thank you DAVE!!    DebugDiag  ROCKS.  I ran it and waited.   We created a page that opens 1200 images and needs thumbnails created and I opened 20 tabs on firefox (turning off local caching just in case but It cannot cache anyway since the call is not to a jpg)

That machine can really churn out images and the product is so close.  But after about 5 hours of actually refreshing and after a while throwing a ;portion of our live traffic at it we got it hang.  AND  DebugDiag did it's thing PERFECTLY.  It gave me a HUNG DUMP.  Then I created another one manually and a minidump.

Then I did the analysis.

And I went through the PPT once for laughs.

It was IISSTATE that taught me the concept of attaching to the pid though so that was why I understood  DebugDiag.

So how do I give more points than I allocated and leave the question OPEN as we are not done for sure?

Can you guys email me for the url to the mht's and dumps so you can look at them.

But here are the developers comments.  By the way the developer is not from the US.  I think Russia but can't recall.

He said


Doug,

We have got all  mht files and impressed with them. I am trying to resolve problem with debug symbols to make it possible to identify exact bug location.
The dead lock with VPS you've reported was probably caused by our monitoring engine, so just turn it off now.
I'll send more info later today.
Alex.
0
 
LVL 34

Expert Comment

by:Dave_Dietz
ID: 13988620
I have placed contact info on my profile page.....

Dave Dietz
0
 
LVL 37

Expert Comment

by:meverest
ID: 13992115
sounds like you are well on the way to making those guys squirm!  Thanks Dave, I got me a copy of that too! ;-)

dcohn: it would be funny to add up all the time you have spent on this and show your boss what that works out to in $$ terms - I wonder if it comes out to what your boss considers small change (i think not! ;-)

but at least now you are more skilled enough to find a new job with a boss who appreciates your great work! :-)

Cheers.
0
 
LVL 3

Author Comment

by:dcohn
ID: 13992480
My boss is an ex customer and he appreciates me very much, believe me.  The issue is that money is tight and I do a ton of work along with his  partner so we can stay lean and mean.

The developers have always been close to knowing nothing but believe me we saved a bundle, even with my work.  We were spending $2500 per month on the previous app.  We spent $5K on this app and can use it on as many machines as we need.

That tool really rocks.  

I have always survived from learning how to do things but it is people like you that have these specific skills that save my ass time and time again.

THANK YOU ALL

I ain't done yet though cause he was missing symbols and I am now testing his second debug build.  This one seems more funky than the others like it doesn't hang but doesn't serve images either.  But I will throw enough requests at it to hang, trust me <BG>

Doug



0
 
LVL 34

Expert Comment

by:Dave_Dietz
ID: 13993240
Pulled the dumps and looked at the reports after digging through the dumps.

DebugDiag did a good job on this one.  :-)

Everything I found to be of interest was also in the MHT reports.

Basically, there are four threads sucking down the CPU.  At the time of the hang there are a pile of queued requests.
One thread is holding a critical section that all three of the other threads are waiting on and it's not going to let it go until it gets finished with the request it is processing.

Since only one of the threads can own the critsec at a time the others are spinning until it is freed and then the process starts all over again.

If you can collect new dumps and make the private symbols for the DLL available I can probably dig further, but without symbols the VPSIsapi DLL is a black box.

Dave Dietz

0
 
LVL 3

Author Comment

by:dcohn
ID: 13993476
That is what he claims he did.  but what has occured twice now is my test pages time out, and a new w3wp has started and I received an error message from Debugdiag about it not being able to complete the debug.

Must I attach the debugger to the specific pid or will it catch the hang just by watching the app pool?

I had to restart the attach a few times cause the worker processes are starting and the old ones end.  I have no recycling set except the 1740 minutes.  It was set at 50000 but I unchecked that also as I may have been reaching it with the traffic I am throwing at it.

Thanks again

Doug
0
 
LVL 3

Author Comment

by:dcohn
ID: 14042296
Thank you EVERYONE but dave's help has been unreal.  From that Debugdiag lead to help analyzing the dat I could ask for no more.

Still working on the issue but how to deal with it was the question and it is surely going that direction.

Will post followup,.

Doug
0

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Prologue It is often required to host multiple websites on a single instance of IIS, mostly in development environments instead of on production servers. I am sure it is not much a preferred solution on production servers but this is at least a pos…
Lync server 2013 or Skype for business Backup Service Error ID 4049 – After File Share Migration
Loops Section Overview
As many of you are aware about Scanpst.exe utility which is owned by Microsoft itself to repair inaccessible or damaged PST files, but the question is do you really think Scanpst.exe is capable to repair all sorts of PST related corruption issues?
Suggested Courses
Course of the Month10 days, 12 hours left to enroll

571 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question