Link to home
Start Free TrialLog in
Avatar of RJV
RJV

asked on

WSANOTINITIALISED not initialized error in a thread

I am using VC6, SP5 and am getting a strange error when using gethostbyname() in a thread. It is a worker thread and messaging is fully active. Yet when I access gethostbyname() I get a NULL returned and WSAGetLastError() immediately afterwards indicates a WSANOTINITIALISED error.

Digging into the on-line literature reveals that WSAStartup() either was not called or it failed. The problem is that it was called and it did not fail. It was also called at the start of the thread. I replaced that with AfxSocketInit() and get exactly the same error. This used to work fine and it seems that after Service Pack 5 came into the picture I get this problem. That seems a bit much to believe.

What can be causing this? How can it be resolved?

RJV
Avatar of jtwine100697
jtwine100697
Flag of United States of America image

Did SP5 "upgrade" any WinSock-related DLLs on your system?

-=- James.
Avatar of RJV
RJV

ASKER

James, I don't recall. However, looking at the dates of the winsock dll's, no, SP5 has not "upgraded" those.
Not the dates: I mean the file list that comes with the service pack.  

Binary incompatablity is one of the things on my mind right now, although IME that usually causes crashes with strange call stacks.

This may sound stupid, but do you have the latest platform SDK, and have you cleaned the project and rebuilt it recently?  

Also:
   o are you using the return value os WSAStartup(...) to determine success (zero) or failure (non-zero), or WSAGetLastError(...) or GetLastError(...)?  WSAGetLastError(...) will not yield a correct value if WinSock failed to initialize...  
   o Have you examined the contents of the WSADATA structure to see if it has any hints?  
   o Have you tried asking for a lower version of WinSock support?
   o Do you have TCP/IP installed on your system? :)

-=- James.
Avatar of DanRollins
For a test, I suggest calling WSAStartup() *directly* before the failing line.  

A simple test like that which takes 20 seconds could eliminate many wrong turns in debugging this problem.

-- Dan
Avatar of RJV

ASKER

Dan, indeed your suggestion worked. However, it is indeed very strange in that the whole class gets initialized and a pointer left in memory to it. I had checked and lpWSAData in memory was filled out properly, as was a boolean flagging proper WSAStartup() startup or not. All of this from when the class was first initialized when the thread started up.

By inserting WSAStartup() just before the offending line I was essentially throwing water on wet ground. Still, the error disappeared far as that line is concerned but not within the full scope of the class/thread. This I need to get to the bottom of and suggestions would be appreciated. One thing I will be doing as soon as I finish here is to check the pointers to see if the class doesn't go out of scope, even though the data in the variables are ok when stopping the debugger at the critical point. Also see below, in my comments to James.

James, as you can see above, the service pack was installed okay -- or be it, the DLLs are ok. I'd also like to note for you and Dan that FTP access also occurs within the view. That works fine. When the user decides to download or upload a file a thread is launched to do that. As everything is effectively "new" at that stage I make sure everything is initialized again, as the memory then belongs to the thread and not anymore to the main thread. A lot of what was done before is done again in the thread. This had worked and now decided to rebel.

I look forward to your thoughts as in a road full of curves we seem to be heading in the right direction.

RJV
Avatar of RJV

ASKER

I'd like to note that I rechecked the initialization, pointers and the like and indeed, nothing wrong. Evidently this would have to be the case as all variables were properly initialized when the offending line comes up. Still, all's possible, but not this.

RJV
>>All of this from when the class was first initialized when the thread started up.

Can you post your thread startup code?  That phrase "when the thread started up" is just slightly suspicious.  WSAStartup() must be called from the executing thread ... and not when the ctor of some class getting instantiated by the code that starts the thread.

Also, if your main thread calls into that class, it will be running on its own thread when the code gets executed.  You may need to post messages to the thread in order to ensure that the exepected thread is in control at the necessary times.

I'm shooting blind here since there is not a scrap of code to examine.  Just guesses.

-- Dan
Avatar of RJV

ASKER

Dan, here is some code. It was taken from the real code, though I have set it up in such as way so you may follow what happens:

// Thread startup (actually from another worker thread)
CWinThread* pThread;
pThread = (InternetTransfThread*) AfxBeginThread(RUNTIME_CLASS(InternetTransfThread));
if (pThread==NULL)
{
   // error, etc
}

// As you pointed out, need a message to start up
// and here it is and always was. It comes if after
// the thread was created correctly
pThread->PostThreadMessage(WM_INTERNET_THREAD,(WPARAM)pDdat,0);

// Here's the worker thread class (simple, as you can see)
IMPLEMENT_DYNCREATE(InternetTransfThread, CWinThread)

InternetTransfThread::InternetTransfThread()
{
}

InternetTransfThread::~InternetTransfThread()
{
}

BOOL InternetTransfThread::InitInstance()
{
    return TRUE;
}

// Here's the entry point of the thread message
void InternetTransfThread::OnRunInternetThread(WPARAM wParam,LPARAM lParam)
{
    InternetTransfer pTransf;
    DATA_NEEDED* pDATA = (DATA_NEEDED*)wParam;

    pTransf.StartThreadWork(pDATA); // goto another class
}

// Here we enter the other class indicated above. All
// variables have to be withing the thread class. BTW,
// this also gets called by the View or main thread. Of
// course, variables must be allocated to the main thread
// in that case
UINT InternetTransfer::StartThreadWork(DATA_NEEDED* pDATA)
{
    CFtpClass pFTP;
    int connect = pFTP.Connect(host,login,pswd);
}

// Now we get closer to the scene of the problem. Note
// that m_da is defined in the header of FTPClass. After
// Connect below is the start of the class, called
// DirectAccess. Thus, in the header, one has
// m_Da CDirectAccess;
int CFTPClass::Connect(LPCSTR host,LPCSTR user,LPCSTR pswd)
{
    Close(); // Make sure nothing's connected

    if(m_ws.GetAddrFromName(host,m_buf,sizeof(m_buf))!=
        NULL)
    {
        return -1;
    }
    else
    {
    }

// Here is the start of the CDirectAccess class. Of,
// course, when reference is made to it, things happen
// in the constructor. At that point WSAStartup() gets
// called. Note that no call is made to destroy the class,
// as CFTPClass isn't destroyed. Also, m_data is defined
// in this header's class: m_data WSADATA
CDirectAccess::CDirectAccess()
{
    // Initialize Windows Socket DLL
    if(WSAStartup(WINSOCKVER,&m_data)!=0)
    {
        m_success = FALSE;
    }
    else
        m_success = TRUE;
}
CDirectAccess::~CDirectAccess()
{
    // Goodbye Winsock DLL
    WSACleanup();
}

// Here is the scene of the problem. Note that I have
// commented your suggestion, even though the above
// should've done the trick. All variables, such as
// m_data and m_success are correct. If I call
// WSAStartup() here nothing changes in the variables.
// Most importantly, m_data does not change
int CDirectAccess::GetAddrFromName(LPCSTR name,LPSTR
    address,int maxLen)
{
    hostent FAR * host;

//    if( WSAStartup(WINSOCKVER, &m_data) !=0)
//    {
//    m_success = FALSE;
//    }
//    else
//        m_success = TRUE;
    host = gethostbyname(&name[0]);
    if(host == NULL)
    {
        int err = WSAGetLastError();
        return 2;
    }
    return 0;
}
So m_Da's ctor is executed when pFTP is instantiated which is in InternetTransfer::StartThreadWork()

That gets called by ...

  InternetTransfThread::OnRunInternetThread(...)

Who calls OnRunInternetThread(...)?  I would guess that it results from

pThread->PostThreadMessage(WM_INTERNET_THREAD,(WPARAM)pDdat,0);

so it would presumably be executed on the thread's timeslice.  So it's a mystery...

All I can recommend at this point is to place breakpoints to make sure that these things are executing in the sequence that they seem to be executing.

=-=-=-=-=-=-

I have to admit that I have not used the
  AfxBeginThread(RUNTIME_CLASS(InternetTransfThread));
method.  When I looked into it, it started looking too complicated, so I have always stuck to the simple form, designed for worker threads:

AfxBeginThread( MyThreadFn, pPtrToAUsefulObject );

I then have...
//-----------------------------------
// get "into" the object
UINT MyThreadFn( LPVOID pParam )
{
    CMyUsefulObject* pThis= (CMyUsefulObject*)pParam;
    UINT nRet= pThis->DoMyThreadProc();
    return( nRet );
}

UINT CMyUsefulObject::DoMyThreadProc()
{
    while( !m_fGetOuttaDodge ) {
        if ( m_fNothingToDo ) {
            sleep(100); // or whatever...
        }
        else {
            DoSomething(); // can call member fns
        }
    }
    return(1)
}

=-=-=-=-=-=-=-=-

Thus, I don't have experience with posting threads messages and so forth.

All I'm saying is that this simplified model is really easy to understand and fix.  The code that is called directly in the DoMyThreadProc fn is running on the worker thread associated with the CMyUsefuleObject object.  All other code is running on some other thread.  There is no question, never any confusion.

The drawback with this model is that the worker thread cannot call into the UI thread directly (to, e.g., update a status control).  Instead, I have the thread simply set status codes and progress values.  Then the UI thread monitors those values --usually on a window timer -- to update screen displays, etc.  The thread can also post messages to any window, so it isn't that big of a drawback.

-- Dan
Avatar of RJV

ASKER

Well Dan, I have set breakpoints and things act exactly the way you describe them. In fact, I had loaded the thread much in the manner you describe, except for some differences. The issue became a problem with sockets as in mfc they post messages. Thus you have to be sure to have messages active in the socket thread class.

The way I have loaded the thread assures one of that. After that, being in the thread class, all should be smooth sailing as all memory, pointers and the like are within the thread, as also in your example.

The one thing that may be a source of problems is the way I call the thread work, after the thread message. Namely,
this way: pTransf.StartThreadWork(). Maybe that is the culprit. Any ideas on that?

Also, I could post a message within the thread class itself. Lastly, I could change the thread to operate in a similar fashion as you have illustrated. Still, even in this case would the way StartThreadWork() starts be a problem?

In terms of posting a message to the UI thread, no problem. You can get a pointer to the thread and even pass that pointer when you start the thread. From there do as I did to post the message. You have to post a message and not send one, as the thread cannot wait for a send to return.

RJV
>> The issue became a problem with sockets as in mfc they
>> post messages. Thus you have to be sure to have
>> messages active in the socket thread class.

The multi-threaded app that uses the simplified model that I described above uses CSocket to process 4000 credit reports per hour.  It does not create a window and it does not perform any message handling.  At all.

The beauty of threading that the thread code does not need to handle or respond to messages.  Each thread runs synchronously:  Send something then wait for a response.  It is a *worker* thread with no user interface.  As I understand it, there may be some messages being passed around -- I think to a hidden window created when the soocket is created --  but I've been able to make it all work without any windows or messages in a production environment for years.

>>After that, being in the thread class, all should
be smooth sailing as all memory, pointers and the like are within the thread

A common misconception.  For instance, you have a CMyThreadClass that has a fn named Sew().  If Sew() is called from your CMainFrame or CDialog, it doesn't matter Sew() is part of your "thread object"  or that the object itself was instaniated by a specific thread.  All that matters is that that call originated from a different thread.  Program code does not "belong" to any specific thread, no matter what naming conventions you use.

>>pTransf.StartThreadWork(). Maybe that is the culprit.

See above.  If that fn is called by the thread, then it is being executed by the thread.  If it is called by a CDialog, then it is executed by the CDialog -- regardless of what object "owns" the code.


>> I could change the thread to operate in a similar
>> fashion as you have illustrated. Still, even in this
>> case would the way StartThreadWork() starts be a problem?

If you used the simplified model, you would not need any complex startup logic.  The code I showed is all that is needed.  The only thing I didnt show is where pPtrToAUsefulObject in...

   AfxBeginThread( MyThreadFn, pPtrToAUsefulObject );

came from.  I have a fn named StartNewThread() and it new's a CMyUsefulObject and saves its pointer into an array befor calling AfxBeginThread.  Upon return from CMyUsefulObject::DoMyThreadProc(), I can delete that object after collecting any result data from it.

-- Dan
Avatar of RJV

ASKER

Well Dan, implementing the way you suggest leads to the same frustrating problem. When I get to gethostbyname() I get a NULL result. Unfortunately the type of thread didn't make any difference. BTW, did you by any chance recompile your application using VC6 and Service Pack 5? If you haven't I'd be curious to see if you wind up with the same type of problem.

RJV
I just tested to be certain.  Paste this code into a Dialog-based App and add a handler for IDC_BUTTON1 to execute CMyDlg::OnButton1().
 

class CMyUsefulObject {
public:
    CMyUsefulObject::CMyUsefulObject() {};
    void DoSomething();
    UINT DoMyThreadProc();

    CString m_sThdID;    
    BOOL m_fAbort;
    BOOL m_fNothingToDo;
};

//-----------------------------------
// get "into" the object
UINT MyThreadFn( LPVOID pParam )
{
   CMyUsefulObject* pThis= (CMyUsefulObject*)pParam;
   UINT nRet= pThis->DoMyThreadProc();
   return( nRet );
}

CMyUsefulObject* gapObjs[10];
int              gnObjCnt= 0;

void CMyDlg::OnButton1()
{
    CMyUsefulObject* pObj= new CMyUsefulObject();
    pObj->m_sThdID.Format("This is Thread #%d", gnObjCnt );

    AfxBeginThread( MyThreadFn, pObj ); // start it

    gapObjs[gnObjCnt]= pObj; // save it so can delete later
     gnObjCnt++;
}

UINT CMyUsefulObject::DoMyThreadProc()
{
    AfxSocketInit();
    m_fNothingToDo= FALSE;
    m_fAbort= FALSE;
    while( !m_fAbort ) {
       if ( m_fNothingToDo ) {
           Sleep(100); // or whatever...
       }
       else {
           DoSomething(); // can call member fns
           m_fNothingToDo= TRUE;
       }
   }
   return(1);
}

void CMyUsefulObject::DoSomething()
{
    hostent* pHost;
    char szName[255];
    strcpy( szName,"www.microsoft.com" );

    CString sMsg;
    pHost= gethostbyname( szName );
    if(pHost == NULL) {
        int err = WSAGetLastError();
        sMsg.Format("ERROR: WSAGetLastError() is %d", err );
        if (err == WSANOTINITIALISED ) sMsg += "\n WSANOTINITIALISED";
        if (err == WSAHOST_NOT_FOUND ) sMsg += "\n WSAHOST_NOT_FOUND";
    }
    else { // success!
        sMsg.Format( "%s\n gethostbyname('%s') returned ", (LPCSTR)m_sThdID, (LPCSTR)szName );
        CString sTmp;
        struct sockaddr_in rDestIP;
        for( int j=0; pHost->h_addr_list[j] != NULL ; j++ ) {
            memcpy( &(rDestIP.sin_addr), pHost->h_addr_list[j], pHost->h_length );
            sTmp.Format( "\n IP address is: '%s'", inet_ntoa(rDestIP.sin_addr) );
            sMsg += sTmp;
        }
    }
    AfxMessageBox( sMsg );
}

It works -- each thread call gethostbyname (it takes a couple of seconds) and pops up the messagebox with the IPs.  Note that I call AfxSocketInit() at the start.  It also works if I use:
    WSADATA rData;
    if( WSAStartup(0x0002, &rData) !=0) {
        AfxMessageBox("Argggh!");
    }    
    // AfxSocketInit();

But note where I call it: In DoMyThreadProc.   I am *unquestionably* executing in that thread and not on some UI thread.  You can easily get confused.  Becasue AfxBegineThread itself does not start the thread.  It creates it and schedules its first timeslice.  So keep that in mind when setting variables in the object.

-- Dan
More notes:
I (re)instaled SP5 just to be sure.  Works fine after clean and rebuild

When I created my test app, I selected the add socekt support in the wizard.  this has the effect of putting
if (!AfxSocketInit())
{
     AfxMessageBox(IDP_SOCKETS_INIT_FAILED);
     return FALSE;
}

In app.InitInstance().  When I commented that out and commented out the per-thread call, I get the WSANOTINITIALISED as you describe.  When I add it back in, to InitInstance, *all* threads' calls to gethostbyname work without calling WSAStartup.  I did not check further to verify the effect on other socket and FTP fns.

-- Dan
Avatar of RJV

ASKER

Dan,

Arghhh... I wrote an answer and rather than update it, I got a browser error. Back and my reply was gone!

Anyway, it works. My goal now is to break it as I must find out why a socket thread that worked fine no longer does. I have included another thread in your app which calls the socket thread. This is what the broken app I have does. Your app works.

I will let you know in due course and hope I find the problem on an asap basis.

RJV
>>I have included another thread in your app which calls the socket thread.

I hope that is a slip of the tounge rather than a major misconception.

A thread *cannot* call another thread.  A thread can run code that may be conceptually associated another thread, but that is not anything like calling a thread.  

For instance, if a user clicks a button making your program call a member of CMyUsefulObject, then that piece of code is running under the UI thread (even if you like to think of CMyUsefulObject as per-thread program code).

Conversely, if you start a new worker thread and it calls a member function of your CMainFrame, then that code is running under the new thread (and this will cause many types of program failures).

This is not a semantic quibble.  It is an important conceptual distinction.  A thread is not a piece of program code or an object or anything tangible (even if you derive from CWinThread or name your object CThisIsaThreadDammit).  A thread is the *idea* of a processor working its way through some code -- not the code itself.

-- Dan
Avatar of RJV

ASKER

Dan,

The UI launches (or creates and then launches) a worker thread. This in turn creates and launches the worker thread with your socket code. In other words, UI AfxBeginThread(worker1) -> worker1 AfxBeginThread(worker2).

Evidently if one calls or runs a function directly from a certain thread, that function will use all of the resources of the thread that called it.

RJV
Ah, I see.  Yes, I have never checked into inherience at that level.  Good luck in the test.

-- Dan
Avatar of RJV

ASKER

Dan,

Need your thinking cap. BTW, I plan to increase your points as you've put in a super effort.

Anyway, the application should work. Essentially it is the same as yours. The difference is as follows:

  1. When a specific application related function occurs
     it flags the fact. The it makes sure the UI timer is
     set to active.

  2. The timer checks if there are any pending function
     flags. If yes it checks that the pending functions
     are dealt with by creating a thread. To make sure
     the thread does its work it fires off a thread
     message to the thread that was created.

  3. When the posted message occurs the thread checks to
     see when the user decided to run the function, such
     as to download or upload via ftp. It checks the time,
     thus avoiding any timer in a thread.
 
  4. If the function should be activated immediately or if
     the given time arrives it creates the ftp thread,
     with the socket in it. Now I am creating this thread
     exactly your way.

  5. The socket thread then goes about the ftp or similar
     work.

That explained, I have a feeling that the posted thread message may be the source of the problem. The posted message is only created once I have a valid thread. Thus one would assume the thread exists and that it is safe to post the thread message. Still, now I'm not 100% certain that things are okay. After all, it should work, as indeed it did once. It is all exactly the same as your example, except for the posted message.

Assuming the thread is not ready, that too should not be a problem. I might be posting from the UI from the time checking thread. Since it spawns the socket thread, that should work anyway as that is the critical thread and not the time checker thread.

In the past I thought of using events for this purpose. The problem here is that I cannot pass data to the thread through events. Still, it should work.

What are your thoughts on the matter? Maybe you can attempt something similar in your example app, as I am concentrated in making my app as similar to yours as possible, to then backtrack to where the problem should be. That, thus far, hasn't been possible.

RJV
My first thought is that I'd try to do it without the extra step (the first, UI/PostThreadMessage, thread).  

It seems to me that the UI window timer that starts the whole process could do all of the checking and then launch the (real) worker thread itself at the scheduled time.

I also want to repeat that I have no experience with spawning UI threads with the RUNTIME_CLASS() syntax.  Once I saw how clean and easy it is to do it the other way, I stopped looking.  I admit that is a hole in my knowledgebase and I do want to fill it.  I will try some experiments to emulate the system that you described.  

Alas, I will be out of town for a few days.  If you don't hear back from me by Tuesday, please post something here to trigger an email and wake me up.

-- Dan
Avatar of RJV

ASKER

Dan,

This might help narrow the scope of what is happening.

On this end I entirely eliminated the thread messages. To be exact, I only did some substituting for test purposes, having the timer call the function activated before with a posted thread message. After that the thread was created using your method (unchanged from our last exchange).

On the first two attempts everything was fine. I got a host with gethostbyname(). On the third attempt (my goal was at least 3 good tries) I got an access fault just on stepping past gethostbyname() while debugging. Not a WSANOTINITIALISED error. I have had that on occasion on the past. Do you have any ideas of why that might be? As I just stepped over that one instruction, I couldn't have anything else causing the problem.

Curiously, after this problem, my next debug session had a NULL in the host returned. I didn't check what the exact error was but imagine that the socket isn't okay as I did not reboot or do anything similar.

RJV
Access violation there is most likly an invalid setting for m_buf.  How does it get set? Check it before each call.

-- Dan
Avatar of RJV

ASKER

Dan, what m_buf? Namely, where? I also couldn't find it in your app example.

RJV
I went back to your original source:

int CFTPClass::Connect(LPCSTR host,LPCSTR user,LPCSTR pswd)
{
   Close(); // Make sure nothing's connected
   if(m_ws.GetAddrFromName(host,m_buf,sizeof(m_buf))!=
       NULL)
   {
       return -1;
   }
...

int CDirectAccess::GetAddrFromName(LPCSTR name,LPSTR
   address,int maxLen)
{
   hostent FAR * host;
   host = gethostbyname(&name[0]);
   if(host == NULL)  {
       int err = WSAGetLastError();
       return 2;
   }
   return 0;
}


Avatar of RJV

ASKER

Dan,

In reality, m_buf becomes address in GetAddrFromName(). I cut that part out, but if host=gethostbyname(&name[0]) is successful, the host will be used to get the IP address, which is then placed in address. Thus m_buf is a pointer to a buffer that will receive the address. Changes to it won't affect anything.

Should name change, one would get an unknown name error. Instead, I get a WSANOTINITIALISED error. Worse, after that Access Violation now constantly causes the WSANOTINITIALISED error. To add a bit of insult to injury, I access the given host in the UI. That never has any problems with it. When accessing from the thread after the UI did its thing, things blow up. In the past I even elimiated all UI accesses, to no avail. The error continues the same.

Evidently, a vexing problem.

RJV
>>Thus m_buf is a pointer to a buffer that will receive the address. Changes to it won't affect anything.

OOps, I meant for you to take a look at the "host" variable which is passed into your Connect() fn.  If it is null or if it has an invalid value, or if it has gone out of scope at the time the thread makes this call, then you are passing a bad pointer in to gethostbyname.  
=-=-=-=-=-=-=-=-=-
Incidently, that type of variable naming will get you into trouble again anad again.  You are using host as a hostent* in one fn and a LPSTR in its calling fn.  Think of the guy who ends up maintaining your code -- especially if it is you!  I avoid many problems by naming all LPCSTRs with leading psz and all structures with leading r and all pointer-to-structures with pr.  Examples:

   LPCSTR pszHostName;
   hostent rHost;
   hostent* prHost;
=-=-=-=-=-=-=-=-=-
Also, I see that you are calling Close() before calling your GetAddrFromName fn.  I suppose that is a direct call to CFtpConnection::Close() which should be OK,  but you should verify that you have not overridden Close() with some funny code in your derived class.

-- Dan
hi RJV,
Do you have any additional questions?  Do any comments need clarification?

-- Dan
Avatar of RJV

ASKER

Dan,

Well, indeed, the host variable is okay. As to the variable naming, I see your point. There are various ideas on that subject. In this particular case, I am actually trying to find out why a bought TCP/IP toolkit doesn't work properly suddenly, seeing as it works fine in previous versions but the changes refuse to. The support has been very poor indeed and as I have the source, the best bet is to dig in and solve it. It could be the toolkit or something I did.

What I can report is a bit strange. When the class is destroyed (in the example above, CDirectAccess), WSACleanup() gets called. Digging around, I found out in VC++'s help that by calling WSACleanup() in any thread, the socket DLL gets unloaded from the whole application and all socket work in all threads will stop (take a look at that).

Well, armed with this info, dove into the UI part that also accesses the socket up 'till them with no problems and removed all local function class pointers. Namely, all CDirectAccess pDAcc type stuff. When the function finishes the class gets destroyed, thus calling WSACleanup. I put everything into header files and fired away.

Surprise! The problem moved from the worker thread to the UI!! Now something that had been working like a charm has exactly the same problem. The only crazy conclusion I can come up with thus far is that there is some timing problem. As it is a UI, I get to click on a directory tree to see remote file contents. I do that fast and no problem. A bit slower and there is the problem.

This is nuts, particularly as the DLL is loaded, now sits in undestroyed memory (indeed, nothing destroyed, just like in the thread) and guess what.

I tell you, I am learning some strange things and I hope it will help me in the present and future, and maybe you too will learn something on this one.

RJV

Avatar of RJV

ASKER

Dan,

Well, indeed, the host variable is okay. As to the variable naming, I see your point. There are various ideas on that subject. In this particular case, I am actually trying to find out why a bought TCP/IP toolkit doesn't work properly suddenly, seeing as it works fine in previous versions but the changes refuse to. The support has been very poor indeed and as I have the source, the best bet is to dig in and solve it. It could be the toolkit or something I did.

What I can report is a bit strange. When the class is destroyed (in the example above, CDirectAccess), WSACleanup() gets called. Digging around, I found out in VC++'s help that by calling WSACleanup() in any thread, the socket DLL gets unloaded from the whole application and all socket work in all threads will stop (take a look at that).

Well, armed with this info, dove into the UI part that also accesses the socket up 'till them with no problems and removed all local function class pointers. Namely, all CDirectAccess pDAcc type stuff. When the function finishes the class gets destroyed, thus calling WSACleanup. I put everything into header files and fired away.

Surprise! The problem moved from the worker thread to the UI!! Now something that had been working like a charm has exactly the same problem. The only crazy conclusion I can come up with thus far is that there is some timing problem. As it is a UI, I get to click on a directory tree to see remote file contents. I do that fast and no problem. A bit slower and there is the problem.

This is nuts, particularly as the DLL is loaded, now sits in undestroyed memory (indeed, nothing destroyed, just like in the thread) and guess what.

I tell you, I am learning some strange things and I hope it will help me in the present and future, and maybe you too will learn something on this one.

RJV
>>a bought TCP/IP toolkit doesn't work properly suddenly,

Once you throw-in some third-party code, it certainly becomes harder to diagnose a problem.  On the other hand, that code, when used *exactly* as intended should have a high probability of working as advertised or the company won't stay in business for long (if they are already out of business, refer to item #2, below).

I see two ways to proceed:

1) "Start over" by creating a minimal app to excercise the TCP/IP toolkit -- presumably using sample code provided with the toolkit.  Get that all working in a multi-threaded test scenario with virtually no UI.  Then start replacing the overly-complicated guts of your app with the minimalized code produced here.

2) Scrap the toolkit and use standard MFC / WinInet API calls and components.  These are very reliable.  Thousands of companies have bet their collective futures on this code base.  Bonus: There is plenty of documentation and sample programs available and persons having expertise with these tools are more readily avaialble.

-- Dan
Avatar of RJV

ASKER

Dan,

In terms of 1, I did exactly what you suggested, with your own code example, and the toolkit worked fine. Indeed, it does not use typical mfc classes but the traditional winsock functions. There are no major secrets to it.

There is one thing I'm not sure is easy to do with mfc: ftp transfers with recovery. That has been implemented. It's also very important to remember that all of this was and is working in an earlier version; toolkit and all, in threads. The part added is the UI having socket access.

There aren't many secrets in the toolkit's handling of ftp. It gets the correct host, connects and then starts sending or receiving, with some delays and checks in between.

I do have another part, also working with threads, that do direct transfers, which can be eliminated (in terms of the toolkit) but wouldn't be that simple. Even then, not complex code in the toolkit. Due to the simplicity, why is it effectively crashing? How certain can I be that replacement code also won't crash in the same way?

RJV
At some level of diagnosis *all* programming problems boil down to a complexity overload.  A chaotic cascade of trival details -- a 2-micron shift of one grain of sand turns instantly into a million-ton landslide.

The *only* defense is to write rock-solid code at the bottom level.  And then build in small, steps, adding only fully-tested incremental changes to the code.

I admit that when I confront a problem like yours, I first try to isolate and patch the code.  But when that fails, I always end up starting from scratch and rebuilding from the foundation, testing thoroughly at each step.  The next pass code is vastly superior.  And with this *new* code base as a foundation, you can add more complexity with less risk.

I am sorry that all I can do is pass on general advise like this, but I can see no other way to proceed.

-- Dan
hi RJV,
Do you have any additional questions?  Do any comments need clarifications?  

Do you expect to award any points for the considerable amount of work I've done here?

-- Dan
/me is not holding his breath...! :)

-=- James
Avatar of RJV

ASKER

Dan,

I tried all of last week to post a reply. I had promised the points and there is no reason not to award them, for the work done. However, the problem has not been resolved. Evidently, James contributes nothing more than breath, so can't award him anything!

I'd like to note that a rewrite means a huge amount of work. What should be redone? The whole app, which has a background of over 3 years of work? The UI part? The thread part? The 3rd party socket part? Worse, nothing assures one that it will indeed resolve matters, as:

  1. The UI should not have anything to do with the
     problem, particularly in a thread;

  2. Everything worked fine until a small change was
     made and VC6's last SP was applied;

  3. The part that stops working in the thread is small.
     all that happens in the thread is to connect and
     send and no more. Really, not much to rewrite, for
     example.

  4. I recreated the problem in the UI with class
     pointers instead of local function pointers. Upon
     going back to function pointers all was fine again,
     but only in the UI;

  5. Research in Microsoft and several other places
     shows that maybe the problem is elsewhere.

The evidence of 4 above, which I tried posting to you last week, is that WSAStartup() or its MFC equivalent cannot be called more than once in an application. The type of error will be exactly the type I am getting. Have you been able to do that in several threads? The examples I see of multi-threaded sockets create one socket, which is then passed on to other threads. I am now hunting for examples of sockets created that connect either to the same IP more than once or to different IPs. In other words, kind of like having several clients in each thread, all in one application.

I look forward to your thoughts on this, particularly as I too am frustrated by not getting to the bottom of matters.

RJV
A commerical multithreaded client app that is my main bread-and-butter calls
  AfxSocketInit( NULL );
once at the very start.  Just before it creates each thread, it 'new's an object that contains a CSocket (so the socket is actually created by the main (UI) thread.

Then each thread sets its host IP and port, and begins sending and recieving data.

It is a very simple model.  I use the exact same technique for implementing multiple simultaneous HTTPS connections.

Lets look at your numbered statements:

>> 1. The UI should not have anything to do with the
   problem, particularly in a thread;
Wrong.  THe most common errors in MFC programming relate to U/I u\interaction with threads.  One must decouple the U.I entirely from the thread functionality.  I have the UI do all status updates on a timer so there is *no* way U/I code gets executed on a thread's timeslice.

>> 2. Everything worked fine until a small change was
   made and VC6's last SP was applied;
Great.  Back off that small change and you will know that it was the Service pack.  Reinstall VC6 and upgrade to the lower-level SP to verify that what you say is true.  

If it is a problem in some MFC code, then the problem is probably localized to MFC40.DLL  Try running the program on a system that has the older MFC40.DLL

>> 3. The part that stops working in the thread is small.
    all that happens in the thread is to connect and
    send and no more. Really, not much to rewrite, for
    example.
I'm not sure what you're getting at here.  If it is easy to fix, then fix it.

>> 4. I recreated the problem in the UI with class
    pointers instead of local function pointers. Upon
    going back to function pointers all was fine again,
    but only in the UI;
See my comments to #1.

>> 5. Research in Microsoft and several other places
    shows that maybe the problem is elsewhere.
What have you learned?  That it is unwise to call WSACleanup() from within a thread?  If that is the problem, then the solution is pretty clear cut.  But we know that that is *not* the problem, don't we?  The behavior is seen before any thread calls WSACleanup.

-- Dan
Avatar of RJV

ASKER

Dan,

First, in this app the UI is totally decoupled from the threads and have always been. In fact, so much so that the socket threads are have been created by a thread.

The problem is that I added a socket to the UI that wasn't there originally. The reason for that was to get addresses and data to later use them in the threads. So, yes, I can get rid of that but will not solve the problem of getting the information.

Which brings me to another question. Does your UI create the socket (i.e. allocate memory with new) and transfer that to the thread, or does it do that within thread? Please exemplify. Maybe my best bet is to get rid of UI sockets altogether and leave them always in threads, to then figure out how to get the new info for the end user.

As to your reply to question 5, what have I learned researching Microsoft and elsewhere? That:

  1. That one should not use WSAStartup() or
     AfxSocketInit() more than once in an application.
     Naturally, not within the thread. It should be
     loaded once only in the application.

  2. Calling WSACleanup() several times makes no
     difference. However, calling it the same number
     of times WSAStartup() was called will make it
     disable all socket work in the whole application,
     wherever it may have been called (i.e. in whichever
     thread). It is also sufficienly unclear in
     Microsoft if calling it only once can stop
     everything everywhere.

  3. Microsoft seems quite adamant at not creating the
     socket more than once. It should be created and
     then transferred to the thread.

I'd like to note that I have a server and another special client all using sockets with threads very nicely. The problem hit this one, which has an extensive UI, but only after the inclusion of the socket in the UI.

I'm interested in seeing what you do with your sockets initially, as this may well be the key to the problem here.

RJV
>> Does your UI create the socket (i.e. allocate memory with new) and transfer that to the thread, or does it do that within thread? Please exemplify.

When I create a thread, I create a CQueueMonitor object and pass the address of that object as the thread param.  The thread fn sits in a while loop checking for a new request and sleeping for 10 seconds.  It keeps looping until CQueueMonitor.m_fAbort is set by the GUI.

When it gets a request, it checks to see if it is to go out by modem, TCP/IP, https, or a couple of other alternatives.  In the case of TCP/IP, it then creates my CTransportSock object and calls its SetupAndConnect fn.  There it deletes the old CSocket:
    if ( m_pcSocket ) {
        delete m_pcSocket;
        m_pcSocket= 0;
    }
and does
    m_pcSocket= new CSocket();
It then does
    m_pcSocet->Create();
and
    m_pcSocet->Connect( "123.123.123.1", 1234 );
finally, it creates a CSocketFile and archive...
    CSocketFile file( m_pcSocket );
    CArchive    arOut(&file, CArchive::store | CArchive::bNoFlushOnDelete );

and does some
    arOut.Write()
fns before allowing the CSocketFile to go out of scope.  Then it reads from the connection using a similar technique:
    CSocketFile file( m_pcSocket );
    CArchive    arIn( &file, CArchive::load);
    ...
    nActual= arIn.Read( pBuf, nRespLen );  
again, the CSocketFile goes out of scope when done reading.

Along the way, it logs progress:
   m_nProgressPct=n;
and sets status:
   m_sCurStatusText="Sending Request";
which the GUI monitors on a Window Timer so it can update a progress bar and display text.  It also logs messages to a trace window and log file and checks for an abort request:
    if (m_fAbort) {...}
which can be set by the GUI at any time, but will be noticed only occasionally by the Thread code.  

Thus the UI is completely decoupled.  At no time is the thread calling code that has anything to do with a window.  And at no time is the UI thread calling into any object associated with a socket.

=-=-=-=-=-=-
So, in answer to your question, the CSocket is allocated,  Create()ed, and deleted by the thread code.
=-=-=-=-=-=-

Note that I do not call any of the lower-level WSA code, such as gethostbyname at any time.  Without knowing why, this may be related to your problem -- calling into the socket support at two different levels.

-- Dan
Avatar of RJV

ASKER

Dan,

Thanks to your last input I think I'm (dare I say, we are) getting closer to the core of the problem.

I gather from your reply that you have a thread which monitors activities by waiting for an event. When one occurs it creates a socket, unless one already exists and then goes about its business. Two questions here, if I have understood correctly:

  1. Does that one thread deal with all socket activities,
     relying on events to do its work?

  2. If the answer to one above is yes, what happens if
     an event occurs as it is busy with another socket?

  3. If the answer to one above is no, does it create
     another thread with another socket?

What I have here and that is what I think is the core of the problem, I create a socket at the UI level. The user does some work at the UI level. Depending on the outcome, the UI or a secondary monitoring thread creates another thread and yet another socket in that thread. Thus one has at least two, if not several sockets duly created and working in the work threads and in the UI. It seems that creating several sockets in different (and decoupled) threads can be a major source of problems, according to Microsoft and others. The problem mentioned is the same I am having right now.

RJV
Actually, rather than "events" these threads just examine a queue.  So when there is nothing in the queue, then sleep (and wake up to poll every so often).  When there many things in the queue, each thread simply takes the top item and runs with it.  No events are missed, but the queue can grow long if there are many items to be processed and few threads to process them.

It would not be too hard to keep an eye on the queue and add more threads when the queue started getting too large.  

This idea of a "thread pool" in which each thread simply looks at a queue and sleeps when there's nothing to do is very easy to understand and I've found it to be quite reliable.

-- Dan

Avatar of RJV

ASKER

Dan,

I gather then that you indeed do create sockets with each thread, as you need them. Please confirm.

Further to the matter of CSockets versus pure API access, if you look at the CSocket source you will be taken to the CAsyncSocket, wherein all is AfxSocketInit. Note that the APIs are called in a very pure manner. Thus, one could use one of the other, even though many in lists around the net have complained of using mfc socket classes.

I have decided to tear this appart a bit better, particularly after I read this:

http://support.microsoft.com/support/kb/articles/Q237/5/72.ASP

To my amazement, I discovered that there is an inherent memory leak, by design! At the very top Microsoft uses the term beta; don't stop reading there. It is clear, from this documente updated in August this year, that the memory leak exists in Win9x and NT environments and only was fixed in W2K. As the toolkit I am (nearly was by now) using loads the DLL countless times, there is a nice memory leak thanks to this design.

I look forward to your reply to the question above.

RJV
ASKER CERTIFIED SOLUTION
Avatar of DanRollins
DanRollins
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of RJV

ASKER

Dan,

Evidently there are quite a lot of issues with this matter and indeed one matter is still unclear to me. However, I promised the points for the effort and am, as such, awarding them. The problem here has not been resolved and now even occurred in your own example.

RJV
Avatar of RJV

ASKER

I had increased this to 350 points but the system limits it to 300.