[Openais] PATCH: aisexec leak & crash (updated)

Fabien THOMAS fabien.thomas at netasq.com
Wed Aug 30 08:43:50 PDT 2006


After doing additionnal testing you can find attached the latest  
patch for trunk and whitetank.
There is 3 problems pending that need a solution (but here i need  
help) two ending by a crash of aisexec.

Now aisexec can continuously run without growing dangerously in size  
but it is still crashing in the reported case.

SOLVED:
=======

- totemsrp.c: missing free in case of error.

- ipc.c: private_data not freed

- ckpt.c: misplaced call to hdb_destroy in the checkpoint module (hdb  
was destroyed during iterator finalize).

- all: pthread_mutex_destroy is never used:
i've tried to find the right place to destroy the mutex but it needs  
to be checked by module owner.

- all: pthread_attr_destroy, cond_destroy is never used (but not too  
many leaks here)

conn_info->shared_mutex is allocated by two thread and overwritten by  
one so the mutex is lost:
here i've just done a quickfix but the real solution need to be found  
later.

PENDING:
========

PROBLEM 1: checkpoint iterator
------------------

a) iterator are broken when a recovery is started (internal structure  
point to old checkpoint freed by the recovery process)

Muni is aware of the problem but the code need to be reworked.

b) ckpt_lib_exit_fn does not free iterator (there is a block of TODO  
in the code).
the problem with this leak is that each time i run a client aisexec  
will grow.

PROBLEM 2: reference to conn_info after free
-----------------

race condition somewhere during connection close: (i've this while  
breaking the client application)

it seems that checkpoint module access conn_info structure after the  
structure is destroyed.
==18010== Invalid read of size 4
==18010==    at 0x8064248: libais_connection_active (ipc.c:332)
==18010==    by 0x80655D1: openais_conn_send_response (ipc.c:963)
       struct conn_info *conn_info = (struct conn_info *)conn;

         if (conn_info == NULL) {
                 return -1;
         }
here ==>        if (!libais_connection_active (conn_info)) {
                 return (-1);
         }

==18010==    by 0x8072CAF: message_handler_req_exec_ckpt_sectionread  
(ckpt.c:3337)

        /*
          * Write read response to CKPT library
          */
error_exit:
         if (message_source_is_local(&req_exec_ckpt_sectionread- 
 >source)) {
                 res_lib_ckpt_sectionread.header.size = sizeof  
(struct res_lib_ckpt_sect
                 res_lib_ckpt_sectionread.header.id =  
MESSAGE_RES_CKPT_CHECKPOINT_SECTIO
                 res_lib_ckpt_sectionread.header.error = error;

                 if (section_size != 0) {
                         res_lib_ckpt_sectionread.data_read =  
section_size;
                 }

   here ==>              openais_conn_send_response (
                         req_exec_ckpt_sectionread->source.conn,
                         &res_lib_ckpt_sectionread,
                         sizeof (struct res_lib_ckpt_sectionread));


==18010==    by 0x806138B: deliver_fn (main.c:357)
==18010==    by 0x805B939: app_deliver_fn (totempg.c:395)
==18010==    by 0x805B70D: totempg_deliver_fn (totempg.c:553)
==18010==    by 0x805AAC2: totemmrp_deliver_fn (totemmrp.c:81)
==18010==    by 0x805843D: messages_deliver_to_app (totemsrp.c:3439)
==18010==    by 0x80580B8: message_handler_orf_token (totemsrp.c:3318)
==18010==    by 0x805A8EA: main_deliver_fn (totemsrp.c:4023)
==18010==    by 0x804EBC9: none_token_recv (totemrrp.c:506)
==18010==    by 0x80504BB: rrp_deliver_fn (totemrrp.c:1308)
==18010==    by 0x804CC2B: net_deliver_fn (totemnet.c:679)
==18010==    by 0x804B17C: poll_run (aispoll.c:402)
==18010==    by 0x8061C8F: main (main.c:594)
==18010==  Address 0x41D51B8 is 8 bytes inside a block of size 188  
free'd
==18010==    at 0x401CFCF: free (vg_replace_malloc.c:235)
==18010==    by 0x80640D5: conn_info_destroy (ipc.c:327)
==18010==    by 0x8064524: prioritized_poll_thread (ipc.c:456)
==18010==    by 0x4032340: start_thread (in /lib/tls/i686/cmov/ 
libpthread-2.3.6.so)
==18010==    by 0x41084ED: clone (in /lib/tls/i686/cmov/libc-2.3.6.so)
==18010==

PROBLEM 3:
------------------

unsolved leak:

totemsrp.c: there is 2 TODO LEAK that really leak but i can figure  
out where this block should be freed (iovec and mcast).


  
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch-leak
Type: application/octet-stream
Size: 12820 bytes
Desc: not available
Url : http://lists.linux-foundation.org/pipermail/openais/attachments/20060830/2dad5f19/patch-leak-0001.obj


More information about the Openais mailing list