[Openais] Re: recent segfault

Steven Dake sdake at mvista.com
Tue Feb 1 11:54:07 PST 2005


I have an idea on this issue that I was planning to look at yesterday
but a few things came up.

The basic idea is that the following scenario happens:
1. fragmented message in assembly area
2. configuration change builds configuration without processor that has
fragmented message data in assembly area
3. some messages get sent
4. new configuration change builds configuration with processor that has
fragmented message data in assembly area
5. next fragment delivered to message which doesn't apply to that
message fragment.

This is just a guess.  There may be some other variation on this idea
that is the cause.

Thanks
-steve


On Tue, 2005-02-01 at 12:24, Mark Haverkamp wrote:
> Steve,
> 
> I got a segfault yesterday in the evt function that makes a local event
> out of the received message.  It looks like the message was partially
> corrupt.  I did some looking around and found that, for instance, that
> the req_header says that the size is 712 (seems reasonable), but the
> iovec passed to the delivery function says that that the iov_len is
> 1306.
> 
> #0  0x4207c46c in memcpy () from /lib/i686/libc.so.6
> #1  0x08055f03 in make_local_event (p=0xb7f0a00c, eci=0x1201a8c0) at evt.c:1784
> #2  0x080573c1 in evt_remote_evt (msg=0xb7f0a00c, source_addr=
>       {s_addr = 302098624}, endian_conversion_required=0) at evt.c:2742
> #3  0x0804b432 in deliver_fn (source_addr={s_addr = 302098624},
>     iovec=0x8075b18, iov_len=1, endian_conversion_required=0) at main.c:702
> #4  0x08061807 in totempg_deliver_fn (source_addr={s_addr = 302098624},
>     iovec=0x80ebd58, iov_len=1, endian_conversion_required=0) at totempg.c:314
> #5  0x0805f86b in messages_deliver_to_app (skip=0, start_point=0x806d2c8,
>     end_point=32) at totemsrp.c:2847
> #6  0x0805fb4e in message_handler_mcast (system_from=0xbffff850,
>     iovec=0x806c5e0, iov_len=1, bytes_received=1472,
>     endian_conversion_needed=0) at totemsrp.c:2970
> #7  0x080611f6 in recv_handler (handle=0, fd=7, revents=1, data=0x0,
>     prio=0x80bc628) at totemsrp.c:3315
> #8  0x0805ae5e in poll_run (handle=0) at aispoll.c:386
> #9  0x0804bb67 in main (argc=1, argv=0xbffffa34) at main.c:1005
> #10 0x420158d4 in __libc_start_main () from /lib/i686/libc.so.6
> 
> (gdb) p /x *p
> $41 = {led_head = {size = 0x2c8, id = 0x13, error = 0x0}, led_in_addr = {
>     s_addr = 0x1201a8c0}, led_receive_time = 0xf5da2a3970817c8,
>   led_svr_channel_handle = 0x0, led_lib_channel_handle = 0x4e206275,
>   led_chan_name = {length = 0x12, value = {0x45, 0x56, 0x45, 0x4e, 0x54, 0x5f,
>       0x54, 0x45, 0x53, 0x54, 0x5f, 0x43, 0x48, 0x41, 0x4e, 0x4e, 0x45, 0x4c,
>       0x0, 0x0, 0x0, 0x0, 0x9c, 0x11, 0xfa, 0xb7, 0xff, 0xff, 0xff, 0x1f,
>       0xff, 0xff, 0xff, 0xff, 0x7, 0x0, 0x0, 0x0, 0x7, 0x0, 0x0, 0x0, 0x0,
>       0x0, 0x0, 0x0, 0x80, 0x8, 0x0, 0xb8, 0x0, 0x0, 0x0, 0x0, 0xe0, 0xf8,
>       0xff, 0xbf, 0xff, 0xff, 0xff, 0x1f, 0x8, 0x0 <repeats 11 times>, 0xc0,
>       0x2b, 0x7a, 0x0, 0x70, 0xf9, 0xff, 0xbf, 0xe0, 0x6, 0x0, 0xb8, 0x8, 0x0,
>       0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x58, 0x56, 0x0, 0x42, 0x0, 0x10, 0xfa,
>       0xb7, 0x0, 0x0, 0x0, 0x0, 0xdc, 0x61, 0x12, 0x42, 0x7, 0x0, 0x0, 0x0,
>       0x0, 0x0, 0x0, 0x0, 0x78, 0xf9, 0xff, 0xbf, 0xd9, 0xe3, 0x2, 0x42, 0x0,
>       0x0, 0x1, 0x0 <repeats 125 times>, 0xe8, 0xf9}},
>   led_event_id = 0x1201a8c00029579f, led_sub_id = 0x4e206275,
>   led_publisher_node_id = 0x1201a8c0, led_publisher_name = {length = 0xd,
>     value = {0x54, 0x65, 0x73, 0x74, 0x20, 0x50, 0x75, 0x62, 0x20, 0x4e, 0x61,
>       0x6d, 0x65, 0x0 <repeats 243 times>}}, led_retention_time = 0x0,
>   led_publish_time = 0xf5da2a3580bf670, led_priority = 0x2,
>   led_user_data_offset = 0x6c, led_user_data_size = 0x2c80000,
>   led_patterns_number = 0x130000, msg_id = 0x0, led_body = 0xb7f0a268}
> 
> led_user_data_size should be zero, and the patterns number should be 4
> 
> looking up the stack a little.
> #3  0x0804b432 in deliver_fn (source_addr={s_addr = 302098624},
>     iovec=0x8075b18, iov_len=1, endian_conversion_required=0) at main.c:702
> 702             res = aisexec_handler_fns[header->id](header, source_addr, endian_conversion_required);
> (gdb) p *iovec
> $42 = {iov_base = 0xb7f0a00c, iov_len = 1306}
> 
> This happened just after a config change.  No one left or came into
> membership. It seems like the assembly areas should have been OK, but
> the the 1306 looks a little funny.
> 
> Mark.
> 




More information about the Openais mailing list