[Openais] [PATCH corosync trunk] When sync is aborted clear the "my_ring_id" variable.

Steven Dake sdake at redhat.com
Mon Apr 12 08:06:55 PDT 2010


Great investigative work.

Please merge at your earliest convenience and I'll release a corosync
1.2.2.

Regards
-steve

On Mon, 2010-04-12 at 20:17 +1000, Angus Salkeld wrote:
> Hi
> 
> This patch fixes crashes found by repeated pacemaker CTS SimluStart
> tests. When you bring up the nodes together it can cause a lot of
> configuration changes and sync gets started and aborted
> lots of times.
> 
> When abort is called the ring_id is not changed which means that any
> sync packet that arrive from that point on will be accepted as valid.
> I have seen old barrier messages causing the processing index to increment
> later causing an array out of bounds.
> 
> This patch memsets the ring_id to 0, thus causing the ring_id in the packet and
> my_ring_id not to match.    
> 
> Regards
> Angus
> 
> 
> Signed-off-by: Angus Salkeld <asalkeld at redhat.com>
> ---
>  exec/syncv2.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/exec/syncv2.c b/exec/syncv2.c
> index 57b501b..559e199 100644
> --- a/exec/syncv2.c
> +++ b/exec/syncv2.c
> @@ -665,6 +665,11 @@ void sync_v2_abort (void)
>  		schedwrk_destroy (my_schedwrk_handle);
>  		my_service_list[my_processing_idx].sync_abort ();
>  	}
> +
> +	/* this will cause any "old" barrier messages from causing
> +	 * problems.
> +	 */
> +	memset (&my_ring_id, 0,	sizeof (struct memb_ring_id));
>  }
>  
>  void sync_v2_memb_list_determine (const struct memb_ring_id *ring_id)



More information about the Openais mailing list