[Openais] [PATCH corosync trunk] When sync is aborted clear the "my_ring_id" variable.
Steven Dake
sdake at redhat.com
Mon Apr 12 08:06:55 PDT 2010
Great investigative work.
Please merge at your earliest convenience and I'll release a corosync
1.2.2.
Regards
-steve
On Mon, 2010-04-12 at 20:17 +1000, Angus Salkeld wrote:
> Hi
>
> This patch fixes crashes found by repeated pacemaker CTS SimluStart
> tests. When you bring up the nodes together it can cause a lot of
> configuration changes and sync gets started and aborted
> lots of times.
>
> When abort is called the ring_id is not changed which means that any
> sync packet that arrive from that point on will be accepted as valid.
> I have seen old barrier messages causing the processing index to increment
> later causing an array out of bounds.
>
> This patch memsets the ring_id to 0, thus causing the ring_id in the packet and
> my_ring_id not to match.
>
> Regards
> Angus
>
>
> Signed-off-by: Angus Salkeld <asalkeld at redhat.com>
> ---
> exec/syncv2.c | 5 +++++
> 1 files changed, 5 insertions(+), 0 deletions(-)
>
> diff --git a/exec/syncv2.c b/exec/syncv2.c
> index 57b501b..559e199 100644
> --- a/exec/syncv2.c
> +++ b/exec/syncv2.c
> @@ -665,6 +665,11 @@ void sync_v2_abort (void)
> schedwrk_destroy (my_schedwrk_handle);
> my_service_list[my_processing_idx].sync_abort ();
> }
> +
> + /* this will cause any "old" barrier messages from causing
> + * problems.
> + */
> + memset (&my_ring_id, 0, sizeof (struct memb_ring_id));
> }
>
> void sync_v2_memb_list_determine (const struct memb_ring_id *ring_id)
More information about the Openais
mailing list