<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">

<HTML>

<HEAD>

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">

<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7226.0">

<TITLE>security review of Clustering spec</TITLE>

</HEAD>

<BODY>

<!-- Converted from text/rtf format -->


<P><FONT SIZE=2 FACE="Arial">I've looked at the Clustering spec from a security point of view and have the following comments/questions (some of which are not security related, however):</FONT></P>


<P><FONT SIZE=2 FACE="Arial">Sec 5.1 Service Availability Forum (SA Forum) APIs:</FONT>

<UL>

<P><FONT SIZE=2 FACE="Arial">The SAF APIs have no security.&nbsp; That is, any application on any system with an AIS service provider and connectivity to a node in the cluster can use the APIs to manipulate the cluster and the cluster-aware applications running in it.&nbsp; At a finer granularity, any cluster-aware application that is compromised (i.e. by a buffer overflow) can not only compromise all other cluster instances of itself but can compromise all cluster applications.</FONT></P>


<P><FONT SIZE=2 FACE="Arial">I realize that there is nothing (directly) that CGL can do about this and so I don't propose any changes to the requirements.&nbsp; However, it is worth understanding this vulnerability.&nbsp; And if any customers or spec authors have contacts within SAF, they might mention that it would be desirable for SAF to add security to a future version of their APIs.</FONT></P>

</UL>

<P><FONT SIZE=2 FACE="Arial">CFH.2.1 Cluster Node Failure Detection:</FONT>

<UL>

<P><FONT SIZE=2 FACE="Arial">This (and subsequent requirements) doesn't define the term &quot;failure&quot;.&nbsp; There could be many types of failure, from application to network stack to hardware.&nbsp; I don't expect this to have security implications, as there is probably not much that could be done to prevent malware from either making a node seem to have failed or making a compromised node seem to still be available.</FONT></P>

</UL>

<P><FONT SIZE=2 FACE="Arial">CFH.3.1 Prevent Failed Node From Corrupting Shared Resources:</FONT>

<UL>

<P><FONT SIZE=2 FACE="Arial">[not a security comment] Given the broad nature of node failure, it's not clear to me that there is any way to guarantee that a failing node won't corrupt shared resources before it is isolated.&nbsp; Perhaps this requirement is really trying to specify that a failed node cannot deny access or service to shared resources?&nbsp; That would be more doable.</FONT></P>

</UL>

<P><FONT SIZE=2 FACE="Arial">CFH.5.1 Application Fail-over Enabling:</FONT>

<UL>

<P><FONT SIZE=2 FACE="Arial">[not a security comment] This requirement doesn&#8217;t mentioned whether failover includes application state (checkpoint in SAF AIS terms) and, if it does, what the freshness of the data must be.</FONT></P>

</UL>

<P><FONT SIZE=2 FACE="Arial">CSM.6.1 Cluster Synchronized Device Hotswap:</FONT>

<UL>

<P><FONT SIZE=2 FACE="Arial">A requirement should be added that specifies that any security policies and/or parameters (e.g. access control lists, etc.) that apply to the class of device being hot-added must be applied in all OSes that will have access, and to the device itself, before that device is made available for use.</FONT></P>


<P><FONT SIZE=2 FACE="Arial">There should also be a requirement that any sensitive state (that has not been explicitly persisted) in a device should be cleared before the device can be removed.&nbsp; This might be an authorization value that has been cached, etc.&nbsp; I would expect loss of power to clear most things, but there may be cases where some state is not lost with power removal and there may be cases where the power is not lost and the device is simply deleted and then re-added.</FONT></P>

</UL>

<P><FONT SIZE=2 FACE="Arial">CCM.2 Cluster Communication Service:</FONT>

<UL>

<P><FONT SIZE=2 FACE="Arial">I don't think that there should be a requirement around endpoint authentication or identification, but it would be something to think about and maybe roadmap.&nbsp; That is, the ability to authoritatively identify the originator and destination of a message as a specific cluster member (machine, app, etc.), perhaps including the ability to create access control lists or other security mechanisms on top of this.&nbsp; Such a building block would facilitate securing the SAF APIs.&nbsp; This would also have to satisfy the transparency requirements of CCM.2.1.</FONT></P>

</UL>

<P><FONT SIZE=2 FACE="Arial">CAF.2.2 / CAF2.3. IP Takeover / TCP Session Takeover:</FONT>

<UL>

<P><FONT SIZE=2 FACE="Arial">I would expect that this mechanism would not support failover of secure connections such as IPSec or TLS/SSL?&nbsp; If it is the intention to support this type of failover then it needs to be recognized that there is data state (i.e. session key, etc.) that must be transferred to the redundant node in order for it to successfully take over the connection.&nbsp; In the case of SSL/TLS, this might be considered application state, in that the SSL/TLS code is part of the application binary/shared lib.&nbsp; It would be useful for this requirement (or its parent requirement) to state one way or another about stateful connection failover.</FONT></P>

</UL>

<P><FONT SIZE=2 FACE="Arial">CCS.2.1 Data Replication Performance:</FONT>

<UL>

<P><FONT SIZE=2 FACE="Arial">[not a security comment]&nbsp; Is it really the case that the checkpoint write time can be independent of the number of replicas, given that the write throughput is defined to be synchronous to all replica updates?&nbsp; It also seems like the read and write throughput would depend on the number of total requests being made to a given replica or original.&nbsp; So is the &quot;500 API executions&quot; meant to mean 500 executions across the entire cluster, on a single node, or by a single application instance?</FONT></P>

</UL>

<P><FONT SIZE=2 FACE="Arial">Section 3.3 Cluster Management:</FONT>

<UL>

<P><FONT SIZE=2 FACE="Arial">It would be very desirable, from a security perspective, to add requirement(s) that all remote management must be secure (authenticated, authorized, and audited).&nbsp; That said, given the underlying remote management technologies I'm not sure that it would be realistic for CGL 3.0.&nbsp; So I think that maybe such a requirement could be made a P2 or roadmap item so that it is kept around for the future.</FONT></P>

</UL>

<P><FONT SIZE=2 FACE="Arial">CMON.1.1 Cluster Node HW Status Monitoring:</FONT>

<UL>

<P><FONT SIZE=2 FACE="Arial">I am confused by the fact that SAF HPI is not a remote management protocol and yet this requirement is stating that CGL should be able to use HPI to manage remote (i.e. cluster members) nodes.&nbsp; Is this implicitly assuming an implementation that extends HPI to support remote management (e.g. OpenHPI)?</FONT></P>


<P><FONT SIZE=2 FACE="Arial">The SAF HPI (B) specification does not support any security (the security parameter to the open session call must be NULL).&nbsp; As HPI is a local management specification by default, this is equivalent to the lack of physical security in IPMI and in general.&nbsp; However, the OpenHPI implementation supports remote management via plugins that support remoteable management protocols such as IPMI and SNMP.&nbsp; If such an implementation is expected to be used, it should be required that any security support in that protocol be exposed to management clients and used in a secure fashion.&nbsp; For instance, any authentication information must be entered by a user (as opposed to stored in a file or hardcoded).</FONT></P>

</UL>

<P><FONT SIZE=2 FACE="Arial">CDIAG.2 Cluster-wide Diagnostic Info:</FONT>


<BR><FONT SIZE=2 FACE="Arial">CDIAG.2.1 Cluster-wide Identified Core Dump:</FONT>


<BR><FONT SIZE=2 FACE="Arial">CDIAG.2.2 Cluster-wide Crash Dump Management:</FONT>

<UL>

<P><FONT SIZE=2 FACE="Arial">The ability to retrieve core and crash dumps remotely should not have any weaker security than that to retrieve them locally (security spec TBD?).&nbsp; This may introduce an additional requirement if/when core and crash dump security is specified.</FONT></P>

</UL>

<P><FONT SIZE=2 FACE="Arial">CDIAG.2.3 Cluster -wide Log Collection:</FONT>

<UL>

<P><FONT SIZE=2 FACE="Arial">An additional requirement may be needed when the security spec specifies the security for logging, as cluster logs should be equally secure.</FONT></P>

</UL>

<P><FONT SIZE=2 FACE="Arial">Joe</FONT>

</P>

<BR>

<BR>

<BR>

<BR>

<BR>


</BODY>

</HTML>