Inventors:
Robert A Wipfel - Sandy UT
David Murphy - Herriman UT
International Classification:
G06F 1324
US Classification:
710269, 709223, 709224, 709226, 709251, 710200, 710260
Abstract:
Methods, systems, and devices are provided for managing resources in a computing cluster. The managed resources include cluster nodes themselves, as well as sharable resources such as memory buffers and bandwidth credits that may be used by one or more nodes. Resource management includes detecting failures and possible failures by node software, node hardware, interconnects, and system area network switches and taking steps to compensate for failures and prevent problems such as uncoordinated access to a shared disk. Resource management also includes reallocating sharable resources in response to node failure, demands by application programs, or other events. Specific examples provided include failure detection by remote memory probes, emergency communication through a shared disk, and sharable resource allocation with minimal locking.