BACKGROUND
[0001] Conventionally, in the computer-related arts, a network is an arrangement of physical
computer systems configured to communicate with each other. In some cases, the physical
computer systems include virtual machines, which may also be configured to interact
with the network (
i.e., communicate with other physical computers and/or virtual machines in the network).
Many different types of networks exist, and a network may be classified based on various
aspects of the network, such as scale, connection method, functional relationship
of computer systems in the network, and/or network topology.
[0002] Regarding connection methods, a network may be broadly categorized as wired (using
a tangible connection medium such as Ethernet cables) or wireless (using an intangible
connection medium such as radio waves). Different connection methods may also be combined
in a single network. For example, a wired network may be extended to allow devices
to connect to the network wirelessly. However, core network components such as routers,
switches, and servers are generally connected using physical wires. Ethernet is defined
within the Institute of Electrical and Electronics Engineers (IEEE) 802.3 standards,
which are supervised by the IEEE 802.3 Working Group.
[0003] To create a wired network, computer systems must be physically connected to each
other. That is, the ends of physical wires (for example, Ethernet cables) must be
physically connected to network interface cards in the computer systems forming the
network. To reconfigure the network (for example, to replace a server or change the
network topology), one or more of the physical wires must be disconnected from a computer
system and connected to a different computer system.
[0004] Further, when transferring data between computer systems in a network, one or more
network protocols are typically used to help ensure the data are transferred successfully.
For example, network protocols may use checksums, small data packets, acknowledgments,
and other data integrity features to help avoid data loss or corruption during the
data transfer. The amount of data integrity features required in the network protocol(s)
generally depends on the type of data being transferred and the quality of the connection(s)
between the computer systems.
SUMMARY
[0006] The invention is defined in the claims.
[0007] In general, in one aspect, the invention relates to a method for low-overhead data
transfer. The method includes initiating, by a first application, a Transmission Communication
Protocol (TCP) connection with a second application, wherein the first application
is configured to execute in a first virtual machine on a first computer, the second
application is configured to execute in a second virtual machine on a second computer,
and the first computer and the second computer are located on a chassis and communicate
over a chassis interconnect, establishing, in response to the initiation, the TCP
connection between the first application and the second application, determining that
the first computer and second computer are located on the chassis and, responsive
to determining that the first computer and second computer are located on the chassis,
providing, by the first application, pre-post buffer information to the second application,
wherein the pre-post buffer information corresponds to a location in a physical memory
of the first computer and wherein the location in physical memory corresponds to a
virtual memory address of the first application, and transferring data, by the second
application, to the first application using the pre-post buffer information, wherein
transferring the data comprises writing the data directly into the location in the
physical memory of the first computer.
[0008] In general, in one aspect, the invention relates to a program product, for example
embodied on a computer readable medium, comprising a plurality of executable instructions
for low-overhead data transfer, wherein the plurality of executable instructions comprises
instructions to carry out such a method.
[0009] In general, in one aspect, the invention relates to a system. The system includes
a chassis interconnect and a first application is executing on a first computer in
a first virtual machine and a second application is executing on a second computer
in a second virtual machine, wherein the first computer and the second computer are
located on a chassis and communicate over the chassis interconnect, wherein the first
application is configured to initiate a Transmission Communication Protocol (TCP)
connection with the second application, wherein, in response to the initiation, the
TCP connection is established between the first application and the second application,
wherein the first application is configured to provide pre-post buffer information
to the second application in response to a determination that the first application
is executing on the same chassis as the second application, wherein the pre-post buffer
information corresponds to a location in a physical memory of the first computer and
wherein the location in physical memory corresponds to a virtual memory address of
the first application, and wherein the second application transfers data to the first
application using the pre-post buffer information, wherein transferring the data comprises
writing the data directly into the location in the physical memory of the first computer.
[0010] Other aspects of the invention will be apparent from the following description and
the appended claims.
BRIEF DESCRIPTION OF DRAWINGS
[0011]
Figure 1 shows a diagram of a blade chassis in accordance with one or more embodiments
of the invention.
Figure 2 shows a diagram of a blade in accordance with one or more embodiments of
the invention.
Figure 3 shows a diagram of a network express manager in accordance with one or more
embodiments of the invention.
Figure 4 shows a diagram of a virtual machine in accordance with one or more embodiments
of the invention.
Figure 5 shows a flowchart of a method for creating a virtual network path in accordance
with one or more embodiments of the invention.
Figures 6A-6C show an example of creating virtual network paths in accordance with
one or more embodiments of the invention.
Figures 7-8 show flowcharts of a method for low-overhead data transfer in accordance
with one or more embodiments of the invention.
Figure 9 shows an example of low-overhead data transfer in accordance with one or
more embodiments of the invention.
DETAILED DESCRIPTION
[0012] Specific embodiments of the invention will now be described in detail, by way of
example only, with reference to the accompanying figures. Like elements in the various
figures are denoted by like reference numerals for consistency.
[0013] In the following detailed description of embodiments of the invention, numerous specific
details are set forth in order to provide a more thorough understanding of the invention.
However, it will be apparent to one of ordinary skill in the art that the invention
may be practiced without these specific details. In other instances, well-known features
have not been described in detail to avoid unnecessarily complicating the description.
[0014] In general, embodiments of the invention provide a method and system for low-overhead
data transfer. More specifically, embodiments of the invention provide a method and
system for enabling two applications executing on blades within a common blade chassis
to communicate using low-overhead data transfer. Further, embodiments of the invention
provide a method and system to enable two applications to participate in a zero-copy
handshake and then proceed to communicate using low-overhead data transfer.
[0015] In one or more embodiments of the invention, the VNICs are connected to each other
via a chassis interconnect. Specifically, the VNICs may be nodes of a virtual network
path that includes a "virtual wire" used to transmit network traffic via the chassis
interconnect. The concept of a virtual wire is discussed in detail below.
[0016] Figure 1 shows a diagram of a blade chassis (100) in accordance with one or more
embodiments of the invention. The blade chassis (100) includes multiple blades (
e.g., blade A (102), blade B (104)) communicatively coupled with a chassis interconnect
(106). For example, the blade chassis (100) may be a Sun Blade 6048 Chassis by Sun
Microsystems Inc., an IBM BladeCenter® chassis, an HP BladeSystem enclosure by Hewlett
Packard Inc., or any other type of blade chassis. The blades may be of any type(s)
compatible with the blade chassis (100). BladeCenter® is a registered trademark of
International Business Machines, Inc. (IBM), headquartered in Armonk, New York.
[0017] In one or more embodiments of the invention, the blades are configured to communicate
with each other via the chassis interconnect (106). Thus, the blade chassis (100)
allows for communication between the blades without requiring traditional network
wires (such as Ethernet cables) between the blades. For example, depending on the
type of blade chassis (100), the chassis interconnect (106) may be a Peripheral Component
Interface Express (PCI-E) backplane, and the blades may be configured to communicate
with each other via PCI-E endpoints. Those skilled in the art will appreciate that
other connection technologies may be used to connect the blades to the blade chassis.
[0018] Continuing with the discussion of Figure 1, to communicate with clients outside the
blade chassis (100), the blades are configured to share a physical network interface
(110). The physical network interface (110) includes one or more network ports (for
example, Ethernet ports), and provides an interface between the blade chassis (100)
and the network (
i.e., interconnected computer systems external to the blade chassis (100)) to which the
blade chassis (100) is connected. The blade chassis (100) may be connected to multiple
networks, for example using multiple network ports.
[0019] In one or more embodiments, the physical network interface (110) is managed by a
network express manager (108). Specifically, the network express manager (108) is
configured to manage access by the blades to the physical network interface (110).
The network express manager (108) may also be configured to manage internal communications
between the blades themselves, in a manner discussed in detail below. The network
express manager (108) may be any combination of hardware, software, and/or firmware
including executable logic for managing network traffic.
[0020] Figure 2 shows a diagram of a blade (200) in accordance with one or more embodiments
of the invention. "Blade" is a term of art referring to a computer system located
within a blade chassis (for example, the blade chassis (100) of Figure 1). Blades
typically include fewer components than stand-alone computer systems or conventional
servers. In one or more embodiments of the invention, fully featured stand-alone computer
systems or conventional servers may also be used instead of (or in combination with)
the blades. Generally, blades in a blade chassis each include one or more processors
and associated memory (
e.g., RAM, ROM, etc.). Blades may also include storage devices (for example, hard drives
and/or optical drives) and numerous other elements and functionalities typical of
today's computer systems (not shown), such as a keyboard, a mouse, and/or output means
such as a monitor. One or more of the aforementioned components may be shared by multiple
blades located in the blade chassis. For example, multiple blades may share a single
output device.
[0021] Continuing with discussion of Figure 2, the blade (200) includes a host operating
system (not shown) configured to execute one or more virtual machines (
e.g., virtual machine C (202), virtual machine D (204)). Broadly speaking, the virtual
machines are distinct operating environments configured to inherit underlying functionality
of the host operating system via an abstraction layer. In one or more embodiments
of the invention, each virtual machine includes a separate instance of an operating
system (
e.g., operating system instance C (206), operating system instance D (208)). For example,
the Xen® virtualization project allows for multiple guest operating systems executing
in a host operating system. Xen® is a trademark overseen by the Xen Project Advisory
Board. In one or more embodiments of the invention, the host operating system supports
virtual execution environments (not shown). An example of virtual execution environment
is a Solaris™ Container. In such cases, the Solaris™ Container may execute in the
host operating system, which may be a Solaris™ operating system. Solaris™ is a trademark
of Sun Microsystems, Inc. In one or more embodiments of the invention, the host operating
system may include both virtual machines and virtual execution environments.
[0022] Many different types of virtual machines and virtual execution environment exist.
Further, the virtual machines may include many different types of functionality, such
as a switch, a router, a firewall, a load balancer, an application server, any other
type of network-enabled service, or any combination thereof.
[0023] In one or more embodiments of the invention, the virtual machines and/or virtual
execution environments inherit network connectivity from the host operating system
via VNICs (
e.g., VNIC C (210), VNIC D (212)). To the virtual machines and the virtual execution
environments, the VNICs appear as physical NICs. In one or more embodiments of the
invention, the use of VNICs allows an arbitrary number of virtual machines and/or
virtual execution environments to share the blade's (200) networking functionality.
Further, in one or more embodiments of the invention, each virtual machine and/or
virtual execution environment may be associated with an arbitrary number of VNICs,
thereby providing increased flexibility in the types of networking functionality available
to the virtual machines and/or virtual execution environments. For example, a virtual
machine may use one VNIC for incoming network traffic, and another VNIC for outgoing
network traffic.
[0024] VNICs in accordance with one or more embodiments of the invention are described in
detail in commonly owned
U.S. Patent Application Serial No. 11/489,942, entitled "Multiple Virtual Network Stack Instances using Virtual Network Interface
Cards," in the names of Nicolas G. Droux, Erik Nordmark, and Sunay Tripathi, the contents
of which are hereby incorporated by reference in their entirety. VNICs in accordance
with one or more embodiments of the invention also are described in detail in commonly
owned
U.S. Patent Application Serial No. 11/480,000, entitled "Method and System for Controlling Virtual Machine Bandwidth" in the names
of Sunay Tripathi, Tim P. Marsland, and Nicolas G. Droux the contents of which are
hereby incorporated by reference in their entirety.
[0025] As discussed above, each blade's networking functionality (and, by extension, networking
functionality inherited by the VNICs) includes access to a shared physical network
interface and communication with other blades via the chassis interconnect. Figure
3 shows a diagram of a network express manager (300) in accordance with one or more
embodiments of the invention. The network express manager (300) is configured to route
network traffic traveling to and from VNICs located in the blades. Specifically, the
network express manager (300) includes a virtual switching table (302), which includes
a mapping of VNIC identifiers (304) to VNIC locations (306) in the chassis interconnect.
In one or more embodiments, the VNIC identifiers (304) are Internet Protocol (IP)
addresses, and the VNIC locations (306) are PCI-E endpoints associated with the blades
(
e.g., if the chassis interconnect is a PCI-E backplane). Alternatively, another routing
scheme may be used.
[0026] In one or more embodiments, the network express manager (300) is configured to receive
network traffic via the physical network interface and route the network traffic to
the appropriate location (
i.e., where the VNIC is located) using the virtual switching table (302). Further, the
network express manager (300) may be configured to route network traffic between different
VNICs located in the blade chassis. In one or more embodiments of the invention, using
the virtual switching table (302) in this manner facilitates the creation of a virtual
network path, which includes virtual wires. Thus, using the virtual switching table
(302), virtual machines located in different blades may be interconnected to form
an arbitrary virtual network topology, where the VNICs associated with each virtual
machine do not need to know the physical locations of other VNICs. Further, if a virtual
machine is migrated from one blade to another, the virtual network topology may be
preserved by updating the virtual switching table (302) to reflect the corresponding
VNIC's new physical location (for example, a different PCI-E endpoint).
[0027] In some cases, network traffic from one VNIC may be destined for a VNIC located in
the same blade, but associated with a different virtual machine. In one or more embodiments
of the invention, a virtual switch may be used to route the network traffic between
the VNICs independent of the blade chassis. Virtual switches in accordance with one
or more embodiments of the invention are discussed in detail in commonly owned
U.S. Patent Application Serial No. 11/480,261, entitled "Virtual Switch," in the names of Nicolas G. Droux, Sunay Tripathi, and
Erik Nordmark, the contents of which are hereby incorporated by reference in their
entirety.
[0028] For example, Figure 4 shows a diagram of a virtual switch (400) in accordance with
one or more embodiments of the invention. The virtual switch (400) provides connectivity
between VNIC X (406) associated with virtual machine X (402) and VNIC Y (408) associated
with virtual machine Y (404). In one or more embodiments, the virtual switch (400)
is managed by a host operating system (410) within which virtual machine X (402) and
virtual machine Y (404) are located. Specifically, the host operating system (410)
may be configured to identify network traffic targeted at a VNIC in the same blade,
and route the traffic to the VNIC using the virtual switch (400). In one or more embodiments
of the invention, the virtual switch (400) may reduce utilization of the blade chassis
and the network express manager by avoiding unnecessary round-trip network traffic.
[0029] Figure 5 shows a flowchart of a method for creating a virtual network path in accordance
with one or more embodiments of the invention. In one or more embodiments of the invention,
one or more of the steps shown in Figure 5 may be omitted, repeated, and/or performed
in a different order. Accordingly, embodiments of the invention should not be considered
limited to the specific arrangement of steps shown in Figure 5.
[0030] In one or more embodiments of the invention, in Step 502, VNICs are instantiated
for multiple virtual machines. The virtual machines are located in blades, as discussed
above. Further, the virtual machines may each be associated with one or more VNICs.
In one or more embodiments of the invention, instantiating a VNIC involves loading
a VNIC object in memory and registering the VNIC object with a host operating system,
i.e., an operating system that is hosting the virtual machine associated with the VNIC.
Registering the VNIC object establishes an interface between the host operating system's
networking functionality and the abstraction layer provided by the VNIC. Thereafter,
when the host operating system receives network traffic addressed to the VNIC, the
host operating system forwards the network traffic to the VNIC. Instantiation of VNICs
in accordance with one or more embodiments of the invention is discussed in detail
in
U.S. Patent Application 11/489,942, incorporated by reference above.
[0031] As discussed above, a single blade may include multiple virtual machines configured
to communicate with each other. In one or more embodiments of the invention, in Step
504, a virtual switch is instantiated to facilitate communication between the virtual
machines. As noted above, the virtual switch allows communication between VNICs independent
of the chassis interconnect. Instantiation of virtual switches in accordance with
one or more embodiments of the invention is discussed in detail in
U.S. Patent Application 11/480,261, incorporated by reference above.
[0032] In one or more embodiments of the invention, in Step 506, a virtual switching table
is populated. As noted above, the virtual switching table may be located in a network
express manager configured to manage network traffic flowing to and from the virtual
machines. Populating the virtual switching table involves associating VNIC identifiers
(for example, Internet Protocol and/or Media Access Control (MAC) addresses) with
VNIC locations (for example, PCI-E endpoints). In one or more embodiments of the invention,
the virtual switching table is populated in response to a user command issued via
a control operating system,
i.e., an operating system that includes functionality to control the network express manager.
[0033] In one or more embodiments of the invention, VNICs include settings for controlling
the processing of network packets. In one or more embodiments of the invention, in
Step 508, settings are assigned to the VNICs according to a networking policy. Many
different types of networking policies may be enforced using settings in the VNICs.
For example, a setting may be used to provision a particular portion of a blade's
available bandwidth to one or more VNICs. As another example, a setting may be used
to restrict use of a VNIC to a particular type of network traffic, such as Voice over
IP (VoIP) or Transmission Control Protocol/IP (TCP/IP). Further, settings for multiple
VNICs in a virtual network path may be identical. For example, VNICs in a virtual
network path may be capped at the same bandwidth limit, thereby allowing for consistent
data flow across the virtual network path. In one or more embodiments of the invention,
a network express manager is configured to transmit the desired settings to the VNICs.
[0034] In one or more embodiments of the invention, once the VNICs are instantiated and
the virtual switching table is populated, network traffic may be transmitted from
a VNIC in one blade to a VNIC in another blade. The connection between the two VNICs
may be thought of as a "virtual wire," because the arrangement obviates the need for
traditional network wires such as Ethernet cables. A virtual wire functions similar
to a physical wire in the sense that network traffic passing through one virtual wire
is isolated from network traffic passing through another virtual wire, even though
the network traffic may pass through the same blade (
i.e., using the same virtual machine or different virtual machines located in the blade).
[0035] Further, a combination of two or more virtual wires may be thought of as a "virtual
network path." Specifically, transmitting network traffic over the virtual network
path involves routing the network traffic through a first virtual wire (Step 510)
and then through a second virtual wire (Step 512). For example, when receiving network
traffic from a client via the physical network interface, one virtual wire may be
located between the physical network interface and a VNIC, and a second virtual wire
may be located between the VNIC and another VNIC.
[0036] Figures 6A-6C show an example of creating virtual network paths in accordance with
one or more embodiments of the invention. Specifically, Figure 6A shows a diagram
of an actual topology (600) in accordance with one or more embodiments of the invention,
Figure 6B shows how network traffic may be routed through the actual topology (600),
and Figure 6C shows a virtual network topology (640) created by routing network traffic
as shown in Figure 6B. Figures 6A-6C are provided as examples only, and should not
be construed as limiting the scope of the invention.
[0037] Referring first to Figure 6A, the actual topology (600) includes multiple virtual
machines. Specifically, the actual topology (600) includes a router (602), a firewall
(604), application server M (606), and application server N (608), each executing
in a separate virtual machine. The virtual machines are located in blades communicatively
coupled with a chassis interconnect (622), and include networking functionality provided
by the blades via VNICs (
i.e., VNIC H (610), VNIC J (612), VNIC K (614), VNIC M (618), and VNIC N (620)). For
ease of illustration, the blades themselves are not included in the diagram.
[0038] In one or more embodiments of the invention, the router (602), the firewall (604),
application server M (606), and application server N (608) are each located in separate
blades. Alternatively, as noted above, a blade may include multiple virtual machines.
For example, the router (602) and the firewall (604) may be located in a single blade.
Further, each virtual machine may be associated with a different number of VNICs than
the number of VNICs shown in Figure 6A.
[0039] Continuing with discussion of Figure 6A, a network express manager (624) is configured
to manage network traffic flowing to and from the virtual machines. Further, the network
express manager (624) is configured to manage access to a physical network interface
(626) used to communicate with client O (628) and client P (630). In Figure 6A, the
virtual machines, VNICs, chassis interconnect (622), network express manager (624),
and physical network interface (626) are all located within a chassis interconnect.
Client O (628) and client P (630) are located in one or more networks (not shown)
to which the chassis interconnect is connected.
[0040] Figure 6B shows how network traffic may be routed through the actual topology (600)
in accordance with one or more embodiments of the invention. In one or more embodiments
of the invention, the routing is performed by the network express manager (624) using
a virtual switching table (634).
[0041] As discussed above, network traffic routed to and from the VNICs may be though of
as flowing through a "virtual wire." For example, Figure 6B shows a virtual wire (632)
located between application server M (606) and application server N (608). To use
the virtual wire, application server M (606) transmits a network packet via VNIC M
(618). The network packet is addressed to VNIC N (620) associated with application
server N (608). The network express manager (624) receives the network packet via
the chassis interconnect (622), inspects the network packet, and determines the target
VNIC location using the virtual switching table (634). If the target VNIC location
is not found in the virtual switching table (634), then the network packet may be
dropped. In this example, the target VNIC location is the blade in which VNIC N (620)
is located. The network express manager (624) routes the network packet to the target
VNIC location, and application server N (608) receives the network packet via VNIC
N (620), thereby completing the virtual wire (632). In one or more embodiments of
the invention, the virtual wire (632) may also be used to transmit network traffic
in the opposite direction,
i.e., from application server N (608) to application server M (606).
[0042] Further, as discussed above, multiple virtual wires may be combined to form a "virtual
network path." For example, Figure 6B shows virtual network path R (636), which flows
from client O (628), through the router (602), through the firewall (604), and terminates
at application server M (606). Specifically, the virtual network path R (636) includes
the following virtual wires. A virtual wire is located between the physical network
interface (626) and VNIC H (610). Another virtual wire is located between VNIC J (612)
and VNIC K (614). Yet another virtual wire is located between VNIC L (616) and VNIC
M (618). If the router (602) and the firewall (604) are located in the same blade,
then a virtual switch may be substituted for the virtual wire located between VNIC
J (612) and VNIC K (614), thereby eliminating use of the chassis interconnect (622)
from communications between the router (602) and the firewall (604).
[0043] Similarly, Figure 6B shows virtual network path S (638), which flows from client
P (630), through the router (602), and terminates at application server N (608). Virtual
network path S (638) includes a virtual wire between the physical network interface
(626) and VNIC H (610), and a virtual wire between VNIC J (612) and VNIC N (620).
The differences between virtual network path R (636) and virtual network path S (638)
exemplify how multiple virtual network paths may be located in the same blade chassis.
[0044] In one or more embodiments of the invention, VNIC settings are applied separately
for each virtual network path. For example, different bandwidth limits may be used
for virtual network path R (636) and virtual network path S (638). Thus, the virtual
network paths may be thought of as including many of the same features as traditional
network paths (
e.g., using Ethernet cables), even though traditional network wires are not used within
the blade chassis. However, traditional network wires may still be required outside
the blade chassis, for example between the physical network interface (626) and client
O (628) and/or client P (630).
[0045] Figure 6C shows a diagram of the virtual network topology (640) resulting from the
use of the virtual network path R (636), virtual network path S (638), and virtual
wire (632) shown in Figure 6B. The virtual network topology (640) allows the various
components of the network (
i.e., router (602), firewall (604), application server M (606), application server N
(608), client O (628), and client P (630)) to interact in a manner similar to a traditional
wired network. However, as discussed above, communication between the components located
within the blade chassis (
i.e., router (602), firewall (604), application server M (606), and application server
N (608)) is accomplished without the use of traditional network wires.
[0046] In one embodiment of the invention, data may be transferred between virtual machines
executing on different blades in a blade chassis using Transmission Control Protocol
(TCP) and Internet Protocol (IP). Further, data may also be transferred between the
virtual machines using low-overhead data transfers. In particular, data may be transferred
directly from physical memory on one blade to physical memory on another blade.
[0047] More specifically, the virtual machine (or application executing therein) may establish
a TCP connection with another virtual machine and then, using the TCP connection,
perform a zero-copy handshake. In one embodiment of the invention, the zero-copy handshake
involves determining whether the virtual machines are able to communicate using low-overhead
data transfer and if the virtual machines (or applications executing therein) want
to transfer data using low-overhead data transfer. In one embodiment of the invention,
the virtual machines may communicate using a combination of data transfer over TCP/IP
and data transfer using low-overhead data transfer.
[0048] In one embodiment of the invention, low-overhead data transfer is achieved by allowing
the direct transfer of data from the virtual memory associated with a sending application
(executing in a first virtual machine) to the virtual memory of a receiving application
(executing in a second virtual machine), where the first application is executing
on a first blade and the second application is executing on a second blade. In one
embodiment of the invention, the target virtual memory address for the transfer must
be provided prior to the transfer of data. If the receiving application is executing
in a guest operating system (executing in a virtual machine), which in turn is executing
in a host operating system, then the receiving application must provide the sending
application (or a related process) a physical memory address (which corresponds to
the virtual memory associated with the receiving application) for a buffer to which
to transfer the data. However, the receiving application is only able to provide a
virtual memory address for the receiving application. This virtual memory address
must be translated one or more times in order to obtain the underlying physical memory
address. The process of translation is described in Figure 7 below. Once the translation
is complete, the physical memory address (as well as any other necessary information)
is provided to the sending application (or a related process) to perform low-overhead
data transfer as described in Figure 8.
[0049] Figure 7 shows a flowchart of a method for pre-posting buffers for an application
prior to the application using low-overhead data transfer. In one or more embodiments
of the invention, one or more of the steps shown in Figure 7 may be omitted, repeated,
and/or performed in a different order than the order shown in Figure 7. Accordingly,
embodiments of the invention should not be considered limited to the specific arrangement
of steps shown in Figure 7.
[0050] In Step 700, an application specifies a pre-post buffer address. In one embodiment
of the invention, the pre-post buffer address is a virtual memory address in virtual
memory associated with the application. In one embodiment of the invention, the pre-post
buffer address may refer to a buffer that is greater than 1 megabyte in size. In Step
702, the guest operating system receives and translates the pre-post buffer address
into a guest OS virtual memory address. In one embodiment of the invention, the guest
OS virtual memory address is a virtual memory address in a virtual memory associated
with the guest operating system.
[0051] In Step 704, the guest operating system provides the guest OS virtual memory address
to the host operating system. In Step 706, the host operating system receives and
translates the guest OS virtual memory address into a host OS virtual memory address.
Based on the host virtual memory address, the operating system may determine the underlying
physical memory address corresponding to the host OS virtual memory address. The physical
memory address corresponds to the host OS virtual memory address is the same physical
memory address which corresponds to the pre-post buffer address.
[0052] In one embodiment of the invention, the host operating system notifies that the guest
operating system that the per-post buffer address has been successfully pre-posted.
The guest operating may, in turn, notify the application that the pre-post buffer
address has been successfully pre-posted. In addition, the host operating system may
maintain the translated physical address and any other related information (collectively
referred to as "pre-post buffer information").
[0053] At this stage, the application may now participate in low-overhead data transfer.
More specifically, the application may receive data using low-overhead data transfer.
Those skilled in the art will appreciate the Figure 7 may be repeated multiple times
for a given application in order for the application to pre-post multiple buffers
for use in low-overhead data transfer. Further, the application may also send data
to another application using low-overhead data transfer if the other application also
pre-posts buffers using, for example, the method shown in Figure 7.
[0054] Figure 8 shows a flowchart of a method for initiating and using low-overhead data
transfer. In one or more embodiments of the invention, one or more of the steps shown
in Figure 8 may be omitted, repeated, and/or performed in a different order than the
order shown in Figure 8. Accordingly, embodiments of the invention should not be considered
limited to the specific arrangement of steps shown in Figure 8.
[0055] In Step 800, Application A attempts to initiate a TCP connection with Application
B. In one embodiment of the invention, Application A provides an IP address assigned
to the virtual machine (or assigned to the VNIC associated with the virtual machine)
on which Application B is executing. In addition, Application A may also provide a
port number.
[0056] In Step 802, the guest OS kernel, in response the request from Application A to initiate
the TCP connection, creates socket A. In one embodiment of the invention, socket A
is a kernel level process identified by the IP-Port Number pair and is a communication
end-point configured to interface with Application A and the VNIC executing on the
host operating system (on which the guest OS is executing). In Step 804, a TCP connection
is initiated by socket A. In Step 806, socket B responds to the connection request
and a TCP connection is established.
[0057] In Step 808, the zero-copy handshake is initiated. In one embodiment of the invention,
the zero-copy handshake is an exchange of data designed to establish whether two applications
may transfer data using low-overhead data transfer. In one embodiment of the invention,
the zero-copy handshake is initiated when Application A sends one or more requests
to Application B to determine whether Application A and Application B may transfer
data using low-overhead data transfer. In one embodiment of the invention, the request
may include placing a specific marker in a TCP SYN packet.
[0058] In one embodiment of the invention, instead of the applications initiating the zero-copy
handshake, the VNICs executing on the respective host operating systems (
see Figure 9 below) may initiate and subsequently perform the zero-copy handshake. In
such cases, one or both of the applications, prior to the initiation of the TCP connection
have indicated that they are able to transfer data using low-overhead data transfer
and have performed the method shown in Figure 7 to obtain the pre-post buffer information.
[0059] In Step 810, as part of the zero-copy handshake, a determination is made about whether
Application A and Application B are connected over a local TCP connection. In one
embodiment of the invention, Application A and Application B are connected over a
local TCP connection when both Application A and Application B are executing on blades
within the same blade chassis. If Application A and Application B are connected over
a local TCP connection, the process proceeds to Step 812. Alternatively, the process
proceeds to Step 820. In Step 820, Applications A and B communicate using TCP/IP.
[0060] In Step 812, as part of the zero-copy handshake, a determination is made about whether
Application B wants to participate in low-overhead data transfer. In one embodiment
of the invention, this determination may include either of the following determinations:
(i) Application B will send data to Application A using low-overhead data transfer
but will only receive data from Application A via TCP/IP; and (ii) Application B will
send data to Application A using low-overhead data transfer and Application B will
receive data from Application A using low-overhead data transfer. If Application B
wants to participate in low-overhead data transfer, then the process proceeds to Step
814. Alternatively, the process proceeds to Step 820 (
i.e., Application B does not want to participate in either of the aforementioned scenarios).
In one embodiment of the invention, the zero-copy handshake is performed over the
TCP connection.
[0061] In Step 814, Application B is provided with Application A's pre-post buffer information.
In Step 816, depending on the determination in Step 812, Application A may be provided
with Application B's pre-post buffer information. In one embodiment of the invention,
the information transferred in Step 814 and Step 816 are communicated over the TCP
connection. In Step 818, Applications A and B participate in low-overhead data transfer.
[0062] In one or more embodiments of the invention, low-overhead data transfer from Application
A to Application B uses, for example, a Direct Memory Access (DMA) operation, where
the DMA operation uses as input Application B's pre-post buffer information. Those
skilled in the art will appreciate that other write operations (
e.g,. RDMA) may be used to write data directly from one physical memory location to another
physical memory on different blades.
[0063] In one embodiment of the invention, the low-overhead transfer is performed by DMA
(or RDMA) engines executing in (or managed by) the respective host operating systems.
Further, because the data transfer is directly from the one blade to another, the
data transfer does not require the additional processing overhead associated with
other transfer protocols such as TCP. Further, in one embodiment of the invention,
the low-overhead data transfer may use the underlying error detection and correction
functionality of the chassis interconnect to ensure that data is transferred in an
uncorrupted manner.
[0064] In one embodiment of the invention, once data from Application B is transferred to
Application A using the low-overhead data transfer, Application A is notified of the
presence of the data. In one embodiment of the invention, Application A receives the
notification from the guest operating system on which it is executing. Further, the
guest operating system is notified by the host operating system on which it is executing.
Finally, the host operating system is notified by Application B, the guest operating
system on which Application B is executing, or the host operating system on which
the aforementioned guest operating system is executing (or a process executing thereon).
[0065] In one embodiment of the invention, Application A and Application B may communicate
using both TCP/IP and low-overhead data transfer. For example, TCP/IP may be used
for all communication of a certain type (
e.g., all files in a specific file format) and/or less than a certain size and low-overhead
data transfer may be used for all communication of another type and/or greater than
a certain size.
[0066] Figure 9 shows an example of low-overhead data transfer in accordance with one or
more embodiments of the invention. Figure 9 is provided for exemplary purposes only
and should not be construed as limiting the scope of the invention. Referring to Figure
9, blade A (900) and blade B (902) are each communicatively coupled with a chassis
interconnect (912). Application A (908) in blade A (900) is configured to communicate
with application B (910) in blade B (902) via a TCP connection having socket A (918)
and socket B (920) as endpoints. Specifically, socket A (918) is configured to transfer
data to socket B (902) by way of VNIC A (926), VNIC B (928), and the chassis interconnect
(912). Further, application A (908) is executing in virtual machine A (904) on guest
OS A (not shown) and application B (910) is executing in virtual machine B (906) on
guest OS B (not shown).
[0067] Based on the above, consider the scenario in which application A (908) and application
B (910) each have performed the method described in Figure 7 to generate buffer pre-post
information. More specifically, application A (908) allocated per-post buffer A (not
shown) in Application A virtual memory (VM) (914). The virtual memory address associated
with per-post buffer A is then translated to a guest operating system VM (922) address.
The guest operating system VM (922) address is then translated by the host operating
system A (930) to obtain a host VM address from the host VM (934), which corresponds
to an underlying physical memory address. A similar process is performed for Application
B (910) and using Application B VM (916) and translating to a guest operating system
VM (924) address and finally to an underlying physical memory address which corresponds
to the a host VM address in host VM (936)..
[0068] Using the above pre-post buffer information, the applications may communicate as
follows in accordance with one embodiment of the invention. Specifically, application
A (908) is configured to request a TCP connection with application B (910) for transferring
data. Socket A (918) initiates a TCP connection with socket B (920) via VNIC A (926)
to VNIC B (928).
[0069] Once the TCP connection is established, the zero-copy handshake is performed. Specifically,
a determination is made by VNIC A (926) that Application A (908) and Application B
(910) are connected over a local TCP connection. A further determination is made that
Application B (910) will send data to Application A (908) using low-overhead data
transfer and Application B (910) will receive data from Application A (908) using
low-overhead data transfer.
[0070] In one or more embodiments of the invention, VNIC A (926) then passes Application
A's pre-post buffer information to VNIC B (928) and VNIC B (928) passes Application
B's pre-post buffer information to VNIC A (926). The applications may then transfer
data using low-overhead data transfer.
[0071] In one embodiment of the invention, data from application B (910) is transferred
using a RDMA engine and the application A's pre-post buffer information directly to
applications A's VM (914), where the RDMA engine located on blade B (902) and is managed
by VNIC B (928). Prior to the transfer, VNIC A may compare the location in the physical
memory received from VNIC B with an allowed address range associated with application
A to determine whether the data may be transferred to the location in memory specified
by the pre-post buffer information. If the location in physical memory received by
VNIC A is outside the allowed address range, then the transfer may be denied.
[0072] Embodiments of the invention may be also be used to transfer data applications by
using embodiments of the invention to transfer data between virtual machines (
e.g., virtual machine A (904) and virtual machine B (906)). For example, referring to
Figure 9, to send data from application A (908) to application B (910). Application
A (908) may transfer data over the connection to VNIC A (926). VNIC A (926) in accordance
with embodiments of the invention, obtains pre-post buffer for virtual machine B (906)
and subsequently transfers the data using, for example, a RDMA engine directly to
the virtual Guest OS B VM (924). Upon receipt, the data is copied into application
B VM (916). In such cases, the virtual machines, as opposed to the applications, are
aware of the ability to transfer data using low-overhead data transfer. However, the
applications are not aware of this functionality. Further, the applications, in this
scenario, do not need to include functionality to pre-post buffers. Instead, the virtual
machines need to include functionality to pre-post buffers.
[0073] Those skilled in the art will appreciate that while the invention has been described
with respect to using blades, the invention may be extended for use with other computer
systems, which are not blades. Specifically, the invention may be extended to any
computer, which includes at least memory, a processor, and a mechanism to physically
connect to and communicate over the chassis interconnect. Examples of such computers
include, but are not limited to, multi-processor servers, network appliances, and
light-weight computing devices (
e.g., computers that only include memory, a processor, a mechanism to physically connect
to and communicate over the chassis interconnect), and the necessary hardware to enable
the aforementioned components to interact.
[0074] Further, those skilled in the art will appreciate that if one or more computers,
which are not blades, are not used to implement the invention, then an appropriate
chassis may be used in place of the blade chassis.
[0075] A computer program product comprising software instructions can perform embodiments
of the invention. The software instructions may be stored on a computer readable medium
such as a compact disc (CD), a diskette, a tape, or any other computer readable storage
device.
[0076] While the invention has been described with respect to a limited number of embodiments,
those skilled in the art, having benefit of this disclosure, will appreciate that
other embodiments can be devised which do not depart from the scope of the invention
as disclosed herein. Accordingly, the scope of the invention should be limited only
by the attached claims.
1. A method for low-overhead data transfer, comprising:
initiating (800), by a first application (908), a Transmission Communication Protocol,
"TCP", connection with a second application (910), wherein the first application is
executing on a first computer (900) in a first virtual machine (904), the second application
is executing on a second computer (902) in a second virtual machine (906), and the
first computer and the second computer are located on a chassis and communicate over
a chassis interconnect (912), wherein the chassis interconnect comprises a Peripheral
Component Interface Express (PCI-E) backplane and the first and second computers are
configured to communicate with each other via PCI-E endpoints;
establishing (806), in response to the initiation, the TCP connection between the
first application and the second application;
selecting (812), by a first virtual network interface card, "VNIC" (926), a second
protocol from a group consisting of a first and the second protocol based on a determination
(810) that the TCP connection between the first application and the second application
is a local TCP connection and that the second application wants to communicate using
the second protocol, wherein the first VNIC is located on the first computer and is
interposed between the first virtual machine and the chassis interconnect, wherein
the TCP connection is the local TCP connection when the first application and the
second application are executing on separate physical computers connected to the chassis,
and wherein the first protocol is TCP and the second protocol is a low-overhead data
transfer; and
based on the selection of the second protocol,
providing (814), by the first application, pre-post buffer information to the second
application, wherein the pre-post buffer information corresponds to a location in
a physical memory of the first computer and wherein the location in physical memory
corresponds to a virtual memory address of the first application, and
transferring (818) data, by the second application, to the first application using
the pre-post buffer information, wherein transferring the data comprises writing the
data directly into the location in the physical memory of the first computer.
2. The method of claim 1, further comprising:
generating the pre-post information, wherein generating the pre-post information comprises:
allocating (700) the virtual memory address in virtual memory associated with the
first application;
providing (700) the virtual memory address to a guest operating system (OS) executing
the first application, wherein the guest OS is executing in the first virtual machine;
translating (702) the virtual memory address to obtain a guest OS virtual memory address
associated with the guest operating system;
providing (704) the guest OS virtual memory address to a host operating system upon
which the guest operating system is executing;
translating (706) the virtual memory address to obtain a host OS virtual memory address
associated with the host operating system, wherein the host OS virtual memory address
corresponds to the location in the physical memory of the first computer.
3. The method of claim 1 or claim 2, wherein the pre-post information is provided to
a first virtual network interface card, "VNIC" (926), over the TCP connection.
4. The method of claim 3, wherein the first VNIC is configured to compare the location
in the physical memory received from a second VNIC (928) with an allowed address range
associated with the first application to determine whether the data may be transferred
to the location in the physical memory, wherein the first VNIC is located on the first
computer.
5. The method of any one of the preceding claims, wherein the second application provides
a second VNIC located on the second computer with a location of physical memory associated
with the TCP connection.
6. The method of claim 5, wherein transferring the data comprises:
writing, by the second VNIC, the data to the location in the physical memory of the
first computer using remote direct memory access, "RDMA", and the location in the
physical memory of the first computer.
7. The method of claim 5 or claim 6, wherein the first VNIC and the second VNIC are nodes
in a virtual network path, wherein the virtual network path comprises a first virtual
wire between the first VNIC and the second VNIC.
8. The method of claim 5, wherein second virtual machine is configured to directly transfer
data from the first virtual machine to a location in the physical memory of the first
computer, wherein the second VNIC transfers the data using an RDMA engine.
9. A system comprising:
a chassis interconnect (912); and
a first application (908) configured to execute in a first virtual machine (904) on
a first computer (900) and a second application (910) configured to execute in a second
virtual machine (906) on a second computer (902), wherein the first computer and the
second computer are located on a chassis and communicate over the chassis interconnect,
wherein the chassis interconnect comprises a Peripheral Component Interface Express
(PCI-E) backplane and the first and second computers are configured to communicate
with each other via PCI-E endpoints,
wherein the first application is configured to initiate a Transmission Communication
Protocol (TCP) connection with the second application, wherein, in response to the
initiation, the TCP connection is established between the first application and the
second application,
wherein a first virtual network interface card, "VNIC", executing on the first computer
and interposed between the first virtual machine and the chassis interconnect, is
configured to select a second protocol from a group consisting of a first protocol
and the second protocol based on a determination that the TCP connection between the
first application and the second application is a local TCP connection and that the
second application wants to communicate using the second protocol, wherein the TCP
connection is the local TCP connection when the first application and the second application
are executing on separate physical computers connected to the chassis, and wherein
the first protocol is TCP and the second protocol is a low-overhead data transfer,
wherein the first application is configured to provide pre-post buffer information
to the second application in response to a determination that the first application
is executing on the same chassis as the second application,
wherein the pre-post buffer information corresponds to a location in a physical memory
of the first computer and wherein the location in physical memory corresponds to a
virtual memory address of the first application, and
wherein the second application transfers data to the first application using the pre-post
buffer information, wherein transferring the data comprises writing the data directly
into the location in the physical memory of the first computer.
10. The system of claim 9, wherein the pre-post information is generated by:
allocating the virtual memory address in virtual memory associated with the first
application;
providing the virtual memory address to a guest operating system (OS) executing the
first application, wherein the guest OS is executing in the first virtual machine;
translating the virtual memory address to obtain a guest OS virtual memory address
associated with the guest operating system;
providing the guest OS virtual memory address to a host operating system upon which
the guest operating system is executing;
translating the virtual memory address to obtain a host OS virtual memory address
associated with the host operating system, wherein the host OS virtual memory address
corresponds to the location in the physical memory of the first computer.
11. The system of claim 9 or claim 10, wherein the second application provides a virtual
network interface card "VNIC" located on the second computer with a location of physical
memory associated with the TCP connection.
12. The system of claim 11, wherein transferring the data comprises:
writing, by the VNIC, the data to the location in the physical memory of the first
computer using remote direct memory access (RDMA) and the location in the physical
memory of the first computer.
13. The system of claim 11, wherein the second virtual machine is configured to directly
transfer data from the first virtual machine to a location in the physical memory
of the first computer, wherein the VNIC transfers the data using a remote direct memory
access engine.
14. The system of any one of claims 9 to 13, wherein the first computer and the second
computer are blades.
15. A computer program product comprising a plurality of executable instructions for low-overhead
data transfer, wherein the plurality of executable instructions comprises instructions
to carry out the method of any one of claims 1 to 8.
1. Verfahren für eine Datenübertragung mit geringem Overhead, das Folgendes umfasst:
Einleiten (800) durch eine erste Anwendung (908) einer Übertragungskommunikationsprotokollverbindung,
"TCP-Verbindung", mit einer zweiten Anwendung (910), wobei die erste Anwendung auf
einem ersten Computer (900) in einer ersten virtuellen Maschine (904) läuft, wobei
die zweite Anwendung auf einem zweiten Computer (902) in einer zweiten virtuellen
Maschine (906) läuft und wobei sich der erste Computer und der zweite Computer auf
einem Gestell befinden und über ein Gestellverbindungselement (912) kommunizieren,
wobei das Gestellverbindungselement eine "Peripheral Component Interface Express"-Rückwandplatine
(PCI-E-Rückwandplatine) umfasst und der erste und der zweite Computer konfiguriert
sind, miteinander über PCI-E-Endpunkte zu kommunizieren;
Einrichten (806) der TCP-Verbindung zwischen der ersten Anwendung und der zweiten
Anwendung als Reaktion auf die Einleitung;
Auswählen (812) durch eine erste virtuelle Netzschnittstellenkarte, "VNIC", (926)
eines zweiten Protokolls aus einer Gruppe, die aus einem ersten und dem zweiten Protokoll
besteht, anhand einer Bestimmung (810), dass die TCP-Verbindung zwischen der ersten
Anwendung und der zweiten Anwendung eine lokale TCP-Verbindung ist und dass die zweite
Anwendung unter Verwendung des zweiten Protokolls kommunizieren will, wobei sich die
erste VNIC auf dem ersten Computer befindet und zwischen die erste virtuelle Maschine
und das Gestellverbindungselement eingefügt ist, wobei die TCP-Verbindung die lokale
TCP-Verbindung ist, wenn die erste Anwendung und die zweite Anwendung auf getrennten
physikalischen Computern, die mit dem Gestell verbunden sind, laufen, und wobei das
erste Protokoll TCP ist und das zweite Protokoll eine Datenübertragung mit geringem
Overhead ist; und
anhand der Auswahl des zweiten Protokolls
Liefern (814) durch die erste Anwendung von Vor-Nach-Zwischenspeicherinformationen
an die zweite Anwendung, wobei die Vor-Nach-Zwischenspeicherinformationen einem Platz
in einem physikalischen Speicher des ersten Computers entsprechen und wobei der Platz
im physikalischen Speicher einer virtuellen Speicheradresse der ersten Anwendung entspricht,
und
Übertragen (818) von Daten durch die zweite Anwendung an die erste Anwendung unter
Verwendung der Vor-Nach-Zwischenspeicherinformationen, wobei das Übertragen der Daten
umfasst, die Daten unmittelbar an den Platz im physikalischen Speicher des ersten
Computers zu schreiben.
2. Verfahren nach Anspruch 1, das ferner Folgendes umfasst:
Erzeugen der Vor-Nach-Informationen, wobei das Erzeugen der Vor-Nach-Informationen
Folgendes umfasst:
Zuteilen (700) der virtuellen Speicheradresse im virtuellen Speicher, der der ersten
Anwendung zugeordnet ist;
Liefern (700) der virtuellen Speicheradresse an ein Gastbetriebssystem (OS), das die
erste Anwendung ausführt, wobei das Gast-OS in der ersten virtuellen Maschine läuft;
Übersetzen (702) der virtuellen Speicheradresse, um eine virtuelle Gast-OS-Speicheradresse,
die dem Gastbetriebssystem zugeordnet ist, zu erhalten;
Liefern (704) der virtuellen Gast-OS-Speicheradresse an ein Host-Betriebssystem, auf
dem das Gastbetriebssystem läuft;
Übersetzen (706) der virtuellen Speicheradresse, um eine virtuelle Host-OS-Speicheradresse
zu erhalten, die dem virtuellen Host-Betriebssystem zugeordnet ist, wobei die virtuelle
Host-OS-Speicheradresse dem Platz im physikalischen Speicher des ersten Computers
entspricht.
3. Verfahren nach Anspruch 1 oder Anspruch 2, wobei die Vor-Nach-Informationen über die
TCP-Verbindung an eine erste virtuelle Netzschnittstellenkarte, "VNIC", (926) geliefert
werden.
4. Verfahren nach Anspruch 3, wobei die erste VNIC konfiguriert ist, den Platz im physikalischen
Speicher, der von einer zweiten VNIC (928) empfangen wird, mit einem erlaubten Adressenbereich,
der der ersten Anwendung zugeordnet ist, zu vergleichen, um zu bestimmen, ob die Daten
an den Platz im physikalischen Speicher übertragen werden können, wobei sich die erste
VNIC auf dem ersten Computer befindet.
5. Verfahren nach einem der vorhergehenden Ansprüche, wobei die zweite Anwendung eine
zweite VNIC, die sich auf dem zweiten Computer befindet, mit einem Platz des physikalischen
Speichers, der der TCP-Verbindung zugeordnet ist, beliefert.
6. Verfahren nach Anspruch 5, wobei das Übertragen der Daten Folgendes umfasst:
Schreiben durch die zweite VNIC der Daten an den Platz im physikalischen Speicher
des ersten Computers unter Verwendung eines entfernten Direktspeicherzugriffs, "RDMA",
und des Platzes im physikalischen Speicher des ersten Computers.
7. Verfahren nach Anspruch 5 oder Anspruch 6, wobei die erste VNIC und die zweite VNIC
Knoten in einem virtuellen Netzpfad sind, wobei der virtuelle Netzpfad einen ersten
virtuellen Draht zwischen der ersten VNIC und der zweiten VNIC umfasst.
8. Verfahren nach Anspruch 5, wobei die zweite virtuelle Maschine konfiguriert ist, Daten
von der ersten virtuellen Maschine unmittelbar an einen Platz im physikalischen Speicher
des ersten Computers zu übertragen, wobei die zweite VNIC die Daten unter Verwendung
einer RDMA-Maschine überträgt.
9. System, das Folgendes umfasst:
ein Gestellverbindungselement (912); und
eine erste Anwendung (908), die konfiguriert ist, in einer ersten virtuellen Maschine
(904) auf einem ersten Computer (900) zu laufen, und eine zweite Anwendung (906),
die konfiguriert ist, in einer zweiten virtuellen Maschine (910) auf einem zweiten
Computer (902) zu laufen, wobei sich der erste Computer und der zweite Computer auf
einem Gestell befinden und über das Gestellverbindungselement kommunizieren,
wobei das Gestellverbindungselement eine "Peripheral Component Interface Express"-Rückwandplatine
(PCI-E-Rückwandplatine) umfasst und der erste und der zweite Computer konfiguriert
sind, miteinander über PCI-E-Endpunkte zu kommunizieren,
wobei die erste Anwendung konfiguriert ist, eine Übertragungskommunikationsprotokollverbindung
(TCP-Verbindung) mit der zweiten Anwendung einzuleiten,
wobei als Reaktion auf die Einleitung die TCP-Verbindung zwischen der ersten Anwendung
und der zweiten Anwendung eingerichtet wird,
wobei eine erste virtuelle Netzschnittstellenkarte, "VNIC", die auf dem ersten Computer
läuft und zwischen die erste virtuelle Maschine und das Gestellverbindungselement
eingefügt ist, konfiguriert ist, anhand einer Bestimmung, dass die TCP-Verbindung
zwischen der ersten Anwendung und der zweiten Anwendung eine lokale TCP-Verbindung
ist und dass die zweite Anwendung unter Verwendung des zweiten Protokolls kommunizieren
will, ein zweites Protokoll aus einer Gruppe auszuwählen, die aus einem ersten Protokoll
und dem zweiten Protokoll besteht, wobei die TCP-Verbindung die lokale TCP-Verbindung
ist, wenn die erste Anwendung und die zweite Anwendung auf getrennten physikalischen
Computern, die mit dem Gestell verbunden sind, laufen, und wobei das erste Protokoll
TCP ist und das zweite Protokoll eine Datenübertragung mit geringem Overhead ist,
wobei die erste Anwendung konfiguriert ist, Vor-Nach-Zwischenspeicherinformationen
als Reaktion auf eine Bestimmung, dass die erste Anwendung auf demselben Gestell wie
die zweite Anwendung läuft, an die zweite Anwendung zu liefern,
wobei die Vor-Nach-Zwischenspeicherinformationen einem Platz in einem physikalischen
Speicher des ersten Computers entsprechen und wobei der Platz im physikalischen Speicher
einer virtuellen Speicheradresse der ersten Anwendung entspricht, und
wobei die zweite Anwendung unter Verwendung der Vor-Nach-Zwischenspeicherinformationen
Daten an die erste Anwendung überträgt, wobei das Übertragen der Daten umfasst, die
Daten unmittelbar an den Platz im physikalischen Speicher des ersten Computers zu
schreiben.
10. System nach Anspruch 9, wobei die Vor-Nach-Informationen erzeugt werden durch:
Zuteilen der virtuellen Speicheradresse im virtuellem Speicher, der der ersten Anwendung
zugeordnet ist;
Liefern der virtuellen Speicheradresse an ein Gast-Betriebssystem (Gast-OS), das die
erste Anwendung ausführt, wobei das Gast-OS in der ersten virtuellen Maschine läuft;
Übersetzen der virtuellen Speicheradresse, um eine virtuelle Gast-OS-Speicheradresse,
die dem Gastbetriebssystem zugeordnet ist, zu erhalten;
Liefern der virtuellen Gast-OS-Speicheradresse an ein Host-Betriebssystem, auf dem
das Gastbetriebssystem läuft;
Übersetzen der virtuellen Speicheradresse, um eine virtuelle Host-OS-Speicheradresse
zu erhalten, die dem Host-Betriebssystem zugeordnet ist, wobei die virtuelle Host-OS-Speicheradresse
dem Platz im physikalischen Speicher des ersten Computers entspricht.
11. System nach Anspruch 9 oder Anspruch 10, wobei die zweite Anwendung eine virtuelle
Netzschnittstellenkarte, "VNIC", die sich auf dem zweiten Computer befindet, mit einem
Platz des physikalischen Speichers, der der TCP-Verbindung zugeordnet ist, beliefert.
12. System nach Anspruch 11, wobei das Übertragen der Daten Folgendes umfasst:
Schreiben durch die VNIC der Daten an den Platz im physikalischen Speicher des ersten
Computers unter Verwendung eines entfernten Direktspeicherzugriffs (RDMA) und des
Platzes im physikalischen Speicher des ersten Computers.
13. System nach Anspruch 11, wobei die zweite virtuelle Maschine konfiguriert ist, Daten
unmittelbar von der ersten virtuellen Maschine an einen Platz im physikalischen Speicher
des ersten Computers zu übertragen, wobei die VNIC die Daten unter Verwendung einer
entfernten Direktspeicherzugriffsmaschine überträgt.
14. System nach einem der Ansprüche 19 bis 13, wobei der erste Computer und der zweiten
Computer Blades sind.
15. Computerprogrammprodukt, das mehrere ausführbare Anweisungen für eine Datenübertragung
mit geringem Overhead umfasst, wobei die mehreren ausführbaren Anweisungen Anweisungen
umfassen, um das Verfahren nach einem der Ansprüche 1 bis 8 auszuführen.
1. Procédé pour le transfert de données à faible surdébit, comprenant les étapes suivantes
:
initialisation (800), par une première application (908), d'une connexion de protocole
de communication en transmission, noté "TCP", avec une deuxième application (910),
la première application s'exécutant sur un premier ordinateur (900) dans une première
machine virtuelle (904), la deuxième application s'exécutant sur un deuxième ordinateur
(902) dans une deuxième machine virtuelle (906) et le premier ordinateur et le deuxième
ordinateur étant implantés sur un châssis et communiquant par le biais d'une interconnexion
(912) de châssis, l'interconnexion de châssis comprenant un fond de panier d'interface
de composants périphériques express (PCI-E) et les premier et deuxième ordinateurs
étant configurés pour communiquer l'un avec l'autre via des points d'extrémité PCI-E
;
établissement (806), en réponse à l'initialisation, de la connexion TCP entre la première
application et la deuxième application ;
sélection (812), par une première carte d'interface de réseau virtuel, notée "VNIC",
(926), d'un deuxième protocole dans le groupe constitué par un premier protocole et
le deuxième protocole sur la base d'une détermination (810) que la connexion TCP entre
la première application et la deuxième application est une connexion TCP locale et
que la deuxième application souhaite communiquer au moyen du deuxième protocole, la
première VNIC étant implantée sur le premier ordinateur et étant intercalée entre
la première machine virtuelle et l'interconnexion de châssis, la connexion TCP étant
la connexion TCP locale lorsque la première application et la deuxième application
s'exécutent sur des ordinateurs physiques distincts connectés au châssis, et le premier
protocole étant le TCP et le deuxième protocole étant un transfert de données à faible
surdébit ; et
sur la base de la sélection du deuxième protocole,
fourniture (814), par la première application, d'informations de tampon de pré-affectation
à la deuxième application, les informations de tampon de pré-affectation correspondant
à un emplacement dans une mémoire physique du premier ordinateur et l'emplacement
dans la mémoire physique correspondant à une adresse de mémoire virtuelle de la première
application, et
transfert (818) de données, par la deuxième application, à la première application
au moyen des informations de tampon de pré-affectation, le transfert des données comprenant
l'écriture des données directement à l'emplacement dans la mémoire physique du premier
ordinateur.
2. Procédé selon la revendication 1, comprenant en outre l'étape suivante :
génération des informations de pré-affectation, la génération des informations de
pré-affectation comprenant les étapes suivantes :
allocation (700) de l'adresse de mémoire virtuelle dans une mémoire virtuelle associée
à la première application ;
fourniture (700) de l'adresse de mémoire virtuelle à un système d'exploitation (OS)
invité exécutant la première application, l'OS invité s'exécutant dans la première
machine virtuelle ;
traduction (702) de l'adresse de mémoire virtuelle afin d'obtenir une adresse de mémoire
virtuelle d'OS invité associée au système d'exploitation invité ;
fourniture (704) de l'adresse de mémoire virtuelle d'OS invité à un système d'exploitation
hôte sur lequel le système d'exploitation invité s'exécute ;
traduction (706) de l'adresse de mémoire virtuelle afin d'obtenir une adresse de mémoire
virtuelle d'OS hôte associée au système d'exploitation hôte, l'adresse de mémoire
virtuelle d'OS hôte correspondant à l'emplacement dans la mémoire physique du premier
ordinateur.
3. Procédé selon la revendication 1 ou la revendication 2, dans lequel les informations
de pré-affectation sont fournies à une première carte d'interface de réseau virtuel,
notée "VNIC", (926), par le biais de la connexion TCP.
4. Procédé selon la revendication 3, dans lequel la première VNIC est configurée pour
comparer l'emplacement dans la mémoire physique reçu depuis une deuxième VNIC (928)
à une plage d'adresses autorisée associée à la première application afin de déterminer
si les données peuvent être transférées ou non vers l'emplacement dans la mémoire
physique, la première VNIC étant implantée sur le premier ordinateur.
5. Procédé selon l'une quelconque des revendications précédentes, dans lequel la deuxième
application fournit à une deuxième VNIC implantée sur le deuxième ordinateur un emplacement
d'une mémoire physique associée à la connexion TCP.
6. Procédé selon la revendication 5, dans lequel l'étape de transfert des données comprend
l'étape suivante :
écriture, par la deuxième VNIC, des données à l'emplacement dans la mémoire physique
du premier ordinateur au moyen d'un accès distant direct en mémoire, noté "RDMA",
et de l'emplacement dans la mémoire physique du premier ordinateur.
7. Procédé selon la revendication 5 ou la revendication 6, dans lequel la première VNIC
et la deuxième VNIC sont des noeuds dans un chemin de réseau virtuel, le chemin de
réseau virtuel comprenant un premier fil virtuel entre la première VNIC et la deuxième
VNIC.
8. Procédé selon la revendication 5, dans lequel la deuxième machine virtuelle est configurée
pour transférer directement des données depuis la première machine virtuelle à un
emplacement dans la mémoire physique du premier ordinateur, la deuxième VNIC transférant
les données au moyen d'un moteur RDMA.
9. Système, comprenant :
une interconnexion (912) de châssis ; et
une première application (908) configurée pour s'exécuter dans une première machine
virtuelle (904) sur un premier ordinateur (900) et une deuxième application (910)
configurée pour s'exécuter dans une deuxième machine virtuelle (906) sur un deuxième
ordinateur (902), le premier ordinateur et le deuxième ordinateur étant implantés
sur un châssis et communiquant par le biais de l'interconnexion de châssis,
l'interconnexion de châssis comprenant un fond de panier d'interface de composants
périphériques express (PCI-E) et les premier et deuxième ordinateurs étant configurés
pour communiquer l'un avec l'autre via des points d'extrémité PCI-E,
la première application étant configurée pour initialiser une connexion de protocole
de communication en transmission, noté "TCP", avec la deuxième application,
en réponse à l'initialisation, la connexion TCP étant établie entre la première application
et la deuxième application,
une première carte d'interface de réseau virtuel, notée "VNIC", s'exécutant sur le
premier ordinateur et intercalée entre la première machine virtuelle et l'interconnexion
de châssis, étant configurée pour sélectionner un deuxième protocole dans le groupe
constitué par un premier protocole et le deuxième protocole sur la base d'une détermination
que la connexion TCP entre la première application et la deuxième application est
une connexion TCP locale et que la deuxième application souhaite communiquer au moyen
du deuxième protocole, la connexion TCP étant la connexion TCP locale lorsque la première
application et la deuxième application s'exécutent sur des ordinateurs physiques distincts
connectés au châssis, et le premier protocole étant le TCP et le deuxième protocole
étant un transfert de données à faible surdébit,
la première application étant configurée pour fournir des informations de tampon de
pré-affectation à la deuxième application en réponse à une détermination que la première
application s'exécute sur le même châssis que celui de la deuxième application,
les informations de tampon de pré-affectation correspondant à un emplacement dans
une mémoire physique du premier ordinateur et l'emplacement dans la mémoire physique
correspondant à une adresse de mémoire virtuelle de la première application, et
la deuxième application transférant des données à la première application au moyen
des informations de tampon de pré-affectation, le transfert des données comprenant
l'écriture des données directement à l'emplacement dans la mémoire physique du premier
ordinateur.
10. Système selon la revendication 9, dans lequel les informations de pré-affectation
sont générées par :
allocation de l'adresse de mémoire virtuelle dans une mémoire virtuelle associée à
la première application ;
fourniture de l'adresse de mémoire virtuelle à un système d'exploitation (OS) invité
exécutant la première application, l'OS invité s'exécutant dans la première machine
virtuelle ;
traduction de l'adresse de mémoire virtuelle afin d'obtenir une adresse de mémoire
virtuelle d'OS invité associée au système d'exploitation invité ;
fourniture de l'adresse de mémoire virtuelle d'OS invité à un système d'exploitation
hôte sur lequel le système d'exploitation invité s'exécute ;
traduction de l'adresse de mémoire virtuelle afin d'obtenir une adresse de mémoire
virtuelle d'OS hôte associée au système d'exploitation hôte, l'adresse de mémoire
virtuelle d'OS hôte correspondant à l'emplacement dans la mémoire physique du premier
ordinateur.
11. Système selon la revendication 9 ou la revendication 10, dans lequel la deuxième application
fournit à une carte d'interface de réseau virtuel, notée "VNIC", implantée sur le
deuxième ordinateur un emplacement d'une mémoire physique associée à la connexion
TCP.
12. Système selon la revendication 11, dans lequel le transfert des données comprend :
l'écriture, par la VNIC, des données à l'emplacement dans la mémoire physique du premier
ordinateur au moyen d'un accès distant direct en mémoire (RDMA) et de l'emplacement
dans la mémoire physique du premier ordinateur.
13. Système selon la revendication 11, dans lequel la deuxième machine virtuelle est configurée
pour transférer directement des données depuis la première machine virtuelle à un
emplacement dans la mémoire physique du premier ordinateur, la VNIC transférant les
données au moyen d'un moteur d'accès distant direct en mémoire.
14. Système selon l'une quelconque des revendications 9 à 13, dans lequel le premier ordinateur
et le deuxième ordinateur sont des lames.
15. Produit-programme d'ordinateur comprenant une pluralité d'instructions exécutables
pour le transfert de données à faible surdébit, la pluralité d'instructions exécutables
comprenant des instructions destinées à mettre en oeuvre le procédé selon l'une quelconque
des revendications 1 à 8.