Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

By default, the SOS controller’s REST API runs at the controller’s IP on TCP port 8080. An HTTP client can be used to leverage the API. The popular utility curl is used throughout this document as an example HTTP client. For more information on curl, please refer to curl’s documentation. The following is a general example on how to use curl:

...

In order to use SOS, the network must first be configured. This is an assumed prerequisite to this document.

For the following examplesAfter the network is configured, the agents should then be added to the controller, information about the transfer to conduct SOS on should be whitelisted in the controller, any SOS parameters should be tuned as desired, and lastly the controller should be checked to ensure the network is ready to handle pending transfer.

In the following steps, the controller is assumed to be running at the IP address 192.168.1.1 with its REST API exposed on port 8080.

 

Add the SOS Agents

Once the network is configured, the SOS controller needs to know about the agents it has at its disposal. One or more agents can be added, although note that SOS must have at least two configured agents in order to function. The following is an example of adding an agent:

...

The Python module json.tool is used to format the JSON output in human-readable form. This is not strictly required, although it sure makes the JSON string look pretty (smile)

Once an agent is added to the controller, there is no need to add it again in subsequent transfers.

Add a Whitelist Entry

To keep track of the transfers which should have SOS performed, the controller maintains a whitelist, where all transfers listed will use SOS, while all transfers absent will be handled outside of SOS. The default behavior of the controller is to emulate a L2 learning switch for all transfers that are not proactively whitelisted.

...

Code Block
languagebash
   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   191    0   101  100    90  16248  14478 --:--:-- --:--:-- --:--:-- 16833
{
    "code": "0",
    "message": "WhitelistEntry successfully added. The entry may initiate the data transfer."
}

Once a transfer is whitelisted, it will persist until explicitly removed. The start and stop values are currently ignored (if supplied).

Tune Transfer Parameters

SOS's performance can be tuned by adjusting the number of parallel connections to use between the agents, the agent application's TCP receive buffer size (in bytes) per connection, and the reordering queue length on the receiving agent. The OpenFlow flow timeouts can also be adjusted, but it is recommended that they be left at their defaults of 0 for the hard timeout and 60 seconds for the idle timeout.

To adjust the number of parallel connections to use between the Any changes to these parameters will not impact ongoing transfers and will only apply to future transfers. Any settings will remain in effect until changed at a later time.

To adjust the number of parallel connections to use between the agents, one can do the following:

...

Code Block
languagebash
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    88    0    57  100    31   9961   5417 --:--:-- --:--:-- --:--:-- 11400
{
    "code": "0",
    "message": "Parallel connections set to 4096"
}

Likewise, to adjust the TCP receive buffer size in the agent application, one can do the following:

...

languagebash

...

It is recommended that the number of parallel connections range from 1 to 8,000. The agents will not reliably handle more than 8,000 given RAM utilization. As a general rule, the more parallel connections that are used, the better the performance will be. However, there comes a point where the CPU struggles to keep up with reading from a too-large number of these sockets.

Likewise, to adjust the TCP receive buffer size in the agent application, one can do the following:

Code Block
languagebash
curl http://192.168.1.1:8080/wm/sos/config/json -X POST -d '{"buffer-size":"70000"}' | python -m json.tool

...

Code Block
languagebash
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    72    0    49  100    23   8105   3804 --:--:-- --:--:-- --:--:--  9800
{
    "code": "0",
    "message": "Buffer size set to 70000"
}

And lastly, to adjust the agent application's reordering queue length, one can do the following:

...

languagebash

...

It is recommended that the buffer size be at least 30,000 bytes and go no higher than 500,000 bytes. Values above 500,000 have not shown improvements in data transfer speed, while values under 30,000 bytes tend to hinder performance. The "sweet spot" at this point has been determined to be around 60,000-70,000 bytes given a large ~4,000 number of parallel connections.

And lastly, to adjust the agent application's reordering queue length, one can do the following:

Code Block
languagebash
curl http://192.168.1.1:8080/wm/sos/config/json -X POST -d '{"queue-capacity":"5"}' | python -m json.tool

...

Code Block
languagebash
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    70    0    48  100    22   5710   2617 --:--:-- --:--:-- --:--:--  6000
{
    "code": "0",
    "message": "Queue capacity set to 5"
}"
}

The queue capacity is merely a safety feature that will only take effect if there is packet loss on a given parallel connection or if the agent CPU cannot keep up with the arrival rate of packets. Better performance with high queue capacity values indicates network instability, which implies that perhaps the buffer size and number of parallel connections parameters are either too large or too small.

If you would like to experiment with idle timeouts, you may adjust the idle timeout as follows:

...

The idle timeout is by default set to 60 seconds. This means that the last packet of the transfer, on a per-flow basis, will cause the flows to expire and be automatically removed from the switches 60s later. This is a long time, but it is safe in that it allows the agents to clear their buffers and finish transferring data, which might be terminated prematurely given a combination of shorter timeouts and poor parallel connection and buffer size parameter choices. Note that the "feedback-port" key used when adding an agent (above) is going to be used to allow the agents to send explicit notification to the controller when they have completed a particular transfer; this is about 95% complete at this time. It will eliminate the need to estimate idle timeouts.

Changing the hard timeout is supported, but it does not make sense at present (thus I will not include an example (smile) ). Having hard timeouts could serve as a way to kick out transfers that have used more than their allotted share/time, but this is not a feature that is supported. As such, it is recommended that the hard timeout be left at 0 seconds (infinite).

Check Controller Readiness

Before a transfer is to be initiated, the controller should be queried to ensure all systems are ready to react to the transfer and perform SOS. This works well in a single-user environment where the user is performing sequential transfers. As such, it is the model we use at this point; however, a solution is nearing completion that allows the controller to pre-allocate resources for a user during a specific time period (which is more real-world) – these are the unused start and end times indicated in the whitelist REST API above.

...