This increases the offered throughput per instance, so it can test instance throughput. Another testing technique we use is to take an existing load generation test or canary, drive sustained load (instead of increasing load) toward a test environment, but begin removing servers from that test environment. For example, they can overburden the CPU or introduce packet loss to simulate conditions that happen during an overload. Tools like Chaos Monkey help perform chaos engineering tests on services. The ideal load test result is for goodput to plateau when the service is close to being fully utilized, and to remain flat even when more throughput is applied. If, in an overload test, a service’s availability degrades rapidly to zero as throughput increases, that’s a good sign that the service is in need of additional load shedding mechanisms. Some load tests ensure that a fleet automatically scales as load increases whereas others use a fixed fleet size. Generating graphs like the ones earlier in this article helps us baseline overload performance and track how we do over time as we make changes to our services. At Amazon we spend a great deal of time load testing our services. When I talk to other engineers about load shedding, I like to point out that if they haven’t load tested their service to the point where it breaks, and far beyond the point where it breaks, they should assume that the service will fail in the least desirable way possible. In this article, we’ll describe other approaches such as load shedding that we found worked well. In the end, we found that the maximum connections concept was too imprecise to provide the complete answer to the puzzle. Then the values would be wrong again, resulting in unnecessary outages or overloads. When maximum connections were set just right for a workload, the workload would shift or dependency performance would change. When maximum connections were set too high, servers would become slow and unresponsive. When maximum connections were set too low, the load balancer might cut off increases in the number of requests, even when the service had plenty of capacity. We decided that if we could figure out how to use human judgment to make a choice, we could then write software to emulate that judgment.ĭetermining the ideal value ended up being very challenging. We set out to help Amazon service owners and service clients figure out the ideal value for maximum connections to set on the load balancer, and the corresponding value to set in the frameworks we provided. This was before the days of Elastic Load Balancing, so hardware load balancers were in widespread use. More specifically, we wanted to configure the maximum connections settings for the server in proportion to the maximum connections for the load balancer. This setting was designed to prevent a server from taking on too much work and becoming overloaded. One common question we struggled with was determining the default number of connections the server would allow to be open to clients at the same time. This wouldn’t have been any easier for service owners or clients to figure out themselves, so we kept trying, and gained some useful insights along the way. For example, we couldn’t set a default client-side timeout easily, because our framework had no idea what the latency characteristics of an API call might be. One challenge we faced was in determining how to provide sensible defaults, especially for features that were performance or availability related. Instead of each service team having to integrate those features into their services manually, the Service Frameworks team did that integration once and exposed the functionality to each service through configuration. Other Amazon teams provided service owners with functionality such as metering, authentication, monitoring, client library generation, and documentation generation. Our team wrote tools that helped the owners of AWS services such as Amazon Route 53 and Elastic Load Balancing build their services more quickly, and service clients call those services more easily. For a few years, I worked on the Service Frameworks team at Amazon.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |