…or, “What should my performance test scenario look like?”
This will be my first post in a series dedicated to sharing my thoughts on performance testing applications to better serve the business, which should be the focus of any comprehensive test plan because ultimately a business application exists to serve that business. This may seem obvious, but when you keep that truth in mind throughout the process it will determine your behaviour in defining a test plan, scripting user behavior or analyzing the outcome.
In this article I will discuss what factors should be considered when defining the test strategy. This will not be a rehashing of many articles on the subject which describe the difference between load, performance, peak, smoke, stress and a multitude of other test definitions related to the same set of objectives. Instead I want to talk about what factors should go into defining these tests, and where the information should come from.
We’ll talk about:
- Hits/pages/requests per second versus “transactions” per second
- Average session length
- Transaction weightings
- Concurrent users, peak vs average
- Failover/disaster scenarios
My goal will be to clarify why each of these factors is important to building better performance test scenarios, and where you can find the data for each. We’ll also talk about how to balance between some of these, as sometimes there will be conflicts between them (transactions per second vs concurrent users, for example). Right then, let’s make this quick…
Performance Testing shouldn’t just be about how many pages per second your application can crank out, or how many concurrent users it can support, or how many transactions can be processed. Performance tests should accurately reflect the usage of the application under test by actual users in production or you aren’t seeing the whole picture. You might get better results than expected or worse, you might get bad results when things are actually ok. Both of these possibilities involve putting your reputation on the line during those last minute “or or no-go decision” situations.
Hits/Pages/Requests per second versus “transactions” per second
This should be a question the test engineer determines for the business, not the other way around. I say that because I have seen test engineers ask the business analyst or whoever it is representing the users, “What is your target hits per second for this test?” I shudder every time I see a seasoned performance engineer ask a question like this of someone who wouldn’t know the relevance of a hits/second metric if it hit/second them right in the nose. Hits per second is a measure of throughput of your web server and network, but it is not a true measure of application performance without knowing the context of those pages that are being served.
How do we find this information? Check production logs. If the system doesn’t exist in production yet, compare to a similarly utilized application if one is available. If this also isn’t possible, leave it out of the equation altogether. If you don’t know your target hits or pages per second that will just have to be one of the things your tests aim to uncover. Instead, aim for something you can ask the business users for, like how many transactions they expect to see in a day and spread across how many users over what period. This will be the basis for your pacing for each virtual user in your test.
If we know the total number of business transactions the business expects to see in a period (T), we get an easy figure showing the number of transactions per second we should target (Ts).
If T = 20,000 transactions per 24 hour period, then Ts equals roughly 0.2315 transactions per second.
Keep in mind this is only for a single defined transaction, so the number will be higher when considering other transaction types. Now we just need to figure out how many users we need active at any point in time (concurrent) to produce that volume of transaction per second. But before we do that, we need to know how long a user is active on the system. Why? Because 20 users hammering away as fast as electrons will allow is very different from 200 users with a more realistic approach to generating those transactions. Also keep in mind that this example is just that, and more than likely you will be dealing with slices of time much shorter than 24 hours, usually just one or a few.
Average Session Length
The question here is, “How long should the average user stay active on the application?”, that is before logging off and waiting some undetermined amount of time before starting the process again. This is important because for most modern applications, the Logon process is the single most resource intensive activity a user can perform due to the memory and other resource allocation performed by the application. So we never want to Logon more than we actually expect to see in production, or our results might be very far from accurate.
For example, let’s say you are trying to produce the number of transactions per second shown in the equation above, and you know that it takes 5 minutes for a single user to produce a transaction (d) – that is, the script executing from start to finish with one transaction occurring at some point – we can divide the number of transactions per second by the number of seconds it takes to perform a single transaction to determine the number of concurrent users (U). Borrowing from the previous example, we have:
U = d / Ts
U = 300 / 0.2315
U = 1295.9
Now we’re getting somewhere! Now we know that if it takes 5 minutes to generate a transaction and our target is 0.2315 transactions per second, it will take 1295 or so users to do it.
Again, if this information (session length) exists in some form of production log, either from a business intelligence type of application such as WebTrends or just by looking at web server logs, be sure to use it in your test for more representative results. Without it, we may have the right number of users creating the right mix of transactions, but with a very different ratio between Logon and the rest of our business transaction which will produce unwanted results.
This next area is one I have had many… “challenging” conversations with business and technical persons around, because it doesn’t only involve an understanding of the volumes of transactions on a system but also the types of transactions, and the impact of different types of transactions on application performance. One example is the Login I explained above. In most application code, the Login process is the most CPU and Memory intensive process a user can perform, and generating too many Login transactions during a test can result in poor performance that is not representative of real world usage. Another example might be very large transaction histories, or wildcard searches (such as SELECT * FROM…). These can lock up system resources such as thread pools, database connection pools and session memory while the application waits for a response from the database, and in a worst case scenario even lock up the database!
The point is, we need to understand which transactions are typical of an average user session in order to build scripts that accurately reflect that average user session, and that means understanding which transactions and in what percentages. If you choose the top 10 business transactions performed by your users and run all of them equally, meaning 10% of the time for each of them, I can assure you that 1) you will be logging on far too frequently and 2) your test results are not going to look anything like what you will see once the application goes live. Why? Because in most applications, transactions are heavily weighted towards just a few of those, with the remainder occupying a very small percentage of the total combined.
At this point I realize I’ve written the word “transaction” far too many times in the preceding paragraphs so will offer a real-world example.
In my previous life, I was testing an Internet Banking application for a very large financial institution. Every quarter this application would see another release, with one or more new “features” added which would allow customers to do things they would previously have to go to a brick-and-mortar location for. The Business Analysts thought this was Nobel prize winning stuff, and that they were somehow saving the world with every release. I always chuckled when it came to a performance test plan, however, because we simply ran the baseline test every time. Why not include the new stuff? Oh, we’d throw in one or two of the new transactions (*sigh*, did it again), which was about how many they would see in Production. You see, 85% of all Internet Banking requests are for one thing, Balance Inquiry. The other two dozen types of activities the customer could perform occupied the other 15% combined, with almost all of them falling in below 5% of the total. So while we were sure to include the new stuff for every release, the performance test scenario almost never changed.
To give another example, a workflow application I was consulting for had a similar transaction mix in that something like 90% of the total activity was refreshing the Inbox. The remaining 9 of the top ten transactions combined to form the other 10% of the mix, but there was one transaction that when performed would lock up the system almost completely until it had been completed. If we hadn’t included that transaction in the performance test scenario, it would have been a disaster upon reaching production.
The key is understanding what to include, and how much. Again this information could come from the backend of the system in production, or WebTrends, or Google Analytics if such information exists.
Now this is getting a little longer than I had originally intended, so I’ll wrap it up for now and come back for Part 2 in a few days to talk about Peak Concurrency and Disaster scenarios. Remember, if you aren’t testing real world scenarios, why even bother testing?
Thanks for reading!
Edit 01/11/12: Fixed spelling errors…