APE

As I mentioned to recently on Twitter, there is a problem of identity crisis within the Application Performance community. One that others have touched on in the past but still seems prevalent.

If you are sick you don’t to go a “Healthcare technician” or a “medical analyst”, you go see a Physician. People work towards this title as a career goal because it is well defined and respected. But what about us? What do we aspire towards? Heck, what do we even call ourselves?

  • Performance Testers
  • Performance Analysts
  • Automation something-or-other…? *shudder*

Perhaps the answer seems obvious to you, and if it is then congratulations for not having the identity crisis that myself and many people I have spoken to seem to share. However this goes beyond standardizing on a name for people with the right skill set, although I believe that would help.

Part of the issue may stem from IT organizations not knowing where to put people like us. I have personally reported into Quality Assurance, Development, Operations and Support, all the while doing the same job. I can’t tell you how many people I have talked to with similar experiences, where the role is the same but the title and reporting structure are always different.

You may choose to call yourself a “Performance Tester” or “Analyst” because that is the title given to you by your current employer, but these fail to capture the entire scope and path of anyone focused on performance as a career. “Performance Engineer” has a nice ring to it, and I’m sure many would like to call themselves an Engineer, but this presents a problem. Are we really engineering application performance? I would estimate that only a select few have the background and experience to call themselves Engineers in this regard. Even more troubling is that there are a great many who throw titles like this onto their resume and don’t have the skills to support it, which brings down the value of the profession for others.

This is all very similar to the commoditization of Java Developers in the early 2Ks, when it seemed all the rage to farm jobs to low cost centers and the average wage of programmers dropped sharply. To executives, a programmer was a programmer was a programmer and it made sense to find the best deal possible. But the low cost centers struggled to retain skilled talent as the pressure of double-digit salary growth forced high attrition rates, and quality suffered. I know this is still the case in many areas, as the low cost centers of 2005 are now sending jobs to other low cost centers, where wages are rising, etc, etc…

The upside to this effect is that wages are on the rise again, and the App Explosion has renewed the once illustrious status of the “Application Developer” as a source of innovation and value to the business.

I also believe standardized education is an issue. In a Computer Science degree you are guaranteed to cover the basics of machine language, data structures, compiled and interpreted languages and a smidgeon of business (or more, depending on where you attend). Somewhere in there you may learn about how to build secure applications, and maybe even scalable ones, but there is no specific training for Application Performance Engineering that I have seen used as a standard for the profession. Did I just call it APE? Really?!

Anyways… we really need to start by defining what core skills are required to make someone successful in this field, and build a standard curriculum around that. Maybe then we will have a title we can agree on as the pinnacle, one that people will aspire towards and IT organizations will define a place for, much like App Security teams have experienced in the last several years.

Finally, what would that place be? Should Performance Engineering be a unique entity with a C-level position reporting directly to the CIO/CTO? Probably not (although it is a nice thought). Unlike Information Security, which until recently also tended to be shuffled around organizations until they defined a clear set of standards for education and skills, Performance Engineering only applies to a subset of IT functions, unless of course your business *is* providing IT systems. Performance, however, does neatly fit into one area that has seen tremendous growth in the last few years; One that spans both Development and Operations. However, I won’t say what that organization is for fear of being called a bandwagon jumper.

will say, I spent a few years convincing my management (back when I was managing a team of performance engineers myself) that it would make sense to give us Production Monitoring as well. It would mean that whatever we were testing is what we would monitor for once applications were released, and that SLAs were vetted and measured by a single group who would strive to remain objective. It also meant many months of not quite knowing how all of that would work once they gave it to me, but in the end it was the right decision.

Performance Testing – Analyzing the AUT, Part 2

In my previous post, I talked about the need to consider more than just one of the common objectives in our performance test scenarios, such as hits/pages per second, transactions per second, specific transactions (such as Login) and peak concurrent users. The key point I wanted to make was that focusing on only one of any of these will almost certainly result in erroneous results, and there is nothing more damaging to the credibility of the performance testing professional than having to defend invalid test results.

Today I want to talk about concurrent users in more detail, as this is one of the more common test objectives and the subject of much debate within the performance testing community. The problem I have isn’t the goal itself, rather what the goal means and how to ensure that we’re using not only the right number of concurrent users, but under the right circumstances. The same applies whether you are running Peak, Average, Stress, or any other type of test whose definitions I will not get into here; there is already plenty of discussions on that topic.

So you want to test 100 concurrent users, but doing what?

The question we should be asking our business is not how many concurrent users they want to run, but how do those users behave? This question will have multiple factors, such as time between steps or requests or between iterations of a session. First we need to ask:

  • Are all transactions performed equally, or is there a weighting to each transaction?
  • Are all sessions roughly the same, or should there be a significant amount of randomization in their behavior?
  • Is there a compelling reason to include other transactions in the mix other than the “top ten”?

Scott Barber, of PerfTestPlus Inc., has a handy way of dealing with these that he outlines over at his blog (deep linking not allowed, just hit enter on the URL again), and its hard to shake the mnemonic once you’ve read it. Go ahead and read that, as well as the other 2 posts he has which are linked from that page. I’ll wait here…

You back? Great.

Once again, I don’t want to rehash something that has already been stated so eloquently. So instead I will offer my own experiences on the subject. First, don’t just look at the goal of X number of concurrent users and nothing else. That should be obvious to anyone who has ever run a performance test, but in case it isn’t I’ll explain why. Take an example where a script has 10 steps, executed in the same order with the same delay between steps, and once the script is finished it starts again from the beginning. That script might look something like this when executed with other scripts during a test.

Everyone in a straight line!

(In all my Word-art glory!)

Now lets look at the same basic script, but with a randomized ordering to 8 of the 10 steps (since Login and Logout have to happen at specific places), and with randomized time between steps, with a random delay between script iterations.

Anarchy, I tell you.

Guess which one is likely to be more realistic? (ethay econdsay iptscray <– pig latin answer key)

These two scenarios will have very different results, but only the second one will be realistic.

The information we tend to get from analytic tools such as Google Analytics or WebTrends show the average user session and not the standard deviation of those sessions’ duration and pacing. Also, a good tool should show us not only the top transactions or pages but also the percentage or weighting of those, so we don’t have to assume all users behave the same way. Of course there will be exceptions, such as a customer registration process which spans multiple pages in a specific sequence. These types of transactions can and should be grouped together in a single “business process” that is somehow represented in our scripts, and business processes treated as separate transactions to be randomized accordingly. I find the best way to get accurate information on exactly what users are doing in an application is by taking a sample of production logs and analyzing them myself, but there are a few tools out there capable of helping you with the work.

That’s it for now. Remember, keep it real and you will be able to stand by your test results with confidence!

Shane Evans

Performance Testing – Analyzing the AUT, Part 1

…or, “What should my performance test scenario look like?”

This will be my first post in a series dedicated to sharing my thoughts on performance testing applications to better serve the business, which should be the focus of any comprehensive test plan because ultimately a business application exists to serve that business. This may seem obvious, but when you keep that truth in mind throughout the process it will determine your behaviour in defining a test plan, scripting user behavior or analyzing the outcome.

In this article I will discuss what factors should be considered when defining the test strategy. This will not be a rehashing of many articles on the subject which describe the difference between load, performance, peak, smoke, stress and a multitude of other test definitions related to the same set of objectives. Instead I want to talk about what factors should go into defining these tests, and where the information should come from.

We’ll talk about:

  • Hits/pages/requests per second versus “transactions” per second
  • Average session length
  • Transaction weightings
  • Concurrent users, peak vs average
  • Failover/disaster scenarios

My goal will be to clarify why each of these factors is important to building better performance test scenarios, and where you can find the data for each. We’ll also talk about how to balance between some of these, as sometimes there will be conflicts between them (transactions per second vs concurrent users, for example). Right then, let’s make this quick…

TL;DR Version

Performance Testing shouldn’t just be about how many pages per second your application can crank out, or how many concurrent users it can support, or how many transactions can be processed. Performance tests should accurately reflect the usage of the application under test by actual users in production or you aren’t seeing the whole picture. You might get better results than expected or worse, you might get bad results when things are actually ok. Both of these possibilities involve putting your reputation on the line during those last minute “or or no-go decision” situations.

Hits/Pages/Requests per second versus “transactions” per second

This should be a question the test engineer determines for the business, not the other way around. I say that because I have seen test engineers ask the business analyst or whoever it is representing the users, “What is your target hits per second for this test?” I shudder every time I see a seasoned performance engineer ask a question like this of someone who wouldn’t know the relevance of a hits/second metric if it hit/second them right in the nose. Hits per second is a measure of throughput of your web server and network, but it is not a true measure of application performance without knowing the context of those pages that are being served.

How do we find this information? Check production logs. If the system doesn’t exist in production yet, compare to a similarly utilized application if one is available. If this also isn’t possible, leave it out of the equation altogether. If you don’t know your target hits or pages per second that will just have to be one of the things your tests aim to uncover. Instead, aim for something you can ask the business users for, like how many transactions they expect to see in a day and spread across how many users over what period. This will be the basis for your pacing for each virtual user in your test.

If we know the total number of business transactions the business expects to see in a period (T), we get an easy figure showing the number of transactions per second we should target (Ts).

If T = 20,000 transactions per 24 hour period, then Ts equals roughly 0.2315 transactions per second.

Keep in mind this is only for a single defined transaction, so the number will be higher when considering other transaction types. Now we just need to figure out how many users we need active at any point in time (concurrent) to produce that volume of transaction per second. But before we do that, we need to know how long a user is active on the system. Why? Because 20 users hammering away as fast as electrons will allow is very different from 200 users with a more realistic approach to generating those transactions. Also keep in mind that this example is just that, and more than likely you will be dealing with slices of time much shorter than 24 hours, usually just one or a few.

Average Session Length

The question here is, “How long should the average user stay active on the application?”, that is before logging off and waiting some undetermined amount of time before starting the process again. This is important because for most modern applications, the Logon process is the single most resource intensive activity a user can perform due to the memory and other resource allocation performed by the application. So we never want to Logon more than we actually expect to see in production, or our results might be very far from accurate.

For example, let’s say you are trying to produce the number of transactions per second shown in the equation above, and you know that it takes 5 minutes for a single user to produce a transaction (d) – that is, the script executing from start to finish with one transaction occurring at some point – we can divide the number of transactions per second by the number of seconds it takes to perform a single transaction to determine the number of concurrent users (U). Borrowing from the previous example, we have:

U = d / Ts

U = 300 / 0.2315

U = 1295.9

Now we’re getting somewhere! Now we know that if it takes 5 minutes to generate a transaction and our target is 0.2315 transactions per second, it will take 1295 or so users to do it.

Again, if this information (session length) exists in some form of production log, either from a business intelligence type of application such as WebTrends or just by looking at web server logs, be sure to use it in your test for more representative results. Without it, we may have the right number of users creating the right mix of transactions, but with a very different ratio between Logon and the rest of our business transaction which will produce unwanted results.

Transaction Weightings

This next area is one I have had many… “challenging” conversations with business and technical persons around, because it doesn’t only involve an understanding of the volumes of transactions on a system but also the types of transactions, and the impact of different types of transactions on application performance. One example is the Login I explained above. In most application code, the Login process is the most CPU and Memory intensive process a user can perform, and generating too many Login transactions during a test can result in poor performance that is not representative of real world usage. Another example might be very large transaction histories, or wildcard searches (such as SELECT * FROM…). These can lock up system resources such as thread pools, database connection pools and session memory while the application waits for a response from the database, and in a worst case scenario even lock up the database!

The point is, we need to understand which transactions are typical of an average user session in order to build scripts that accurately reflect that average user session, and that means understanding which transactions and in what percentages. If you choose the top 10 business transactions performed by your users and run all of them equally, meaning 10% of the time for each of them, I can assure you that 1) you will be logging on far too frequently and 2) your test results are not going to look anything like what you will see once the application goes live. Why? Because in most applications, transactions are heavily weighted towards just a few of those, with the remainder occupying a very small percentage of the total combined.

At this point I realize I’ve written the word “transaction” far too many times in the preceding paragraphs so will offer a real-world example.

In my previous life, I was testing an Internet Banking application for a very large financial institution. Every quarter this application would see another release, with one or more new “features” added which would allow customers to do things they would previously have to go to a brick-and-mortar location for. The Business Analysts thought this was Nobel prize winning stuff, and that they were somehow saving the world with every release. I always chuckled when it came to a performance test plan, however, because we simply ran the baseline test every time. Why not include the new stuff? Oh, we’d throw in one or two of the new transactions (*sigh*, did it again), which was about how many they would see in Production. You see, 85% of all Internet Banking requests are for one thing, Balance Inquiry. The other two dozen types of activities the customer could perform occupied the other 15% combined, with almost all of them falling in below 5% of the total. So while we were sure to include the new stuff for every release, the performance test scenario almost never changed.

To give another example, a workflow application I was consulting for had a similar transaction mix in that something like 90% of the total activity was refreshing the Inbox. The remaining 9 of the top ten transactions combined to form the other 10% of the mix, but there was one transaction that when performed would lock up the system almost completely until it had been completed. If we hadn’t included that transaction in the performance test scenario, it would have been a disaster upon reaching production.

The key is understanding what to include, and how much. Again this information could come from the backend of the system in production, or WebTrends, or Google Analytics if such information exists.

Now this is getting a little longer than I had originally intended, so I’ll wrap it up for now and come back for Part 2 in a few days to talk about Peak Concurrency and Disaster scenarios. Remember, if you aren’t testing real world scenarios, why even bother testing?

Thanks for reading!

 

Edit 01/11/12: Fixed spelling errors…

 

Fanboyism

I was poking around in Google+ today, checking out my local feed, when I stumbled upon the most hate-filled posts from a guy in my area who is clearly an Android fan. This got me thinking, why is this man so angry about people who like Apple products? It’s not like he needs to buy them.

Edit: Isn’t it the goal of every company to succeed at the expense of the competition? This forms the basis for capitalism, whether explicit or implied. Eric Schmidt, the chairman of Google once sat on the board at Apple and had to step down from that position when low-and-behold Google introduced the Android phone, after Apple introduced the iPhone. But hey, Apple is the bad guy for wanting to protect their ideas.

Back up a few steps…

I’m a Mac. I mean, I use one at home but I’ve spent my entire career using Windows. I built my career on my knowledge of Windows, and how to make it tick. I don’t have problems with viruses, malware, or drivers in Windows so my choice to use a Mac in my personal time is entirely based on non-propogandic (if that is even a word) opinions. There was a time when Linux was my mistress, back before the Mac came into my life. But since OS X 10.2-ish, I haven’t compiled a single kernel, window manager or application.

I type this post on my stunning 27″ iMac, running OS X 10.7.2 “Lion”. A computer that sits roughly 15 feet away from my 46″ LCD TV in the living room, yet entirely dominates the space. When people come over to visit, they often tell me how nice my computer is.

I also have 2 iPhones, an iPad 2, a few iPod Touch, and an Apple TV in the house. So yeah, you could easily assume that I am a fan of Apple, and you would be right. Steve Jobs was someone I looked up to for many years before he lost his battle with Cancer, and with all of the media attention his death received I can honestly say the more I learn about the man, the more I would like to model my own career after his. That I also own a high-end PC and an Android tablet despite not having a need for 2 tablets in a family with more mobile devices than people should say something about my personal preferences.

When I work, I work in Windows… but in a virtual machine on my Mac. I also have a top-of-the-line HP laptop beside it that if I can get away with it is usually closed. It’s a nice laptop, aside from abysmal battery life thanks to a power hungry processor intended for desktop use. I don’t do this because I dislike my PC, but using it at home next to my iMac is like driving a Mustang to dinner when there is a Mercedes AMG in the garage. Sure, the Mustang is fast and overall a very nice car, but next to the AMG it’s kinds… meh.

I tell people that ask my opinion about phones, tablets or computers to consider Apple products. I do this because they have served me well over the last 6 or 7 years, and compared to my experience with Windows-based PCs (which as I mentioned above is a lot) have been a lot more productive. This is the key to my affinity to Apple products. I spent the first 15 years of my computer experience building PCs from scratch, tuning them, making them operate without a great deal of headaches, and helping others with their own PCs. This was followed by 4 or 5 years of learning how to build a Linux PC from less than scratch, literally learning the basic building blocks of a computer and getting them to abide by my will. Suddenly, I didn’t have time for any of that. I had 2 kids (3 now), a demanding career, and interests beyond the care and feeding of my computer. This was the same time I found myself using a Mac.

I had taken my knowledge of Linux (well, Unix), and my experience with PCs, and turned those into something truly powerful when using my OS X machine, a system built on BSD Unix. Yeah, its easy to use – unless you are my parents and have no desire to try something different – my daughter started using ours when she was 4, but it also has capabilities far beyond those of a Windows based PC. I’m not talking video games, video editing, or music. I’m talking about a set of tools underlying the system that allow me to control things in ways a PC can’t. It is called UNIX, Google it.

But I’ve used Android, and it’s great. Its based on Linux, which I like, and it is supported by a huge community so cool tricks and solutions to problems are never far away. I get that people think that it is free, but nothing in the world is free; Everything comes at a cost. It is somewhat more open than Apple’s menagerie of toys, but there are tradeoffs there. Some like that openness, and I totally respect that. Others prefer to have things work in a familiar way no matter what operating system version or device we’re using. That’s the beauty of competitive products, everyone has different preferences so these things can easily coexist.

I’ve often compared people’s allegiance to PC vs Mac, PC vs Linux, iPhone vs Android, and so on as subjects of religious debate; something better left out of casual conversation. In the beginning, I was kidding. But lately this has become all too true. Is it because people need to hate something in their day-to-day lives? Are we hard wired to align ourselves with a cause and defend it to our bitter end? Or is there something more sinister to blame for all of the hate, on both sides of this corporate war. I mean, I work for Hewlett-Packard but here I am talking about how much I love Apple products. I don’t work in the PC division, or I might have to conceal my name before posting something like this, but my point is if my financial well being depends on the success of my company I should have some personal allegiance to PCs, certainly more than some guy off the street who doesn’t work for Google or Apple yet fills his social media presence with hateful comments towards one or the other, right?

Maybe we’ve just become too consumed by our consumerism. Maybe the folks down on Wall Street protesting the corporate takeover of our generation aren’t just fighting the big banks, but the people at home watching TV as well. Because if we’re willing to defend our company of choice to the point of insulting anyone who decides otherwise, maybe the battle against capitalism is already lost.

Wow, that ended up being a lot deeper than I had intended going into this post.

First things second.

A Jellyfish!

Jellyfish are bloody amazing.

I wanted to play with the image processor here, so I can see how theme changes impact the look and feel of the site.

Also, Jellyfish are freaking amazing. I mean, how the hell is that thing even considered alive? It is basically a floating nervous system, and nothing else. No cardiovascular system, no bones, no digestive system. When it eats, it simply absorbs its prey.

Aliens, I tell you.

Oh, and if I do post an image here it is either one I captured or one I need to make a point, and credit will be given to the owner.

As you were…

New(b)

This isn’t my first blog, but that hardly matters. The first was a personal site dedicated to pouring out my emotions, mainly frustration, and some experiences. In short, it was shit. I maintained that for about 6 years before deciding to download the entire thing and delete it from the web entirely.

This time around, I will be focusing on topics somewhat more relevant… to me. Mainly my commentary on what’s going on in the world, both corporate and political (same thing, amIright?), as well as emerging trends in technology and where I think things will go next. I figure, since I spend most of my day working in technology it is high tide that I start sharing some of my thoughts more openly.

More to come.

Follow

Get every new post delivered to your Inbox.