Performance - Part 3
February 28, 2009 · by Kevin Runde
Performance Tuning – Your first few tests
So, now you know how to prove what you think you know. You know your goals and what others are expecting. So let’s start testing, right? NOPE! I know, I know. Kevin, when do we actually start testing? How are we going to get any where if we don’t start testing? Well, now that you know where you are going (your goal) and any side trips you have to make (the other expectations) you need to be able to tell that you have accomplished all of these things. In other words, what metrics do you need and how are you going to record them? So, for each goal and expectation, what metrics are you going to use? Yes, I said metrics. You can’t assume one metric will be enough. After all, you don’t know what you don’t know so you need other metrics to validate the metric you are using to be able to check off each goal and expectation.
For example, if you are creating a web based application, you are probably using a testing tool to generate various user loads to see how the system performs. How do you know that tool is the right one? Well, if you are using Apache you can set up the logs to report response time. (NOTE: The Apache response time includes sending the data to the user.) So, if the tool does not agree with the Apache logs, you have an issue. Is the tool downloading other content, like images, CSS, and JavaScript, that you are not seeing in your Apache logs? Does rendering the page in the browser take a long time? Maybe the page is too complex with lots of tables and such or maybe your tool is having problems. Maybe it is doing too many requests per server. What is the CPU, network, disk, and memory usage stats on the testing boxes?
Notice something? That one metric of page response time has suddenly turned into several more metrics. This is why we are not testing yet. You need to look at each metric and figure out how you will validate it and, when something is wrong, how you will break it down even further. Going back to the web page example; if page requests are taking over 5 seconds, what is really taking the time? If the tools, metrics and the Apache logs are close then the problem isn’t at this level and you need to dig down deeper. If Apache hands off the page request to an application sever (like what often happens in Java Web Apps) can you tell how long the application server is taking. Remember, I said the Apache response time included the time to transfer the data. Maybe you have a network bottleneck. So in your application code, log out how long you think it takes you to process a page request. Don’t reinvent the wheel here either. There are often great tools you can use likeJAMon for Java. It even has a Servlet Filter you can use to monitor your web application that will generate stats. But wait, Kevin. I have a nice Network usage chart that shows there isn’t a problem. Does that chart show every switch and router in between the tool and the test server? Probably not. Again, you don’t know what you don’t know so find ways to prove what you think you know. Don’t assume anything. Getting paranoid yet? No. No one told me to say that. If you got that, then maybe, just maybe, you are starting to realize you don’t know what you don’t know.
Now, you don’t need to implement each metric right now. Make sure you have ways to record each major metric. Don’t waste time putting in additional metrics now. Remember, performance tuning is like an Ogre. As you peel back each layer, you will be adding more metrics. For now just think about what other metrics you might use and find out how to get more. Look at what additional metrics each tool,API, and component can give you. If possible, get people to design how they would add the other metrics.
Now run some tests. Run the same test 3 or 4 times. Don’t make any changes! Just run the tests. Then, look at the results. Are your results the same across all the runs. If they aren’t, don’t worry. It just means you don’t know what you don’t know. Take the metrics that are changing and come up with new metrics to validate those metrics and drill down into what is really going on.
Finally, make sure you know how much your metrics cost you. Any time you observe something (record data), you affect it. If you run your code in a profiler, it runs significantly slower. It will make out-of-process calls look faster because they are not running in the profiler. If you have to, run a test with and with out your metrics using a wall clock to check if the tests take roughly the same amount of time. If they don’t, you need to compensate for that.
Next time, I’ll talk about how to run the tests and, once you’ve found a problem, what do you do.
-Kevin Runde
Filed in: Team Member Blog Comments (0)
Comments
There are no comments for this entry.