Java Performance: The Definitive Guide

Java Performance: The Definitive Guide

English | 2014 | ISBN: 978-1-449-35845-7 | 425 Pages | PDF | 11 MB

Coding and testing are often considered separate areas of expertise. In this comprehensive guide, author and Java expert Scott Oaks takes the approach that anyone who works with Java should be equally adept at understanding how code behaves in the JVM, as well as the tunings likely to help its performance.
You’ll gain in-depth knowledge of Java application performance, using the Java Virtual Machine (JVM) and the Java platform, including the language and API. Developers and performance engineers alike will learn a variety of features, tools, and processes for improving the way Java 7 and 8 applications perform.


The PrintFlagsFinalcommand will print out hundreds of available tuning flags for the JVM (there are 668 possible flags in JDK 7u40, for example).

The vast majority of these flags are designed to enable support engineers to gather more information from running (and misbehaving) applications. It is tempting upon learning that there is a flag called AllocatePrefetchLines(which has a default value of 3) to assume that value can be changed so that instruction prefetching might work better on a particular processor. But that kind of hit-or-miss tuning is not worthwhile in a vacuum; none of those flags should be changed without a compelling reason to do so. In the case of the AllocatePrefetchLinesflag, that would include knowledge of the application’s prefetch performance, the characteristics of the CPU running the application, and the effect that changing the number will have on the JVM code itself.

Java and single-CPU usage

To return to the discussion of the Java application—what does periodic, idle CPU mean in that case? It depends on the type of application. If the code in question is a batchstyle application where there is a fixed amount of work, you should never see idle CPU because there is no work to do. Driving the CPU usage higher is always the goal for batch jobs, because it means the job will be completed faster. If the CPU is already at 100%, then you can of course still look for optimizations that allow the work to be completed faster (while trying also to keep the CPU at 100%).

If the measurement involves a server-style application that accepts requests from some source, then there may be idle time because no work is available: for example, when a web server has processed all outstanding HTTP requests and is waiting for the next request. This is where the average time comes in. The sample vmstat output was taken during execution of an application server that was receiving one request every second. It took 450 ms for the application server to process that request—meaning that the CPU was actually 100% busy for 450 ms, and 0% busy for 550 ms. That was reported as the CPU being 45% busy.

Although it usually happens at a level of granularity that is too small to visualize, the expected behavior of the CPU when running a load-based application is to operate in short bursts like this. The same macro-level pattern will be seen from the reporting if the CPU received one request every half-second and the average time to process the request was 225 ms. The CPU would be busy for 225 ms, idle for 275 ms, busy again for 225 ms, and idle for 275 ms: on average, 45% busy and 55% idle.

Network Usage

If you are running an application that uses the network—for example, a Java EE application server—then you must monitor the network traffic as well. Network usage is similar to disk traffic: the application might be inefficiently using the network so that bandwidth is too low, or the total amount of data written to a particular network interface might be more than the interface is able to handle.

Unfortunately, standard system tools are less than ideal for monitoring network traffic because they typically show only the number of packets and number of bytes that are sent and received over a particular network interface. That is useful information, but it doesn’t tell us if the network is under- or overutilized.

On Unix systems, the basic network monitoring tool is netstat (and on most Linux distributions, netstatis not even included and must be obtained separately). On Windows, typeperfcan be used in scripts to monitor the network usage—but here is a case where the GUI has an advantage: the standard Windows resource monitor will display a graph showing what percentage of the network is in use. Unfortunately, the GUI is of little help in an automated performance testing scenario.

Fortunately, there are many open source and commercial tools that monitor network bandwidth. On Unix systems, one popular command-line tool is nicstat, which presents a summary of the traffic on each interface, including the degree to which the interface is utilized:

Using Less Memory

The first approach to using memory more efficiently in Java is to use less heap memory. That statement should be unsurprising: using less memory means the heap will fill up less often, requiring fewer GC cycles. The effect can multiply: fewer collections of the young generation means the tenuring age of an object is increased less often—meaning that the object is less likely to be promoted into the old generation. Hence, the number of full GC cycles (or concurrent GC cycles) will be reduced. And if those full GC cycles can clear up more memory, then they will also occur less frequently.

This section investigates three ways to use less memory: reducing object size, lazy initialization of objects, and using canonical objects.