Flame graphs are a wonderful tool to find bottlenecks and performance issues in Java code. When using JBang to start a Java application, enabling the generation of flame graphs is close to trivial.
We developers eventually need to focus on performance. Whether for saving costs, improving user experience, or avoiding timeouts, performance can be important.
One common technique for analyzing performance is flame graphs. In short, flame graphs is a technique to visualize what parts of your code are consuming more time.
Understanding Flame Graphs
Flame graphs display information in three dimensions:
- Vertically, it stacks program frames. That is, it stacks a new level every time your code delegates execution to another frame (p.e another function).
- Horizontally, the with of each program frame is proportional to the amount of time it (or a delegated frame) is consuming CPU.
- Colors depend on the visualization tool and the configuration used. In this post we´ll be using standard configuration for Java, meaning that the colors are related to the type code composing the frame (JIT, inlined, kernel, native…)
Flame Graphs are not Call Graphs
One thing you must have in mind when checking flame graphs is that flame graphs DO NOT imply time sequences! That is, if the frame for foo is to the left (or to the right) of the frame for bar, then it does NOT mean foo was invoked before bar.
The horizontal order of frames does not have a strict order. The default implementation of flame graphs sorts horizontal frames in alphabetical order for the method name, but even this can be changed via configuration.
In summary: look at the width of the frames, not at the horizontal position.
Tips for Finding Performance Issues
There is no silver bullet for “fixing” performance when you find a piece of code that consumes a fair amount of time.
Concerning performance, the frame graph provides two interesting hints:
- The higher the stack, the bigger the number of delegations in code. Most of the time delegations are not a big performance deal (JVM is smart enough to inline code to avoid them), but when the number is big one might consider investigating the cause.
- The wider the frame, the bigger the time the function takes to complete.
So, when using flame graphs to look for performance issues, look for wide frames or higher stacks.
Note that the width of a frame is proportional to the with of the frame below, down to the bottom all
frame. Before considering if a frame is actually causing a performance issue, review the actual impact in the whole execution.
So, remember: avoid early optimizations and delay changing your code until you are confident the change is beneficial. In case of need, use other tools to obtaines the numbers you need: profilers, JFR, hyperfine/hyperfoil…
Another indicator you might look for is the actual methods using execution time. Flame graphs display method names in each frame (you might need to hover over the frame if it is too narrow). Scanning through classes and method names, you might find unexpected initialization of classes or method invocations.
A flame graph will not tell you WHEN or WHY the method or class is invoked, but only WHO is invoking it. It is up to you to find the reason for the invocation or to apply other performance analysis techniques to find the exact cause (p.e. profiling).
One example of this case is class initialization. I found cases where a class includes time-expensive static initializations. Usually, developers rely on the idea that static initialization only occurs if the class is used; But class initialization sometimes occurs under the hood, hidden from developers (p.e. when using annotation processors). Flame graphs reveal the unexpected class initialization, that can be “fixed” by lazy initializations, saving some serious time.
Generating Flame Graphs
Now we know about flame graphs, and how to use them to improve the performance of our Java programs. But before starting to use flame graphs, we need to generate them!
Usually, in broad terms, to generate flame graphs we need the following steps:
- Install a performance analyzer, a tool to analyze the running application threads to identify the running frames. This usually implies locating and downloading the sources for the analyzer
- Setting up the developer environment and the compiler for such sources
- Compiling the sources to generate a native binary for the profiler.
- Compile your application in a fashion that it generates all the metadata needed for providing the performance analyzer with the frame information (p.e. class and method names).
- Execute your application and the performance analyzer at the same time, attaching the performance analyzer to your application.
- When the performance analysis completes, extract the information generated by the performance analyzer.
- Finally, transform the extracted information into a viewable flame graph (usually an HTML file).
Each of the previous steps requires installing and setup a specialized tool to be available in your system. And that´s annoying and error-prone.
Here is when the ap-loader comes into the scene. Ap-loader is a library that packages, loads, and executes async-profiler as a Java agent. Async-profiler is a performance analyzer written and specialized. in Java applications.
So, ap-loader and async-profiler solve the first and last steps for generating flame graphs. What about the rest?
And now JBang comes to the rescue. I am not getting into a detailed description of JBang features (that will require several blog posts!) but just focusing on the features we can use for generating flame graphs. Out of the box, JBang:
- Fetches and runs the appropriate version of the JVM to run your application.
- Fetches your code dependencies, compiles your code, and runs it on a single step.
- If required, fetches extra dependencies or libraries and executes Java agents besides your application.
JBang closes the gap. Together with ap-loader, JBang can generate flame graphs in a single step:

Let´s explain the command a bit:
- –java “19+” tells JBang to use Java 19 or later.
- –deps … add the jmh-generator-annprocess library to the build. This is needed to run ap-loader.
- –javaagent=… instructs JBang to fetch and execute the ap-loader java agent that fetches performance data and generates the flame graphs. It also provides configuration to the agent, as described in the async-profiler codebase.
With that command, when your application concludes, you obtain a file profile.html in your current directory. This file is the flame graph. Easy, isn´t it?
In some cases, you might need to add other parameters:
- In MacOS, I needed to enable archiving with Java agents by adding –java-options=”-XX:+UnlockDiagnosticVMOptions” –java-options=”-XX:+AllowArchivingWithJavaAgent”
- If you want JDK packages to show in your flame graph, you need to make sure the appropriate module is open; p.e. -R=”–add-opens=java.base/sun.nio.ch=ALL-UNNAMED”, -R=”–add-opens=java.base/java.io=ALL-UNNAMED” or -R=”–add-opens=java.base/java.lang=ALL-UNNAMED”.
- Some people commented requiring a
--enable-preview
flag for the agent to work. I never needed it, though.
It´s a wrap!
Analyzing performance in Java is complicated, but the flame graphs approach is a great tool to find bottlenecks or code unnecessarily running in your application.
With JBang, generating flame graphs is mostly a no-brainer. Just add two flags to your run command and you got it!
Then, it is up to you as a developer to read the flame graphs and update your application to get the best performance.