Profiling Python Plugins

Table of Contents

Original entry at old.niklasrosenstein.com.


I finished developing a custom Content Browser for the hantmade Stage plugin just the other day. One interesting thing I learned was that using Python’s built-in profiling module was much easier and more informative than to do your own benchmarking.

The Content Browser displayed items organized in a grid. The grid was rendered using a c4d.gui.GeUserArea and was customized to fit my client’s requirements. The development was straight forward, it was only very time consuming. But when my client reported back after the second alpha version, he complained about bad performance.

If you reduced the grid size (which was a parameter in the dialog) so that the thumbnails were rather small and you had a lot of items to display (let’s say a hundred), it rendered horribly slow, freezing Cinema for half a second every time it’s rerendered. I had not tried to use the plugin with that many items yet. But where was the performance bottleneck?

You have probably seen “traditional” benchmarking before, where you would measure the start and finish time of some code and subtract the one from the other. Doing this manual benchmarking would have been an immense amount of work and would have taken me several hours to benchmark all individual components of the DrawMsg() call tree.

A quick google for “profiling python” revealed the cProfile/profile module which is built-in and available in Python 2.6. Yeay!

How to profile in Python

The cProfile or profile module offers an easy way of profiling Python code revealing many details about its execution, including all sub-function calls. Automatically! You just have to start end stop the profiler at some point.

The bottleneck must clearly have been anywhere inside GeUserArea.DrawMsg(). So I decorated it:

import cProfile
import functools

def profile(func):
    """
    Decorator to profile the function *func*.

    .. attribute:: profile

        The generated profile of the function. Any calls to the wrapped
        function will contribute to the profile.
    """

    profile = cProfile.Profile()

    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        return profile.runcall(func, *args, **kwargs)

    wrapper.profile = profile
    return wrapper

class StageStorageGridView(c4d.gui.GeUserArea):

    @profile
    def DrawMsg(self, x1, y1, x2, y2, msg):
        # ...

With all the gathered information on the functions execution, I would need some action that triggers it to be printed so we can analyse it. This will print the stats when I reload our plugin:

def PluginMessage(msg_type, data):
    if msg_type == c4d.C4DPL_RELOADPYTHONPLUGINS:
        profile = StageStorageGridView.DrawMsg.profile
        profile.create_stats()
        if profile.stats:
            # Creating the Stats object will fail if the profile
            # has recorded data.
            stats = pstats.Stats(profile)
            stats.sort_stats('time').print_stats()

    return True

Interpreting the stats

Now that DrawMsg() is automatically profiled, I need to fill its profile by executing it. So I fire up the dialog an play with it for just a little while, then hit the “Reload Python Plugins” button, et voilà:

         7297 function calls in 0.378 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       71    0.328    0.005    0.328    0.005 {method 'DrawBitmap' of 'c4d.gui.GeUserArea' objects}
      216    0.013    0.000    0.013    0.000 {method 'DrawRectangle' of 'c4d.gui.GeUserArea' objects}
       71    0.006    0.000    0.355    0.005 'stage_storage.pyp':2359(DrawElement)
      101    0.005    0.000    0.005    0.000 {method 'DrawBorder' of 'c4d.gui.GeUserArea' objects}
       74    0.004    0.000    0.378    0.005 'stage_storage.pyp':2984(DrawMsg)
       74    0.002    0.000    0.002    0.000 {method 'OffScreenOn' of 'c4d.gui.GeUserArea' objects}
      277    0.002    0.000    0.002    0.000 'stage_storage.pyp':521(__init__)
       71    0.001    0.000    0.331    0.005 'stage_storage.pyp':2338(DrawThumbnail)
      216    0.001    0.000    0.002    0.000 'stage_storage.pyp':2251(PushClippingRegion)
      216    0.001    0.000    0.003    0.000 'stage_storage.pyp':2265(PopClippingRegion)
      216    0.001    0.000    0.001    0.000 {method 'DrawSetPen' of 'c4d.gui.GeUserArea' objects}
      142    0.001    0.000    0.003    0.000 'stage_storage.pyp':567(__copy__)
      358    0.001    0.000    0.001    0.000 {method 'SetClippingRegion' of 'c4d.gui.GeUserArea' objects}
       74    0.001    0.000    0.001    0.000 {method 'DrawText' of 'c4d.gui.GeUserArea' objects}
   [ ... and 40 more calls ]

As you can clearly see, the DrawBitmap() function is where most of the time is spent: 87% of the time! When I reiterated through my code and look for calls to DrawBitmap(), I found that I passed c4d.BMP_NORMALSCALED as flag to it. The documentation states about this flag:

“Scaling with sampling for high quality. Slow.”.

I removed the flag from the call and tried the whole thing again!

         4018 function calls in 0.041 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       39    0.011    0.000    0.011    0.000 {method 'DrawBitmap' of 'c4d.gui.GeUserArea' objects}
      120    0.007    0.000    0.007    0.000 {method 'DrawRectangle' of 'c4d.gui.GeUserArea' objects}
       39    0.003    0.000    0.027    0.001 'stage_storage.pyp':2359(DrawElement)
       42    0.003    0.000    0.041    0.001 'stage_storage.pyp':2984(DrawMsg)
       46    0.003    0.000    0.003    0.000 {method 'DrawBorder' of 'c4d.gui.GeUserArea' objects}
       42    0.002    0.000    0.002    0.000 {method 'OffScreenOn' of 'c4d.gui.GeUserArea' objects}
      150    0.001    0.000    0.001    0.000 'stage_storage.pyp':521(__init__)
       39    0.001    0.000    0.013    0.000 'stage_storage.pyp':2338(DrawThumbnail)
      120    0.001    0.000    0.001    0.000 'stage_storage.pyp':2251(PushClippingRegion)
      120    0.001    0.000    0.001    0.000 {method 'DrawSetPen' of 'c4d.gui.GeUserArea' objects}
      120    0.001    0.000    0.001    0.000 'stage_storage.pyp':2265(PopClippingRegion)
       78    0.001    0.000    0.002    0.000 'stage_storage.pyp':567(__copy__)
       42    0.001    0.000    0.001    0.000 {method 'DrawText' of 'c4d.gui.GeUserArea' objects}
      198    0.001    0.000    0.001    0.000 {method 'SetClippingRegion' of 'c4d.gui.GeUserArea' objects}
      750    0.000    0.000    0.000    0.000 'stage_storage.pyp':538(__iter__)
   [ ... and 40 more calls ]

Now that looks much better! Be careful though, in this trace, I triggered less redraws, thus a lower number of total calls. The total time is not comparable 1-to-1 with the previous stats.

The DrawBitmap() is still at the top of the list, but instead of 87% of the total time it only consumed 26%! Surprisingly the thumbnails still looked good (you could hardly see the difference). The enhanced scaling results make a difference with upscaling, but the thumbnails are usually downscaled.

And there I was, totally amazed that I found the one little flag in a 4k lines Plugin in a fraction of the time I would have needed with manual benchmarking!

comments powered by Disqus