Skip to main content

Optimize Workflows End-to-End

Codeflash supports optimizing an entire Python script end to end by tracing the execution of the script and generating Replay Tests. These Replay Tests can be used to optimize all the functions called in the script.

Motivation for Tracing a workflow

One of the hard problems with optimizing new code is verifying correctness and performance gains. The way Codeflash verifies correctness and performance gains is by running the function under optimization against a set of test cases. These test cases can be part of your existing unit test suite or generated by Codeflash. However, it's tedious to write test cases for every function you want to optimize, plus it's hard to come up with test cases for complex inputs and inputs that cover all edge cases. Additionally, running the function with these test cases is not a great way to verify performance gains as the test cases might not be representative of the real-world usage of the function.

Codeflash Tracer solves these issues.

What is Codeflash Tracer?

Codeflash Tracer is a tool that traces the execution of your workflow and generates a set of test cases that are derived from how your code is actually run. Codeflash Tracer works by recording the inputs of your functions as they are called in your codebase. These inputs are then used to generate test cases that are representative of the real-world usage of your functions. We call these generated test cases "Replay Tests" because they replay the inputs that were recorded during the tracing phase.

Then, Codeflash Optimizer can use these replay tests to verify correctness and calculate accurate performance gains for the optimized functions. Using Replay Tests, Codeflash can verify that the optimized function produces the same output as the original function and also measure the performance gains of the optimized function on the real-world inputs. This way you can be sure that the optimized function causes no changes of behavior for the traced workflow and also, that it is faster than the original function.

Using Codeflash Tracer

Codeflash Tracer can be used in two ways:

  1. As a command line module -

    You can use Codeflash Tracer as a module when you run Python. If you run a Python script as follows -

    python path/to/your/file.py --your_options

    You can trace the execution of the script by running -

    python -m codeflash.tracer -o codeflash.trace path/to/your/file.py --your_options

    So adding a -m codeflash.tracer -o codeflash.trace before your script will trace the execution of the script and save the trace to a file called codeflash.trace. If your script itself runs as a module, you can run it as follows -

    python -m codeflash.tracer -o codeflash.trace -m path.to.your.module --your_options

    More Options:

    • --max-function-count: The maximum number of times to trace a single function. More calls to a function will not be traced. Default is 100.
    • --tracer-timeout: The maximum time in seconds to trace the entire workflow. Default is indefinite. This is useful while tracing really long workflows.
  2. As a Context Manager -

    You can also use Codeflash Tracer as a context manager in your codebase. You can wrap the code you want to trace in a with statement as follows -

    from codeflash.tracer import Tracer

    with Tracer(output="codeflash.trace"):
    # Your code here

    This is useful to only trace and optimize a part of your executable, not the entire script. Sometimes, if using the tracer as a module fails, then the Context Manager can also be used to trace the code sections.

    More Options:

    • disable: If set to True, the tracer will not trace the code. Default is False.
    • max_function_count: The maximum number of times to trace a single function. More calls to a function will not be traced. Default is 100.
    • timeout: The maximum time in seconds to trace the entire workflow. Default is indefinite. This is useful while tracing really long workflows, to not wait indefinitely.
    • output: The file to save the trace to. Default is codeflash.trace.
    • config_file_path: The path to the pyproject.toml file which stores the Codeflash config. This is auto-discovered by default. You can also disable the tracer in the code by setting the disable=True option in the Tracer constructor.

Optimizing with Replay Tests

After the tracing phase is complete, the tracer will generate a trace file as well as a Replay Test file. The path of the generated replay test is printed on the console after the tracing is complete. It will be located in your tests directory and have a name like test_file_getting_traced__replay_test_0.py. The Replay Test file is a Python test file that when run will call the traced functions with the recorded inputs, i.e. replay them. Now Codeflash Optimizer can use these Replay Tests to verify correctness and calculate performance gains of the optimized functions.

To optimize all the functions traced, you can run the following command -

codeflash --replay-test tests/test_file_getting_traced__replay_test_0.py

Codeflash will auto-discover all the functions that were traced, and use the replay tests, plus will discover existing unit tests and generate more tests to get the best optimizations. Codeflash will open pull requests with the optimized functions as it finds them, which should speed up your end to end workflow!