Cpg#

ScubaTrace can read Joern cpg.bin files and expose them as an in-memory code property graph. The graph keeps Joern node labels and properties while adding Python helpers for common navigation tasks, such as finding methods, walking edges, resolving call targets, and locating the method that contains a source position.

Loading a Cpg#

Use scubatrace.Cpg when you already have a Joern FlatGraph file:

import scubatrace

cpg = scubatrace.Cpg.load("path/to/cpg.bin")

print(cpg.node_count)
print(cpg.edge_count)

When a project is created with scubatrace.JoernConfig, ScubaTrace also stores the loaded graph on the project:

import scubatrace

project = scubatrace.Project.create(
    "path/to/code",
    language=scubatrace.language.C,
    joern_config=scubatrace.JoernConfig(),
)

cpg = project.cpg

Querying Nodes#

Nodes are keyed by (label, sequence) tuples. For ad-hoc exploration, use label-based helpers or Joern-style step names:

methods = cpg.nodes_by_label("METHOD")
calls = cpg.nodes_by_label("CALL")

# Dynamic CPG node steps are also available.
methods = cpg.method
calls = cpg.call

main = cpg.find_method("main")
matching = cpg.find_methods(".*Controller.*", regex=True)

CPG properties can be read from the raw properties mapping, with scubatrace.CpgNode.get(), or through lower-case Python attributes for properties defined by the CPG schema:

method = cpg.find_method("main")
if method is not None:
    print(method["NAME"])
    print(method.get("FULL_NAME"))
    print(method.full_name)

Call Relationships#

For METHOD nodes, scubatrace.CpgNode.callers and scubatrace.CpgNode.callees return scubatrace.cpg.MethodCall objects. Each relationship contains the caller method, callee method, and the callsite node. Unresolved calls keep callee as None.

method = cpg.find_method("main")
if method is not None:
    for relation in method.callees:
        callee_name = relation.callee.full_name if relation.callee else "<unresolved>"
        location = relation.callsite_location
        print(callee_name, location.filename, location.line_number)

Source Locations#

Use scubatrace.Cpg.methods_at() or scubatrace.Cpg.method_at() to map a source location back to the smallest matching CPG method:

method = cpg.method_at("src/main.c", 42)
exact = cpg.method_at("src/main.c", 42, column_number=8)

The returned nodes expose scubatrace.CpgNode.location, which normalizes file, line, column, and byte-offset fields into a scubatrace.SourceLocation object.

NetworkX Export#

Use scubatrace.Cpg.to_networkx() to convert the Cpg into a networkx.MultiDiGraph. Node attributes include the Cpg label and all node properties. Edge attributes include the edge label and optional edge property.

graph = cpg.to_networkx()
class scubatrace.Cpg(nodes: Mapping[tuple[str, int], CpgNode], edges: Iterable[CpgEdge], manifest: Mapping[str, Any] | None = None)#

Bases: object

class scubatrace.CpgNode(id: 'NodeId', label: 'str', seq: 'int', properties: 'dict[str, Any]'=<factory>)#

Bases: object

class scubatrace.CpgEdge(src: 'NodeId', dst: 'NodeId', label: 'str', property: 'Any' = None)#

Bases: object

class scubatrace.cpg.MethodCall(caller: 'CpgNode | None', callee: 'CpgNode | None', callsite: 'CpgNode')#

Bases: object

class scubatrace.SourceLocation(filename: 'str | None', line_number: 'int | None', column_number: 'int | None', line_number_end: 'int | None' = None, column_number_end: 'int | None' = None, offset: 'int | None' = None, offset_end: 'int | None' = None)#

Bases: object

class scubatrace.cpg.FlatGraphReader(path: str | Path)#

Bases: object

scubatrace.cpg.load(path: str | Path) Cpg#