Skip to content

Add allowlist for filtering processes interface #1009

@korniltsev-grafanista

Description

@korniltsev-grafanista

One of the problem we face at grafana is profiling cluster-wide with a memory limit. Let's say we set the limit to 400Mib and in average it consumes 350Mib. We have some clickhouse / oracle database running and the binaries are pretty huge. Processing such binaries takes seconds of CPU time. And hundreds Mib of memory (both heap and ebpf maps). Most of the time it leads to OOM. This leaves us with either an unnecessary high memory cgroup limit or constantly OOMed profiler on some nodes.

func TestExtractStackDeltasFromFilename(t *testing.T) {
	elf, _ := pfelf.Open("/home/korniltsev/Downloads/clickhouse-common-static-25.11.2.24/usr/bin/clickhouse")
	var data sdtypes.IntervalData
	t1 := time.Now()
	_ = extractFile(elf, nil, &data)
	fmt.Println(len(data.Deltas) * int(unsafe.Sizeof(data.Deltas[0])) / 1024 / 1024) // 128
	fmt.Println(time.Since(t1)) // 2.276861659s
}

The other problem is short lived processes which die by the time or even before we configure profiling for them.
Reported here grafana#37 (comment)

What would be the solution to deal with these cases?

While I agree the #955 proposed configuration is not flexible and does not allow to support many possible configurations, I do believe it's possible to develop a better interface to allow users not wasting resources (at least for the users of the profiler as a library, but for the collector as well).

The cons I see of process filtering

  • It is not idiomatic to the collector's processor architecture.
  • It may complicate the profiler logic a bit and therefore maintenance .

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions