The plugin Package — Optimize¶
1. What costs and what doesn't¶
A surprising amount of plugin-package "performance work" is actually about avoiding repeated costs. The base operations have these rough costs:
| Operation | Typical cost | Dominated by |
|---|---|---|
plugin.Open | 1–10 ms (small plugin), 100–500 ms (huge) | dlopen, type table dedup, init |
(*Plugin).Lookup | 100 ns – 1 µs | Hash map probe in the plugin's symbol table |
| Type assertion of looked-up symbol | ~5 ns | Cache-warm interface conversion |
| Direct call through cached function value | ~5 ns | Function-pointer call, branch predictor friendly |
Interface call through pluginapi.Plugin | ~5–10 ns | One indirection through itab |
The takeaways:
Openis expensive; do it once per plugin per process.Lookupis cheap but not free; cache the result.- Once you have the function value or interface, calls are at native Go speed.
2. Don't Open repeatedly¶
The most common performance mistake is treating plugin.Open like a regular file open.
// BAD — opens the plugin on every request
func handle(req *Request) {
p, _ := plugin.Open("./filter.so")
sym, _ := p.Lookup("Apply")
apply := sym.(func(*Request))
apply(req)
}
Even though plugin.Open is idempotent (the runtime caches the loaded plugin internally), each call still takes a mutex and does a path lookup. Under load you'll see contention and microseconds of overhead per request.
The fix is obvious but worth stating:
// GOOD — open once at startup; cache the function
var apply func(*Request)
func init() {
p, err := plugin.Open("./filter.so")
if err != nil { log.Fatal(err) }
sym, _ := p.Lookup("Apply")
apply = sym.(func(*Request))
}
func handle(req *Request) {
apply(req)
}
3. Cache symbol lookups¶
Lookup returns a fresh interface value every call, and it walks a hash map keyed by the symbol name. For a hot path, do it once.
type loadedPlugin struct {
name string
handle *plugin.Plugin
apply func(*Request) (*Response, error)
flush func() error
}
func load(path string) (*loadedPlugin, error) {
h, err := plugin.Open(path)
if err != nil { return nil, err }
applySym, err := h.Lookup("Apply")
if err != nil { return nil, err }
flushSym, err := h.Lookup("Flush")
if err != nil { return nil, err }
return &loadedPlugin{
handle: h,
apply: applySym.(func(*Request) (*Response, error)),
flush: flushSym.(func() error),
}, nil
}
After this, every call to p.apply(req) is a direct function call — no map lookup, no type assertion.
4. Prefer interface dispatch over many Lookups¶
A plugin with five exported functions can either be looked up symbol-by-symbol, or you can return a single object that implements an interface. The interface form is faster and clearer:
// plugin
type filter struct{}
func (filter) Apply(r *Request) (*Response, error) { ... }
func (filter) Flush() error { ... }
func (filter) Name() string { return "filter" }
func New() pluginapi.Plugin { return filter{} }
// host
sym, _ := h.Lookup("New")
plug := sym.(func() pluginapi.Plugin)()
plug.Apply(req) // single interface call, ~5–10 ns
One Lookup, one type assertion, one interface call per dispatch. Beats five lookups and five assertions.
5. The call-overhead comparison¶
For perspective, here's the rough cost ladder of indirect calls in a Go program:
| Mechanism | Cost per call |
|---|---|
| Direct function call | ~1 ns |
Cached function value (via Lookup) | ~5 ns |
| Interface method call | ~5–10 ns |
reflect.Value.Call | ~500 ns – 1 µs |
exec.Command round-trip | ~1 ms |
| gRPC over UDS | ~50–100 µs |
WASM call (wazero) | ~500 ns – 10 µs depending on shape |
Plugin-package calls sit at the top — they are the fastest "dynamic" dispatch available in Go. That speed is the entire reason to put up with the package's drawbacks.
6. Init time matters¶
Plugin init runs during plugin.Open. If you have a hundred plugins, init time multiplies your startup latency.
| Bad init pattern | Fix |
|---|---|
| Compiling regexes from constants | Use sync.Once to defer until first use |
| Loading large lookup tables from disk | Lazy-load on first call |
| Connecting to a database | Move to an explicit Initialize method called after Open |
| Spawning goroutines | Don't; start them when the host requests it |
A clean plugin's init should be no more than registering a factory in a private registry. Everything else belongs in a method.
7. Lazy loading¶
If you have many plugins and only some get used per request, defer Open until first use.
type lazyPlugin struct {
path string
once sync.Once
pl pluginapi.Plugin
err error
}
func (l *lazyPlugin) get() (pluginapi.Plugin, error) {
l.once.Do(func() {
h, err := plugin.Open(l.path)
if err != nil {
l.err = err
return
}
sym, err := h.Lookup("New")
if err != nil {
l.err = err
return
}
newFn, ok := sym.(func() pluginapi.Plugin)
if !ok {
l.err = fmt.Errorf("%s: bad New signature", l.path)
return
}
l.pl = newFn()
})
return l.pl, l.err
}
This trades startup latency for first-call latency on the rarely used plugins. Pair with a background warmer that loads everything within the first second to keep p99 predictable.
8. Avoid reflection across the boundary¶
Reflection inside a plugin works fine; reflection on a value crossing the boundary does not always work the way you'd expect.
// host
sym, _ := h.Lookup("Config")
cfg := sym.(*Config)
t := reflect.TypeOf(cfg).Elem()
// This is the plugin's *Config type, not necessarily the host's *Config!
Because the plugin allocates the *Config value, reflect.TypeOf returns the plugin's deduplicated type — usually but not always the same as the host's type. If hashes match, the type pointers were unified at Open and everything works. If they don't, reflection-based code paths will surprise you.
The safest pattern: never let the host introspect plugin values reflectively. Use interfaces with explicit methods.
9. Memory layout and false sharing¶
A plugin's globals sit in the plugin's own data segment, not in the host's. For a counter that both sides touch in a tight loop, this means cache lines do not collide with host hot data (good) but the counter does live at a different cache line than other plugin globals (potentially false-sharing with the next plugin global).
For high-throughput counters exported by plugins, pad them like you would any other concurrent counter:
package main
import "sync/atomic"
type padded struct {
_ [64]byte
v atomic.Int64
_ [64]byte
}
var Counter padded
func Inc() { Counter.v.Add(1) }
Then in the host:
The padding survives the boundary because the type bytes are identical on both sides.
10. Benchmarking the plugin boundary¶
A quick benchmark to measure your specific case:
package main_test
import (
"plugin"
"testing"
)
var apply func(int) int
func init() {
p, _ := plugin.Open("./filter.so")
sym, _ := p.Lookup("Apply")
apply = sym.(func(int) int)
}
func BenchmarkPluginCall(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = apply(i)
}
}
Compare with a direct function call of the same body. The difference, on a modern CPU, should be in the low single-digit nanoseconds. If you see microseconds, you've left a Lookup or Open in the hot path.
11. What not to optimize¶
Some things look like opportunities but aren't:
| Idea | Why not |
|---|---|
| Pool plugin handles to "reuse" them | plugin.Open already returns the same *Plugin for the same path; pooling is redundant |
mmap the .so yourself | The dynamic loader already memory-maps it; you can't beat libc here |
Call runtime.GC() after Open to "settle" | The runtime paces itself; forced GC just adds latency |
| Strip symbols from the plugin to "shrink" it | The runtime needs the symbol table for Lookup; stripping breaks the package |
Precompile to a smaller plugin via -ldflags="-s -w" | Sometimes safe, but can break plugin symbol resolution in subtle ways; measure both ways |
The biggest optimization is structural: keep Open and Lookup out of the hot path.
12. Summary¶
The plugin package gives you native-speed dynamic dispatch if you cache Open and Lookup results outside the hot path. Symbol resolution and type assertion are not free — do them once at load time and reuse the cached function value or interface. Keep plugin init minimal so Open itself stays fast. The competitive advantage of this package is the ~5 ns call cost; protect it by avoiding repeated loading, reflection across the boundary, and excessive symbol churn.
Further reading¶
runtime/pproffor profiling plugin calls: https://pkg.go.dev/runtime/pprof- Go performance benchmark guide: https://github.com/golang/go/wiki/Performance
- Broader plugin survey: 08-plugins-dynamic-loading