What methods should we implement to detect malicious content?

Methods are going to depend a lot on what is an acceptable false positive rate, false negative rate, how much resources are available for training and for live detection (and how much of a delay is acceptable). At least ballpark figures are needed.

I ran into typosquatting of python packages at a customer site (local pypi) last year, and I’ve gone from regex and soundex to building models to detect this, but I suspect something lightweight would be needed for real-time.

1 Like