Hey HN,
I’m the creator of SemanticsAV. This project has been a long time coming, and I’m thrilled (and terrified) to finally share it with you.
A few years ago, I was designing ML-based malware detectors for a security firm, hitting top scores at major AV tests. I then left the industry for a while to work in CV/NLP and saw AI advancing at lightning speed.
Looking back, I was shocked that malware detection was still stuck in the past, fundamentally chained to the 1990s model of signature databases. Every vendor claims “AI-powered,” but for most, it’s just a thin layer on top of the same old signature game.
This isn’t just a tech problem—it’s an economic gate. The signature model means only those with massive data collection budgets can compete, forcing high prices. The result is that the entire Linux ecosystem, the backbone of the internet, has been stuck with ClamAV, a respectable but aging project, as its only real general-purpose open-source option for decades.
I consider this a structural failure, so I decided to build a solution from first principles.
My goal was to prove that a true end-to-end AI approach could replace signatures entirely, slash maintenance costs, and deliver top-tier performance without harvesting user data.
This is SemanticsAV:
– AI-Native, Signature-Free: We replaced the slow, expensive, and fallible work of human signature creation with a single, end-to-end AI. It learns directly from raw binary architecture to discover its own brutally effective patterns, achieving a level of speed, accuracy, and economic efficiency that human-guided systems simply cannot match.
– Free for Linux, Forever: The scanner is perpetually free for all commercial uses on Linux, requiring only attribution. To maintain top-tier performance against emerging threats, we periodically release ultra-lightweight AI models (typically <5MB per file type). These updates are downloaded on-demand via the open-source CLI, ensuring the core engine remains 100% offline during scans.
– Trust Through Verifiable Architecture: The core engine (SDK) is a closed-source binary, but it is architecturally incapable of networking. This isn’t a claim you have to trust; it’s a fact you can verify. Run it behind a firewall or with any network monitor, and you will see zero outbound connections from the SDK. All legitimate network activity is handled exclusively by the MIT-licensed open-source CLI, which you can audit line by line.
– Privacy by Design(Offline-First, Online-Optional): The free scanner is 100% offline by design. For deeper threat attribution, you can choose to enable our paid Cloud Intelligence service. Even then, we don’t want your files. The SDK extracts a tiny (~15KB) encrypted “architectural fingerprint,” and the open-source CLI then transmits it for analysis. This fingerprint is a one-way transformation; the original file is never sent and cannot be reconstructed from it. This service exists to solve the AI’s black box problem by showing you the evidence behind a verdict.
Current Status & The Ask:
The platform currently supports PE and ELF files, with more formats on the roadmap. My goal is for SemanticsAV to become the standard, foundational malware scanner for the entire Linux ecosystem, integrated into other great open-source security tools.
But here’s the honest truth: I’m an engine developer, not an open-source maintainer. I’ve spent years obsessed with the core tech, but I’m a novice at building a community. I’m sure the integration experience has rough edges, the CLI could be better, and the documentation has holes.
This is where I need your help. I’m looking for your brutal, honest feedback. Tell me what’s broken, what’s confusing, and what’s missing. I’m here to learn.
Thank you for your time.
Website: https://www.semanticsav.ai/
GitHub: https://github.com/metaforensics-ai/semantics-av-cli
Comments URL: https://news.ycombinator.com/item?id=45822352
Points: 1
# Comments: 0
Source: github.com
