Magika is a tool that leverages cutting-edge deep learning to enhance file type detection. It operates with over 99% average precision and recall, and quickly processes files even on a single CPU. It supports over 120 content types and offers a superior accuracy boost compared to traditional tools. A demonstration is available that showcases classification happening entirely in the user's browser, ensuring privacy. It's available for command line use by installing a Python package, and can be integrated with Python or JavaScript code. A related paper will be released with details on its training and dataset performance.
Magika is a deep learning-based tool for detecting file content types with high accuracy.
Magika boasts over 99% precision and recall on its test dataset across 120+ content types.
The Magika project can be viewed on GitHub.
No, Magika processes files entirely in the browser, ensuring no files are uploaded.
You can install Magika as a Python package or integrate it with your JavaScript code.
Magika achieves high precision and recall across numerous file types, with precision often at 99% or higher.
Magika can only predict one content type per file, and might have limitations concerning polyglot files.
Magika runs its underlying model inference in 5/6ms, even on a single CPU.
Magika's team will release a detailed paper on the model's training and performance.
Magika is licensed under Apache-2.0.