On this page

Your analytics
sees 60% of traffic.
The rest is invisible.

GA, Plausible, Fathom — they all miss AI crawlers, SEO bots, link preview fetchers, and automated scrapers. Because none of them run JavaScript.

Logwick reads your raw server logs and classifies every single request: real users, AI agents, social previews, SEO tools, security probes, and more.

No tracking snippet. No SaaS account. No data leaving your server.

View docs →GitHub →

Open-source · AGPL-3.0 · Node.js 20+ · SQLite on disk · Zero external calls

Three types of traffic your dashboard will never show you

Logwick dashboard: summary metrics, sessions over time, and L1 traffic categories including AI, link preview, and SEO intel.
Logwick UI: session traffic explorer by L1 category (screenshot from the open-source repo).
01

AI agents crawling your content

GPTBot, ClaudeBot, Perplexity, Common Crawl, Google Vertex, Amazon Nova, DuckAssistBot — they read your site for training, AI search, and citations. Your GA shows: 0 sessions. Because they do not run JavaScript.

02

Someone sharing your link in Slack

Every time your URL is posted in Slack, Telegram, WhatsApp, LinkedIn, Discord, or X, bots fetch the page to build a preview. Your GA shows: nothing. Server logs show who, when, and how often.

03

Ahrefs, Semrush, Screaming Frog on your site

SEO intel bots map your graph and audit structure continuously. Your GA shows: silence. Your logs show every crawl, timestamped.

Raw HTTP logs are the only place
to catch all of this.
Logwick does exactly that.

Everything Logwick can see — classified, not guessed

Logwick uses a multi-phase engine (UA patterns, path heuristics, transfer-size signals) and maps requests to a stable taxonomy: L1 category → L2 subcategory → traffic family. Below is what it detects.

AI traffic (L1: ai)

User-initiated fetches, AI search indexers, training crawlers, shopping assistants, answer engines, and cloud agents—classified from User-Agent and behaviour signals.

WhatExamplesWhy it matters
User-initiated AI fetchesGPT user browse, Perplexity user fetch, NotebookLM, Anthropic user flowsSomeone asked an AI about you; it fetched your page to cite or summarize it.
AI search indexersOpenAI SearchBot, PerplexityBot, AzureAI-SearchBot, Anthropic searchAI-native search engines building an index from your content.
AI training corpus crawlersCommon Crawl, Bytespider (ByteDance), AI2, OpenAI corpus patternsYour pages may enter model training or dataset pipelines.
AI shopping assistantsAmazon Nova, Buy for Me, AMZN-SearchBot, AmazonBot-VideoCommerce AI checking product detail pages and media.
AI answer crawlersDuckAssistBotAnswer engines pulling page text for responses.
Cloud AI agentsGoogle Vertex Agent and similar enterprise fetchersAutomated pipelines reading your site inside customer workflows.

SEO and rank intelligence (L1: seo_intel)

WhatExamplesWhy it matters
Link graph mappersAhrefs, Majestic, BarkrowlerBacklink databases—your site is part of competitive intel.
Rank auditorsSemrush, Screaming Frog, Sitebulb, OnCrawl, LumarCrawl depth and frequency reveal audits and competitor interest.
SERP data APIsDataForSEO and similar brokersThird parties reselling crawl-derived signals.
LLM visibility checkersLLM-Discoverability-Checker, Amplitude AI Visibility BotBots measuring how you appear to AI systems—not classic SEO alone.

Link previews and social sharing (L1: link_preview)

Most classic analytics never attribute unfurl traffic. Logwick separates social feeds, messengers, embed crawlers, and generic preview tools.

WhatExamplesWhy it matters
Social platform unfurlersFacebook, LinkedIn, X/Twitter, Pinterest, Reddit, TikTokYour URL was shared in a public feed—a real distribution signal.
Messenger previewsSlack, Telegram, WhatsApp, Discord, SkypePrivate and channel shares generate repeatable preview fetches.
Embed and oEmbed crawlersEmbedly, iframelyYour content is being embedded elsewhere.
Generic link unfurlerstool_linkpreview, URL preview servicesThird-party apps validating links outside major platforms.

For content teams: A spike in Telegram preview bots on one article often means it spread in a private channel—something JavaScript analytics will not show.

Classic search engines (L1: search_index)

WhatExamples
Global majorsGooglebot, Bingbot
Regional searchYandex, Baidu, Naver, Petal (Huawei), Coccoc
Alt searchDuckDuckGo, Mojeek, Qwant, Seznam, Exabot

Security and attack traffic (L1: security)

WhatExamplesWhy it matters
Active probes and attackssqlmap, Nikto, Masscan, Burp Suite signaturesDetect when someone is actively probing your stack.
Env and secrets scanners/.env, wp-admin, CMS exploit pathsCredential harvesting and CMS probes visible in paths.
Measurement servicesCensys, Shodan, QualysInternet-wide scanning—different from targeted hostile probes.

Phase A: Even if an attacker spoofs a GPTBot User-Agent, path-first rules can still classify /.env-style probes as security/attack, not AI.

Infrastructure and monitoring (L1: infra_monitor)

Uptime monitors, CDN health checks (Cloudflare, Route53, Stackdriver), Atlassian Statuspage probes—split out from human and marketing bot traffic so they do not inflate bot counts.

Dev automation (L1: dev_automation)

curl, wget, Python requests, Go net/http, Node scripts, load generators, Lighthouse and PageSpeed audits, CMS cron jobs—classified so you can tell developer tooling from real users.

Feed readers (L1: feed_sync)

RSS aggregators and feed readers—who pulls your feeds and how often.

Archive bots (L1: archive)

Internet Archive / Wayback Machine and Heritrix-style crawlers.

Commerce crawlers (L1: commerce_crawl)

General Amazonbot catalog crawl—distinct from Amazon AI shopping assistants above.

Ads and marketing bots (L1: ads_marketing)

Google Ads (AdSense, Mediapartners, StoreBot), Microsoft AdIdxBot, Meta Ads crawlers, Google Read-Aloud, privacy-preserving prefetch—separate from organic users.

Unknown clients (L1: unknown_client)

Traffic that did not match a bot rule—real browsers, unrecognized apps, or automation without a known pattern. Human traffic stays identifiable versus labeled bots.

Phase D — transfer size gate: Browser-like User-Agents with abnormally small bodies for a URL can be flagged as suspect light fetches. Baselines and guards reduce false positives on lightweight or cached responses.

Drop your logs in. Get the full picture in seconds.

Input

Your edge / CDN (Nginx, Caddy, Cloudflare, Fastly, any JSONL writer)

Phase A

Path and method attack rules

Phase B

UA rules (AI → archive → crawler → other bot)

Phase C

Meta-path policy (/robots.txt, /llms.txt, sitemap.xml)

Phase D

Transfer-size gate (browser UA + tiny body → suspect)

Output

Sessionize → SQLite → dashboard at 127.0.0.1:5173

Three commands to start

bash
Read-Only
npm run process -- --config config/process.example.json \
--target-id demo \
--db data/analytics/http-analytics.db \
--input path/to/access.jsonl
npm run dashboard-api -- --db data/analytics/http-analytics.db --port 8787
npm run dashboard-ui:dev

Then open http://127.0.0.1:5173 for the dashboard UI. Full setup: repository docs/getting-started.md.

Who needs this — and what questions it answers

SEO and content teams

Is GPTBot indexing new articles? Which AI search engines cite you? Is Ahrefs daily or weekly? Did a post explode in Telegram channels?

Infra and SRE

Is someone scanning .env and wp-admin? Which monitors hit you? Is the 3 AM spike bots or users—without shipping logs to a vendor?

Marketing and growth

Which platforms unfurl your links? Are URLs shared inside Slack workspaces? Preview spikes are virality leads GA never shows.

Indie hackers and small sites

Full traffic picture without GA, Plausible, or a client snippet. One SQLite file on a laptop—no account or billing.

SaaS and API products

Is pricing scraped every ten minutes? Is a competitor running Screaming Frog on your docs? HTTP facts, not pageview guesses.

How we think about this

Your logs never leave your machine

Processing is local. SQLite on disk. Dashboard on 127.0.0.1. Logwick has no telemetry, no update pings, and no cloud sync. You define retention, access, and deletion.

Rules you can read and extend

Classification is not a black box. The YAML registry is human-readable. Phases, thresholds, and path policies are documented and configurable, with codegen to compile rules.

First match wins, with explicit precedence

Phase A beats Phase B. A GPTBot User-Agent hitting /.env is classified as a security attack, not an AI crawler. The taxonomy is explicit, not probabilistic.

We document the limits

Some bots spoof browsers. Lightweight sites can skew Phase D baselines. The docs state blind spots so you know what to trust.

No client-side compromise

A tracking tag costs performance, creates GDPR surface area, and ties you to ad networks. If your edge writes JSONL, Logwick needs none of it.

Built to stay small, auditable, and dependency-light

AreaStack
RuntimeNode.js 20+, npm workspaces, ES modules
PipelineJSONL → CLI process → SQLite (better-sqlite3)
ClassificationYAML registry → generated rules, JSON Schema / Ajv validation
Geo enrichmentMaxMind GeoLite2 MMDB — optional, fully offline
DashboardVite + React + read-only JSON API (Node http)
Testingnode --test, ESLint 9, codegen parity checks

No database server. No message queue. No cloud calls. The pipeline runs on a laptop or small VPS. Copy the SQLite file, back it up, or query it with any tool you already use.

Open source. Commercially licensed.

AGPL-3.0 — free for open-source and personal use. Commercial or closed-source use requires a commercial license.

OfferingWhat you get
Commercial licenseUse without AGPL obligations inside your company or product.
Extended detection signaturesBroader bot and AI pattern library on a release cadence.
Integration and consultingPipeline design, JSONL adapters, rule authoring, training.
Custom supportPriority responses, bug fixes, and feature requests.

Contact for commercial license

Tell us how you want to use Logwick inside your product or organization. We will respond from Bratislava on business days.

Common questions

Yes. Anything that produces JSONL access logs works. Logwick does not ship logs for you—you point the CLI at a file on disk.

No—and it is not trying to. GA is for human behaviour, funnels, and conversions. Logwick answers who made each HTTP request, including traffic that never runs JavaScript. Many teams use both.

HTTP logs can contain personal data such as IP addresses and User-Agents. Logwick processes them locally; nothing is sent to our servers. Retention, access control, and legal basis are your responsibility—consult your DPO for your setup.

Yes. The taxonomy includes families such as OpenAI training corpus, Anthropic training corpus, ByteDance Bytespider, and Common Crawl under the AI / vendor training corpus branch when those User-Agents and patterns match.

A spike in link_preview / messaging / telegram sessions on a specific URL versus your baseline is a strong signal. Compare time series and session lists in the local dashboard.

Yes. Edit the traffic family registry in the repository, run the documented codegen step for traffic rules, and the engine picks up the compiled patterns. See the project docs for the exact commands and file paths.

Deploy Logwick on your infrastructure

Need help with log adapters, classification rules, or a production rollout? Tell us what you run—we reply on business days.

Logwick: Bots, AI Crawlers & Previews Your GA Misses