GEOTechnical SEOAI CrawlersMalaysiarobots.txt

Why Your Malaysian Website Might Be Blocking AI Crawlers Without You Knowing

Many Malaysian business websites are accidentally blocking ChatGPT, Perplexity, and Google AI from reading their content — and the owners have no idea. Here's how to check and fix it.

FI
Founder & GEO Consultant at SeenBy Digital — helping Malaysian businesses get recommended by ChatGPT, Perplexity, and Google AI Overviews. All articles → LinkedIn →

There’s a scenario that comes up regularly when SeenBy Digital audits Malaysian business websites.

The business has decent reviews. Their Google Business Profile is reasonably complete. Their website looks professional. But AI tools like ChatGPT and Perplexity either don’t know they exist or describe them vaguely and inaccurately.

In many of these cases, the root cause is the same: the website is blocking AI crawlers. Not deliberately — the business owner has no idea it’s happening. But the effect is the same as putting a “Do Not Enter” sign on your front door and then wondering why no customers are coming in.

This post explains how AI crawler blocking happens, how to check if it’s affecting your website, and exactly how to fix it.


How AI Tools Read Your Website

Before understanding how blocking happens, it helps to understand how AI tools access websites in the first place.

AI platforms like ChatGPT (OpenAI), Perplexity, Google AI Overviews (Google), and Claude (Anthropic) each send automated bots — called crawlers or spiders — to read websites across the internet. These bots request your web pages just like a browser would, read the text content, and feed that information back to the AI system.

Each AI company operates its own crawler with a distinct name:

AI PlatformCrawler Name
OpenAI / ChatGPTGPTBot
PerplexityPerplexityBot
Google AI OverviewsGooglebot / Google-Extended
Anthropic / ClaudeClaudeBot / anthropic-ai
Common Crawl (used by many AI systems)CCBot

For any of these crawlers to read your website, two things must be true: they must be allowed to access it, and they must be able to parse the content when they get there.

Both of these can go wrong — often without the website owner realising it.


The Main Ways Malaysian Websites Block AI Crawlers

1. A restrictive robots.txt file

robots.txt is a plain text file at the root of your website — accessible at yourdomain.com/robots.txt — that tells crawlers what they can and cannot access. Google Search Central publishes the authoritative specification for how this file works and which directives crawlers are expected to respect.

It was originally designed for search engine crawlers. Many website templates, security plugins, and developers include robots.txt configurations that block certain crawlers by default. If your robots.txt was set up years ago, it almost certainly doesn’t account for AI crawlers that didn’t exist at the time.

The most common blocking pattern looks like this:

User-agent: *
Disallow: /

This tells every crawler — including all AI bots — that nothing on your website can be accessed. It’s the most extreme form of blocking and completely prevents AI from reading your site.

A slightly less aggressive but still problematic version:

User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

This specifically blocks OpenAI’s crawler and Common Crawl, which is used as training data by many AI systems.

Both of these are common configurations that Malaysian businesses have on their websites right now — often added by a developer or plugin years ago and never reviewed.

2. Cloudflare and security plugin settings

Many Malaysian websites use Cloudflare for performance and security — and Cloudflare has bot protection settings that can block AI crawlers. If your Cloudflare is configured to block “bad bots” or uses an aggressive security level, it may be treating AI crawlers as threats and blocking them before they ever reach your website.

Similarly, security plugins on WordPress sites — like Wordfence or iThemes Security — sometimes have settings that block unknown or non-whitelisted bots. AI crawlers, being newer, often fall into this category.

The website owner sees no sign of this. The security tools are doing exactly what they were configured to do. The AI crawlers just can’t get in.

3. JavaScript-heavy websites that AI can’t parse

This is a subtler form of blocking — not an access issue but a readability issue.

Some websites are built almost entirely in JavaScript. The page loads in a browser through client-side rendering — meaning the content only appears after JavaScript executes. AI crawlers often don’t wait for JavaScript to execute. They request the page, receive a near-empty HTML shell, and conclude there’s almost no content on your site.

This is increasingly common with websites built using modern JavaScript frameworks. The site looks beautiful in a browser. To an AI crawler, it’s essentially blank.

4. Noindex meta tags on key pages

The noindex meta tag tells crawlers not to include a page in their index. It was designed to keep certain pages (like admin pages or duplicate content) out of Google’s search index.

But if noindex has been applied broadly — to your homepage, service pages, or about page — AI crawlers that respect this tag will skip those pages entirely. Your most important content goes unread.

5. Login walls and access restrictions

Some Malaysian business websites require a login to view certain content — membership sites, client portals, or restricted sections. AI crawlers cannot log in. Any content behind a login wall is invisible to AI.

This is usually intentional for restricted content. The problem arises when key business information — service descriptions, pricing, team profiles — is accidentally placed behind a login requirement, or when a staging environment is accidentally indexed instead of the live site.


How to Check if Your Website Is Blocking AI Crawlers

Step 1: Check your robots.txt

Visit yourdomain.com/robots.txt in your browser. You’ll see a plain text file.

Look for any of the following:

Complete block (everything disallowed):

User-agent: *
Disallow: /

Specific AI crawler blocks:

User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

If you see any of these, your website is blocking AI crawlers.

A correctly configured robots.txt that allows AI crawlers looks like this:

User-agent: *
Allow: /

User-agent: GPTBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: CCBot
Allow: /

Step 2: Check your Cloudflare settings

If your website runs through Cloudflare, log in to your Cloudflare dashboard and check:

  • Security level — if set to “High” or “I’m Under Attack,” it may be blocking legitimate bots including AI crawlers
  • Bot Fight Mode — if enabled, check whether it’s blocking crawlers you want to allow
  • Firewall rules — look for any rules that block bots by user agent

If you’re not sure how to read these settings, ask your web developer to review them with AI crawler access in mind.

Step 3: Test how AI sees your page

A practical way to check how an AI crawler sees your website is to disable JavaScript in your browser and then visit your homepage. What you see without JavaScript is roughly what many AI crawlers see.

If your page appears blank or contains almost no text, your site has a JavaScript rendering issue that AI crawlers may be unable to work around.

You can also use Google Search Console’s URL Inspection tool to see how Googlebot renders your page — a reasonable proxy for how AI crawlers handle it.

Step 4: Check for noindex tags

View the source code of your homepage (right-click in your browser, select “View Page Source”) and search for:

noindex

If you find <meta name="robots" content="noindex"> on your homepage or key service pages, those pages are being excluded from crawler indexes — including AI ones.


How to Fix It

Fix 1: Update your robots.txt

Edit your robots.txt file to explicitly allow AI crawlers. If you’re not sure how to edit it, ask your web developer — it’s a 10-minute task.

Replace any blocking rules for AI crawlers with allow rules. At minimum, allow: GPTBot, PerplexityBot, Google-Extended, anthropic-ai, and CCBot.

Fix 2: Review your Cloudflare and security plugin settings

If you use Cloudflare, review your bot protection settings and whitelist major AI crawler user agents. The specific crawlers to allow are listed above.

If you use a WordPress security plugin, check its bot blocking settings and ensure AI crawlers are not caught in your block rules.

Fix 3: Add an llms.txt file

Even with proper crawler access, an llms.txt file gives AI tools a clear, structured summary of your website — bypassing the need to parse complex page layouts. As we covered in our llms.txt guide, this file is one of the fastest GEO improvements available and takes less than an hour to implement.

Fix 4: Address JavaScript rendering issues

If your website is JavaScript-heavy and AI crawlers can’t read it, the solutions range from simple to complex:

  • Ensure your most important content (business description, services, contact details) is present in the static HTML — not loaded dynamically
  • Consider server-side rendering for key pages
  • At minimum, make sure your meta description, page title, and schema markup are in the static HTML and not dependent on JavaScript to render

Fix 5: Remove noindex from key pages

If you’ve found noindex tags on pages that should be publicly indexed — your homepage, service pages, about page — remove them. This is a quick code change that immediately opens those pages to crawler access.


A Quick Priority Order

If you’ve just checked your website and found blocking issues, fix them in this order:

  1. robots.txt first — it’s the front door. If it’s locked, nothing else matters.
  2. Cloudflare / security settings second — common cause of silent blocking.
  3. noindex tags third — easy to find and remove.
  4. JavaScript rendering last — more involved, but important if your site is heavily JavaScript-dependent.

After fixing these, add an llms.txt file to give AI crawlers the clearest possible understanding of your business once they can access your site.


Crawler access is one of the five dimensions SeenBy Digital checks in every GEO audit — and one of the most common issues we find on Malaysian business websites. If you’re not sure whether your site has a blocking problem, the easiest way to find out is to get a proper audit.

Get your free GEO audit from SeenBy Digital →

We’ll check your crawler access, your schema markup, your content, and your brand authority — and tell you exactly what’s holding your AI search visibility back.

Want to know your GEO score?

Get your free GEO score →