There are numerous good reasons you might will need to find all of the URLs on an internet site, but your precise objective will figure out Whatever you’re attempting to find. For illustration, you might want to:
Discover each and every indexed URL to analyze issues like cannibalization or index bloat
Obtain recent and historic URLs Google has noticed, specifically for site migrations
Come across all 404 URLs to Get better from put up-migration mistakes
In Each individual state of affairs, an individual Software won’t Provide you with anything you require. Unfortunately, Google Search Console isn’t exhaustive, as well as a “web site:example.com” search is limited and hard to extract knowledge from.
During this post, I’ll stroll you through some instruments to build your URL list and prior to deduplicating the info employing a spreadsheet or Jupyter Notebook, depending on your internet site’s sizing.
Aged sitemaps and crawl exports
Should you’re on the lookout for URLs that disappeared with the Reside web page just lately, there’s a chance a person on your own staff may have saved a sitemap file or perhaps a crawl export before the changes were being built. If you haven’t now, check for these data files; they could frequently provide what you need. But, when you’re studying this, you almost certainly did not get so lucky.
Archive.org
Archive.org
Archive.org is a useful tool for SEO tasks, funded by donations. If you look for a site and select the “URLs” choice, you'll be able to access nearly 10,000 detailed URLs.
On the other hand, There are several constraints:
URL limit: You may only retrieve nearly web designer kuala lumpur ten,000 URLs, which is inadequate for larger sites.
Quality: Many URLs may be malformed or reference source files (e.g., visuals or scripts).
No export solution: There isn’t a developed-in solution to export the list.
To bypass The dearth of an export button, use a browser scraping plugin like Dataminer.io. Even so, these constraints mean Archive.org may not provide an entire Alternative for larger sites. Also, Archive.org doesn’t suggest whether or not Google indexed a URL—however, if Archive.org observed it, there’s an excellent opportunity Google did, way too.
Moz Professional
Although you might generally make use of a backlink index to locate external web sites linking to you, these resources also learn URLs on your website in the procedure.
Tips on how to use it:
Export your inbound links in Moz Professional to get a speedy and simple listing of focus on URLs from your web page. Should you’re working with a huge Web-site, consider using the Moz API to export details over and above what’s workable in Excel or Google Sheets.
It’s crucial that you Be aware that Moz Professional doesn’t ensure if URLs are indexed or uncovered by Google. However, considering that most web pages apply the exact same robots.txt guidelines to Moz’s bots as they do to Google’s, this method normally will work perfectly as a proxy for Googlebot’s discoverability.
Google Search Console
Google Look for Console features quite a few precious sources for creating your list of URLs.
Links experiences:
Just like Moz Pro, the Links portion delivers exportable lists of goal URLs. Unfortunately, these exports are capped at one,000 URLs Each and every. You can implement filters for distinct pages, but considering that filters don’t utilize for the export, you may perhaps have to depend upon browser scraping applications—restricted to five hundred filtered URLs at any given time. Not ideal.
Efficiency → Search Results:
This export provides you with a summary of webpages getting lookup impressions. Whilst the export is limited, You should use Google Search Console API for larger datasets. Additionally, there are totally free Google Sheets plugins that simplify pulling extra intensive info.
Indexing → Webpages report:
This portion supplies exports filtered by problem form, even though these are also constrained in scope.
Google Analytics
Google Analytics
The Engagement → Internet pages and Screens default report in GA4 is an excellent supply for accumulating URLs, which has a generous limit of 100,000 URLs.
Better still, you may utilize filters to create various URL lists, efficiently surpassing the 100k limit. As an example, if you need to export only blog site URLs, comply with these methods:
Move one: Insert a segment towards the report
Phase two: Click “Create a new section.”
Phase three: Outline the phase using a narrower URL sample, including URLs that contains /weblog/
Observe: URLs located in Google Analytics may not be discoverable by Googlebot or indexed by Google, but they supply beneficial insights.
Server log information
Server or CDN log information are Probably the last word Instrument at your disposal. These logs seize an exhaustive listing of every URL route queried by people, Googlebot, or other bots in the recorded interval.
Factors:
Facts measurement: Log information is usually substantial, so many web pages only retain the last two weeks of information.
Complexity: Analyzing log information might be hard, but many applications are available to simplify the procedure.
Blend, and superior luck
Once you’ve collected URLs from every one of these resources, it’s time to combine them. If your web site is small enough, use Excel or, for larger datasets, equipment like Google Sheets or Jupyter Notebook. Ensure all URLs are persistently formatted, then deduplicate the listing.
And voilà—you now have an extensive list of recent, outdated, and archived URLs. Superior luck!