Basic audit setup

As a bare minimum, these are the fields that you need to complete »

Setup target website to be audited:
Enter domain name, making sure to include the http / https protocol.
The more pages you audit on the site, the more useful insight you will get.

Advanced audit setup

Advanced options that give you greater control of the audit, or data scrape that you are setting up »

Advanced options
Should the audit be restricted to a particular language? Pulls the language from the html tag e.g. <html lang="en">
i.e. the number of links away from the initial target page
Influences the order in which pages are crawled. Note: depth limit cannot be set for random crawl.
Should the crawler obey what is in the target sites Robots.txt? You need a good reason not to have this checked...
i.e. links that have rel="nofollow" set on the <a>
Assumes that the XML Sitemap exists at <domain.tld>/sitemap.xml. Note: full audits only.
Extract links from <frame>'s that are discovered in the markup.
Extract links from <iframe>'s that are discovered in the markup.
Extract links from <area>'s that are discovered in image maps in the markup.
Pings all discovered external links to test that they work (i.e. return a HTTP 200 code).
Pings all discovered image src paths to test that they exist, and to determine other meta information.
If running a report against a previously audited site, should the tool regenerate the images.
Run images through the AWS Rekognition service.
Run text through the AWS Comprehend service.
If the target url starts with a particular path e.g. /en/ then restrict the audit to that path by adding 'en'. Comma separate multiple terms.
Limit the length of the url to a certain number of characters. Max 2000 chars.
i.e. ignore paths that begin with domain.com/xyx. Use a comma separated list for multiple terms.
i.e. ignore paths that contain string at any point in the whole url. Use a comma separated list for multiple terms.
i.e. everything after the start of the query ("?")
Accept cookies from the site.
Some sites have illegal characters (non SGML characters) that can cause problems with an audit.
Set to off by default as it takes ages to calculate.
Set to off by default as it takes ages to calculate.
If the audit encounters any HTTP authentication challenges, use these credentials.

SEO / Link report required?

Do you require a report on the site linking structure and SEO issues?

Link structure and SEO analysis
Heading type, and content
Image src, target, alt
strong, em, b, i
object, svg, canvas
Script src
Stylesheet href, type, rel
Href, title, rel, value, target
Looks for any email addresses on the site.
title, value, markup

Capture Data?

Use CSS or XPath expressions to locate the data on a web page. There is a Firefox extension called FirePath that bolts on to FireBug that helps you determine the xPath location of a piece of content.

Scrape data using xPath / CSS expressions
Define data extract points.

Set all to CSS | xPath

i.e. only collect data where the path starts with domain.com/xyx

When you are ready...

Before you hit "Begin Audit", ensure that:

  • You aren't carrying out the audit against anything that you aren't meant to
  • You have a set an appropriate number of pages to audit so that you get meaningful output
  • You have configured any advanced options to fine tune what your audit will report on