Skip to main content
Version: 1.0.2

Website footprinting

  • Hackers can map the entire website of the target without being noticed
  • Gives information about:
    • Software
    • Operating system
    • Subdirectories
    • Contact information
    • Scripting platform
    • Query details

Web spiders

  • Programs designed to help in website footprinting
  • Methodically browse a website in search of specific information.
  • Information collected this way can help attackers perform social engineering attacks.
  • Reveals what software that is running on the server and its behavior
  • Possible to identify the scripting platforms.

Examining website headers

  • By examining the website headers, it is possible to obtain information about:
    • Content-Type
    • Accept-Ranges
    • Connection Status
    • Last-Modified Information
    • X-Powered-By Information
      • E.g. ZendServer 8.5.0,ASP.NET
    • Web Server Information
      • Server header can give you e.g. Apache Server on CentOS
  • You can also analyze what website pulls
    • In debugging developer tool of most browsers (ctrl+shift+c) network section
    • For each request you can see remote IP address, and response headers for further analysis.

Source code examination

Comment analysis

  • Possible to extract information from the comments
  • In most of browsers you can right click and how source
  • Walkthrough
    • In almost any browser: Right click => Show source
    • Check for HTML <!-- comment --> or JavaScript // comment comments
    • They are skipped by interpreters and compilers, only for human eyes
    • They can be instructions for other developers, notes for themselves
      • E.g. this library won't work as this element is not supported
        • Gives you clues about what technology (frameworks, languages) they use in the background
  • Html links: href=cloudarchitecture.io
  • Gain insight into the file system structure
  • You can find e.g. a caching server and check vulnerabilities for that caching server.

Cloning websites

  • Also called website mirroring
  • Helps in
    • browsing the site offline
    • searching the website for vulnerabilities
    • discovering valuable information and metadata.
  • Can be protected with some detections based on e.g. page pull speed, behavior, known scrapers, AI.
  • 💡 Good tool for setting up fake websites.
    • E.g. manually recreate login pages
    • If you control the DNS you can do a redirect.
  • Allows you to save social media pages with this however most are protected, and illegal to clone.
  • Website monitoring tools can send notifications on detected changes.
  • 💡 Protection against fake websites
    • Always check domain name for misspelling
    • Make sure it's HTTPS, if it's not the data can be sniffed easily
      • Protects against someone taking over DNS
      • If the other part does not have the certificate, browser does not accept communication
    • Check SSL certificate authority, if it's changing, it can prompt a question.
      • Certificates expire usually in a year.

Website cloning tools

  • httrack
    • httrack https://testwebpage.com to copy
  • 📝 wget
    • Basic utility that can be used for mirroring website
  • Or one could manually copy paste source code of HTML + CSS

Extracting metadata

  • You can extract metadata of files (e.g. images) from a webpage
  • Metadata can include
    • Owner of the file
    • GPS coordinates (images)
    • File type metadata
      • 🤗 Linux does not work with extensions e.g. .pdf but checks for the metadata.
      • Helpful as you will not be fooled by the extension

Tools for extracting metadata

  • hexdump
    • Dump file as ASCII and inspect manually
    • E.g. hexdump -C TEST_DOCUMENT.docx
    • ❗ Not recommended as it's pretty hard to extract information from binary.
  • ExifTool
    • Reads + writes metadata of audio, video, PDF, docs etc.
    • E.g. exiftool TEST_DOCUMENT.docx would return something like Microsoft Office Word, Version: 16.0
  • 📝 Metagoofil | Google hacking tool
    • Search for files that may have metadata for a website using Google and dump their metadata.