Lesson 7 of 9

DOM Scraping, The Safe Way

Sometimes the value is in none of the good places — not the dataLayer, not the URL, not a cookie. It is only printed on the page. Then you scrape the DOM. This is a legitimate tool, but it is the most fragile one you have: it breaks the moment a developer changes the markup, an A/B test swaps the layout, or the page renders in another language. Treat it as a last resort and make it as robust as you can.

Choose stable selectors

The order of preference, most stable first:

data-* attributedata-plan→id#order-total→semantic class.price→nth-childbrittle

A data-* attribute put there for tracking is gold — it exists to be read and rarely changes. A positional selector like "the third div" is the opposite: it works today and silently breaks on the next redesign.

Read defensively, then normalise

Assume the element might be missing and the text might be messy. Never let a scrape throw — a broken variable can take other tags down with it. Return undefined and move on.

function () {
  var el = document.querySelector('[data-plan-price]');
  if (!el) return undefined;                       // guard: element may be gone
  var raw = el.textContent || '';                  // "  $1,499.00 "
  var num = parseFloat(raw.replace(/[^0-9.]/g, '')); // 1499
  return isNaN(num) ? undefined : num;             // never return NaN
}

Then test it in every state that matters: logged in and out, on sale and not, empty cart and full, and in any other locale the site serves. A scrape that only works for the happy path is a slow-motion data quality bug.

Key takeaway

Scrape only when the value lives nowhere better. Prefer data-* attributes over ids over classes over positions, guard against missing elements, normalise the text to a clean value, never throw, and test across states and locales.