Ordinary scrapers need to be programmed separately for each website (to target their unique HTML structure or textual content).
UXO uses machine learning to understand what each part of a page “is”, and assigns a semantic label to each section or converts the whole page to a simplified semantic XML document.
Scrapers that work on UXO-detected labels will work on all websites without modification.
<menu>
<search-filters>
<search-filter active="1">Indoors</search-filter>
<search-filter active="0">Outdoors</search-filter>
...
</search-filters>
<categories>
<category href="...">Flowers</category>
<category href="...">Plants</category>
<category href="...">Dried flowers</category>
...
</categories>
</menu>
<products-list>
<product id="29974">
<product-title>Divine Garden</product-title>
<product-description>Take a stroll through this Divine Garden filled with the scent of brandy orange Roses, Alstroemeria, purple Lisianthus, Stocks, and a sprinkle of pink Statice and Antirrhinum. What floral dreams are made of.</product-description>
<product-price>$59.99</product-price>
</product>
<product id="85535">
<product-title>Magic Moments</product-title>
<product-price>$34.99</product-price>
</product>
...
<pagination>
<page href="...">1</page>
<page href="...">2</page>
<page href="...">3</page>
<page href="...">4</page>
</pagination>
</products-list>
UXO interacts with websites in the exact same way as humans do: by using a web browser to view, click, drag, and type on the page. This ensures UXO can extract information from JavaScript-heavy and single page applications, which are plentiful these days, and can avoid being detected or confused by anti-bot techniques.
UXO is in active development, be first to get access and leave us your e-mail: