This article describes a process for extracting VDP data from a dealership’s website. Of course, the process is not limited to car dealer websites. It can be applied to any type of website.
Download Chrome (Browser)
If you’re already using Chrome, skip ahead and create an account at KimonoLabs. If you’re not using Chrome yet, download and install it.
Kimono Labs is a web data extraction service. A free account allows you to fetch data from up to 20,000,000 web pages. Sign up here.
Install the Kimono Labs Chrome extension. (Also free.)
After the installation, you will see to the right of the URL bar.
Extract VDP URLs
Using Chrome, go to your website and navigate to the page that lists the URLs of your VDPs, e.g. your “used inventory” page. Then click the Kimono extension button .
In the upper left hand corner of your browser is where you define your API’s “properties,” i.e. the discrete pieces of data you’re going to extract for each inventory item.
Follow the instructions in these videos to select an element that links to your VDPs. Or you can find more detailed instructions on selecting data on KimonoLab’s website.
After you’ve selected the links from this page to your VDPs and set Kimono up to follow pagination, go to the Data Model View as was demonstrated in the video.
Name the property “url”. When naming properties, I recommend using all lower case and using underscores in place of spaces (e.g. stock_number).
Next, click on the “Advanced” link. Then click the Attributes link. Make sure only the “href” attribute is selected.
Note: If all the data you require is displayed on your inventory list pages, then you could simply continue adding properties for each of the pieces of data.
Save your API as demonstrated in the pagination video above. Call it something like “used_inventory_urls” or whatever name will help you identify it.
Create VDP Data Extractor
If some of the vehicle data you need is found only on your VDPs, the next thing you’ll need to do is create an API to extract data from your VDPs.
Navigate to a VDP. And, just as before, click on the Kimono Labs Chrome Extension button.
Select all of the pieces of data you need.
If you need images, you’ll want to edit the “Attributes” to include only the “src” attribute.
If you need the URL of the VDP, most websites include a rel=”canonical” link element, Facebook Open Graph or some other element that includes the page’s URL. If you can’t extract the URL using the Kimono visual extraction method (i.e. the point and click method), you can extract it using CSS selectors and, if needed, regular expressions.
For example, to select the <link> element with the rel=”canonical” attribute, use link[rel=”canonical”]
Then, to extract only the href attribute from the element, click “Atributes” and click “Include Hidden Elements” (because the <link> element is a hidden element, i.e. it’s not displayed on the web page).
The “Attributes” window will close.
You may or may not have to click “Submit” in order to fully apply your changes.
Click on “Attributes” again. You should now see more attributes.
Select only the “href” attribute and click “Apply.”
You can check to make sure the URL is being extracted properly by going to the “Raw Data View.”
Save and upload your API.
Configure Your Crawler
View your API at KimonoLabs.com. Follow the instructions in the video below to set up crawling. What you’re basically doing is telling Kimono, “Hey, go grab all the URLs that were retrieved from the first API I created, that is to say, the one that grabbed all the links to all my VDPs. For every one of those URLs, run this new API that grabs all the VDP data.”