Sniffie extension 1.0.0 preview has the ability to define XPath manually for table refining mode and dynamic serialization. This article has tips on how to create a good XPath selector for those circumstances. Sniffie tries to optimize all of this during initial configuration, but sometimes results may vary. Thus, you can give your specific XPath commands to Sniffie. Sniffie will then optimize its extraction procedures based on your initial command.
XPath cheat sheet
- If the site has unique ids, you often get good results by selecting based on id. Below we have an example, where div with id app was selected as the base.
- If you have multiple similar elements, you can define the type of element. Use * for wildcard. Below, we select the button that “paginate” id.
- IDs are not the only useful selector. You can also use class selectors. Below, we select a link (i.e. a tag) that contains class “test”. Contains criteria is not limited to classes, but can be used for ids as well.
- The above example will also get all link elements that have “test” in their class names. Thus, for example elements with classes like “supertest” and “testmara” will get matched. If you exactly want class “test”, use the concat function like in the example below. You can also use normalize-space function (not covered here).
//a[contains(concat(' ', @class, ' '), ' test ')]
- Sometimes it’s useful to select items based on text. This can be achieved by contains text() selector. Below shows how to select any element that contains ABC. Use concat like in the above example to test for matching exact phrases.
- At times, there may be multiple elements and you may want to select the last one, regardless of how many are present in the document tree. In such cases, just use last() function in the xpath. Below, we select the last list item (li tag) with class containing “next-page” in the example.
- You may also want to select a specific item in all elements satisfying criteria. You can use the number selector for this. In the below example, we select the second div inside an id “app”.
- Sometimes you may want to select following sibling based on a good starting point. Below we have an example for selecting the h1 tag from the app.
The above examples can be combined in any fashion to get the correct elements for either refining mode or dynamic serialization.
Examples of generally bad XPaths:
The following is too volatile for minor changes in the html tree structure. A better solution would be to find a relatively unique class name or id from the document tree.
The following has a suspicious alphanumeric identifier next to id. Most likely this is something that changes between each build of the app and/or , which means that selections may fail in successive builds. A better solution is to use a “app” contains criteria for id.