0. Time needed to finish reading: 4-7 minutes
#1. How can you use data extraction tools?
If you are not sure what “data extraction tools” are, then maybe that’s because you heard or read more about “web scraping tools”, instead. The first rule in the data extraction game is to actually have data you can view and understand. Sometimes information is unstructured, but you can either use an editor to organize that data after it was extracted, or you can have a software especially designed to do that: process information and make it useful for your business.
Some consider web scraping to be just a data entry technique that would save a lot of time, since it can be done automatically. And even if that’s not untrue, you can do so much more with the right tools and experience. If I were to list few areas where you can make good use for data extraction tools, I would start with market research (since many entrepreneurs would require this to even start thinking about a business, so it’s phase 1 for any start-up) and I would continue with extracting contact information (sales, marketing), search for candidates, track prices, track trends or user needs and habits.
After you have the extracted data, you will need to go through a process of “cleaning” the results, in order to make that data useful. As mentioned above, you can either do that manually, or you can use a software for that. If you’re stuck to a tight budget, feel free to give OpenRefine (formerly GoogleRefine) a try. After downloading & installing the kit, you can dive into the documentation and start transforming the collected data.
If, on the other hand, you want to invest some resources in order to make the data available fast in the exact format you need it (either for your CRM or to connect it with other services that are key to running your business), you can develop such a solution with a software development company that understands how data can transform into business value. In a field like e-commerce, if your system is not running fast, at full-power, then you’re losing opportunities and probably money, too.
So it all depends on how big and complicated your process is and if you want to handle it or you prefer to outsource that bit of work. In the next paragraphs, I will introduce you to some tools – some free, some paid – that are already on the market and can handle the data extraction for you, which leaves you with handling the cleaning & integration process.
#2. Few data extractions tools
WebScraper.io – this is actually my favorite tool for quick tasks, as it runs as a Google Chrome extension and does it’s job nicely. You can actually use WebScraper for QA tasks as well, but this is something I might write about in a separate article. The tutorials will teach you how to handle multi-level navigation and paginations and if you’ll be patient and play around with it a while, you’ll find that it’s quite versatile. Since it’s a Chrome extension, this tool is obviously free, but you can also use the Cloud Web Scraper – their premium version. Another really interesting feature of this tool is the fact that you can import / export sitemaps. This means you can set up a “pattern” for how the data needs to be extracted and just pass that template away to colleagues or partners. Neat!
Import.io – I’ve had the chance to test their service out back when it was free, but since then they added a lot of other features. But even with their initial releases, the software was really nice. You can download the data as CSV or generate an API with the information.
Portia – ScrapingHub – Create a template by clicking the elements on pages you would like to scrape. On their site you have few video tutorials that will make it easy to understand how to use the tool. For larger volumes you will need Scrapy Cloud units, but if you’re a small business and don’t require large datasets, you can probably handle your tasks within the free version limits.
ParseHub – provides a desktop app (Windows, Mac and Linux) that’s easy to install and use. The free version is limited to 200 pages of data in 40 minutes, on a maximum of 5 public projects. The free version can be really useful for local businesses that are collecting contacts, for example. If you want more power under the hood, you will need to spend some money.
Scrapy – if you’re into programming, even at beginner level, you can try this fast framework. Scrapy runs on Python 2.7 and Python 3.4 or above under CPython and seems to be a flexible solution for those who are not afraid of going through the very complete tutorial offered by the tool creators.
Of course, if you don’t want to use any of the tools listed above or you feel like a custom software solution would help your business more, then you can always get in touch with us and discuss those specific needs. A software development company can create a custom data extraction tool, run a clean-up process based on rules you define and connect the required data with your other services, thus generating added value for your business process.
#3. Data Scraping tutorial with WebScraper – Quick Guide on how to collect relevant data
Few paragraphs ago, I’ve mentioned that I enjoy using WebScraper for quick tasks and now it’s time to show you why. And there’s no better way to do that than with a quick guide on how to collect relevant data. We’ll assume, for the purpose of this “tutorial”, that you are the owner of of a new cosmetics company based in Dublin, Ireland and you’ve launched a new product line that’s dedicated to hair salons. Your product is great quality, the packing design is absolutely wonderful and your sales team is eager to put some products on the market. How do you start?
You will need to collect business leads, and you need it fast. At this point, you can just access Yelp.ie, for example, fire up WebScraper, locate the hairdressers category, choose Dublin as the desired location for where the hair salons should be based and boom, you have 967 potential clients. Now you need to create a sitemap in WebScraper – you’re basically telling the tool what data you want to extract from the page. You will need to name you sitemap and add a start URL (which will be your starting point).
In the next step, you will start adding selectors. In this situation, since you’re being displayed a list with multiple entries – you might not have all the data you require for closing the deal, you’ll want to access each of those entries separately and better understand the profile for those possible clients. So your selector will be a link. Click on all the titles of those entries (which are also links to the detail pages), check the “multiple” option below the selector input, name your selector and hit save. You’re now ready to go to step two.
Click on the selector you’ve just created a minute ago and inside it create another selector. Also, in the browser, click on any entry so that you’ll be redirected to the details page. In this view you have a bit more information, like the name of the business, the address, the phone number (which you also had in the previous view), but also the webpage URL and a calendar with the working hours. You now know when it’s OK to reach the hairdresser. Let’s map this inside WebScraper and save that data for the sales team.
You’ve got everything you need. All selectors are “text”. Let’s see how the sitemap looks like. You’ll use the selector graph to visualize that
Obviously, you’ll also have to handle pagination. You can follow the WebScraper’s official tutorial or you can try out this option which requires more manual intervention, but will allow better control (on what pages are selected, for example). Switch over to “Edit metadata” and you’ll notice that the Start URL field allows for multiple values. Click on “+” and in the second, third etc input, add the starting URL plus the paginated param.
And you’re done. Extract the data for Dublin, export the data as CSV and you’re ready for the cleaning step. Or you can just go back and collect the data for Cork, Limerick or Galway. Up to you.
#4. How to use extracted data for sales process optimization
Now that you have the data available locally, you need to clean it and start using it. If there’s a software to handle the cleaning, that’s great. If not, then you should do that manually.
Your business most likely uses a CRM to manage a process for each department. We’ll assume it uses Microsoft Dynamics 365 CRM – so you want all that data you’ve extracted in there, available for your sales team. Well, using a tool like data import feature, you should be able to do exactly that. Map the column headers and you’re ready to go.
Don’t worry if you’re not using Microsoft Dynamics 365 CRM, most of the CRM softwares available on the market have a data import tool available.
For many small and mid-size businesses, you’d only need a generic tool to work with the extracted data. You can either use Apache’s OpenOffice or LibreOffice (especially if you’re on Mac) to generate all sort of reports or simply prepare the data for further processing by another department. You also have the option of using Microsoft Office 365 or G Suite and store everything in cloud, for an affordable monthly subscription – an option which I think sounds very reasonable. But you are surely not limited to that. Data Extraction tools provide you a way to generate useful information for your whole business process, helping your stay ahead of the game in your industry just by relying on data. Integrate collected data with a CRM to ease the job of your sales department? Check! Dynamically adjust costs depending on the market supply-demand report? Check! Use non-traditional web sources to better understand trends & season-affected markets? Check!
With a small learning curve & some imagination, you can use data extraction tools for retail, research, finance and even the automotive industry.
If you have any thoughts on the topic or you would like to share another brilliant tool you’ve used for data extraction, don’t be shy and use the comment form below. We’d love to hear your opinion.