Logo International Association for Official Statistics

Improving Official Statistics

IASS Webinar: “Building a Sample Frame of SMEs Using Patent, Search Engine, and Website Data”, 2 September 2021

International Association for Survey Statisticians (IASS) is organizing a webinar entitled “Building a Sample Frame of SMEs Using Patent, Search Engine, and Website Data” on Thursday, 2 September 2021 at 1 pm – 2:30 pm (CET).

Webinar Abstract

This research outlines the process of building a sample frame of US SMEs. The method starts with a list of patenting organizations and defines the boundaries of the population and subsequent frame using free to low-cost data sources, including search engines and websites. Generating high-quality data is of key importance throughout the process of building the frame and subsequent data collection; at the same time, there is too much data to curate by hand. Consequently, we turn to machine learning and other computational methods to apply a number of data matching, filtering, and cleaning routines. The results show that it is possible to generate a sample frame of innovative SMEs with reasonable accuracy for use in subsequent research: Our method provides data for 79% of the frame. We discuss implications for future work for researchers and NSIs alike and contend that the challenges associated with big data collections require not only new skillsets but also a new mode of collaboration.

Biography of Speakers

Sanjay K. Arora is an innovation policy and management researcher who uses emerging big data sources to measure small firm R&D and entrepreneurial activity. He is also an ML Engineering Business Leader at EY, the global audit and consulting firm. Sanjay currently resides in Washington, DC.

Sarah Kelley, MIDS, is a Senior Data Scientist at Child Trends where her work focuses on the intersection of social science and data science, especially using natural language processing and machine learning to answer social science questions.


The IASS webinar is open to all. Please register for the webinar here. After registering, you will receive a confirmation email containing information about joining the webinar. There will be time for questions. The webinar will be recorded and made available on the International Statistical Institute (ISI) website.