Filtering

Filtering enables users to focus their analysis on specific subsets of a dataset.

The ‘Add a filter’ dialog is used to add new filters. Once a filter is added, all of the reports and tools within the text analytics toolkit will automatically apply the new filter for the current dataset.

Filter sources are groups of data fields. There are 4 standard groups.

  • Supplied data
  • Geography data
  • Weather data
  • Text analytic data

Supplied Data

Any data field directly supplied to a dataset is considered ‘supplied data’. The software is not limited to working with survey data, so survey data is not treated differently when using the tool. We do have special handling for loading survey data that streamlines the load process, but beyond that, survey data is no different from other data sources. All fields in a survey that can be treated as filters are included in the supplied data filter option list. Open ends within loops are included, but other fields from within the loops may or may not be included based on how the loop is configured.

Geography Data

Geography data is automatically added by the text analytics system WHEN the source data has a zip code field. When available, a dataset may contain:

  • CBSA
  • City
  • County
  • Division
  • MSA
  • Region
  • State
  • Time Zone
  • Zip Code

These values are automatically associated based on the zip code field found while loading the data. These values are matched to the incoming data using a zip code data set.

For surveys, the software detects the zip code field automatically based on the following rules:

  • The survey contains a zipcode question.
  • The survey contains a field with one of these names: SAMPLE_ACCOUNT_ZIP_CODE, SAMPLE_ACCZIPCD, SAMPLE_ADDRESS_ZIP, SAMPLE_AZIP, SAMPLE_BILL_ZIP_CODE, SAMPLE_BILLING_ZIP, SAMPLE_BILLING_ZIP_POSTAL_CODE, SAMPLE_BRI_ZIP, SAMPLE_CLNT_ADDRESS_ZIP, SAMPLE_COMPANY_ZIP_POSTAL_CODE, SAMPLE_COMPANYZIP, SAMPLE_CUSTOMER_ZIP, SAMPLE_CUSTOMER_ZIP_CODE, SAMPLE_EF_INVALID_CITY_ZIP, SAMPLE_EF_INVALID_STATE_ZIP, SAMPLE_EF_INVALID_ZIP, SAMPLE_EF_US_ZIP, SAMPLE_ESIID_ZIP, SAMPLE_FIRM_ZIPCODE, SAMPLE_FROM_PANEL_ZIP_CODE, SAMPLE_FULL_ZIP, SAMPLE_HOME_ZIPCODE, SAMPLE_HZIPCODE, SAMPLE_IN_ZIP_LIST, SAMPLE_INSTALL_ZIP, SAMPLE_INVOICEZIP, SAMPLE_MAILING_ZIP, SAMPLE_MAILING_ZIP_CD, SAMPLE_MAILING_ZIP_POSTAL_CODE, SAMPLE_MAILING_ZIP5, SAMPLE_MLNG_ZIP_CODE, SAMPLE_MZIP, SAMPLE_OFFICE_ZIPCODE, SAMPLE_PASZIPCODE, SAMPLE_PATIENT_ZIPCODE, SAMPLE_PHYSICAL_ZIP5, SAMPLE_PHZIP, SAMPLE_PO_ZIP, SAMPLE_PREM_ZIP_CODE, SAMPLE_PREM_ZIP9_CODE, SAMPLE_PREMISE_ZIP, SAMPLE_PREMISE_ZIP_CODE, SAMPLE_PREMISE_ZIP5, SAMPLE_PRIMARY_ZIPCODE, SAMPLE_PRM_ZIP4_CODE, SAMPLE_PRMRY_ZIP_CODE, SAMPLE_PRMZIPCD, SAMPLE_PZIP, SAMPLE_RESPONDENT_INFO_ZIP, SAMPLE_S_ZIP_CODE, SAMPLE_SAMPLE_ZIP, SAMPLE_SERVICE_ZIP, SAMPLE_SERVICE_ZIP_CD, SAMPLE_SERVICE_ZIP_CODE, SAMPLE_SHIP_TO_ZIP, SAMPLE_SHIP_ZIP, SAMPLE_SHIPPING_ZIP, SAMPLE_SHIPPING_ZIP_POSTAL_CODE, SAMPLE_SVC_ZIP5, SAMPLE_ZCZIP5, SAMPLE_ZIP, SAMPLE_ZIP_5, SAMPLE_ZIP_CD, SAMPLE_ZIP_CODE, SAMPLE_ZIP_LOCATION, SAMPLE_ZIP_OR_ZIP___4_, SAMPLE_ZIP_PLUS_4, SAMPLE_ZIP5, SAMPLE_ZIP5_CD, SAMPLE_ZIP5A, SAMPLE_ZIPCD, SAMPLE_ZIPCODE, ZIPCODE, ZIP_CODE, ZIP5, ZIP9, ZIP10, ZIP

Weather Data

When geography data is available AND a date/time, the text analytics system automatically associates local weather data with each record. When available, a dataset may contain:

  • Hourly precipitation at time of survey
  • Hourly precipitation description at time of survey
  • Overall weather description at time of survey
  • Temperature description at time of survey

Text Analytic Data

The text analytics system automatically assigns certain fields to data as it processes the text. These fields are:

  • Emotions – Detailed
  • Emotions – Simple
  • Sentiment
  • Text Length
  • Topics
  • Word Count

Frequently Asked Questions

Q: Is there a limit to how many filters can be applied?

A: Yes. The answer is ‘about’ 25. A filter, for purposes of this limit, is a field in one of the available field sources. So, you can filter on CITY, STATE, AGE, INCOME, GENDER and 20 other variables at the same time. We say the answer is ‘about’ 25 because there are a few caveats to the limit enforced by underlying technologies that can reduce or increase that limit slightly in a few cases.