Voxgov Leverages NLP to Expose U.S. Government Reports, Data and YouTube Channels

Uncategorized
This post was originally published on this site

I wrote my first post about Voxgov in 2017 I described it as “ as Goldmine of Hidden US Government Insights and Trends.”  Voxgov is one of those products that I wish I had thought up myself. Founder, Robert Dessau an Australian lawyer working in New York recognized  that US government documents are gold mine of valuable research and data, but  it was not always easy to locate.  The Voxgov database monitors the websites and issuances of over 9,000 federal government offices. The database now includes 75 million documents. Approximately 7 million new documents are ingested and processed each year. All of these documents are tagged using Natural Language Processing  ((NLP) to enable researchers to figure out what has the government said on any topic at any time.

I recently spoke with Dessau to learn about some exiting new features  and content sets that that improve the precision of search results and add new insights into federal government activities.

New Voxgov Features:

Interactive Result Tagging

All VoxGov results now include tagging for keywords/phrases; names; organizations and places that help to  users pinpoint the results of most interest.

 Closed Captioning Text From Official Government YouTube Accounts

Voxgov has captured content from over 12,000 official government social media accounts. An exciting new feature makes the transcripts  from government Youtube  videos keyword searchable. VoxGov extracted Closed-Caption text from over 660.000 videos. Users can watch a video with CC text visible throughout or use the CC text to jump to between mentions of their search term in the extracted text.

Voxgov Govenment Youtube Videos with Transcripts

Data Filter: Identifying Images, Tables and Statistics in Documents

VoxGov now identifies documents containing open-source images., including photos, charts and banners. As a researcher I am very excited about the ability to  quickly identify and view documents containing specific  types of statistical data. Voxgov will save researchers hours of time by making statistics, tabular material and images “finable” and viewable in a few seconds.

The Data Filter identifies search results containing:

  • Documents with statistics
  • Spreadsheets
  • PDF documents
  • Word documents
  • PowerPoint presentations
Voxgov – Government Documents with statistics, tables and images tagged, finable and viewable.

 

The Word Cloud

VoxGov is full of words and the word cloud  feature is a powerful tool for drilling into related documents. It also exposes important related terms and organizations that the researcher may want to explore. The word cloud can enable “a search within a search.” It can be used to refine search results and to navigate to documents containing the specific words.