Approaches to Searching by Place in Media Cloud
One of the key values of Media Cloud is that it supports media analysis with a global footprint in ways few other resources do. Almost all of my research in Media Cloud includes some geographical components - my recent post on US pandemic news by state is an example that did so at the state level in the US. However, since Media Cloud is so versatile, it can be hard to figure out which approach to searching within places is the best one for you. This blog post explains the available options and makes some suggestions about when to use each.
1. Geographic Collections
Use our curated geographic collections if you want to search online news sites we have validated as published from a place.
Media Cloud has geographic collections for most countries and states/provinces. You can browse these in our Source Manager tool. We created these years ago by importing what we found on the ABYZ News Links website. Since then our own staff, working with partners based in various countries, have expanded and cleaned some of these collections significantly. That means these sources are what I refer to as "validated" - we've checked that they are news about that place and we are pulling in stories regularly via any RSS feeds we have associated with each source. It doesn't mean that the site is necessarily still online; it might be a historical "dead" source that doesn't publish anymore, but we retain it in our collection to preserve records of the content.
Each country has a "National" collection which includes media sources that cover the entire country or are not specific to a locale (like the India - National collection). Use this collection if you are interested in what the top news in a country is at a high level. "State & Local" collections for each country group together all the non-national sources (like the India - State & Local collection). Use that collection if you want a broader picture of more local or regional news in different places within the country. For many states or provinces we also have specific collections that you'd use if you care about what makes online news in a very specific part of a country (like the Kerala, India - State & Local collection).
The caveat here is that we of course aren't experts in the online media ecosystem in every country. Some places have more robust coverage than others. Drop us a line if you know a country's online media well and would like to help us flesh out better collections!
2. Sources Published in a Place
Use our "published in" media tags to search online news sites from a larger, less validated, set of sources we have marked as published in each place.
One of the pieces of metadata we store in our system about media sources is the country of publication. You can use that to search all the media sources we have marked as published in a place. This helps if you want to cast a slightly wider net and are OK with a little more noise in your results from non "validated" sources. This usually only adds a little more content to your search than using the national and state & local collections for a country. For instance, in India this technique returns about twice as many sources, but that only yields about 20% more stories than using the place collections (see an Explorer search showing that). This is because we have many sources marked as "published in India" that we are not collecting stories from dailly. Our system has limited resources, so we have to prioritize which sources we regularly ingest from.
3. Articles "About" a Place
Use our "about a place" story tags to search online news stories that our algorithms think are about a certain place.
The previous two approaches both help when you want to search online news media in a place, but what if you want to search media about a place? Our story processing pipeline takes the text of every story and runs it through our CLIFF-CLAVIN engine to tag it with places our algorithm thinks the story is "about". You can use those tags to search all stories that are about a place.
What does "about a place" mean? A short description is that CLIFF uses a set of custom heuristics to determine which geographic place is being referred to in text, and then picks the countries and states/provinces with the most places mentioned in a story and tags the story with those. Read our academic paper about CLIFF to learn more.
To take this approach, you need to know the "magic tag" id for the place you are interested in. You can get that for a country by clicking on it in Explorer results, as shown above, or you can get the tag id from the CSV download on our support page.
4. Media Sources "About" a Place
Use our "about a place" media tags to search online news sites that most often write stories about a country.
One piece of metadata about media sources that we generate automatically is the country each source most often writes about. This is based on the "about" tags mentioned earlier. For each source we have a substantial number of stories from, we look at the country that shows up most often as the place it is about (via CLIFF as described about) and tag the media source itself with that country.
This is subtly different than the "articles about a place" approach. Using the media source "about country" tag searches all the stories from media sources that generally write about that country, including any stories that our system does not think are about that country. Using the story-level "about place" tag described above searches only the stories that are about the place, which includes countries and state/province tags. We don't tag media sources with the state/provide that they most often write about.
5. A Place's Name
Include the place's name in your query string if you want to find any mention of a place in stories.
The techniques above can help you narrow in. If you want to make sure you don't miss anything, you need to include the place's name in your search query. This is especially helpful when you are searching at the city level. Of course, the problem is that you will probably get a lot of stories that mention similarly named places! For instance, searching for a place by name will include other similarly named places around the world. This approach increases "false positives" (stories that come back as matches but aren't what you intended), but minimizes "false negatives" (stories that should have come back as matches but didn't). If your place name is fairly unique, then the false negatives aren't likely to be a problem. But if it is a generic or widely-used place name, then that will probably be a problem.
As you can see, there are a variety of approaches here to choose from based on what you are doing. This is, of course, both a strength of our platform - because you can tailor it to your needs - and a weakness - because it is confusing! Here are a few other things to keep in mind when you are doing research on news media from or about a place in Media Cloud:
You can search by story language as well, adding "language:en" to search English language articles or "language:hi" to search Hindi language news articles.
We also tag sources with the language they most often publish stories in, so you can search sources that most-often publish in a language. You can do this very similar to how you use a "custom collection" to use for media sources tagged as published in a place as shown above.
Another research angle would be to search sources that are most read in a place, which of course could be a different list than the sources published in a country.
Thanks to Dennis Jen, Emily Ndulue, and Fernando Bermejo for comments and edits.