Changedetection.com will email you if a page changes. It checks about once a day, I think.
IFTTT.com can check Twitter, RSS feeds, Reddit and other data streams and send you an email, Tweet, add a line to a Google spreadsheet, etc when it gets something of interest.
Can robots check other social media like FB, Instagram?
As far as I know, no publicly-available robot will check other people's traffic on these two. But see Problem 2 for some advanced searches.
But anyway, while I'm standing here, let's do an IFTTT recipe that will send me an email each time there's a USDOJ press release about Georgia.
It's the same as in this video, so you can watch later if you like.
Some of y'all probably know about advanced search: https://www.google.com/advanced_search
You can narrow searches by when the site was last updated, by phrases you don't want, etc.
You can also do it from the search bar, which is nice because you can ask for any filetype, not just the ones listed on the advanced search page:
"tom wolfe" -book= search for the exact phrase "tom wolfe," but leave out any results that contain the word "book"
georgia history site:edu= search for the words "georgia" and "history", but only in websites that end in ".edu"
Obama AROUND(0) Zuckerberg= search for "Obama" within 0 words of "Zuckerberg" (ie, right beside it.)
Obama AROUND(5) Zuckerberg= search for "Obama" within 5 words of "Zuckerberg"
Note: AROUND() has to be in all caps. And it doesn't always change the results much. But sometimes it does.
landfills site:georgia.gov filetype:xls= search for the word "landfills" in all websites that end in georiga.gov, but only show me Excel spreadsheets (.xls)
inurl:pdf "Georgia State University"= search for the phrase "Georgia State University" in any url that has "pdf" in it.
("inurl:pdf" will give you pdfs that Google can't find via "filetype:pdf," according to Henk van Ess, to whom I apologize for cribbing some of this stuff. Read him for more fun searches.)
Search is Back for FB. Easier than the search in FB, IMHO.
Official Facebook Live Map for, well, seing all FB Live streams on now.
Echosec for searching a bunch of social media by location.
Official Twitter Advanced Search
PDFs are kind of like a picture of words and tables. Sometimes you can search the text in a PDF & copy it. But sometimes you can't. And you certainly can't open a .pdf with Excel.
A PDF is like this: . Text is like this: 228
So there are a couple of services out there that unlock the text and tables in PDFs.
It's prone to error because a computer is making its best guess at what it's seeing. But IMHO, Cometdocs gives you a pretty clean conversion.
You can convert a few docs a month with a free subscription.
With a paid subscription, you can do more documents. (IRE members get a free subscription.)
Let's try it with lotto.pdf. But look at the result carefully ;)
Let's start with lotto.xls, a (fake) list of where lottery winners live and how much they won.
I'll do an example right here.
It's the same as in this video, so you can watch later if you like.
This is an app that transfers really big files for you, files that are too big to email.
You set up an account and you send a link to the person who has the big document. That person follows the link to upload the doc to your Dropbox account. Then you download it from there.
Or, vice versa, you upload a big doc and send somebody else the download link.
Next up: putting things like maps and graphs on your web site.
You probably most all work with a CMS (a "content management system") or WordPress. This is the place where you paste in the text you want to publish.
But when you want to publish a map or graph, you have to tell your CMS/WordPress where that map or graph is stored.
For example, if you make a Google map, you need to give your CMS a Google code. If you make a Plotly graph, you have to give your CMS a Plotly code.
Luckily, these "codes" are standardized and are pretty much a copy-and-paste into your CMS. They're called "embed codes."
AFAIK, all "embed codes" begin with "<iframe>" and are just a couple of lines.
You will have to figure out in your own CMS where to paste the "embed code." It might be really simple. Here are the buttons for CL & at my version of WordPress, for example:
(In WordPress, you *might* need a plugin. There are many. I use iframe by Webvitalii.)
Your CMS might require a few more steps tho. If you need help, ask help desk where to paste "embed codes."
Anyway, if you make anything online like a Google map, a graph, a timeline or whatever else they've dreamed up, your goal is to get an "embed code" and put it somewhere in your CMS.
With Plotly's free tier you can make some simple graphs.
So first let's prepare some data to give Plotly.
Let's go back to lotto.xls
And make a Pivot Table that sums up winnings by year.
Then we'll put that info in Plotly.
There are many sites that do something similar.
If you put a graph on your page, check it on several browsers and PHONES. I've seen reputable news orgs run graphs that are half cut off & unzoomable on phones. I don't know if it's the CMS or the embed or what. But don't be the person who offers mobile users a crummy graph. If one service doesn't work, try another.
You might try Infogram, Datawrapper, Chartbuilder by Quartz, or check Computerworld's list of tools.
Let's say you want to make a clickable map of Georgia's congressional districts.
First you'll need to download a file of that map.
Two common map filetypes are shapefile (.shp) and .kml.
Shapefiles are big, but you need to know about them because a lot of public agencies use them, like the U.S. Census, or sometimes the Georgia General Assembly
Kmls are smaller and they are the format that Google Fusion Tables wants.
So let's imagine our source has given us this shapefile of Georgia's congressional districts: CONGRESS12-SHAPE-2.zip. (If your source gives you a .kml, then you can skip the conversion step.)
Now we will turn this .shp into a .kml. I'm going to use a site called mapsdata. There are others, like shpescape.com.
Let's get started!
Or you can watch this video later.
BONUS ROUND, time permitting: Now you ask, great, how do I add more info to this map, like, say the legislator's name? Or party? Or make the blue districts blue and the red districts red?
That is exactly what Fusion Tables were designed for!! Fuse data across multiple sources!
There are tutorials online, but FYI here's roughly what you'll do:
First you'll need this spreadsheet, GA_HOUSE.csv. It lists every House member, their party, and a color code for the web.
Downlad that spreadsheet and save it in your own Google Drive as a Fusion Table.
Go back to the House map. On the left, go under File -> Merge. Follow the steps merge the two into a new document. Match the two tables by "DISTRICT." Look closely at "DISTRICT." It's a three-digit number with leading zeroes. Google can match "001" to "001". But "1" and "001" might confuse it. Your merge column has to match up!
But before Problem 8, let's talk about limitaions of Google, Plotly, IFTTT, Changedetection and any other outside site you use in publication ...
When you use free services, you're depending on them not to go out of business, for their cloud to not crash, for them to keep the free tier open.
If any of these data disasters happen, they take your data with them — and your embedded thing will appear as a broken picture.
And as for these graphs & maps, you don't get a ton of control over how they look compared to if you were a coder writing someting from scratch.
These are good quick & dirty tools for day-to-day storytelling.
But just as FYI, I'm telling you there are some shortcomings.
That's the National Institute for Computer-Assisted Reporting email listserv. While you're there, join any of the several Investigative Reporters & Editors listservs. They are super-helpful and super-friendly.
Note: If you're dealing with seriously confidental/sensitive docs, I doubt you should put them in/through any of the services listed above. They probably keep a copy and/or have vulnerabilities.
Two ways to search for deleted web pages or old versions of web pages: The Wayback Machine aka Archive.org or, google something and click the GREEN down arrow beside the results. Sometimes one of the choices is "cached."
PDFSplit to pull one or more pages out of a PDF.
RECAP for Pacer, a plugin for Chrome and Firefox that downloads and stores the federal court docs you view in Pacer and allows other users to see those documents for free. And you can, for free, see what other users have downloaded. Saves $ on your Pacer bill.
Sqoop tracks SEC filings, patent applications and Pacer filings, and I think it lets you search & filter and does email alerts. But I also *think* it's missing the federal courts that don't publish an RSS feed, cough, cough Northern Georgia. I've never messed with patents & SEC parts.
Justia Dockets & Filings does catch all federal courts, AFAIK.
Import.io can scrape data off *some* websites into tables you can open in Excel.
StoryMapJS, a Knight Lab offering, is like a mashup of a map and a timeline.
Silk: "Silk lets anyone create interactive data visualizations, publish websites and tell interactive stories."
SiteDelta, a Firefox plugin that checks pages for changes.
Bloom, a "geolocation platform for news publishers."
Additional bonus rounds. Items that were cut b/c the presentation was getting too long:
I can't do a better walkthrough than Knight Lab has on the Timeline JS site.
You will need a Google account. And at the end you'll get some code to embed, just like the map example above.
Problem: "I need to download a bunch of documents/pictures from a website. A bunch."
Solution: Requires Firefox, but try the DownThemAll plugin
Please get DownThemAll from the official Mozilla site: https://addons.mozilla.org/en-US/firefox/addon/downthemall/developers
It'll walk you through installation.
Then, let's say I need file photos of all 180 Georgia state House members.
It's the same as this video, so you can watch later if you like.