Open Data

Colin Copeland @copelco

http://cakt.us/tripython-opendata

Presenter Notes

Talk Outline

  • About me
  • What is open data?
  • Open data projects
  • Activity in the triangle
  • How to get involved

Presenter Notes

About Me

  • Work here at Caktus - we build custom Python/Django web apps
  • In spare time have picked up open data as a hobby
  • Citizen and programmer/developer perspective

Presenter Notes

What is Open Data?

Open Knowledge Foundation Definition

Open data is data that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and sharealike.

Use in the government/public sector (from Socrata)

Making data that belongs to the public broadly accessible and usable by humans and machines, free of any constraints.

Presenter Notes

  • I'm not going to dive deep into the philosophical aspects of open data
  • Talk about it at a very high level

Historic Example: Weather

Weather

Data from government satellites and ground stations.

Created industry for:

  • Weather Channel
  • Commercial agricultural advisory services
  • New insurance options

Presenter Notes

  • Two commonly mentioned examples of how open data has been used in the past
  • Economy has consistently benefited when government data have been released to entrepreneurs and other innovators.

Historic Example: GPS

Global Positioning System (GPS)

US Gov. released GPS data once reserved for military use to the public.

Gave rise to GPS-powered innovations:

  • Aircraft navigation systems
  • Precision farming
  • Location-based apps

Presenter Notes

  • Executive decisions made by Presidents Ronald Reagan and Bill Clinton
  • Turns your phone into a GPS device so you can use Foursquare or get navigation directions on a map.

Open Data Requirements

  • Availability and Access
  • Reuse and Redistribution
  • Universal Participation

Presenter Notes

  • Release important government data allows citizen developers and entrepreneurs to turn them into new products and services
  • Available online and in a convenient and modifiable form
  • Released under terms that allow reuse, redistribution and mixing with their data sets
  • Everyone can do it, no restrictions, non-commercial/education-only. Don't need to worry about the legal aspects.

Interoperability

Presenter Notes

  • Requirements are really about interoperability, right?
  • Open formats (nothing proprietary)
  • GPS data and transportation alone is great
  • But the data becomes more useful when combined together
  • Open data can scale too. Shared formats, combining data across regions, etc.
  • Larger level, more people

What kind of data?

static/data-types.png

Source: http://okfn.org/opendata/

Presenter Notes

  • Data that the government already collects
  • Map data (roads, buildings, topography and boundaries)
  • Environment (level of pollutants, quality of rivers and seas)
  • Statistical and Financial data (census and government spending)
  • Anything from prices charged by hospitals/colleges to crime to unemployment

Not Personal Data

Important point: not personally identifiable data

Issues in the past: Journal News, Westchester County, NY, published map with names and addresses of people who had gun permits

http://www.nytimes.com/2013/01/14/business/media/guns-maps-and-disturbing-data.html

Presenter Notes

  • Not talking about releasing personal tax records or private health records. Gave open data a bad rap. Other ways to analyze this data without mapping everyone to a point.
  • This means that some data is aggregate data, to a certain area/region, over time, etc.
  • Scrubbing data talked about later
  • Let's look at an example of using open data

EveryBlock.com

../../djangocon/2012/openblock/static/example-everyblock.png

Presenter Notes

  • Hyper local news
  • Browse by neighborhoods, streets, zipcodes, or draw your own location
  • Lots of public record information as well as community neighbor content
  • Lots of community activity, especially in Chicago

OpenBlock

../../djangocon/2012/openblock/static/openblock-logo.png

Presenter Notes

Columbia Tribune

static/tribune.png

Presenter Notes

  • Most recently in Columbia Missouri, newspaper
  • Police, restaurants and home sales are the newsy stuff that's updated daily

OpenRural

Presenter Notes

  • Taking OpenBlock and using it in rural North Carolina communities
  • Small towns and small news organizations
  • Newspapers don't have a lot of digital resources
  • And they lack the resources to make public data digestible on the web
  • Quite different than typical OpenBlock setup in a big city with larger infrastructure

OpenRural

../../djangocon/2012/openblock/static/unc.png
  • June 2011: OpenRural funded by a three-year Knight News Challenge grant
  • Ryan Thornburg, professor at School of Journalism and Mass Communication at UNC
  • Caktus is helping develop and deploy OpenRural for these NC communities

Presenter Notes

  • Goals:
    • Apply same OpenBlock tools to rural North Carolina communities
    • Increase access to local public records
    • Do this by helping local newspapers leverage OpenBlock
    • "Help Rural Newspapers Get Access to Public Data"

Columbus County, North Carolina

../../djangocon/2012/openblock/static/nc-columbus-county.png

Presenter Notes

  • Our initial focus is on Columbus County, NC
  • Small county in the south eastern part of the state with 50k residents
  • Working with a local newspaper to incorporate public records onto their site

The News Reporter

../../djangocon/2012/openblock/static/whiteville-com.png

Presenter Notes

  • The online version of the paper serving Whiteville and Columbus County

Columbus County Open Data

static/columbus-gis.png

Presenter Notes

  • Wouldn't have been possible without the county staff
  • Access to downloadable information from local websites
  • Small county, CH is bigger, one guy
  • People asking him for data, rather than responding to each one individually, he posts them online

Presenter Notes

  • Bring this back to the Triangle, maybe do something in Durham
  • GIS/Historic nut
  • Durham had horse/mule drawn streetcars in 1880. Electric streetcars 1900-1930, before buses took over.

Durham GIS

static/durham-gis.png

Presenter Notes

  • Can't download
  • $25-$100/layer

Commercial Use

  • "None of the GIS data purchased through this Policy shall be published by the requestor without the City’s explicit written consent, nor shall the requestor permit any other party to publish the data."
  • $100-$1000/layer
  • Provided on CD-ROM or 8MM tape

Presenter Notes

  • Poking fun at Durham
  • Recoup the costs of man hours spent creating these files
  • Rather than finding common set of files to publish, they make all requests go through the department
  • GIS has a special case in NC

Presenter Notes

  • Enacted legislation in NC for public records
  • Lays out what can be published
  • Has special case for GIS

Project Open Data

static/project-open-data.png

White House executive order - http://project-open-data.github.io/

Presenter Notes

  • White House executive order- open data the default for release of government information
  • Open data people were excited, general public wasn't
  • Big step forward making gov data, paid for by tax dollars, accessible by citizens
  • Tools: Mix of PHP, Java, Ruby, Python

Business Case for Open Data

  • Save time and money responding to Freedom of Information Act (FOIA) requests
  • Avoid duplicative internal research
  • Discover complementary datasets held by other agencies
  • Empower employees to make better-informed, data-driven decisions
  • Positive attention from the public, media, and other agencies
  • Generate revenue and create new jobs in the private sector

Presenter Notes

  • Massachusetts saved over $3 million putting procurement information online and South Carolina has seen FOIL requests decrease by one third.
  • San Francisco access to real-time transit information resulted in decreased 311 call volume that saved over $1 million a year

Open Data Triangle

Presenter Notes

  • Lots going on in the triangle
  • DataPalooza is an open-data competition sponsored by the White House - focused on health, energy and education data

Raleigh Open Data

static/openraleigh.png

Presenter Notes

  • Way aheadd: Open source/data resolution - agenda/policy, on the books
  • Socrata data portal, 95 data sets
  • Jason says it's getting a lot of hits and they're getting requests for different kinds of data
  • Lots of open data portals these days

Presenter Notes

  • Most of these are using the Socrata product. SAAS.
  • data.gov just relaunched based off CKAN - "open source data portal software". Python, Solr, Postgres.
  • Philly is using a Django project

OpenDataPhilly

static/opendataphilly.png

Presenter Notes

  • Take the ODP codebase and use it in Durham
  • Python/Django codebase, I can deploy this here
  • Got it up and running, but Durham wasn't ready to adopt it

Code for America Brigade

static/brigade.png

Presenter Notes

  • CfA Brigade, organizing civic-minded technologists to contribute their skills in service to their local governments
  • Create re-usable apps

Brigade Apps

static/brigade-apps.png

Presenter Notes

  • Raleigh refactored the adpot a hyrdant for bus shelters
  • Durham started in May. Small steps, slowly working to pass an open data resolution.

Tools/Software

  • ScraperWiki
  • Open Data Catalog
  • OpenBlock
  • OpenTreeMap
  • ...

Presenter Notes

Questions?

Colin Copeland @copelco

http://cakt.us/tripython-opendata

Presenter Notes