Tag Archives: hocodata

Get your county government data at the OpenHoward portal

tl;dr: Howard County government ups its game in providing data with a new web site opendata.howardcountymd.gov. Next stop, HoCoStat?

I’ve previously written about Howard County’s initial foray into publishing government data, the data.howardcountymd.gov web site created by the Howard County GIS division. As announced by the county and reported by Amanda Yeager at the Baltimore Sun, Howard County has launched a new site opendata.howardcountymd.gov to provide access to government data. This new site, also known as the OpenHoward portal1, can be considered as a concrete implementation of open data practices mandated by the Howard County Council (see Council Bill 32-2014) and as a down payment on County Executive Allan Kittleman’s campaign promise to create an automated system (“HoCoStat”) to “help government increase responsiveness, improve efficiency and heighten accountability”.

But enough marketing speak, what is this thing really? Briefly, the opendata.howardcountymd.gov site, like the original data.howardcountymd.gov site, is a web site that allows you to view and download various datasets relating to Howard County government activities and Howard County in general. However in other respects the new OpenHoward site goes well beyond what the previous site offers. First, the new site includes many types of data not previously available on the older site, including (to take but two examples) datasets relating to county budgets and police reports.

Second, the new site has a search facility that is extremely handy when trying to find data and datasets of interest. For example, since the renovation of Merriweather Post Pavilion has been in the news I decided to search for “Merriweather”. The search returned (among other things) datasets and records relating to police reports, reports from the Tell HoCo web site and mobile app used to report potholes, broken street lamps, and other problems, and a list of payments the county made relating to Wine in the Woods. I also tried searching for the name of the street I live on, and got a similar mix of results. I predict that this will be a popular use of the site.

Finally, the new site offers an application programming interface (API) by which independent developers can create applications that access the data in real-time. Most people won’t care about this, but (among other things) it offers local Howard County businesses and motivated individuals a way to create their own applications to add value to the underlying county data.

The opendata.howardcountymd.gov site was not built from scratch, but was instead deployed using the online service provided by Socrata, a Seattle-based private company specializing in helping governments to implement open data initiatives. Socrata’s is a “cloud-based” or “software as a service” (SaaS) offering, meaning that Howard County did not purchase software and hardware to run the site, but instead pays a ongoing subscription fee to host its data on Socrata’s servers running Socrata’s software. We’ll see in future exactly how much Howard County is paying Socrata for this service (since presumably it will show up in the “Payments to Vendors” database), but based on an independent analysis of Socrata pricing it’s likely that the cost to the county is on the order of several thousand dollars per month.

That may sound like a lot, but you have to compare it to the fully-burdened cost (i.e., including salaries, heath care, and pensions) of having Howard County employees build the site, or the cost of having a contractor develop a custom site.2 Socrata appears to be a market leader in the open data space, is growing rapidly, and has a coherent vision for future product offerings. Socrata also has other customers in Maryland at both the state and local levels, with Socrata powering the Open Data Portal used in the StateStat system, as well as open data portals and related applications for Baltimore City and Montgomery County.3

In general I think going with Socrata was a good decision for the county. The site looks pretty functional from the point of view of both beginners and more advanced users, Socrata appears to have good mechanisms for getting new datasets into their system, and the provision of an API is a plus for advanced usage. Plus Socrata also has a separate Open Performance (GovStat) product that looks as if it would be a good base on which to build the HoCoStat system.

In comparison to the pluses my concerns about OpenHoward thus far are relatively minor. First, the site could use more datasets, and more data in existing datasets. (For example, there’s no police or fire and rescue data for 2015.) However the press release is upfront about this being a “beta” site at present, so presumably more data is on the way. One major potential lack is data on Howard County schools; I presume the Board of Education and Superintendent Foose would need to cooperate to get that done, and it’s an open question as to whether such cooperation will be forthcoming.

Second, I think the conditions for access to the site and its data need to be spelled out a bit more clearly. The original County Council bill CB32-2014 stated that “All accessible data … shall be made available without copyright, patent, trademark, or trade secret, or similar regulation other than reasonable privacy, security, and privilege restrictions.” In other words, all data published on the site is presumably in the public domain with no restrictions on its use. However it would be nice if that could be spelled out more explicitly. The terms of use for the API are somewhat unclear as well: There’s a basic level of API access available by default, and more intensive usage is possible by registering and getting an “application token”. These are both provided at no charge, but it’s not clear whether there is some level or type of API access that would incur a charge to the application developer or to application users. Again, this is worth spelling out.

Finally, what will happen to the existing data.howardcountymd.gov site? Will its data be folded into the OpenHoward portal and the original site decommissioned, or will it continue to operate? I confess to a personal interest in this, since I’ve previously published analyses that pull datasets from the older site, and if the old site goes away I’d like the web links I used to be redirected to the new site.

Leaving these relatively minor concerns aside, overall the launch of the OpenHoward portal is a very welcome event, and I’m looking forward to see how it and the larger HoCoStat initiatives evolve. Our thanks should go to all those who made this possible, including to Greg Fox, Jen Terrasa, and the other members of the Howard County Council for pushing Howard County to provide open accessible data, to Allan Kittleman for his work thus far to fulfill his campaign pledges around open access, and, most importantly, to those who did the real work, Chris Merdon’s staff in the Department of Technology and Communication Services.

1. Although both the county press release and the Baltimore Sun article reference the OpenHoward name, the actual web site doesn’t use that name. Maybe they’re still finalizing the logo and related branding?

2. Based on the figures on page 190 of the Howard County FY2016 proposed operating budget [PDF], personnel costs for the Department of Technology and Communication Services (the county’s IT department) appear to be almost $100,000 per employee on average. So a hypothetical subscription fee of $8,000 a month would be equivalent to hiring one new employee.

2. The Maryland connection goes beyond what I mentioned: Beth Blauer, who headed up the Maryland StateStat project, subsequently worked at Socrata for a couple of years before leaving to head up the Center for Government Excellence at John Hopkins University.

How politicians see Howard County

Howard County, Maryland precinct cartogram

Howard County, Maryland precinct cartogram. Precinct area is proportional to the number of registered voters as of the 2014 general election. Click for higher-resolution version.

tl;dr: The map of Howard County looks very different if you’re looking for votes. Cartograms help you see like a politician.

There are 118 election precincts in Howard County, Maryland, varying both in geographic area and in the number of voters they contain. Precincts in western Howard County tend to be larger, because the population density in western Howard is lower. Precincts in more densely populated areas of the county (including Columbia) tend to be smaller. If we’re interested in how voters behave across the county a conventional map can be misleading because the larger area of western Howard precincts causes us to overrate the importance and impact of those precincts. (This is similar to the US electoral map being visually dominated by large states like Montana, Wyoming, and the Dakotas that have fewer voters than small states like Connecticut and Rhode Island.)

The figure above is actually a map of Howard County electoral precincts, not as they exist in reality but as they might appear if their size were proportional to the number of voters they contain. More specifically, this is a cartogram in which the precinct map is distorted to make precinct areas proportional to the number of registered voters in each precinct as of the 2014 general election.

Allan Kittleman's victory margins by precinct.

Conventional map of Allan Kittleman’s election-day margin of victory in each precinct in the 2014 general election for Howard County Executive. Click for a higher-resolution version.

Let’s look at a real-life example of how cartograms can present a more accurate picture of election results. The next map shows Republican Allan Kittleman’s election-day margin of victory in each precinct in his 2014 race for Howard County Executive against Democrat Courtney Watson. (The margin of victory is expressed as votes per precinct, not as a percentage. Thus a value of 100 means that Kittleman received 100 more votes in a precinct on election day than Watson. The map does not include absentee and early voting results because they are not reported per precinct.)

Each precinct is colored from bright red (large Kittleman margin) to bright blue (large Watson margin) and all shades in between. (Incidentally, this type of colored map is known as a choropleth map.) Since precincts in western Howard County are both large and heavily Republican the conventional map exaggerates the extent of Kittleman’s election-day victory margin over Watson.

Cartogram of Allan Kittleman victory margins by precinct

Cartogram of Allan Kittleman’s election-day margin of victory in each precinct in the 2014 general election for Howard County Executive. Click for a higher resolution version.

To address this perceptual problem we can instead represent the exact same data in the form of a cartogram, as seen in the next map. Here the precincts of western Howard shrink in size to reflect their true contribution to the overall registered voter population. In particular Howard County Council District 5 now appears to be roughly equal in size to the other districts—which makes sense since county council redistricting had as one of its goals making the districts contain roughly equal number of voters. On this map Kittleman’s margin of victory still appears to be significant, but we can better identify precincts (like those in Columbia) in which Watson polled strongly on election day.

Cartograms can be used in place of conventional maps in any context in which each geographic subdivision has associated with it some common variable of interest. For example, suppose we want to look at elementary school overcrowding in Howard County. Looking at a conventional map (like the elementary school attendance area map provided by the Howard County Public School System) we might say, “Gee, there are a lot of elementary schools in eastern Howard. How could they possibly be overcrowded?” It would make much more sense to show school attendance areas as a cartogram in which the size of each attendance area was proportional to the number of students in that area. Each of the attendance areas could then be colored according to the extent of overcrowding at that school.

This sounds like a possible future project for me if and when I have time. Or if anyone out there would like to try this yourself, I’ve provided more detailed information on how to create maps like those shown above. See my three-part series “Creating Howard County Precinct Cartograms Based on 2014 Registered Voters” (part 1, part 2, and part 3) and my second three-part series “Allan Kittleman’s Election-Day Victory Margins in the Howard County 2014 General Election” (part 1, part 2, and part 3).

Useful datasets for Howard County election analysis

tl;dr: I release two useful Howard County election datasets in preparation for future posts.

In the coming days and weeks I’ll be posting some analyses of Howard County election results. Unfortunately the data released by the Howard County Board of Elections and the Maryland State Board of Elections is not always in the most useful form for analysis. In particular I was looking for per-precinct turnout statistics for the 2014 general election in Howard County, along with some way to match up precincts with the county council district of which they’re a part. That data is available in the 2014 general election results per precinct/district published by the Howard County Board of Elections, but unfortunately that document is a PDF document.

PDF files are great for reading by humans, but lousy for reading by machines. They violate guideline 8 in the Open Data Policy Guidelines published by the Sunlight Foundation:

For maximal access, data must be released in formats that lend themselves to easy and efficient reuse via technology. … This means releasing information in open formats (or “open standards”), in machine-readable formats, that are structured (or machine-processable) appropriately. … While formats such as HTML and PDF are easily opened for most computer users, these formats are difficult to convert the information to new uses.

Since the data I wanted wasn’t in a format I could use, I manually extracted the data from the PDF document and converted it into a useful format (Comma Separated Value or CSV format) myself. Then since someone else might find a use for them, I published the files online in a datasets area of my Github hocodata repository. The first two files are as follows:

  • hocomd-2014-precinct-council.csv. This dataset maps the 118 Howard County election precincts to the county council districts in which those precincts are included.
  • hocomd-2014-general-election-turnout.csv. This dataset contains turnout statistics for each of the 118 Howard County precincts in the 2014 general election, including the number of registered voters and ballots cast in each precinct on election day.

Stay tuned for some interesting ways to use this data.

Fun with Howard County building permit data

tl;dr: I have fun creating graphs and maps with building permit data from data.howardcountymd.gov.

I’ve written previously about the cornucopia of interesting data sets that Howard County government has made available at the data.howardcountymd.gov site. I had some spare time over a long weekend and decided to try analyzing some of that data, including making use of the various map files on the site (under the “Spacial Data (GIS)” tab).

The particular data set I decided to start with was for building permits issued for residential and commercial construction—not because I have a burning interest in building permits but because I mentioned this type of data in my last post and thought it would be a relatively easy data set to analyze. The particular question I decided to look at was how many residential building permits were issued in each zip code within Howard County in 2014—basically to get a feel for where the most construction was occurring in the county. (It’s only an approximate measure because some permits cover multiple units.)

bar chart showing Howard County residential building permits per zip code

To do the analysis I used the skills and the tools I learned in the courses that are part of the Johns Hopkins data science specialization series on Coursera. (See my Coursera-related posts for more on my experiences in these classes.) I won’t go over the process here since I’ve separately published full details on my RPubs page, with the source code available in my hocodata GitHub repository.

I first created a simple table of the top zip codes for residential permits issued. This was sort of boring so I won’t reproduce it here; you can find it in the first example analysis I did. More interesting is the bar chart I created as part of the second example. It’s clear from the chart that there’s wide variation among Howard County zip codes in terms of residential construction. The two Ellicott City zip codes combined (21042 and 21043) accounted for the largest fraction of residential building permits in 2014; in contrast there were almost no permits issued for east Columbia (21045).

Howard County map showing residential building permits per zip code

However what I really wanted to create was a map showing exactly where permits were being issued across the county. The Howard County GIS division provides on data.howardcountymd.gov a set of map data for zip codes within Howard County. After doing a bit of research and experimentation, in my third example I was able to use this in conjunction with the building permit data to produce a map that is a nice alternative to the bar chart.

I have to stop here and ask the unspoken question: What’s the point of all this? I’d answer as follows:

First, this shows that releasing government data empowers people to do interesting things with it, especially when combined with free software and easily available online information and training. Maybe everybody isn’t interested in building permit data or any other individual government data set, but I suspect that there are a fair amount of people out there who are, including small businesses, nonprofit organizations, or just individual activists and interested citizens.

Second, I did all this in a way that is completely reproducible by anyone else. How often have you seen a graph or map in a newspaper or government report and wondered, where exactly did that data come from? Wonder no longer: In my examples I start with the raw data as released by Howard County and show all my work in analyzing the data and creating the tables, charts, and maps.

Finally, this is all reusable and adaptable. For example, suppose you have a better source of data on construction activity, perhaps one that gives the actual numbers of residential units, commercial square footage, and so on. You can easily plug that modified data into the analysis steps I’ve documented, and create better versions of the charts and maps in my examples.

You can also reuse the overall technical approach for any type of data tied to a geographic area within Howard County. For example, in addition to zip code areas the data.howardcounty.gov site contains map data for Howard County school districts, election precincts, census tracts, and many other subdivisions of the county. If you have data sets that are based on those subdivisions (for example, vote totals or turnout percentages for precincts) then you can adapt the code I wrote (all of which is in the public domain) to create your own maps showing how that data varies across the county.

The bottom line is that the data is out there for the picking, as are the tools to make sense of it. You just need to spend some time learning how to use them or (if you don’t feel up to the task yourself) finding someone who can. Have fun!