The GIS Extension (Vector Tutorial)

The GIS extension for NetLogo allows you to import and export real-world spatial data into your models. If you can picture your model taking place on top of a 2D map of real-world geography, then the GIS extension is your best bet for making that happen. In this tutorial, I will be showing an example of how to use a GIS application (either the free and open source tool QGIS or the industry giant ArcMap) to clean up and prepare GIS data and then use the GIS NetLogo Extension to import and create a model with it.

What is Spatial Data?

Spatial data is, unsurprisingly, any data with a spatial component, from simple points of latitude and longitude — such as locations of post offices around a nation — to lines — such as road networks — to closed polygons — such as the boundaries of nations or municipalities. In addition to the spatial data inherent to such geographic features, a GIS dataset almost always also contains many more fields of data associated with each feature or individual element of that dataset. To reuse the above examples: each post office point feature could have a field representing its address, each road line feature could have a field representing its speed limit, and each nation's polygon feature could have a field representing its population or its GDP.

One way to think about it (which also happens to be the way it is represented "under the hood") is to imagine that every spatial feature is given a unique ID that corresponds to a row within a large table or spreadsheet (commonly called the attribute table, as you will see). In this way, spatial data is composed of spatial features that all have a one-to-one relationship to a row within a table.

What is (a) GIS?

GIS stands for Geographic Information System -- that much almost everyone agrees, but there is some geography-nerd controversy about some of the semantics beyond that. GIS can commonly refer to one of two things: first, GIS apps, which can manipulate and analyize spatial data much in the same way apps like Excel can manipulate and analyze tabular data, or second, the idea of geospatial data and the manipulation thereof more broadly (which is sometimes called Geographic Information Science, also, unhelpfully, acronym-ized to GIS). For our purposes, we are going to use one of two GIS apps, QGIS, or ArcMap, to clean up data before importing it into NetLogo using the GIS extension.

Our End Product:

By the end of this tutorial, you will have created a highly simplified model of disease transmission via air travel between a number of cities in the continental United States. We will model cities in their real world locations with estimations of their real-world populations. Each of these cities will have with a number of airplanes traveling between them, each with a possibility of having an infected passenger based on how many infected individuals are in the departing city.

img

Through this process, you will learn a number of valuable GIS skills, including:

Step 1: Acquiring Data

(Note that for long-term compatibility's sake, all the data you need to complete this tutorial can also be downloaded from [a mirror link we've set up] (broken-link))

While not strictly necessary, it would be a little hard to visualize the model without a background map to give some visual context. While there are a number of places you can download maps of the US states, we might as well go straight to the source and use the files provided by the Census Bureau. Scroll down until you see the subheading "States" and click on one of the options below. The multiple versions each correspond to the level of fidelity to the actual outlines of the states, with a higher number after the 1 (and thereby a lower ratio) corresponding to higher fidelity. We won't need any higher fidelity than the minimum 1:500,000 version, so download that one and move the zip somewhere you can keep track of.

If we're going to model air travel, we're going to need a map of airports, and here, we are going to learn on the good community of Natural Earth, a repository of public domain geospatial datasets supported by the North American Cartographic Information Society. They have a dataset of airports that serves our needs nicely. Go ahead and download the latest version and move the zip somewhere you can keep track of.

Finally, we are going to also go ahead and grab the Natural Earth's "Populated Places" dataset while we're here. We will use this to estimate the populations served by each of the airports we are simulating.

(For both of these natural earth point datasets, you will see that the versions we are using are labeled as having a "1:10m scale". However, unlike the scales we talked about earlier with the polygon state dataset, this "scale" does not actually signify a level of fidelity but instead a measure of how many datapoints -- cities or airports in this case -- are included or omitted. It is a bit confusing, but the way to think about it is that Natural Earth makes datasets that are meant to be used in reference maps, including printed reference maps. In a general reference map the size of a piece of letter paper, you don't want every single city you have in your database cluttering up the page because it makes it harder to see the large cities that you are more likely to care about at that scale.)

If you unzip any of these files, you will see that there are a number of different files that comprise the dataset, each with the same name and a different file extension. The most important thing to know when working with these spatial files is not to break apart these many disparate files. The ".shp" file is the main file of the bunch, but you don't need to know much more than that. I've given a quick summary of all the different kinds of files below if you're so inclined. Otherwise skip down to step 1.5.

Step 1.5: Choose your GIS app

Before we begin the tutorial proper, you have to choose which GIS app you wish to use and set it up. And really there are two clear options.

First, there is the industry-standard ArcMap. ArcMap and its cousins are made by ESRI, the commercial giant in the industry. Indeed all the file formats we are going to be working with today were developed by ESRI and have become de-facto defaults. The benefits of using ArcMap is that, if you are part of a large commercial or educational institution, its probably what everyone else in your organization is using if they use GIS and that can be a great resource if you need to ask for help. In addition, if you want to build GIS skills that are highly transferrable, then learning the de-facto default is not at all a bad route. (One thing to note: ArcMap itself is Windows only, but ESRI offers less-mature web-based GIS services that are cross platform.) However, if you are not part of an institution that gives you access to ArcMap already, then the steep cost is almost certain to turn you looking for an alternative, which brings us to QGIS.

QGIS is a free and open source alternative piece of GIS software that can more than hold its own against ESRI's offerings. It has versions for MacOS, Linux, and Windows, and stands right next to other high-quality open-source tools like Blender and LibreOffice in terms of stability and community support. What it lacks in the larger ecosystem and more team-focused features of the ArcGIS family of products, it makes up for in performance, stability, and deep extensability. For individuals or small organizations, it is a no-brainer.

For each step of the process here, I have created both a QGIS and an ArcMap set of instructions so you can follow along regardless of which tool you have chosen.

Step 2: Importing the Data & a Tour of the Interface (QGIS)

Importing the data into QGIS is as simple as opening up QGIS, unzipping the downloads, and dragging each of the .shp files into the QGIS window, starting with the airports layer, then moving on to the cities layer, and then finally the states layer. (If you get any sort of warnings, just click OK through them all for now. If you're interested in understanding what they mean, read the optional extra part of Step 3.) Each dataset will be represented within QGIS as a layer, which can be seen in the layers panel on the bottom left of the window. Just like in programs like Photoshop, these layers tell QGIS which datasets should be drawn "on top" of others. The drawing order of layers has no effect on any of manipulations or math that we'll be using, but for obvious reasons, it isn't a good idea to have the US States polygons covering up the airports and populated places point datasets. Make sure that you states dataset is below all the others by dragging it to the bottom like so:

Screen Shot 2021-01-12 at 9.07.55 AM

Note that for all these symbols, the colors are chosen randomly. You can change them by double clicking on the layer name and going to the "Symbology" tab of the window that pops up.

Now that we have all the datasets into the program, it's time to take a quick tour.Screen Shot 2021-01-12 at 9.14.41 AM

First, let's take a look at the navigation tools, found near the left of the top of the window (highlighted in red above). The hand tool handtoolpans around the map, and, with it selected, you can also zoom in or out by scrolling. Moving over two items, you have the zoom in and zoom out toolszoominout, which do exactly what they say on the tin. (One quick tip about the zoom in tool: you can draw a rectangle around the area you wish to fill up your view and it will zoom and pan accordingly.) Next is one of the most useful tools, the zoom full tool zoomfullwhich I like to think of as the zoom to fit tool. It zooms so that all of your shape features are visible. After that is the zoom to selection toolzoomselection, which zooms to fit all of the selected shape features in view, and the zoom to layer toolzoomlayer, which does the same with the currently selected map layer. Finally, looking at the last two icons in that image, those are the undo and redo zoom buttonszoomredoundo. If you accidentally zoom somewhere you didn't mean to, you can undo it with these.

Much of the rest of the tools will come up naturally throughout this tutorial, but one last useful tool you should know about is the Identify Features tool identify (highlighted in blue above). If you select this tool and click on any shape feature in the currently selected layer, a side panel will open up and display some information about that feature. (The currently selected layer is the one in the bottom-right layers panel that is underlined. You can change the currently selected layer by just clicking on the layer you want to be selected.) For example, if you were to use the zoom in tool to zoom to the continental US and then select the identify features tool and click on Texas, your window should look something like this:

Identify Tool Example

If you look on the right you can see everything that QGIS knows about that feature representing Texas. If you click on the "(Derived)" sub menu, you can see things that QGIS has calculated like the area, but everything else there is something that was specified by that feature's row in the attribute table of that dataset. In other words, the attribute table for the layer cb_2019_us_state_20m has columns called "STATEFP", "STATENS", "AFFGEOID", etc., and for Texas, those values are 48, 01779801, 0400000US48, etc. The identify tool lets you quickly surface this information for any given feature.

Step 2: Importing the Data & a Tour of the Interface (ArcMap)

When you first open up ArcMap, you will be greeted with a welcome window asking you to choose a template. The default works just fine for us so you can go ahead and click OK and continue on.

Before beginning real work with ArcMap, it is important to get something out of the way: ArcMap is, well, a bit tempermental. In addition to the rather steep learning curve, it is a program that takes its sweet time, doesn't like being rushed, and will often just decide it needs to take a break and crash on you. To make your time with ArcMap less frustrating, I have two suggestions. First, if you click on a button and nothing seems to happen, I would advise waiting a bit before trying to hit the button again -- just give it some time to think. Second, save often. Save your map often (Control + s works just fine) and save intermediate versions of data layers often (this will make sense in a bit) so that if ArcMap does decide to freeze or crash on you, you will not have lost all that much work. With those disclaimers out of the way, lets import some data

To import files into ArcMap, we use the Catalog Panel, located on the right hand side of the window. If you click on the tab that says Catalog, the Catalog Panel will show up, like so.

CatalogPanel

From here, click on the Connect to Folder Button, highlighted above with a red rectangle, and a window will pop up that allows you to select a folder to "Connect" to ArcMap. Select the folder where you saved your datasets and hit OK.

Now, it probably occured to you that this is a little bit less than straightforward. Why do you have to do this whole folder connection thing if you just want to add some data to your map. Well the first answer is that you actually can do just that by dragging any ".shp" file from your file explorer onto the ArcMap window, and for small projects this makes a lot of sense. But the second answser is that ArcMap has its own file system of sorts to organize all the geospatial data it knows about. This data can come from folders on your hard drive, like the datasets we'll be working with, but it could also come from a local server, a remote server, a geodatabase etc. To make working with all those different sources easier, there is a common intermediary layer on top of all of it: The Catalog, which we can interface with using, you guessed it, the Catalog Panel we just opened.

Now that you have a folder connection to your data all set up, you can click on the little "+" icon to the left of your new folder connection and see the contents within. Start by dragging in the ne_10m_airports.shp dataset onto your map. (Take note, the order actually is important here, as we'll see.) You can drag the ne_10m_populated_places_simple.shp dataset next (the Catalog may close each time, but you can keep it open by clicking the push pin icon), and finally the cb_2019_us_state_20m.shp dataset as well. After dragging this final layer, you will get a "Geographic Coordinate Systems Warning", which you can go ahead and close. If you're curious, read below for an explanation, otherwise, you can skip this next paragraph.

Geographic Coordinate Systems are, via a very rough similie, like units of measurement. If you want to work with measures of distance with someone else, you both need to agree on a unit. You can convert between, say, feet and meters, but it doesn't make much sense to have an architectural schematic that has some measurements in feet and some in meters. Because we dragged on the natural earth layers first, ArcMap set its own geographic coordinate system, its own internal units, to the coordinate system of those files, "WGS84" (a good default if you ever have to choose a geographic coordinate system to export in.) However our states data, being from the US Census Bureau and depicting only North America, uses a different coordinate system that is better suited to working with North American datasets. (This is where the units of measure analolgy breaks down: different geographic coordinate systems are better or worse at measuring position on different parts of the globe, and its not like centimeters are somehow better at measuring wood or feet better at measuring steel.) This warning is simply telling us that ArcMap is about to do some conversions, and that as a result, there may be some minor alignment and/or accuracy problems. At the very zoomed-out scale we're working with, however, we won't run into any issues. InterfaceAfterImport

Your interface now should look something like this, with a few possible exceptions.

First, the colors for all these symbols are chosen randomly. You can change them by double clicking on the symbol (the little dot for point data and the little rectangle for polygon data) in the layers panel on the left and modifying them in the "Symbol Selector" Panel that pops up, either by changing their color on the right or picking from a preset on the left.

Second, the order of layers may not be the same as the ones shown. To reorder the layers so that, say, the airports are not occuluded by the states layer, just drag the layer names around in the layers panel on the left.

A Quick Tour of The Interface

First, lets take a look at the navigation tools, found on the second toolbar from the top, all the way on the left. The hand tool HandPans around the map, and, with it selected, you can also zoom in or out by scrolling. Moving over to the left, you have the zoom in zoomIn and zoom out ZoomOut Tools which do exactly what they say on the tin. (One quick tip about the zoom in tool: you can draw a rectangle around the area you wish to fill up your screen and it will zoom and pan accordingly.) Next is one of the most useful tools for navigation, the "Full Extent" tool FullExtent which will zoom such that your entire map is filling the screen so that you can reset and reorientate while navigating. Finally, there is the undo and redo zoom tools, technically called the "Go back to Previous Extent" tool and the "Go to Next Extent" tools. These allow you to return to prior zoom levels if you ever accidentally zoom somewhere you didn't intend to.

Much of the rest of these tools will come up naturally throughout the course of this tutorial, but one last useful tool you should know about is the Identify tool Identiy. If you select this tool and click on any shape feature, a panel will show up and display someinformation about that feature. For example, if I were to select the Identify Tool and click on Texas, a window should pop up showing information like this: IdentifyDialog

This panel represents everything ArcGIS knows about that feature representing Texas. Each of these fields and values represent the all the data for Texas stored in the attribute table for the cb_2019_us_state_20m dataset. This is the most direct way to investigate individual features in ArcMap and can be quite useful.

Step 3: GIS Data Cleaning (QGIS)

Like all data, GIS data is rarely exactly ready to use for your application right out of the box. As an example for non-spatial data, perhaps you want to filter out all data points that are more than five years old. To do so you could use Excel and sort by collection date and delete all rows that are too old. Or perhaps you would write an SQL query to only grab recent data. Conveniently, GIS tools can do this kind of table-based cleanup, but the real power of a GIS system is in its spatial intelligence.

Selection Tool

Let's start simple. Since we don't plan on including any airports in Alaska, Hawaii, or Puerto Rico, let's go ahead and filter out those three places. We could do this non-spatially in the attribute table, but the quickest and easiest way is just to drag a selection box over the whole continental United States and save it as a new shapefile.

First on the layers menu on the bottom left of the window, click the checkboxes of both of the point layers so that the only active layer is the cb_2019_us_state_20m layer. Next, select the Select Features tool SelectFeatures . (Remember you can hover over any tool's icon to get its name). Now select the cb_2019_us_state_20m layer by clicking on it (after selection the name should appear underlined.) Once you have the selection tool active and the desired layer selected, drag a rectangle around the continental United States. All the selected features should turn bright yellow. Now, right click the cb_2019_us_state_20m layer and hover down to Export and click on Save Selected Features As.

Save Vector Layer As Dialog

A window like this should pop up. First of all, you should make sure that on the top of the window, the Format drop-down is set to "ESRI Shapefile." This is the more common of the two file formats the GIS extension can read, so it makes the most sense to just do everything in shapefiles instead of dealing with QGIS specific formats or anything like that. Once you set this once you shouldn't have to change it again.

The only other thing you need to change here is the location on your disk where you want to store the newly created shapefile. Pick a directory you can keep track of and give the exported file a descriptive name like states_continental.shp. Click OK and a new layer should show up in the layers pane. Now that you have this new layer, you can go ahead and right click on the old cb_2019_us_state_20m layer and hit Remove Layer.

Working non-destructively like this is a useful habit to get into. Often GIS operations take a not-insignificant amount of time and so it would be a shame for you to make some not-undo-able error and have to start back at the base data and work your way back to where you were. By making each step of the process a new shapefile, your directories may get a bit crowded (hence the need for descriptive names) but you have the peace of mind that you can always go back to prior versions of shapefiles.

Cookie Cutter

Now that we've narrowed down to the states that we're interested in, we have to do the same for the airports. First, enable the ne_10m_airports layer by clicking on its checkbox. What we want is to get rid of all those airports that are not within the continental states. We could do this manually with the selection tool again, but here we can leverage a GIS tool to make our lives significantly easier.

Open up the Toolbox panel by clicking on the gear icon three to the right of the inspect icon on the top row. Type "clip" into the search par within the toolbox panel and, under Vector Overlay, double click on Clip.

Clip tool in the Toolbox

This should open up a new window with some options for the Clip tool. Click on the top drop-down menu and change it to ne_10m_airports. Your window should look like this:

Clip Window

As you can read on the right of the window, Clip works like a cookie cutter. The input layer is the cookie dough and the overlay layer is the cookie cutter. Everything inside the overlay layer is kept and everything outside is discarded.

Once you click run you should see a new temporary layer with just the airports we want called Clipped. Go ahead and close the clip window and disable the original ne_10m_airports layer. (You can tell that the Clipped layer is temporary because of the little RAM/computer chip icon to the right of it.) Once you've confirmed that your clip operation did what you meant it to do, you can right click on the Clipped temporary layer and export it as a new layer by clicking on Export > Save Features As. Give it a descriptive name like ne_10m_airports_continental_US and remove the temporary layer.

Before moving on from the Clip Tool, it would be remiss not to mention that the Clip Tool can be used to clip more than just points, it can clip line and polygon features as well. Say for example you had a line dataset representing the all the highways in the United States, but only wanted the parts of those highways that were inside of the state of Texas. You could clip the highways layer by the Texas layer and it would "cut" the highways appropriatly so that everything outside of the cookie cutter of Texas would be discarded.

Using the Attribute Table

Next we want to narrow down our set of airports to only relatively major ones. Thankfully the dataset we are using has a field called "scalerank". While this metric was created to allow map makers to specify at what zoom level a certain airport should be displayed, it is an imperfect but good enough metric to select a few major airports for our toy model. Since we want to narrow down our data by using one of its fields, one convenient way to do so is to use the layer's attribute table.

We've discussed attribute tables a few times so far but have yet to open one up, so lets go ahead and do that now. Right click on the newly created ne_10m_airports_continental_US layer and click on Open Attribute Table. Your window should look like this, if it doesn't, consult the note below.

Attribute Table

(Note: In QGIS, there are two different views of the attribute table, the table view, which looks like a traditional table or spreadsheet, and the form view, which allows you to look at one row/record at a time. For this section, if you are in the form view, click on the icon in the very bottom right of the Attribute Table window and switch into the table view.)

With this table, you can navigate around by scrolling or using the scroll bars. If you wanted to Edit any of these values, you could click on the pencil icon in the top right to enter an editing session where you could click on a cell to modify it. To save your changes, you must click on the save icon to the right of the pencil to save all your session edits. (Like I mentioned before, it is often best to save a version of a shapefile before making any changes in this way because undo is not always an option.)

Getting back to our task of grabbing only major airports, click on the column header for "scalerank" to sort all the rows by this value. Then click on the row header for row number 1, hold down "Shift" and click on the row header for row number 12, the last airport with a scalerank of 2 (indicating one of the most "major" airports). The selected columns should be highlighted blue like so:

Selected Rows in the Attribute Table

If you close the attribute table now you will see that the dots representing these 12 airports have lit up yellow. This is an important thing to note: selections are shared between the attribute table view of a dataset and the spatial/map view of a dataset. Now that we have selected these major airports, we can export these selected features as a new layer like we did before by right clicking on the ne_10m_airports_continental_US layer and clicking Export > Save Selected Features As. Give it a name like ne_10m_airports_major_continental_US and remove the old ne_10m_airports_continental_US Layer.

To quickly illustrate another feature of QGIS, we are going to look at an alternate method of isolating only major airports like we just did, this time by deleting features we don't want instead of selectivly exporting those we do. Follow along if you like.

To modify features in QGIS, whether that be modifying their shape, the values of their fields, or outright deleting them, you need to be inside of an editing session. While inside an editing session, all changes you make are temporary until you manually save them. To start an editing session, select the layer you want to edit and click Toggle Editing ToggleEditing. A whole bunch of tools that were greyed-out should gain their color back upon starting an editing session.

Now that we're in an editing session, we can select the airports that we want to delete. To do so, open up the attribute table for the ne_10m_airports_continental_US layer. If you click within any of the cells of the attribute table, you'll notice that you can edit them like you would an Excel spreadsheet now that we're inside an edting session. We could once again sort by scalerank and manually select those airports with a scalerank of 2 and then use the Invert Selection button InvertSelectionto select the airports we want to delete, but instead we're going to use the Select Features By Expression toolSelectFeaturesByExpression to do the manual work for us.

Once you click on the Select Features By Expression tool itself, a dialog box should open up where you can type in a QGIS expression to be evaluated. Now, if you're familair with SQL, you can think of expressions as the WHERE clause in a SQL SELECT, but that analogy is just that, an analogy. QGIS expressinos are their own thing and only a subset of SQL features are present. Expressions are meant to be powerful enough for quick filtering and the like, but anything more complicated should really be written in python in the "Function Editor" tab. Since our use case clearly falls into the first category, we'll be using them here.

In the middle pallete of tokens, click on the "Fields and Values" header to expand it. Scroll down to "scalerank" and double click it to insert it into the expression builder on the left. Then click over to the expression builder itself and type !=. Then, on the right third of the dialog box, click on "All Unique" to get a list of possible values for "scalerank". Double click 2 to insert it into the expression builder as well. The completed expression should read "scalerank" != 2, and indeed you can just type in the expression directly as well if you prefer. In either case, to actually use the selection, click Select Features. You should see a notification in the main QGIS menu that 110 features have been selected. You can now close this dialog box and actually perform the delete with the Delete Selected Features button DeleteSelectedFeatures.

Like I mentioned above, to actually make these changes permanent, you have to manually save them. You can use the floppy-disk-save icon to save the changes directly, or you can just close the editing session with the Toggle Editing button and direct QGIS to save the changes when prompted. Remember that it often isn't ideal to work destructivly like this, and that in most cases the extra disk space taken up by the intermediary files is worth the extra workflow flexability of working non-destructivly, but its good to know the option is there for when you need it.

Spatial Joins

By now you should have a set of 49 polygons representing the 48 continental united states and DC as well as a set of 12 points representing a few major US Airports. If you look around in attribute table for the airports layer or inspect one of them with the identify features tool, you will see that Natural Earth provided us with plenty of useful information about each airport including its name, IATA airport code, its wikipedia unique ID, and more. However it does not contain the key piece of information that we need for our virus transmission model: the size of the population it serves.

Now, there are a few ways we could go about this. First of all, since we are only dealing with 12 data points, manually entering the data is a perfectly reasonable, and probably the most accurate, choice. This allows for judgement calls like noticing that Newark Airport is on our list but not JFK or LaGuardia from New York City, so we should probably make Newark's population it services encompass New York City's population as well. (This is another bit of evidence that scalerank is an imperfect metric as JFK airport served around 16 million more passengers than Newark in 2019 according to wikipedia.) But for the sake of this tutorial I'm going to show you a different way to estimate population served using one of the most powerful tools in GIS -- the spatial join.

A traditional non-spatial table join is any operation where data from two different tables is combined or joined together into a third new table with some amount of information from both. For example, an SQL inner join could be used when you have a table with literacy data about each state in one table and education funding data about each state in another. You can use an inner join to match the data up by state so that in the end you have a table with both literacy and education funding data by state.

Spatial joins are similar except that instead of matching up data rows based on a shared key, table rows are matched up based on spatial information like polygon intersection, point distances, etc. Given our US states shape dataset and our populated places dataset, various spatial joins could help us answer questions like: "What is the largest City in each state?" or "How many cities with population over 50,000 are there in each state?". Or, if we change our perspective from getting information about the cities within each state to getting information about the state each city is in, we could use a spatial join to answer the question "What state is each city in?" since our dataset doesn't have that information already.

For our purposes we are going to be asking a fairly simple question (with a more complicated question as an optional addendum): what is the population of the city nearest to each of our major airports. To answer this question, we are going to use the Join Attributes By Nearest tool (a more specific name for a kind of spatial join where the join criteria is closest distance) found within the toolbox. You can open the toolbox by clicking the toolbox icon Toolbox and search for the tool in the search bar. Once you have the tool's window up, change the input layer to our newly created ne_10m_airports_major_continental_US layer and the input layer 2 to ne_10m_populated_places_simple. Then, click on the three-little-dots button beside the "Layer 2 fields to copy" field. Scroll down and click on the check box next to "pop_max" like so:

Join Attributes By Nearest Window

Hit Run and then Close and take a look at the newly created Joined Layer temporary layer. If you open up its attribute table or inspect it with the identify features tool, you will see that in addition to all the fields it already had, it now has a "pop_max" field representing the population of the city nearest to it. (You can also hit the check box for the "name" field to get the name of that city if you want to double-check that the intended cities were chosen.)

Now we can once again save our temporary layer out to a new shapefile by right clicking on the layer and hitting Export > Save Features As. One thing you might want to do for cleanliness' sake is only export those fields that you care about since this is the final version of our airports shapefile that we are going to be importing into NetLogo. To do so, hit the Deselect All button and only select the "name", "abbrev", and "pop_max" fields. Give it a descriptive name like final_airports_with_populations.

Congratulations, you've completed the QGIS portion of this tutorial. Before we move on let's review what we learned how to do:

If you want to learn more about how to use QGIS, I recommend looking at the official documentation and tutorials at https://docs.qgis.org/. Among other things, these tutorials will teach you how to use symbology tools to create visually informative maps, perform more powerful geospatial analysis and processing operations, and work with different map projections and geospatial datums or reference frames.

More Comprehensive Population Data: (Advanced; Optional)

One flaw with our final dataset is that it assumes that each airport only services the single city closest to it, however, if you zoom in near O'Hare international airport you can see that there the city of Evanston is only a little bit farther away than Chicago and has a population that we should take into account, and we've already discussed the glaring issue with Newark Airport not taking New York City's population into account. To remedy these errors we need to employ a more sophisticated multi-step spatial analysis.

At a high level view: we are going to create a 50 mile buffer around each of our airports and sum up the populations of every city that falls within that buffer and then transfer that value back to the airport. This is not the only way to do this operation but it does illustrate that often when working with a GIS program you may need to create temporary auxiliary layers and combine a number of different geoprocessing tools to get the value that you want.

1. Clip The Populated Places Layer by the Continental States:

While it is possible that someone in Vancouver might cross the border to fly out of SEA-TAC in Seattle, it isn't all that likely, so let's go ahead and only consider US cities. Use this opportunity to check if you can use the clip tool to clip the ne_10m_populated_places_simple layer by the states_continental layer. Remember that one layer is the cookie cutter and the other is the cookie dough. Go ahead and save that clipped layer as us_cities.

2. Create a 50 mile Buffer Polygon Around Each Airport:

To create a buffer around a point, you use the buffer tool. You can find it by searching for it in the toolbox. Change the input layer to ne_10m_airports_major_continental_US and it should look something like this:

Buffer Window Degrees

One thing you may notice in this window is that for the distance field, the only unit available to you is degrees, but we want to create our buffer with a distance of 50 miles. What's with that?

Well right now all our datasets are a geographic coordinate system, or a coordinate system that is defined by distance in degrees of latitude and longitude (more specifically longitudinal distance from the prime meridian and latitudinal distance from the equator). All our data is in degrees and QGIS doesn't know how to implicitly convert into meters that we can use to create our desired buffer. To use a linear surface distance measure like meters, we need to reproject our data into a projected coordinate system, or a coordinate system that is defined by simple cartesian distance.

Step 2.5: Reprojecting our data

We have a number of different choices of which map projection, and therefore projected coordinate system, we want to use. (For a list of many different map projections, see this list on Wikipedia.). For our purposes, since we are trying to compare the distances between various points, we want to pick a map projection that prioritizes preserving distance. Generally the tradeoff when choosing a map projection is between higher accuracy at over a smaller area and lower accuracy over a larger area, so you often want to choose a map projection that is specifically designed to perform well at your chosen scale in your specific area of interest. For this reason, we are going to reproject our airports dataset into QGIS's built in "USA_Contiguous_Equidistant_Conic" projection because it does a good job of preserving distances within the continental US.

To reproject we simply start exporting our ne_10m_airports_major_continental_US layer like we have before with Export > Save Features As and hit the little globe icon to the right of the CRS drop-down. This will open up a map projection picker that lets you search through every map projection that QGIS knows about. Type "USA_Contiguous_Equidistant_Conic" into the search bar at the top and double click on the only result. Save the layer as ne_10m_airports_major_continental_US_Equidistant_Conic (I told you we'd be dealing with long descriptive file names) and hit OK.

Reprojecting

While normally the point for the new layer would be directly on top of the old points, after all, they are the same location just represented differently, but sometimes QGIS doesn't have a good time reading in map projections and gets confused. While this may not have happened on your machine/version of QGIS, it did on mine so here's how to fix it -- its a useful skill to have either way. Simply right click on the newly created layer and click Set CRS > Set Layer CRS. A similar map projection selection window should pop up. Simply select the "USA_Contiguous_Equidistant_Conic" projection we used to save the file and the points should show up in the right place.

Now that we have created an airports layer that is relative to a projected coordinate system, when we open up the buffer tool and select our new ne_10m_airports_major_continental_US_Equidistant_Conic as the input layer, we should be able to create a buffer of the 50 miles that we intended.

Buffer with 50 miles

If we hit Run, we should see a few not-quite-circles show up on our map like so:

Buffers

These buffers are not perfectly circular because we are looking at a circle that was generated in one map projection but is being displayed in a different one and when you go from one projection to another there is always distortion somewhere. If we were to change the QGIS display map projection by clicking on the button on the bottom right hand of the window that says "EPSG: 4326" (the current display map projection) and set it to "USA_Contiguous_Equidistant_Conic", the map projection the buffers were created relative to, then they would look like perfect circles again:

Perfect circles again

(If you want to return to the original default map projection, choose "WGS 84" from the display projection selection menu)

Step 3: Spatial Joins, Revisited

If you turn the us_cities layer back on, you should see that we now have circles of 50 mile radius surrounding each airport and containing a number of cities beyond just the single nearest. Our challenge now is to take each city within each of those buffers and sum up their populations. To do this, we will use a Summary Spatial Join using the Join Attributes by Location (summary) tool in the toolbox (which can be found by searching).

Set the input layer to our newly created buffer layer, Buffered and set the Join layer to us_cities. This arrangement will allow us to take values from the cities layer and join them onto rows of the Buffered layer. Now we need to select which columns from the join layer (the cities) to summarize and which summary statistic we want to use. For Fields to summarise, click the three little dots and hit the checkbox for "pop_max" and for Summaries to calculate, hit the three little dots and hit the checkbox for "sum". Once you hit Run, there should be a new Joined layer layer in the layer pane. Select that layer and use the Identify tool to check that your calculation was successful. If it was, there will be a new "pop_max_sum" field on each of the circles. (You should also take note here that the buffer operation transfers all data fields from the shapes being buffered into the buffers themselves.)

Step 4: Joining back to the airports point layer.

Now that we have a layer of buffer polygons that know their own aggregate population, we can spatial join that information back onto our airports point layer. To do so, open up the Join Attributes by Location tool from the toolbox and set the Base Layer to be the airports layer and the Join Layer to be the newly created temporary Joined layer. Now we can use the three little dots next to the Fields to add selector and just hit the checkbox for the "pop_max_sum" field. You should see a newly created Point layer (unhelpfully also) called Joined layer. If you inspect one of these points, you should see that it has a new "pop_max_sum" field the represents the sum of all the populations of all the cities within 50 miles, exactly what we wanted to calculate.

Step 5: Rename the field name

If you are going to be moving forward with this tutorial having finished these steps, you probably want to rename the population field from "pop_max_sum" to "pop_max" so that you can continue to follow along without changing the name in the NetLogo code. To do so, double click on the new point Joined layer and select the Fields tab from the sidebar. Here you should see a table with a bunch of rows representing field names and properties. To make edits, you have to start an editing session by clicking the pencil icon near the top of the window. Once in an editing session, you can simply click on the "pop_max_sum" text and change it "pop_max".

Changing the field name

Once you've made the edit, you can click on the pencil icon again and say that you want to save your changes. Go ahead and exit that window and export this point Joined layer as final_airports_with_populations like you did with the prior version. You're now ready to move on to the NetLogo section of the tutorial.

Step 3: GIS Data Cleaning (ArcMap)

Like all data, GIS data is rarely exactly ready to use for your application right out of the box. For non-spatial data, perhaps you would want to filter out all data points that are more than five years old and so you could use Excel and sort by collection date and delete all rows that are too old. Or perhaps you would write an SQL query to only grab recent data. Conveniently, GIS tools can do this kind of table-based cleanup, but the real power of a GIS system is in its spatial intelligence.

Selection Tool

Let's start simple. Since we don't plan on including any airports in Alaska, Hawaii, or Puerto Rico, let's go ahead and filter out those three places. We could do this non-spatially in the attribute table, but the quickest and easiest way is just to drag a selection box over the whole continental United States and save it as a new shapefile.

First on the layers menu on the bottom left of the window, click the checkboxes of both of the point layers so that the only active layer is the cb_2019_us_state_20m layer. Next, select the Select Features tool SelectRectangle . (Remember that You can hover over any tool's icon to get its name.) Once you have the selection tool active, drag a rectangle around the continental United States. All the selected features should gain a bright blue outline. Now right click on the cb_2019_us_state_20m layer and click on Data > Export Data. ExportDataDialog A window like this one should pop up. Make sure that the "Export" drop-down on top is set to "Selected Features" and then click on the "Browse" button, highlighted above in red. Navigate to where you want to save the data (you may need to create a new folder connection like we did when importing the data) and then give it a dscriptive name like states_continental. In the "Save as type" drop-down menu, select "Shapefile". This is the only format ArcMap can export to that NetLogo will read, so it makes sense to do just about everything in shapefiles instead of dealing with the other ArcMap-specific formas. Once you've made this change, you can go ahead and click Save. and then afterwards, OK on the "Export Data" dialog box. ExportDataNavigator A pop-up will appear asking if you want to "add the exported data to the map as a layer?". Click yes and it should appear. Right click on the old cb_2019_us_state_20m layer and click "Remove" to remove it from the map.

Working non-destructively like this is a useful habit to get into. Often GIS operations take a not insignificant amount of time and so it would be a shame for you to make some not-undo-able error and have to start back at the base data and work your way back to where you were. By making each step of the process a new shapefile, your directories may get a bit crowded (hence the need for descriptive names) but you have the peace of mind that you can always go back to prior versions of shapefiles.

Cookie Cutter

Now that we've narrowed down the states that we are interested in we have to do the same for the airports. First, enable the ne_10m_airports layer by clicking on its checkbox. What we want is to get rid of all those airports that are not within the continental states. We could do this manually with the selection tool again, but here we can leverage a GIS tool to make our lives easier.

Open up the Geoprocessing menu and click on Clip tool.

ClipToolDropDown

This will open up a new Window like so. As the tooltip on the right demonstrates (if your tooltips aren't showing up, hit the Show Help >> button -- they really are quite useful), the Clip tool acts like a cookie cutter. The input layer is the cookie dough and the "Clip Features" layer is the cookie cutter. Everything inside the overlay layer is kept and everything outside is discarded. ClipDialogBox Since we want to keep all of the airports within the continental US, we select the ne_10m_airports layer as our input layer and our states_continental layer as our "Clip Features" layer, or cookie cutter. Give the output a descriptive name like ne_10m_airports_continental_US and hit OK. After a bit the newly clipped layer will appear in the map and you can remove the original ne_10m_airports layer.

Before moving on from the Clip Tool, it would be remiss not to mention that the Clip Tool can be used to clip more than just points, it can clip line and polygon features as well. Say for example you had a line dataset representing the all the highways in the United States, but only wanted the parts of those highways that were inside of the state of Texas. You could clip the highways layer by the Texas layer and it would "cut" the highways appropriatly so that everything outside of the cookie cutter of Texas would be discarded.

Using the Attribute Table

We have all of the airports in the US, but now we want to narrow down our set of airports to only relatively major ones. Thankfully the dataset we are using has a field called "scalerank". While this metric was created to allow map makers to specify at what zoom level a certain airport should be displayed, it can be used as an imperfect, but good enough, metric to select a few major airports for our toy model. Since we want to narrow down our data by using one of its fields, one convenient way to do so is to use the layer's attribute table.

We've discussed attribute tables a few times so far but have yet to open one up, so lets go ahead and do that now. Right click on the newly created ne_10m_airports_continental_US layer and click on Open Attribute Table.

AttributeTable

With this table you can navigate around by scrolling or using the scroll bars. If you had any features in the dataset selected, you could click on the Show Selected Records button (highlighted in red at the bottom) to only display those rows associated with the selected features. If you wanted to invert the selection, such that all currently selected features are deslected and vice versa, you can click the Switch Selection button, highlighted in green at the top.

Getting back to our task of grabbing only major airports, double-click on the column header for "scalerank" to sort all the rows by this value. Then click on the row header (the littel box on the left of each row) for the first row, hold down "Shift" and click on the row header for the twlfth row (San Francisco Int'l airport, or SFO). The selected columns, all the rows with a scalerank of 2, should be highlighted blue like so:

AttributeTableAirportsSelected

Once you have these features selected, close the attribute table and you should see the twleve points on the map cooresponing to the twleve rows we selected highlighted in blue as well. This is an important thing to note: selections are shared between the attribute table view of a dataset and the spatial/map view of a dataset. Now that we have selected these major airports, we can export these selected features as a new layer like we did before by right clicking on the ne_10m_airports_continental_US layer and clicking on Data > Export Data and exporting it as a shapefile like we did after selecting just the continental states. Give it a name like ne_10m_airports_major_continental_US.

To quickly illustrate some more features of ArcMap, we are going to look at an alternate method of isolating major airports like we just did, this time by deleting features we don't want instead of exporting those we do. Follow along if you like.

To modify features in ArcMap, whether that be modifying their shape, the values of their fields, or outright deleting them, you need to be inside of an editing session. While inside an editing session, all changes you make are temporary until you manually save them. To start an editing session, click on the Editor Toolbar button EditorToolbarButton which will open up the editor toolbar itself, which looks like this:

EditorToolbar

Click on Editor and then Start Editing. A pop up will appear asking what layers you want to edit. Select the ne_10m_airports_continental_US layer and the click OK. If you get an error about the "Spatial reference", just click through it by clicking Continue.

Now that we're in an editing session, we can select the airports that we want to delete. To do so, open up the attribute table for the ne_10m_airports_continental_US layer. If you click within any of the cells of the attribute table, you'll notice that you can edit them like you would an Excel spreadsheet now that we're inside an edting session. We could once again sort by scalerank and manually select those airports with a scalerank of 2 and then use the Switch Selection button to select the airports we want to delete, but instead we're going to use the Select by Attributes feature. Open up the tool by clicking on the Select by Attributes button SelectByAttributesButton which will open up the dialog box below.

SelectByAttributes

This dialog box allows us to construct SQL SELECT queries to create selections based on the field values of features. (For more about the specifics of the SQL dialect used by ArcMap, you can click on the Help button in this toolbar.) The top list displays all the field names you can use in your queries, which you can double-click to insert into the bottom text field. Start by double-clicking on the "scalerank" field name to insert it into the statement-builder, then click on the "<>" button (the not equals symbol in this dialect of SQL) to insert it as well. At this point you can click Get Unique Values and a list of all unique values for scalerank for all the features in the table will be listed. Double-click the "2" to insert it as well, such that the final statment (including the implied prefix supplied by ArcMap) reads: SELECT * FROM ne_10m_airports_continental_US WHERE "scalerank" <> 2 Click Verify to ensure that the statement is entered correctly, then hit Apply. Now, if you move the attribute table so you can see the map below, you will see that most of the airports will be slected and highlighted in blue. To delete these small airports, hit the Delete Selected button DeleteSelectedat the end of toolbar.

Now remember, because we started an editing session to makes these changes, we have to now save and exit the editing session to keep them. To do so, go back to the editor toolbar and underneath were you earlier clicked to start the editing sesion, click Stop Editing and, when prompted, keep the changes. You will now see that only the same twleve airports from our earlier method remain.

Whichever method you used, we are going to go forward assuming that you have a layer with only those twelve airports which we will refer to as ne_10m_airports_major_continental_US

Spatial Joins

By now you should have a set of 49 polygons representing the 48 continental united states and DC as well as a set of 12 points representing a few major US Airports. If you look around in attribute table for the airports layer or inspect one of them with the identify features tool, you will see that Natural Earth provided us with plenty of useful information about each airport including its name, IATA airport code, its wikipedia unique ID, and more. However it does not contain the key piece of information that we need for our virus transmission model: the size of the population it serves.

Now, there are a few ways we could go about this. First of all, since we are only dealing with 12 data points, manually entering the data is a perfectly reasonable, and probably the most accurate, choice. This allows for judgement calls like noticing that Newark Airport is on our list but not JFK or LaGuardia from New York City, so we should probably make Newark's population number encompass New York City's population as well. (This is another bit of evidence that scalerank is an imperfect metric as JFK airport served around 16 million more passengers than Newark in 2019 according to wikipedia.) But for the sake of this tutorial I'm going to show you a different way to estimate population served using one of the most powerful tools in GIS -- the spatial join.

A traditional non-spatial table join is any operation where data from two different tables is combined or joined together into a third new table with some amount of information from both. For example, an SQL inner join could be used when you have a table with literacy data about each state in one table and education funding data about each state in another. You can use an inner join to match the data up by state so that in the end you have a table with both literacy and education funding data by state.

Spatial joins are similar except that instead of matching up data rows based on a shared key, table rows are matched up based on spatial information like polygon intersection, point distances, etc. Given our US states shape dataset and our populated places dataset, various spatial joins could help us answer questions like: "What is the largest City in each state?" or "How many cities with population over 50,000 are there in each state?". Or, if we change our perspective from getting information about the cities within each state to getting information about the state each city is in, we could use a spatial join to answer the question "What state is each city in?" since our dataset doesn't have that information already.

For our purposes we are going to be asking a fairly simple question (with a more complicated question as an optional addendum): what is the population of the city nearest to each of our major airports.

To answer this question, we will need to open up the Spatial Join tool. You can find it by clicking on Geoprocessing and then slecting Search for Tools. In the panel that pops up on the right, you can type "Spatial Join" and it should pop up under "Spatial Join (Analysis)". Double-click on it and the Spatial Join dialog box should pop up.

Now, there are a lot of settings to change here, so we'll go through them one-by-one, but before you hit OK your dialog box should look like this:

SpatialJoinExample

Once you hit OK, wait a bit and you will see the new layer show up in the layers panel. If you remove the old airports layer and use the inspector on any point in the new layer, you should see that in addition to all of the fields the ariports had initially, they also have fields from their nearest city. For example, if I click on Seattle's airport in the top left of the map, I will see that it still has the airport fields like its name and its abbreviation, and once I scroll down I will see that it also has fields from the Seattle point in the populated places layer like the city name and, most importantly, the population we need for our model.

Congratulations, you've completed the ArcMap portion of this tutorial. Before we move on let's review what we learned how to do:

If you want to learn more about how to use ArcMap, I recommend looking at the official documentation and tutorials from Esri. Among other things, these tutorials will teach you how to use symbology tools to create visually informative maps, perform more powerful geospatial analysis and processing operations, and work with different map projections and geospatial datums or reference frames.

More Comprehensive Population Data: (Advanced; Optional)

One flaw with our final dataset is that it assumes that each airport only services the single city closest to it, however, if you zoom in near O'Hare international airport you can see that there the city of Evanston is only a little bit farther away than Chicago and has a population that we should take into account, and we've already discussed the glaring issue with Newark Airport not taking New York City's population into account. To remedy these errors we need to employ a more sophisticated multi-step spatial analysis.

At a high level view: we are going to create a 50 mile buffer around each of our airports and sum up the populations of every city that falls within that buffer and then transfer that value back to the airport. This is not the only way to do this operation (in fact you can do it in a single spatial Join) but it does illustrate that often when working with a GIS program you may need to create auxiliary layers and combine a number of different geoprocessing tools to get the value that you want.

1. Clip The Populated Places Layer by the Continental States:

While it is possible that someone in Vancouver might cross the border to fly out of SEA-TAC in Seattle, it isn't all that likely, so let's go ahead and only consider US cities. Use this opportunity to check if you can use the clip tool to clip the ne_10m_populated_places_simple layer by the states_continental layer. Remember that one layer is the cookie cutter and the other is the cookie dough. Go ahead and save that clipped layer as us_cities.

2. Create a 50 mile Buffer Polygon Around Each Airport:

To create a buffer around a point, you use the buffer tool. You could search for it like we did with the spaital join tool, but you can find it quickly by clicking on Geoprocessing > Buffer. It should look something like this:

BufferDialog

The buffer tool, like the help pane illustrates, creates series of circular polygons around each of our points. (The buffer tool can also be used on lines and polygons, in which case it "grows" them by a given distance.)

Set the input layer to our ne_10m_airports_continental_US layer, the output file name to ne_10m_airports_continental_US_50mi_buffer.shp, and the distance from to 50 and the unit from "Decimal Degrees" to "Miles". Since none of our buffers will overlap, we don't need to think about what behavior we want for dissolving.

Once you hit OK, a new layer should appear like so:

CreatedBuffers

Now if you take a look at this map and think that the buffer around Seattle's airport looks a little squished, well, you're not wrong it totally is on this map. However, if we take this same exact shapefile we just created and open it up in a virtual globe like ArcGlobe (another program made by Esri; you can think of it as a mashup of ArcMap and Google Earth), then Seattle's circle will look, well, actually circular. Here's Seattle's buffer on the left followed by Chicago and NYC's. BufferSizesInArcGlobe

This illustrates an important thing to keep in mind when working with GIS: even though you will spend all of your time looking at 2D maps, the globe just isn't 2D and will refuse to cleanly be smooshed down onto just two dimensions. The buffer tool, by default when working with Geographic Projections like WGS84 (the projection we are using), will calculate distance based on geodesic distance, or distance according to the shortest arc you can draw between two points on a globe. This is why on a globe, even a 2D "image" of a globe like the one on a computer screen, the circles all look circular, but when you try to flatten them out onto WGS84, they get distorted.

If you were to change the display coordinate system however, the circles would be distorted differently. To change coordinate systems in ArcMap, right click the map display area and click Data Frame Properties at the bottom. Switch over to the "Coordinate System" tab and pick one of the many choices built into ArcMap. If we switch over to "USA_Contiguous_Equidistant_Conic" (which you can find by searching for "102005", an ID that ArcMap recognizes), then suddenly our circles look a lot less distorted because this coordinate system was designed to minimize distance distortion within the continental US. (If you want to go back to the prior coordinate system, you can search for "GCS_WGS_1984" in the search box.)

ReprojectedBuffers

Step 3: Spatial Joins, Revisited

If you turn the us_cities layer back on, you should see that we now have circles of 50 mile radius surrounding each airport and containing a number of cities beyond just the single nearest. Our challenge now is to take each city within each of those buffers and sum up their populations. To do so, we are going to return to our old friend, the Spatial Join.

This time around, we want the target layer to be ne_10m_airports_continental_US_50mi_buffer.shp and the join layer to be the recently created us_cities layer. This way, our 12 buffer polygons take on the values from all the cities within their boundaries. Give it a name like airport_buffers_with_surrounding_populations.shp and make sure the "Join Operation" is "JOIN_ONE_TO_ONE". If we were to hit "OK" now, then not much useful would happen because the default behavior when there are multiple possible values to grab during a spaital join is just to grab whichever one ArcMap happens to see first. However, when working with numeric data, we can change that "Merge rule" to any number of things from the mean, the mode, the minimum, maximum, standard deviation, etc. For us, we just want to sum up all the populations, so the "Sum" merge rule is what we're looking for. Now, merge rules are defined per field, so scroll down in the center "Field Map of Join Features" table until you find pop_max (Long) . Right click on it, go to "Merge Rule", and select "Sum". After that, you can hit OK and wait for the operation to complete.

Spatial Join Merge Rule

Step 4: Joining back to the airports point layer.

Now that we have a layer of buffer polygons that know their own aggregate population, we can spatial join that information back onto our airports point layer. To do so, open up the Spatial Join tool one last time and set the target layer to be the`ne_10m_airports_continental_US.shp layer and the join layer to be our just-created airport_buffers_with_surrounding_populations layer. Give it a descriptive name like final_airports_with_surrounding_populations.shp and hit OK as all the defaults should work just fine. If you inspect one of these newly created airport points, you should see that it has a new "pop_max_sum" field the represents the sum of all the populations of all the cities within 50 miles, exactly what we wanted to calculate. (If you want to use these more accurate numbers in the model itself, just make sure to specify the right file name when importing).

Step 4: Finally Some NetLogo

If you had any issues getting any of these GIS steps to work, I have provided final versions of all the files you'll need to import into NetLogo so you can continue on with the tutorial if you wish.

Now that we're finally done with data preprocessing, we can get to work on actually using the GIS extension. Go ahead an open up a new NetLogo file and write extensions [gis] at the top of the code tab. This tells NetLogo that we want to be using the GIS extension and will allow us to use all the primitives defined by the GIS extension if we preface them with gis:.

Importing & Drawing Shapefiles

To get our feet wet, let's start by importing our states shapefile to use as a background/basemap. Note that like all other file import methods in NetLogo, the filenames we use will be relative to the .nlogo file itself. I've set up my project so that all the shapefiles I am going to be using are in the same directory as the .nlogo file, but you might have to change the file paths if you've set up your project differently for some reason.

This code simply loads in the states_continental.shp file and keeps it in a global variable so we can use it anywhere.

While most NetLogo models use a setup and go paradigm, when working with GIS datasets, I like to split up my reset-the-model code from my load-in-my-data code, just because when you're working with large datasets, re-parsing the datasets can be time consuming and slow down your setup procedure. As such, I use clear-all in the load-datasets procedure and selectively use clear-xxxx primitives in my setup procedure like so. The one thing you have to remember when structuring your code in this way is to always manually reset any global variables you may be using aside from your dataset variables.

We're almost ready to draw our states, but first we need to tell the GIS Extenison one last thing: what part of our globe we want to display. The term for this is "envelope", but if it helps you can just substitute our "envelope" for "bounding box" in your head. Either way, it is a set of two points that describe the extents of what we want to show. While you can certainly set these numbers manually, it is much simpler to just tell the extension to just "zoom to fit all of this dataset" and it'll calculate the appropriate values accordingly. So before we draw anything, we should tell NetLogo to set its world envelope to the envelope of the states dataset by adding this to the setup block:

Now we can finally draw our states. First we set a drawing color (using the NetLogo colors you are familiar with), then we use the gis:draw command, supplying the dataset we want to draw and a line width as arguments.

Finally, add a load-datasets button and a setup button to the interface and set the world dimensions to 32 x 16, and while we're here, change the view updates from "continuous" to "on ticks". At this point you should be able to click load-datasets and then setup and see a map drawn like this.

Just drawing the states

Before we go on, here's all the code so far in one place.

Setting Up the Airports

Our next step is to set up our airport agents, which will hatch airplanes and keep track of how many infected individuals are present in their vicinity. We will soon turn each point in the point dataset of airports into its own NetLogo agent, but first let's just import the dataset and draw it using the gis:draw command to ensure it is being loaded correctly.

As before, to load a dataset you need to create a global variable to store the dataset and then set the value of that global variable to be the result of gis:load-dataset.

To verify that the airport points were loaded like we expected them to, we can use gis:draw like we did for the polygon states dataset. This time instead of representing the width of a line, the second, numerical, input represents the radius of the circle that represents each point. When you run your updated code, you should see a number of red circles above each airport in the dataset.

Now it is time to create turtles for each of the points in the dataset, and here is where things get a bit tricky if NetLogo is your primary experience programming. In order to create a turtle for each point in the dataset, we need to use a not very commonly used feature of NetLogo, the foreach loop.

The foreach loop

The foreach loop runs a section of code over and over again for each item in a list. (If you have programmed before, this is similar to the for item in items: pattern in Python, the for (int num : numbers) pattern in Java or C++, or the foreach (int num in numbers) pattern in C#.) For example, if we had a list of numbers [1 2 3 42 3.14 69105] and wanted to print out each of them times two, we would write:

In general, the syntax works like this: foreach list-of-items [item -> (do something with that item)]. Remember that a list is different from an agentset in NetLogo. An agentset can only contain agents and it implies that there is no order among the agents within it. This is why we need a foreach loop to create turtles based on each point instead of using ask. ask only works on agentsets and the point features in a shapefile are not agents. A list, on the other hand, can contain any kind of value and implies a specific order, which, not coincidentally, is the order in which the items are "visited" in a foreach loop.

(If you're curious about the mechanics of the -> syntax, then you can read more about anonymous procedures in NetLogo in the NetLogo documentation.)

Creating the agents

Now that we know how to use foreach loops, we can go through each point feature in our airports-dataset and create a turtle for each of them.

First, create a breed for airports like so:

Second, go into your setup command, remove the gis:draw command that draws the airports and insert this bare-bones version of our foreach loop:

If you run this new code, you should see default NetLogo turtles at the location of each airport. This is probably the most complicated section of code we'll write today, so let's go through it line by line.

If you remember the foreach syntax from before, then this code should seem mostly familiar. The only new addition is the command gis:feature-list-of. This command looks into the input dataset (final_airports_with_populations.shp in this case) and reports back a list of each individual feature within it. In this case, since the airports dataset is a dataset of point features, gis:feature-list-of will report back a list of every point in the dataset, represented by the GIS extension as a "VectorFeature" object. (the VectorFeature object is used regardless of the actual shape type. For example, if we were to run gis:feature-list-of with our states-dataset ,it would report back a list of shape features, but they would also be represented by VectorFeatures.)

In this line, we create a local variable location and store the location of the center of this-airport-vector-feature within it. It may seem counterintuitive that we need two gis: commands to get the location of a point feature, but remember, all our code knows is that this-airport-vector-feature is a VectorFeature, it doesn't know if it is a point, a line, or a polygon, and it doesn't much make sense to try to get the location-of a polygon or a line. Before we can get a location, we need to take the VectorFeature and turn it into a single point, called a "Vertex" in the context of the GIS extension. To do this we use gis:centroid-of which reports the "center of mass" of a given feature, which in the case of a point, is just the center of the point. Once we have that Vertex, we can pass it into gis:location-of and finally get a two-item list representing an xcor and ycor.

These lines simply create an airport and set its xcor and ycor to the xcor and ycor that we just stored in location (the 0th and 1st item of the list, respectively.) At This point, if you run your setup procedure, you should get something like this:

Screen Shot 2020-12-29 at 10.24.25 AM

Filling in the data

Now that we have the spatial component of our data loaded into NetLogo, it is time to plug in the second piece of our GIS dataset, the feature properties themselves. If you remember, each feature in a GIS dataset is associated with a row of data in the attribute table, and each row has a value for each column, or field of data. For our airports dataset in particular, each airport feature has a "name", an abbreviation ("abbrev"), and a population size that it services ("pop_max"). To get these values out of the dataset and into properties of our airport turtles, we use the gis:property-value command, which takes as input an individual VectorFeature and a string representing the name of the field. So to grab the population, name, and abbreviation from the dataset and add them as properties of our turtle, add these lines of code within the create-airports block:

To verify that the fields were loaded correctly, you can add a set label abbrev below and, after running this new code, you should see each airport has a label that coo responds to their abbreviation code.

Before we move on to creating the NetLogo model itself, you should take a second to think back on what you have learned and pat yourself on the back. We will touch on exporting data out of NetLogo back into a GIS format at the end, but with just the skills you've learned so far, you can probably already accomplish whatever task led you to learn how to use the GIS NetLogo extension in the first place. You can download data in the proper formats, cleanup and manipulate that data in a GIS program, and import it into a NetLogo world. That's no small feat.

Update: The create-turtles-from-points primitive added in version 1.3.0 of the GIS extension

In version 1.3.0 of the GIS extension, a pair of new primitives were added to make the process of creating turtles from point datasets easier. Instead of a foreach loop, you can use create-turtles-from-points or create-turtles-from-points-manual to automatically create one turtle for each point in your points dataset and automatically populate their turtles-own variables with values from their GIS counterpart. In general, the process is to add variables to the breed you want to convert points into with the same names as the property/field names in your dataset and then use create-turtles-from-points with the dataset and the name of the breed to create the turtles. Like the standard create-turtles primitive, you can add in a command block to give the newly created turtles commands upon creation.

To substitute in the create-turtles-from-points primitive into the setup procedure above, you would replace the foreach loop with the following:

The Model Itself

Before continuing work on the model, there are a few quick things we need to do as setup. First, add a quick line to set the pcolor of all the patches to a navy within the setup command: ask patches [ set pcolor blue - 3 ] and add a line to the create-airports block that sets each airport's shape to a circle: set shape "circle". Switch back to the interface tab and click on the Settings button and disable vertical and horizontal wrapping (otherwise our planes would be able to magically transport between the Atlantic and Pacific). While you're in the interface tab, add a new Input panel of type "Number" to the interface and name it growth. This value represents the growth rate of the disease within each city. I set mine at 0.1 to start. Finally, add a go forever button to handle running the model.

Here's our code as it stands, just in case you want to catch up:

The next step is to set up the infection modeling within each airport (keeping in mind that we are representing the entire region that an airport is servicing within each airport turtle). Go back to the code and after your set label command within the create-airports block, go ahead and add these lines of code that set an initial infected population and setup some visual characteristics for each airport.

After that, we want to setup the air traffic routes that our airplanes will travel along, and NetLogo links are the perfect choice here. After the close of the foreach block, add in this quick command that will create links between each airport and up to 3 of its surrounding airports.

Next, we need to start the infection somewhere, so go ahead and randomly pick an unlucky city to have a patient zero.

That concludes our setup command, leaving us now ready to create some airplanes.

Airplanes

To start, create a new airplanes breed at the top of your program:

For our highly simplified model, we pretend like each airplane only carries one passenger and that passenger can either be infected or not infected.

To make our code a bit cleaner, we're going to encapsulate the behavior of hatching planes into a command that our airports will run called hatch-departing-planes. It will create a list of destinations reachable from the current airport (as represented by link-neighbors) and will have a 50% chance of hatching an airplane each tick. If an airplane is hatched, it will have its destination set to one of the possible destinations. It will then face that destination and finally, decide if it has any infected passengers on board. For each plane, there is a x% chance that it will be infected where x = (infected-population / total_population). Throw in a few visual changes, and we're done.

Next we want to give our planes a new command, fly so that they can, well, fly. Because we had them face their destination already, they can just fly straight, but we do need them to stop and offload their passenger when they reach their destination. To accomplish this, we have them check if they are within one unit of their destination and, if so, die and have their passenger deplane, possibly bringing their infection with them.

Now that we have a behavior for the airports (hatching planes) and a behavior for the planes, we can put them together in a go procedure and check that everything is working as expected.

If you flip back to the interface tab and run your go procedure, you should see a bunch of white planes flying between different airports. Since there is still only one infected case in the whole model, it is highly highly unlikely that any of the airplanes will carry that infection to another city. However, unfortunately, viral disease is rarely so contained. The next step is to have each city's infected-population rise at a rate proportional to how large the infected population currently is. We're going to encapsulate this behavior inside a command called calculate-infections:

This equation is modeled off a typical logistic model of population growth and its principal characteristic is that it starts growing exponentially when there are many susceptible (not-infected) individuals it can infect but slows down once most of the population already has been infected and stops growth completely once there is no more susceptible population. Most models that model disease also have a concept of individuals that have recovered from the model and are no longer susceptible, but our simulated people are not so lucky and will remain permanently infectious.

In order to visualize this infection growth, let's use scale-color to recolor our airports according to how much of their population is infected. Lets call this command recolor-airport

Throw these in to the ask airports block inside the go procedure and we've got our model!

Here's our final model & code in case you need a reference, as well as a download link to [a working version with all associated files] (broken-link):

model-running