将原始数据读入大熊猫


14

是否可以将原始数据读入a geopandas GeoDataFrame,la a pandas DataFrame

例如,以下工作:

import pandas as pd
import requests
data = requests.get("https://data.cityofnewyork.us/api/geospatial/arq3-7z49?method=export&format=GeoJSON")
pd.read_json(io.BytesIO(r.content))

以下不是:

import geopandas as gpd
import requests
data = requests.get("https://data.cityofnewyork.us/api/geospatial/arq3-7z49?method=export&format=GeoJSON")
gpd.read_file(io.BytesIO(r.content))

换句话说,是否可以在不先将数据保存到磁盘的情况下读取内存中的地理空间数据?

Answers:


16

您可以将json直接传递给GeoDataFrame构造函数:

import geopandas as gpd
import requests
data = requests.get("https://data.cityofnewyork.us/api/geospatial/arq3-7z49?method=export&format=GeoJSON")
gdf = gpd.GeoDataFrame(data.json())
gdf.head()

输出:

                                            features               type
0  {'type': 'Feature', 'geometry': {'type': 'Poin...  FeatureCollection
1  {'type': 'Feature', 'geometry': {'type': 'Poin...  FeatureCollection
2  {'type': 'Feature', 'geometry': {'type': 'Poin...  FeatureCollection
3  {'type': 'Feature', 'geometry': {'type': 'Poin...  FeatureCollection
4  {'type': 'Feature', 'geometry': {'type': 'Poin...  FeatureCollection

对于受支持的单文件格式或压缩的shapefile,可以使用fiona.BytesCollectionGeoDataFrame.from_features

import requests
import fiona
import geopandas as gpd

url = 'http://www.geopackage.org/data/gdal_sample.gpkg'
request = requests.get(url)
b = bytes(request.content)
with fiona.BytesCollection(b) as f:
    crs = f.crs
    gdf = gpd.GeoDataFrame.from_features(f, crs=crs)
    print(gdf.head())
以及用于压缩的shapefile(从fiona 1.7.2开始受支持)
url = 'https://www2.census.gov/geo/tiger/TIGER2010/STATE/2010/tl_2010_31_state10.zip'
request = requests.get(url)
b = bytes(request.content)
with fiona.BytesCollection(b) as f:
    crs = f.crs
    gdf = gpd.GeoDataFrame.from_features(f, crs=crs)
    print(gdf.head())

您可以使用以下方法找出Fiona支持的格式:

import fiona
for name, access in fiona.supported_drivers.items():
    print('{}: {}'.format(name, access))

还有一种用于在fiona 1.7.1或更早版本中读取内存压缩数据的解决方法:

import requests
import uuid
import fiona
import geopandas as gpd
from osgeo import gdal

request = requests.get('https://github.com/OSGeo/gdal/blob/trunk/autotest/ogr/data/poly.zip?raw=true')
vsiz = '/vsimem/{}.zip'.format(uuid.uuid4().hex) #gdal/ogr requires a .zip extension

gdal.FileFromMemBuffer(vsiz,bytes(request.content))
with fiona.Collection(vsiz, vsi='zip', layer ='poly') as f:
    gdf = gpd.GeoDataFrame.from_features(f, crs=f.crs)
    print(gdf.head())

这适用于GeoJSON,它回答了这个问题。但这不适用于其他地理空间文件格式,例如shapefile或KML或KMZ。您知道这些情况的解决方法吗?
Aleksey Bilogur

需要一点澄清。GeoPandas和Fiona确实支持shapefile和KML,但它们不能完全支持像纽约市这样的API。同样,它BytesCollection完全可以工作,但是可能会在以后的版本中删除,以支持github.com/Toblerity/Fiona/issues/409中的选项之一。
sgillies

谢谢。@sgillies是否应在功能请求上打开geopandas它,还是最好等待您在此处提到的更改?
Aleksey Bilogur

@sgillies您在上面的评论中指出Fiona支持KML,但是DriverError: unsupported driver: 'KML'在尝试打开KML时会引发@sgillies,因为它不在supported_drivers字典中(使用Fiona 1.7.1),并且我注意到了一些问题。缺乏KML支持(#23和#97)。Fiona是否支持KML?
user2856

感谢您发现该from_features方法。拯救了我的一天!
jlandercy

3

由于fiona.BytesCollection似乎不适用于TopoJSON这里,因此不需要以下内容即可解决所有问题的解决方案gdal

import fiona
import geopandas as gpd
import requests

# parse the topojson file into memory
request = requests.get('https://vega.github.io/vega-datasets/data/us-10m.json')
visz = fiona.ogrext.buffer_to_virtual_file(bytes(request.content))

# read the features from a fiona collection into a GeoDataFrame
with fiona.Collection(visz, driver='TopoJSON') as f:
    gdf = gpd.GeoDataFrame.from_features(f, crs=f.crs)

随着geopandas==0.4.0Fiona==1.8.4和Python 3,我得到DriverError: unsupported driver: 'TopoJSON'
edesz

你是对的。这是工作,直到至少版本1.7.13Fiona
Mattijn

不幸的是,这不起作用。我试图在GitHub上按照您的示例进行Altair Choropleth绘图,但这也直接引发了完全相同的错误gdf = gpd.read_file(counties, driver='TopoJSON')。我认为使用with fiona.Collection...可能有效,但遗憾的是无效。
edesz

@edesz,这是一个错误,将在Fiona 1.8.5中修复,请参见:github.com/Toblerity/Fiona/issues/721
Mattijn


2

使用Fiona 1.8时,可以(必须?)使用该项目的MemoryFile或完成ZipMemoryFile

例如:

import fiona.io
import geopandas as gpd
import requests

response = requests.get('http://example.com/Some_shapefile.zip')
data_bytes = response.content

with fiona.io.ZipMemoryFile(data_bytes) as zip_memory_file:
    with zip_memory_file.open('Some_shapefile.shp') as collection:
      geodf = gpd.GeoDataFrame.from_features(collection, crs=collection.crs)

0

最简单的方法是将GeoJSON URL直接输入gpd.read()。在使用BytesIO和zipfile之前,我曾尝试从zip中提取shapefile,但是gpd(特别是Fiona)在接受类似文件的对象时遇到了问题。

import geopandas as gpd
import David.SQL_pull_by_placename as sql
import os

os.environ['PROJ_LIB'] = r'C:\Users\littlexsparkee\Anaconda3\Library\share\proj'

geojson_url = f'https://github.com/loganpowell/census-geojson/blob/master/GeoJSON/500k/2018/{sql.state}/block-group.json?raw=true'
census_tracts_gdf = gpd.read_file(geojson_url)

0

我更喜欢使用未记录的结果GeoDataFrame.from_features()而不是将GeoJSON直接传递给GDF构造函数:

import geopandas as gpd
import requests
data = requests.get("https://data.cityofnewyork.us/api/geospatial/arq3-7z49?method=export&format=GeoJSON")
gpd.GeoDataFrame().from_features(data.json())

输出量

                       geometry                         name                                url           line objectid                                              notes
0    POINT (-73.99107 40.73005)                     Astor Pl  http://web.mta.info/nyct/service/  4-6-6 Express        1  4 nights, 6-all times, 6 Express-weekdays AM s...
1    POINT (-74.00019 40.71880)                     Canal St  http://web.mta.info/nyct/service/  4-6-6 Express        2  4 nights, 6-all times, 6 Express-weekdays AM s...
2    POINT (-73.98385 40.76173)                      50th St  http://web.mta.info/nyct/service/            1-2        3                              1-all times, 2-nights
3    POINT (-73.97500 40.68086)                    Bergen St  http://web.mta.info/nyct/service/          2-3-4        4           4-nights, 3-all other times, 2-all times
4    POINT (-73.89489 40.66471)             Pennsylvania Ave  http://web.mta.info/nyct/service/            3-4        5                        4-nights, 3-all other times

生成的GeoDataFrame具有正确设置的geometry列和我期望的所有列,而无需取消嵌套任何FeatureCollection

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.