从MODIS测绘数据中提取特定纬度和经度的值

我正在尝试确定地球上特定地点（即我们的天文观测台）上随时间变化的可沉淀水蒸气（PWV），臭氧和气溶胶的量。为此，我已经有了一些Python代码modapsclient，可以使用这些代码每天两次下载MODIS Aqua和Terra MYDATML2和MODATML2产品，这些产品涵盖了我感兴趣的特定经度和纬度。

我不确定如何提取所需的特定量（例如，获取MODIS数据的时间和天文台特定纬度和经度的PWV），以使其成为时间值序列。该MYDATML2产品似乎包含二维纬度和经度网格Cell_Along_Swath_5km，并Cell_Across_Swath_5km让我猜这使得SWATH数据相对于瓷砖或网格数据？我想要的数量Precipitable_Water_Infrared_ClearSky似乎也与相对Cell_Along_Swath_5km，Cell_Across_Swath_5km但是我不确定如何在特定的纬度上获得该PWV值，我对此感兴趣。请帮忙吗？

python remote-sensing modis

— 占星者
source

您能否提供图像的链接或示例？

— 安德里亚·马塞蒂

当然，这是MODIS存档中的示例文件：ladsweb.modaps.eosdis.nasa.gov/archive/allData/61/MODATML2/2018/…–

— astrosnapper

嗨，您有机会尝试我的答案吗？

— 安德里亚·马塞蒂

抱歉，我已经参加了一次会议，该会议根据来自卫星数据的类似PWV确定进行工作...您的更新代码为我提供了与在PanoplyJ中看到的相同单元格相同的值（考虑到了不同的数组索引顺序和数组索引的“偏离1”差异开始）

— astrosnapper

[编辑1-我更改了像素坐标搜索]

使用您提供的MODATML样本并使用gdal库。让我们用gdal打开hdf：

import gdal
dataset = gdal.Open(r"E:\modis\MODATML2.A2018182.0800.061.2018182195418.hdf")

然后，我们想查看子数据集的命名方式，以便正确导入我们需要的子数据集：

datasets_meta = dataset.GetMetadata("SUBDATASETS")

这将返回一个字典：

datasets_meta
>>>{'SUBDATASET_1_NAME': 'HDF4_EOS:EOS_SWATH:"E:\\modis\\MODATML2.A2018182.0800.061.2018182195418.hdf":atml2:Cloud_Optical_Thickness', 
'SUBDATASET_1_DESC': '[406x271] Cloud_Optical_Thickness atml2 (16-bit integer)',
'SUBDATASET_2_NAME':'HDF4_EOS:EOS_SWATH:"E:\\modis\\MODATML2.A2018182.0800.061.2018182195418.hdf":atml2:Cloud_Effective_Radius',
'SUBDATASET_2_DESC': '[406x271] Cloud_Effective_Radius atml2 (16-bit integer)',
[....]}

假设我们要获取第一个变量，即云的光学厚度，我们可以通过以下方式访问其名称：

datasets_meta['SUBDATASET_1_NAME']
>>>'HDF4_EOS:EOS_SWATH:"E:\\modis\\MODATML2.A2018182.0800.061.2018182195418.hdf":atml2:Cloud_Optical_Thickness'

现在我们可以再次调用.Open（）方法将变量加载到内存中：

Cloud_opt_th = gdal.Open(datasets_meta['SUBDATASET_1_NAME'])

例如，您可以通过提供“ SUBDATASET_20_NAME”来访问您感兴趣的Precipitable_Water_Infrared_ClearSky。只需看一下datasets_meta字典。

但是，提取的变量没有地理投影（var.GetGeoprojection（）），就像您从其他文件类型（如GeoTiff）中所期望的那样。您可以将变量加载为numpy数组，并绘制不带投影的2d变量：

Cloud_opt_th_array = Cloud_opt_th.ReadAsArray()
import matplotlib.pyplot as plt
plt.imshow(Cloud_opt_th_array)

现在，由于没有地理投影，我们将研究变量的元数据：

Cloud_opt_th_meta = Cloud_opt_th.GetMetadata()

这是另一本字典，其中包含您需要的所有信息，包括对子采样的详细说明（我注意到仅在第一个子数据集中提供），其中包括对这些Cell_Along_Swath的解释：

Cloud_opt_th_meta['1_km_to_5_km_subsampling_description']
>>>'Each value in this dataset does not represent an average of properties over a 5 x 5 km grid box, but rather a single sample from within each 5 km box. Normally, pixels in across-track rows 4 and 9 (counting in the direction of increasing scan number) out of every set of 10 rows are used for subsampling the 1 km retrievals to a 5 km resolution. If the array contents are determined to be all fill values after selecting the default pixel subset (e.g., from failed detectors), a different pair of pixel rows is used to perform the subsampling. Note that 5 km data sets are centered on rows 3 and 8; the default sampling choice of 4 and 9 is for better data quality and avoidance of dead detectors on Aqua. The row pair used for the 1 km sample is always given by the first number and last digit of the second number of the attribute Cell_Along_Swath_Sampling. The attribute Cell_Across_Swath_Sampling indicates that columns 3 and 8 are used, as they always are, for across-track sampling. Again these values are to be interpreted counting in the direction of the scan, from 1 through 10 inclusively. For example, if the value of attribute Cell_Along_Swath_Sampling is 3, 2028, 5, then the third and eighth pixel rows were used for subsampling. A value of 4, 2029, 5 indicates that the default fourth and ninth rows pair was used.'

我认为这意味着，基于这些1km像素，将5km构建成精确获取5x5感应阵列中某个位置的像素值（该位置在元数据中表示，我认为这是减少故障的一种工具）。

无论如何，在这一点上，我们有一个1x1 km的像元阵列（请参阅上面的子采样说明，不确定其背后的科学知识）。要获取每个像素质心的坐标，我们需要加载纬度和经度子数据集。

Latitude = gdal.Open(datasets_meta['SUBDATASET_66_NAME']).ReadAsArray()
Longitude = gdal.Open(datasets_meta['SUBDATASET_67_NAME']).ReadAsArray()

例如，

Longitude
>>> array([[-133.92064, -134.1386 , -134.3485 , ..., -154.79303, -154.9963 ,
    -155.20723],
   [-133.9295 , -134.14743, -134.3573 , ..., -154.8107 , -155.01431,
    -155.2256 ],
   [-133.93665, -134.1547 , -134.36465, ..., -154.81773, -155.02109,
    -155.23212],
   ...,
   [-136.54477, -136.80055, -137.04684, ..., -160.59378, -160.82101,
    -161.05663],
   [-136.54944, -136.80536, -137.05179, ..., -160.59897, -160.8257 ,
    -161.06076],
   [-136.55438, -136.81052, -137.05714, ..., -160.6279 , -160.85527,
    -161.09099]], dtype=float32)

您可能会注意到，每个像素的纬度和经度坐标都不同。

假设您的天文台位于lat_obs，long_obs坐标处，则可以将x，y坐标差最小化：

coordinates = np.unravel_index((np.abs(Latitude - lat_obs) + np.abs(Longitude - long_obs)).argmin(), Latitude.shape)

并提取您的价值

Cloud_opt_th_array[coordinates]

— 安德里亚·马塞蒂（Andrea Massetti）
source

感谢您提供信息，但是我在坐标转换部分上遇到了问题；的Longitude_px和Latitude_px均为零长度阵列。还有没有办法使用gdal自身来处理转换？（而不是依靠1度的近似值是X英里数，然后将其重新近似为公里数）

— astrosnapper

纬度和经度作为子数据集提供，即66和67。我将更新第二部分。

— 安德里亚·马塞蒂

@astrosnapper现在答案应该完全解决您的问题。

— 安德里亚·马塞蒂