12+ hours of hard work for one piece of code... (Accurate) average income of each NY Taxi Zone.

danielwu779
Oct 30, 2024
3 min read

Hi everyone, this is probably the most daunting task I've embarked on yet, but I (hopefully) have managed to figure this out. My previous blog post on estimating taxi zone incomes through NTAs is highly inaccurate as the NTAs are larger than taxi zones. What if we found something smaller? I got it! Census tracts.

Using the American census "Financial characteristics" "New York City Census Tracts" "2022" filters (accessible through: https://data.census.gov/map/050XX00US36005$1400000,36047$1400000,36061$1400000,36081$1400000,36085$1400000?q=economic&t=Income%20(Households,%20Families,%20Individuals)&layer=VT_2022_140_00_PY_D1&loc=40.6826,-74.0243,z8.7896

I managed to get each census tracts data. With some more digging I managed to find the shapefiles of each census tract. Now all that's left is to look at the average income of each census tract within each taxi zone, and calculate it based on how many census tracts are in each zone. Right? Not so easy. I always got the pesky "Object.,. ID missing: null", or "Object: Null" or "Object: DropNull is set to false (which, on google earth engine, you CANT set it to true....:"

After I resolved that, I dealt with destringing issues (as the income was set as a string instead of an integer), STRINGING ISSUES (as the Geographical ID identifier for each census tract was stored as a long integer rather than a string)... and a whole lot of other messes. The breakthrough came when using ee.Algorithm command to circumvent the null values, which acted as a filter for essentially pairing issues. There are different census tracts that the American census committee uses vs the actual census tract on the NY department of urban planning website. This led to 7 hours of manual and automated data filtering. But without farther ado here is the final code: (by the way, avg. income of taxi zone was calculated through taking the avg income in the zone and dividing that by the amount of census tracts inside the zone)

// Step 1: Process income data to filter out null or zero income values

var processIncomeData = function(feature) {

  var income = ee.Number.parse(feature.get("Income")); // Convert to number if needed

  var isNull = income.eq(0).or(income.not()); // Check if income is 0 or null

  return ee.Algorithms.If(

    isNull,

    feature.set('isNull', true).set('Income', null), // Mask invalid values

    feature.set('isNull', false).set('Income', income) // Keep valid values

);

};

// Apply to income data

var processedIncomeData = incomeData.map(processIncomeData);

// Filter out null or zero incomes

var validIncomeData = processedIncomeData.filter(ee.Filter.eq('isNull', false));

// Join income data with tracts

var validTracts = tracts.map(function(tract) {

  var matchingIncome = validIncomeData.filter(ee.Filter.eq('BoroCT2020', tract.get('BoroCT2020'))); // Match ID fields

  var meanIncome = ee.Algorithms.If(

    matchingIncome.size().gt(0),

    ee.Number(matchingIncome.aggregate_mean('Income')), // Calculate mean only if there’s data

    null

);

  return tract.set('Mean_Income', meanIncome);

}).filter(ee.Filter.notNull(['Mean_Income'])); // Filter out tracts with no income data

// Step 2: Calculate the simple average income for each taxi zone

var calculateZoneIncome = function(zone) {

  // Filter tracts within this zone's geometry

  var zoneTracts = validTracts.filterBounds(zone.geometry());

  // Check if there are valid tracts within the taxi zone

  var hasTracts = zoneTracts.size().gt(0);

  // If there are valid tracts, calculate simple average income

  return ee.Feature(ee.Algorithms.If(

    hasTracts,

    (function() {

      // Calculate the average income for the taxi zone

      var averageIncome = zoneTracts.reduceColumns(ee.Reducer.mean(), ['Mean_Income']).get('mean');

      return zone.set('Average_Income', averageIncome);

    })(),

    // If no valid tracts are found, set Average_Income to null

    zone.set('Average_Income', null)

));

};

// Apply the calculation to each taxi zone

var taxiZonesWithIncome = taxiZones.map(calculateZoneIncome);

// Print or export the results

print("Taxi Zones with Average Income:", taxiZonesWithIncome);

// Export the result as a CSV file

Export.table.toDrive({

  collection: taxiZonesWithIncome,

  description: 'TaxiZonesWithAverageIncome',

  fileFormat: 'CSV'

});

Again, feel free to contact me if you have any questions or want any data. *p.s. The don't forget to load your files first