12+ hours of hard work for one piece of code... (Accurate) average income of each NY Taxi Zone.
- danielwu779
- Oct 30, 2024
- 3 min read
Hi everyone, this is probably the most daunting task I've embarked on yet, but I (hopefully) have managed to figure this out. My previous blog post on estimating taxi zone incomes through NTAs is highly inaccurate as the NTAs are larger than taxi zones. What if we found something smaller? I got it! Census tracts.
Using the American census "Financial characteristics" "New York City Census Tracts" "2022" filters (accessible through: https://data.census.gov/map/050XX00US36005$1400000,36047$1400000,36061$1400000,36081$1400000,36085$1400000?q=economic&t=Income%20(Households,%20Families,%20Individuals)&layer=VT_2022_140_00_PY_D1&loc=40.6826,-74.0243,z8.7896
I managed to get each census tracts data. With some more digging I managed to find the shapefiles of each census tract. Now all that's left is to look at the average income of each census tract within each taxi zone, and calculate it based on how many census tracts are in each zone. Right? Not so easy. I always got the pesky "Object.,. ID missing: null", or "Object: Null" or "Object: DropNull is set to false (which, on google earth engine, you CANT set it to true....:"
After I resolved that, I dealt with destringing issues (as the income was set as a string instead of an integer), STRINGING ISSUES (as the Geographical ID identifier for each census tract was stored as a long integer rather than a string)... and a whole lot of other messes. The breakthrough came when using ee.Algorithm command to circumvent the null values, which acted as a filter for essentially pairing issues. There are different census tracts that the American census committee uses vs the actual census tract on the NY department of urban planning website. This led to 7 hours of manual and automated data filtering. But without farther ado here is the final code: (by the way, avg. income of taxi zone was calculated through taking the avg income in the zone and dividing that by the amount of census tracts inside the zone)
// Step 1: Process income data to filter out null or zero income valuesvar processIncomeData = function(feature) { var income = ee.Number.parse(feature.get("Income")); // Convert to number if needed var isNull = income.eq(0).or(income.not()); // Check if income is 0 or null return ee.Algorithms.If( isNull, feature.set('isNull', true).set('Income', null), // Mask invalid values feature.set('isNull', false).set('Income', income) // Keep valid values );};// Apply to income datavar processedIncomeData = incomeData.map(processIncomeData);// Filter out null or zero incomesvar validIncomeData = processedIncomeData.filter(ee.Filter.eq('isNull', false));// Join income data with tractsvar validTracts = tracts.map(function(tract) { var matchingIncome = validIncomeData.filter(ee.Filter.eq('BoroCT2020', tract.get('BoroCT2020'))); // Match ID fields var meanIncome = ee.Algorithms.If( matchingIncome.size().gt(0), ee.Number(matchingIncome.aggregate_mean('Income')), // Calculate mean only if there’s data null ); return tract.set('Mean_Income', meanIncome);}).filter(ee.Filter.notNull(['Mean_Income'])); // Filter out tracts with no income data// Step 2: Calculate the simple average income for each taxi zonevar calculateZoneIncome = function(zone) { // Filter tracts within this zone's geometry var zoneTracts = validTracts.filterBounds(zone.geometry()); // Check if there are valid tracts within the taxi zone var hasTracts = zoneTracts.size().gt(0); // If there are valid tracts, calculate simple average income return ee.Feature(ee.Algorithms.If( hasTracts, (function() { // Calculate the average income for the taxi zone var averageIncome = zoneTracts.reduceColumns(ee.Reducer.mean(), ['Mean_Income']).get('mean'); return zone.set('Average_Income', averageIncome); })(), // If no valid tracts are found, set Average_Income to null zone.set('Average_Income', null) ));};// Apply the calculation to each taxi zonevar taxiZonesWithIncome = taxiZones.map(calculateZoneIncome);// Print or export the resultsprint("Taxi Zones with Average Income:", taxiZonesWithIncome);// Export the result as a CSV fileExport.table.toDrive({ collection: taxiZonesWithIncome, description: 'TaxiZonesWithAverageIncome', fileFormat: 'CSV'});Again, feel free to contact me if you have any questions or want any data. *p.s. The don't forget to load your files first



Comments