12+ hours of hard work for one piece of code... (Accurate) average income of each NY Taxi Zone.
- danielwu779
- Oct 30, 2024
- 3 min read
Hi everyone, this is probably the most daunting task I've embarked on yet, but I (hopefully) have managed to figure this out. My previous blog post on estimating taxi zone incomes through NTAs is highly inaccurate as the NTAs are larger than taxi zones. What if we found something smaller? I got it! Census tracts.
Using the American census "Financial characteristics" "New York City Census Tracts" "2022" filters (accessible through: https://data.census.gov/map/050XX00US36005$1400000,36047$1400000,36061$1400000,36081$1400000,36085$1400000?q=economic&t=Income%20(Households,%20Families,%20Individuals)&layer=VT_2022_140_00_PY_D1&loc=40.6826,-74.0243,z8.7896
I managed to get each census tracts data. With some more digging I managed to find the shapefiles of each census tract. Now all that's left is to look at the average income of each census tract within each taxi zone, and calculate it based on how many census tracts are in each zone. Right? Not so easy. I always got the pesky "Object.,. ID missing: null", or "Object: Null" or "Object: DropNull is set to false (which, on google earth engine, you CANT set it to true....:"
After I resolved that, I dealt with destringing issues (as the income was set as a string instead of an integer), STRINGING ISSUES (as the Geographical ID identifier for each census tract was stored as a long integer rather than a string)... and a whole lot of other messes. The breakthrough came when using ee.Algorithm command to circumvent the null values, which acted as a filter for essentially pairing issues. There are different census tracts that the American census committee uses vs the actual census tract on the NY department of urban planning website. This led to 7 hours of manual and automated data filtering. But without farther ado here is the final code: (by the way, avg. income of taxi zone was calculated through taking the avg income in the zone and dividing that by the amount of census tracts inside the zone)
// Step 1: Process income data to filter out null or zero income values
var processIncomeData = function(feature) {
var income = ee.Number.parse(feature.get("Income")); // Convert to number if needed
var isNull = income.eq(0).or(income.not()); // Check if income is 0 or null
return ee.Algorithms.If(
isNull,
feature.set('isNull', true).set('Income', null), // Mask invalid values
feature.set('isNull', false).set('Income', income) // Keep valid values
);
};
// Apply to income data
var processedIncomeData = incomeData.map(processIncomeData);
// Filter out null or zero incomes
var validIncomeData = processedIncomeData.filter(ee.Filter.eq('isNull', false));
// Join income data with tracts
var validTracts = tracts.map(function(tract) {
var matchingIncome = validIncomeData.filter(ee.Filter.eq('BoroCT2020', tract.get('BoroCT2020'))); // Match ID fields
var meanIncome = ee.Algorithms.If(
matchingIncome.size().gt(0),
ee.Number(matchingIncome.aggregate_mean('Income')), // Calculate mean only if there’s data
null
);
return tract.set('Mean_Income', meanIncome);
}).filter(ee.Filter.notNull(['Mean_Income'])); // Filter out tracts with no income data
// Step 2: Calculate the simple average income for each taxi zone
var calculateZoneIncome = function(zone) {
// Filter tracts within this zone's geometry
var zoneTracts = validTracts.filterBounds(zone.geometry());
// Check if there are valid tracts within the taxi zone
var hasTracts = zoneTracts.size().gt(0);
// If there are valid tracts, calculate simple average income
return ee.Feature(ee.Algorithms.If(
hasTracts,
(function() {
// Calculate the average income for the taxi zone
var averageIncome = zoneTracts.reduceColumns(ee.Reducer.mean(), ['Mean_Income']).get('mean');
return zone.set('Average_Income', averageIncome);
})(),
// If no valid tracts are found, set Average_Income to null
zone.set('Average_Income', null)
));
};
// Apply the calculation to each taxi zone
var taxiZonesWithIncome = taxiZones.map(calculateZoneIncome);
// Print or export the results
print("Taxi Zones with Average Income:", taxiZonesWithIncome);
// Export the result as a CSV file
Export.table.toDrive({
collection: taxiZonesWithIncome,
description: 'TaxiZonesWithAverageIncome',
fileFormat: 'CSV'
});
Again, feel free to contact me if you have any questions or want any data. *p.s. The don't forget to load your files first
Comments