What would you say if we told you that data can help save lives? And if we could use it to help minimize the consequences of a natural disaster?
In LUCA’s Big Data for Social Good area, with have an area of research that focuses on the analysis of data relating to natural disasters (earthquakes, floods etc) with the aim of managing them better. You can watch an example of this work in this post about our collaboration with UNICEF.
The repercussions of such events shows itself in the way we communicate in their aftermath. We call for help from emergency services, we call our friends to see if they are okay, and let our family though that we are safe. These human reactions are reflected in the mobile data from telephone networks and once suitably anonymized and aggregated, can be used to help manage such events.
On this occasion, we have studied the impact of the storm that took place in the Golfo de San Jorge region of Argentina between the 29th March and the 7th April 2017. This event had widespread news coverage for a number of days. Comodoro Rivadavia and Rada Tilly are two regions that are located in the basins of various rivers and their drainage systems. The storm dropped around 232mm on rain on the 29th, in a month where the average rainfall in Comodoro Rivadavia is a mere 20.7mm. This intense rainfall, when combined with the bursting of river banks that flow into the Atlantic Ocean, caused large floods in the city and led to the evacuation of thousands of people.
What do call records tell us?
In order to carry out analysis, we used hourly call data from different municipalities. Due to the different volumes of calls that were made we were able to group regions as shown in figure 2. In red, we have the highly affected areas (Comodoro Rivadavia and Rada Tilly), yellow shows Caleta Olivia which was moderately affected and blue represents the low impact areas (Camarones, Sarmiento, Las Heras and Pico Truncado).
The following graph shows the number of calls per hour in each of these zones. At first glance, we can already see that for the red lines (ground zero of the catastrophe), there is a sharp peak of calls on the 29th March at 6pm.
We can also see an increase in calls in the regions of Sarmiento and Pico Truncado, which shows us that the storms impacted zones that are geographically further away.
In order to dig a little deeper, we calculated the deviation in the number of calls compared to their regular daily and hourly patterns. We normalized this difference and in the following graph each spike shows the large deviations from the traffic we would expect for that time. In this case, we can see that there was a peak in each zone on the 29th.
In this type of disaster, when there is a flooding due to high rainfall or an earthquake, the reaction is usually immediate. On the 28th, the number of calls was largely in line with the average, and on the 29th there was a large deviation during one specific hour. In the days following, the situation become to return to normal.
In the following map (figure 4), we can see the evolution of the flood. The different colors represent the degree of deviation from the norm. Light green shows the smallest deviations and red represents the largest. We can appreciate the sudden change on the 29th March and how the situation stabilizes afterwards.
We can again go a bit further with the analysis by analyzing the behavior of each antenna and in this way look at the impacts of the flood in different zones of the same municipality.
In figure 5, we represent the amount of calls from antennae in various areas, and we can see how some antennae show a larger spike than others. We can also see how some stopped working altogether, probably due to technical faults in the network as a results of the weather conditions.
As we can see in figure 6, if we analyze call data (the green line) compared to its normal behavior (the red line) we can also check the differences between the deviations for each municipality’s antennae. Furthermore, we see the different consequences in the affected zones. The graphs to the left represent antenna in Comodoro, showing a large spike at 6pm on the 29th. However, in the graphs on the right (antennae in Las Heras), the impact of the disaster in seen in the days following the event.
Thanks to our mobile network, we can not only see behavior through call traffic, but also mobility behavior by using anonymized and aggregated data. In this way, we can study how people move following a natural disaster.
We have create an “origin and destination matrix” for all the provinces in Argentina and especially the areas that we have been looking at up until now. We have also followed the same method to calculate deviations. In figure 7, you can see how we have applied a filter so that the only visible areas are those that showed large deviations during the period studies. You can see the different mobility profiles across the most affected origin and destination matrices.
We can observe a clear negative deviation in mobility on the 29th March. People move less, are isolated in the disaster zone or don’t travel to and from that area due to the meteorological conditions.
We can also observe a second drop in mobility across all origin-destination combinations. This happened on the 7th April and was caused by a new wave of storms and rainfall in the same areas of Argentina.
Without doubt, natural disasters greatly affect our behavior and we are left with a trail of data that, once anonymized and aggregated, we can use to respond better to such events.
One possible use case is the creation of alerts for emergency services, that would be able to direct efforts and resources to the most affected areas, or maybe anticipate when the storm’s effects will begin.
Another possibility is to develop an application that would include warnings for the users of the mobile network, which would be capable of alerting them of imminent danger in their area and offer advice regarding precautionary measures.
Clearly, it is important to discern whether the events that we register in this way are natural disasters or if the mobility patters are caused by other events such as concerts. It is possible to do this with the use of other sources and forms of analysis, such as the state of the network itself and Natural Language Processing (NLP) of Twitter activity.
With this analysis, we can continue to test the great potential that data has in such services aimed at social good. We are talking about data that is capable of improving, helping and preventing the possible effects of these catastrophes. For now, don’t forget to follow us on Twitter, LinkedIn and YouTube to keep up to date with all things LUCA.