A Credible Approach of Big Data Processing and Analysis of Telecom Data
In this article, see a credible approach of big data processing and an analysis of telecom data to minimize crime, combat terrorism, unsocial activities, etc.
Join the DZone community and get the full member experience.Join For Free
Telecom providers have a treasure trove of captive data — customer data, CDR (call detail records), call center interactions, tower logs, etc. and are metaphorically “sitting on a gold mine.” Ideally, each category of the generated data has the following information:
- Customer data consolidate customer id, plan details, demographic, subscribed services, and spending patterns
- Service data category consolidates types of customer, customer history, complain category, query resolved, etc. are on
- Usually for the smart mobile phone subscriber, location category data consolidates GPS-based data, roaming data, current location, frequently visited location, etc.
Due to technological evaluation in all the verticals, manufacturing cost for smart mobiles, network infrastructures, optimized devices for cellular network, etc. is rapidly declining. As an end result, there is an exponential adoption of the smart mobile phone across the generation as well as the rapid expansion of telecom/mobile network in both rural and urban areas. 4G broadband cellular network technology, succeeding 3G is acting as a catalyst to attract people towards using smart mobile phones because it provides mobile web access, IP telephony, gaming services, high-definition mobile TV, video conferencing, etc.
Criminal police organizations, Security agencies, Anti-terrorist squad, or other government agencies can leverage the above-mentioned telecom data to root out terrorist activities initiated in sensitive areas and nab, track suspected persons by detailing out the behaviors before and after crime by unsocial people, etc.
By managing a dashboard on the suspected mobile numbers, the security agencies can speed up the investigation process after telecom data processing. The dashboard can be populated with the following critical information:
- To find the duration of each call made from a particular number.
- The numbers from which the calls are being made. (calling party)
- The numbers receiving the calls (called party)
- When the call started (date and time)
- How long the call was (duration)
- The identifier of the telephone exchange writing the record
- Call type (voice, SMS, etc.)
- If the SIM card has been destroyed and the same phone was used with a new SIM card
- In case, phone and SIM both were replaced with new ones and calls made to the contacts of previous phone or SIM,
with call patterns, the caller party can be tracked.
- Capturing the maximum number of calls, the duration that has been made to a particular number.
- Live location capture of a suspected number
- Details of past locations of the suspected number along with the patterns of visits.
- Which number and how many times the call has been made during a day.
- Where that person was/is when the call has been made.
- Where the person is currently based on the geolocation feed.
To counter the business as well as technical challenges to achieve the above by processing massive volume of CDR, GPS data, we can set up the end to end multi-node Hadoop cluster to fit the petabytes of data and process the same. We can define mappings for each CDR file and write a component to process each file in a distributed manner with 0 data loss and apply DQ check to extract good records from hundreds of GB/TB data generated continuously from multiple interfaces including mobile towers.
By developing a module to ingest geo-location feed and customer data (provided by the ISP) on HDFS multi-node cluster and subsequent immediate processing of data, we can achieve geo-fencing feature which helps security agency to conduct a behavioural analysis of the suspected person/groups etc. Besides, using QlikView reporting tool, the gathering of the criminals/unsocial persons can be plotted to avoid future crimes etc. This high-level architectural diagram can be leveraged to implement the entire solution.
We need to have corrupt data purification mechanism before ingestion into HDFS multi-node cluster. Data quality plays a major factor in delivering the final dashboard. By collecting good quality of data from telecom operators, we can present the final dashboard to security/investigating agency. Ideally, Telecom Company or mobile service provider filter data feeds before exporting to the third party system or cloud environment for processing.
Published at DZone with permission of Gautam Goswami. See the original article here.
Opinions expressed by DZone contributors are their own.