Big Data in general
Big Data is a topic that is actively discussed by technology companies, some of them were disappointed in the large data disaster (such as Facebook data leaked to Cambridge Analytics), others – on the contrary, make the most of them for business.
What is Big Data?
Big Data, to date, is one of the key drivers of the development of information technology. This direction, relatively new for young business, has become widespread in Western countries. This is due to the fact that in the age of information technology, especially after the boom of social networks, a significant amount of information began to accumulate for each Internet user, which ultimately gave rise to the direction of Big Data. The term “Big Data” causes a lot of controversies, many believe that it only means the amount of accumulated information, but do not forget about the technical side, this area includes storage technologies, calculations, as well as services. It should be noted that this area includes the processing of a large amount of information, which is difficult to handle by traditional methods.
Below is a comparative table of traditional data and Big Data base.
The scope of Big Data is characterized by the following features:
Volume – the volume, the accumulated database represents a large amount of information that is labor-intensive to process and store in traditional ways, they require a new approach and improved tools.
Velocity– speed, this sign indicates both the increasing rate of data accumulation (90% of the information was collected in the last 2 years) and the data processing speed, more recently, the technologies for processing data in real time have become more demanded.
Variety – variety, i.e. The possibility of simultaneous processing of structured and unstructured multi-format information. The main difference between structured information is that it can be classified. An example of such information is information about client transactions.
Unstructured information includes video, audio files, free text, information coming from social networks. To date, 80% of the information is part of the unstructured group. This information needs a comprehensive analysis to make it useful for further processing.
Veracity – the reliability of the data, the increasing importance of users began to attach importance to the reliability of the available data. So, Internet companies have a problem in sharing the actions carried out by the robot and the person on the company’s website, which ultimately leads to difficulties in analyzing the data.
Value– the value of the accumulated information. Large Data should be useful to the company and bring some value to it. For example, help in improving business processes, reporting or optimizing costs.
If the above 5 conditions are met, the accumulated data volumes can be referred to as large.
Spheres of using the Big Data (5Vs) as illustrated below:-
The field of use of Big Data technologies is extensive. So, with the help of Big Data, you can learn about the preferences of customers, the effectiveness of marketing campaigns or conduct a risk analysis. Below are the results of the survey of the IBM Institute, on the directions of using Big Data in companies.
As can be seen from the diagram, most companies use the Big Data in the customer service area, the second most popular direction is operational efficiency, in the field of risk management, Big Data is less common at the moment.
It should also be noted that Big Data is one of the fastest growing areas of information technology, according to statistics, the total amount of data received and stored doubles every 1.2 years.
Over the period from 2012 to 2014, the amount of data that is transmitted monthly by mobile networks has grown by 81%. According to Cisco, in 2014 the volume of mobile traffic was 2.5 exabytes (unit of information amounting to 10 ^ 18 standard bytes) per month, and in 2019 it will be equal to 24.3 exabytes. Thus, the Big Data is already an established sphere of technologies, even in spite of its relatively young age, which has spread in many spheres of business and plays an important role in the development of companies.
Big Data Technologies
The technologies used to collect and process the Big Data can be divided into 3 groups:
- Service services.
The most common approaches to data processing (software) include:-
SQL – the language of structured queries that allows you to work with databases. With the help of SQL, you can create and modify data, and the management of the data array is handled by an appropriate database management system.
NoSQL – the term stands for Not Only SQL (not just SQL). It includes a number of approaches aimed at implementing a database that differs from the models used in traditional, relational DBMSs. They are convenient for use with an ever-changing data structure. For example, to collect and store information in social networks.
MapReduce– a model of the distribution of calculations. Used for parallel computations over very large data sets (petabytes * or more). In the programming interface, the data is not transferred to the program for processing, and the program to the data. Thus, the request is a separate program. The principle of operation consists in the sequential processing of data by two methods Map and Reduce. Map selects the preliminary data, Reduce aggregates them.
Hadoop – used to implement the search and context mechanisms of highly loaded sites – Facebook, eBay, Amazon, etc. A distinctive feature is that the system is protected from failure of any of the cluster nodes, since each block has at least one copy of the data on another node.
SAP HANA– a high-performance NewSQL platform for data storage and processing. Provides high speed of processing requests. Another distinguishing feature is that SAP HANA simplifies the system landscape, reducing the costs of supporting analytical systems.
The technological equipment includes:
- Infrastructural equipment.
Servers include data warehouses and infrastructure equipment includes platform acceleration means, uninterruptible power supplies, server console consoles, etc. whereas the “Services” include services to build the architecture of the database system, the arrangement, and optimization of the infrastructure and the provision of security of data storage.
Software, hardware, and services together form a comprehensive platform for storing and analyzing data. Companies such as Microsoft, HP, EMC offer services for the development, deployment and management of Big Data solutions.
Application in industries
Big Data has become widespread in many business sectors. They are used in healthcare, telecommunications, trade, logistics, financial companies, as well as in public administration.
Below are a few examples of the use of Big Data in some of the industries.
The databases of retail stores can accumulate a lot of information about customers, the system of inventory management, the supply of marketable products. This information can be useful in all areas of the shops. So, with the help of the accumulated information, it is possible to manage the supply of goods, its storage and sale. Based on the accumulated information, you can predict the demand and supply of goods. Also, the data processing and analysis system can solve other problems of the retailer, for example, optimize costs or prepare reports.
Big Data provides an opportunity to analyze the creditworthiness of the borrower, and they are also used for credit scoring * and underwriting **. The introduction of Big Data technologies will reduce the time for consideration of loan applications. With the help of Big Data, you can analyze the operations of a specific customer and offer suitable banking services.
In the telecommunications industry, Big Data is widely used by cellular operators.
Cellular operators on a par with financial institutions have one of the most extensive databases, which allows them to conduct the most profound analysis of the accumulated information.
The main purpose of data analysis is to retain existing customers and attract new ones. To do this, companies conduct customer segmentation, analyze their traffic, determine the subscriber’s social identity. In addition to using Big Data for marketing purposes, technologies are used to prevent fraudulent financial transactions.
Mining and oil industry
Large Data are used both in the extraction of minerals, and in their processing and marketing. On the basis of information received, enterprises can draw conclusions about the efficiency of field development, track the schedule of overhaul and equipment condition, forecast the demand for products and prices.
According to a survey of Tech Pro and consolidated Research information, the largest distribution of large data in the technology and consulting industries, as well as in health enterprises. According to the results of this survey, the Big Data in education is less popular. The survey results are presented below:
To date, Big Data is actively being introduced in foreign companies. Companies such as Nasdaq, Facebook, Google, IBM, VISA, Master Card, Bank of America, HSBC, AT & T, Coca Cola, Starbucks and Netflix are already using the Big Data resources.
The fields of application of the processed information are diverse and vary depending on the industry and the tasks to be performed.
Further examples of the application of Big Data technologies in practice will be presented.
HSBC uses the Big Data technology to counter fraudulent transactions with plastic cards. With the help of Big Data, the company has increased the efficiency of security services by 3 times, the recognition of fraudulent incidents – 10 times. The economic effect of the introduction of these technologies exceeded $ 10 million.
VISA allows you to automatically calculate fraudulent transactions, the system currently helps prevent fraudulent payments amounting to $ 2 billion annually.
IBM Watson supercomputer analyzes the real-time data flow of money transactions. According to IBM, Watson increased the number of fraudulent transactions by 15%, reduced false alarms by 50% and increased the amount of money protected from transactions of this nature by 60%.
Procter & Gamble with the help of Big Data design new products and make up global marketing campaigns. P & G created specialized offices Business Spheres, where you can view information in real time.
Thus, the company’s management has the opportunity to instantly test hypotheses and conduct experiments. P & G believes that the Big Data help in forecasting the company’s activities.
OfficeMax office supplies retailer with the help of Big Data technologies analyze customer behavior. Analysis of Big Data allowed increasing B2B revenue by 13%, to reduce costs by 400 000 US dollars per year.
According to Caterpillar , its distributors annually miss $ 9 to $ 18 billion in profits only because they do not implement Big Data processing technologies. Big Data would allow customers to more effectively manage the fleet of machines by analyzing information coming from sensors installed on the machines.
Today, it is already possible to analyze the condition of key nodes, their degree of wear and tear, and manage fuel and maintenance costs.
Luxottica group is a manufacturer of sports glasses, such brands as Ray-Ban, Persol and Oakley. The company uses the Big Data technologies to analyze the behavior of potential customers and smart SMS-marketing. As a result, Big Data Luxottica group has allocated more than 100 million of the most valuable customers and increased the effectiveness of the marketing campaign by 10%.
With the help of Yandex Data Factory developers of the game World of Tanks analyze the behavior of players. Big Data technologies allowed to analyze the behavior of 100 thousand players of World of Tanks using more than 100 parameters (information about purchases, games, experience, etc.). As a result of the analysis, a forecast of user outflow was obtained. This information allows you to reduce the care of users and work with the participants of the game to address. The developed model was 20-30% more effective than the standard tools for analysis of the gaming industry.
The German Ministry of Labor uses the Big Data in its work related to the analysis of incoming applications for unemployment benefits. So, after analyzing the information, it became clear that 20% of the benefits were paid undeservedly. With the help of Big Data, the Ministry of Labor cut spending by 10 billion euros.
Toronto Children’s Hospital implemented the project Project Artemis. This is an information system that collects and analyzes data on infants in real time. The system tracks every 1260 indicators of the state of each child every second. Project Artemis allows you to predict the unstable state of the child and begin to prevent the disease in children.
OVERVIEW OF THE WORLD MARKET OF LARGE DATA
The current state of the world market
In 2014, the Big Data, according to Data Collective, has become one of the priority areas of investment in the venture industry. According to the information portal Computerra, this is due to the fact that developments from this direction have begun to bring significant results for their users. Over the past year, the number of companies with implemented projects in the management of large data increased by 125%, the market volume increased by 45% compared to 2013.
According to source Wikibon, Big Data accounted for the largest part of the Big Data market’s revenue in 2015 in professional services, accounting for 40% of total revenue (see the chart below):
The most popular is Big Data technologies such as in-memory platforms of SAP, HANA, Oracle, etc. The results of the survey of T-Systems showed that they were chosen by 30% of the polled companies. NoSQL platforms became the second most popular platform (18% of users), and companies used the analytical platforms of Splunk and Dell companies, they were chosen by 15% of companies. The least useful for solving the problems of Big Data, according to the results of the survey were products Hadoop / MapReduce.
According to the Accenture survey, in more than 50% of companies using Big Data technologies, Big Data costs range from 21% to 30%. From the analysis, 76% of companies believe that these costs will increase in 2015, and 24% of companies will not change their budget for Big Data technologies. This suggests that in these companies Big Data has become an established IT direction, which has become an integral part of the company’s development. The analysis suggested that supply chain companies in the survey said it has lived up to its promise helping them improve customer service and demand fulfillment, experience faster and more effective reaction time to supply chain issues, increase supply chain efficiency, and drive greater integration across the supply chain.
The results of the survey of the Economist Intelligence Unit survey confirm the positive effect of the introduction of Big Data. 46% of companies claim that using Big Data technologies they have improved customer service by more than 10%, 33% of companies have optimized their reserves and improved the productivity of fixed assets, 32% of companies have improved their planning processes.
Great Data in different countries of the world
To date, Big Data technologies are most often introduced in US companies, but already now other countries in the world have begun to show interest. In 2014, according to IDC, Europe, the Middle East, Asia (excluding Japan) and Africa accounted for 45% of the software market, services, and equipment in the Big Data field.
Also, according to the CIO survey, companies from the countries of the Asia-Pacific region are rapidly developing new solutions in the field of analysis of Big Data, secure storage and cloud technologies. Latin America is in second place in the number of investments in the development of Big Data technologies, outstripping the countries of Europe and the United States.
Further, the description and forecasts of the development of the Big Data market of several countries will be presented.
The volume of information in China is 909 exabytes, which is 10% of the total information in the world, by 2020 the volume of information will reach 8060 exabytes, the share of information in world statistics will increase, and in 5 years it will be 18%. The potential growth of China’s Big Data has one of the fastest growing dynamics.
Brazil in 2014, accumulated information on 212 exabytes, which is 3% of the global volume. By 2020, the amount of information will grow to 1600 exabytes, which is 4% of the information from around the world.
According to EMC, the volume of accumulated data in India in 2014 is 326 exabytes, which is 5% of the total information. By 2020, the volume of information will grow to 2800 exabytes, which is 6% of the information from around the world.
The volume of accumulated data in Japan in 2014 is 495 exabytes, which is 8% of the total volume of information. By 2020, the volume of information will grow to 2200 exabytes, but the share of the Japanese market will decrease and make up 5% of the total information of the whole world.
Thus, the volume of the Japanese market will decrease by more than 30%.
According to EMC, the volume of accumulated data in Germany in 2014 is 230 exabytes, which is 4% of the total information in the world. By 2020, the volume of information will grow to 1100 exabytes and amount to 2%.
In the German market, the largest share of revenue, according to the forecasts of the Experton Group, will generate a segment of service services, whose share in 2015 will be 54%, and in 2019 will increase to 59%, the share of software and hardware, on the contrary, will decrease.
In general, the market volume will grow from 1.345 billion euros in 2015 to 3.198 billion euros in 2019, the average growth rate will be 24%.
Thus, based on the analysis of CIO and EMC, it can be concluded that the developing countries of the world in the coming years will become markets for the active development of Big Data technologies.
Drivers and market restraints
IDC experts identified 3 drivers for the Big Data Market of 2015:
- Mass absorption of the client base of companies offering mobile applications and other data platforms;
- Development of cloud infrastructure;
- Changes in data privacy laws.
- In addition, it is also worth mentioning:
- Increased interest in the processing of media materials related to previously unstructured information;
- The growing popularity of training courses in the field of Big Data;
- Investments in data visualization and active storytelling by data analysts;
- Constant investments in Big Data by web giants such as Google, Amazon, Facebook, etc.
- Among the market restrictors Big Data distinguish:
- The still high cost of implementing Big Data technologies;
- The need to ensure data protection and confidentiality;
- Lack of qualified staff;
- Mistrust of companies to these technologies;
- Insufficient amount of accumulated information;
- Database support requires constant funding, which creates an additional barrier to the implementation of Big Data;
- Complexity of integration with existing systems;
- Limited number of data providers.
Challenges and difficulties in Big Data-driven
According to the Accenture survey, data security issues are now the main barrier to the introduction of Big Data technologies, more than 51% of respondents confirmed that they are concerned about data protection and their confidentiality. 47% of companies reported that it is impossible to implement Big Data due to a limited budget, 41% of companies indicated a shortage of qualified personnel as a problem.
The projected market size will depend on how developing countries perceive Big Data technologies, whether they will also be popular in developed countries. In 2014, the developing countries of the world occupied 40% of the volume of accumulated information. According to EMC forecast, the current market structure, with the predominance of developed countries, will change already in 2017. According to analyst EMC, in 2020 the share of developing countries will be more than 60%.
According to Cisco and EMC, the developing countries of the world will work quite actively with Big Data, in many respects, it will be connected with the availability of technologies and the accumulation of sufficient information to the level of Big Data. The world map presented on the next page will show a forecast for the increase in volume and the growth rate of the Big Data by region.
Main results of market analysis – World market
Big data has been a focus of research in science, technology, economics, and social studies. It is a worldwide focus to have already incorporated big data research into their analytic strategies. Referring to a research paper “Scientific big data and Digital Earth”, recently published in Chinese Science Bulletin, iit elaborated based on the origin, connotation, and development of big data from both temporal and spatial perspective. Surprisingly, India leading the market then followed by USA, where Russia, Brazil and Argentina in the middle of the notability, with Australia, Canada and China are falling behind.
Based on the results of the analysis, we can conclude that the Big Data market is still in the early stages of development, and in the near future, we will see its growth and expansion of the capabilities of these technologies.