Name of University
Data and Visual Analytics (Tableau)
PROBLEM STATEMENT 3
India has the world’s tenth-largest arable land mass. India, with its 20 agri-climatic areas, is home to all 15 of the world’s main climate zones. Additionally, the nation is home to 46 of the world’s 60 soil varieties. India is the world’s biggest producer of spices, milk, pulses, cashew, tea, and jute, and ranks second in terms of rice, wheat, sugarcane, fruits and vegetables, oil seeds and cotton. Additionally, India ranks second in worldwide fruit and vegetable output and is the world’s biggest producer of banana and mangoes. Food grain output is expected to reach a peak of 295.67 million tons in the 2019-20* crop year (MT). India’s government has set a goal of 298 million tons of food grain output for 2020-21.
With the dawn of the twenty-first century and an exponential rise in demand, food security has become one of the country’s primary issues. This has resulted in an unrelenting consumption of natural resources, which is proving to be a curse. It is critical for farmers to practice sustainable agriculture, but with little technological integration in many areas of India and a heavy reliance on seasonal rainfall, ensuring food security is becoming more challenging.
In this report, I will analyze major factors influencing India’s agricultural production, respond to a few market questions to gain a better understanding of the current state of affairs, and give some of my own perspectives that may help improve the farming industry’s efficiency and also transition toward sustainable farming.
STATE OF THE ART
We performed a literature review to gain a better knowledge of what innovative techniques are being implemented in the agricultural sector across the globe and how data analysis might benefit the agricultural industry in India. Our first study demonstrated how business intelligence tools can help us gain understanding of the massive raw data at our disposal. The article “Business Intelligence in Agriculture – A Practical Approach” discusses business intelligence, which is a collection of tools and processes that may help simplify the data analysis procedure and offer a range of targeted findings. Data analytics enables the transformation of unstructured data from the data warehouse into actionable information that enables educated marketing decisions. It encompasses Loading (ETL), data extraction, data synthesis and analysis, transformation, and loading (ETL), and data warehousing, as well as data visualization and presentation.
In “Big Data in Smart Farming – A review” emphasizes on the importance of making farming processes more data driven and data enabled. Rapid developments in the Internet of Things and Cloud Computing are propelling the phenomenon of what is called Smart Farming. Smart Farming focuses on basing management tasks not only on location but also on data, enhanced by context- and situation awareness, triggered by real-time events. Real-time assisting reconfiguration features are required to carry out agile actions, especially in cases of suddenly changed operational conditions or other circumstances (e.g. weather or disease alert). These features as shown in figure 1 typically include intelligent assistance in implementation, maintenance and use of the technology.
In India, farming systems are carefully tailored to their appropriate locations. That is, harvests are produced in accordance with the ranch or the soil properties prevalent in a certain zone or region. The districts in India vary in terms of the types of cultivation they practice; some rely on agriculture, while others rely on agro ranger services. India’s geological makeup prompts diverse places to confront distinct atmospheric conditions, which have a pronounced effect on each district’s agricultural profitability. India now ranks second in agricultural production on the globe. Agriculture and other industries accounted for more than 16% of India’s Gross domestic product in 2007. Despite the persistent decline in agriculture’s contribution to the total GDP, India agribusiness remains the largest industry in the country and plays a critical role in the country’s financial growth. Moreover, it is the second largest producer of vegetables and organic products, accounting for 8.6 percent and 10.9 percent of total production, respectively. Around one-sixth of the region is affected by real harvest yield problems, such as aridity.
PROPERTIES OF THE DATA
Big Data in General
Big data is a rapidly developing phrase that refers to any large volume of organized, moderately structured, or unstructured data that may be processed for information. Big data consists of methods and technologies that need novel forms of data integration to unearth significant hidden information from varied, complicated, and vast datasets. Illustrated in figure 2.
The analysis of these data using standard database management tools or traditional data processing software is a time-consuming and laborious job. At any one moment, everything contributes to the creation of big data. It is sent through social media interactions and electronic processes involving algorithms, sensors, portable devices, and any other electronic gadget accessible. To get tangible value from large amounts of data, you need the optimum combination of processing capacity, analytic tools, and expertise. Big data is fostering a new culture of collaboration among business decision Makers in order to get value from all data. However, in order to properly comprehend this constantly increasing data, we need a fundamental shift in our approach to architecture, tools, and processes. The vast Indian agricultural system should address agribusiness “Big Data” by comprehending a complex arrangement of data, comprising digital ranch records and sensor data. This enables agriculturists to access and analyze horticulture’s massive data set in order to determine quality, establish best practices, evaluate treatment methods, and identify crops at risk.
To get a comprehensive picture of the agriculture sector, we collected data from a variety of sources, comprising publicly accessible data and datasets visible on Kaggle. As a result, the project makes use of various datasets that are connected through Tableau joins. The main data set has nine columns and 246091 rows. The dataset’s main fields are as follows:
State: The collection contains information on 33 states and territories.
Crop statistics are available from 1997 to 2014.
Crops: Information on 124 distinct crops is provided.
Rainfall: Annual rainfall data for the state of California from 2000 to 2015 in millimeters per square meter (area).
Production: Details about the nation’s production region and overall output. The production Parameter has a value between 0 and 1250800000. (Million tonnage)
Area: Indicates the percentage of land under cultivation for each crop and state (in hectares).
Rate of crop growth: Increase in crop mass, size, or number between 1997 and 2012.
Exports: Volume and price of different crops exported from 2003 to 2015.
Rate Of suicide: The total amount of farmer suicides recorded in different states between 2010 and 2014.
Tableau links clients to a range of data sources and allows them to build data visualizations using a simple drag-and-drop interface to create tables, maps, graphs, and stories.
Tableau provides three versions of Tableau Desktop . The Public version is free but the Personal and Professional versions are available with a fee or free 14-day trial. However, Tableau provides a 1-year free product key with support for students and instructors in academic programs.
Connecting to Sample Data
The sample in this example procedure description includes information on 33 states and territories, 34 Crop statistics from 1997 to 2014, Information on 124 distinct crops, annual rainfall data from 2000 to 2015 in millimeters per square meter (area), details about the nation’s production and overall output, the percentage of land under cultivation for each crop and state (in hectares), rate of crop growth between 1997 and 2012, volume and price of different crops exported from 2003 to 2015, and the total amount of farmer suicides recorded in different states between 2010 and 2014.
Once the data is gathered in different formats (Excel), it may be linked to Tableau through the Connections area on the left. Because this example contains Excel-formatted data, Excel is used to import it. The connected datasets are ranked in the top left corner, followed by the sheets. The Tableau workspace is comprised of the objects and functions listed below:
-Toolbar: undo, redo, switch columns and rows, and show me, among other functions.
-Menus including files, data, spreadsheets, dashboards, and stories, among others.
-Cards and shelves: a filter shelf, a page shelf, a card-marking shelf, and a top section with rows or columns shelves.
-Data Pane: Tableau populates the left area with data fields from the linked dataset. Figure 3. Additionally, the ‘Formatting’ and ‘Analytics’ panels are accessible.
– In the top part, ‘Dimensions’ refers to discrete categorical variables such as date.
– The bottom part, titled ‘Measures,’ contains continuous data.
-Sheet Tabs: Add or relocate worksheets, stories, and dashboards, which have already been produced.
-View: The appropriate space for adding fields to build a view (table, graph).
With the recent agricultural sector discussions, farmers demanding fair pricing for their product, the status of exports, changing weather patterns, and corresponding interruptions in sowing intervals, we want to evaluate key factors in order to get a better understanding of the industry. To begin, we will attempt to address the following market questions.
What crops are grown in each state and how are they affected by environmental conditions? How has the agriculture industry in India changed over time?
How have pricing patterns changed throughout time? How much does it cost to cultivate different crops in the country?
What is the farmer suicide rate in the nation, and what proportion of farmers are covered via the government’s schemes?
How have exports changed over time, and which main crops are included?
What relationship exists between yearly rainfall and agricultural production?
What is the relationship between area and distribution of production?
Combining into Groups and Creating Hierarchies
The dataset includes 143 distinct crops. To get a better understanding of these crops, we divided them into ten categories: Fruit, pulses, staple crops, oilseeds, and spices, Nuts, Vegetables, Commercial, and Fibers. Furthermore, we established a hierarchy for the 21 states / union territories by categorizing them into seven groups depending on their regions. Figure 4
Area and production are two significantly skewed data divisions. I noticed that sugarcane as a crop and Uttar Pradesh as a state are both contributing to the data’s skewness owing to the state’s large agricultural area. As a result, we removed those datasets in further research to increase the model’s accuracy. Figure 5 and 6
Crop Statistics in the country
The area under cultivation of different crops demonstrates that rice and wheat contributes 75% of the country’s total food grain output. The area devoted to these crops has generally been stable, with slight fluctuations. However, the country’s growing need may be attributed to the continuous rise in agricultural prices in the coming years. Additionally, the nation has maintained a consistent growth rate from the pre-green revolution till 2012. The abrupt rise in costs during 2008 may be attributed to the 2007-2008 food crisis. Shown in figure 7
Annual Rainfall and its effect on production
By examining the yearly rainfall and production patterns, it can be seen that the trend of output is mostly determined by the amount of annual rainfall experienced by the country. The following pattern is found by mapping the connection between yearly rainfall and productivity.
The crop with the greatest output is sugarcane, of which India is also the world’s second biggest producer. Coconut is yet another crop that is in great demand. Wheat, rice, and cotton are all commonly grown crops.
Area and its correlation with production
By graphing the distribution of area against production, we discovered four clusters of (region, crop) combinations that helped us comprehend the different patterns seen in terms of cultivation area and output.
Cluster 1: Correlations are weaker (figure 10)
Consists of medium-sized states such as Gujarat, Bihar, and West Bengal where the section under cultivation is not directly proportional to the amount of food produced.
Cluster 2: Consists of minor cold area states like Himachal Pradesh, Jammu &Kashmir, Arunachal Pradesh, and Jammu & Kashmir. The region may be considered to be linked with the output in such situations.
Cluster 3: Has a low correlation and includes states with a large land area such as Rajasthan, Maharashtra, and Madhya Pradesh. The disproportionate representation may be ascribed to these areas’ aridity and lack of enough rainfall.
Cluster 4: High degree of correlation. Comprises states such as Uttar Pradesh and parts of Maharashtra that get a plenty of rainfall and have favorable weather patterns.
Suicides among farmers and the number of farmers insured under different programs
Suicide among farmers has been a tragic reality in the agricultural industry. Additionally, the economic issue at hand entails safeguarding these farmers and their families via different state government programs such as WBCIS, MNAIS, and others. Nonetheless, the chart below illustrates the number of farmer deaths in relation to the proportion of farmers insured in different states across the nation.
The resulting model demonstrates that only 51.24 percent, 14.21 percent, and 10.95 percent of farmer suicides occurred in the three states of Maharashtra, Andhra Pradesh, and Tamil Nadu. This raises issues for both government and business, since the FMCG industry, which is highly reliant on farmer tie-ups, should strive for insurance coverage for farmer fatalities. This also demonstrates the farmers’ lack of financial inclusivity and the areas in which insurance firms can focus their efforts. Additionally, we can see that states with a higher suicide rate also have a higher cost of production per hectare of land. Figure 11
The Indian agricultural sector has seen a continuous growth in exports, as well as a consistent increase in the price of those products. These may also be used to plan out a strategy for increased output in conjunction with improved irrigation infrastructure provided by the government. Additionally, wheat, fresh onions, rice, and maize are significant exports. Maize is the most expensive of the exports. The Indian agricultural sector has seen a continuous growth in exports, as well as a consistent increase in the price of those products. These may also be used to plan out a strategy for increased output in conjunction with improved irrigation infrastructure provided by the government. Additionally, rice, wheat, maize, and fresh onions are significant exports. Maize is the most expensive of the exports.
The use of data analytics to agricultural technologies has extensive consequences. It is capable of optimizing supply chains, recommending suitable agricultural methods depending on climatic conditions, and forecasting demand to meet necessary output, among other capabilities. Additionally, depiction of crop distribution within a state may assist businesses in making site choices for their factories. Additionally, the use of analytics in agriculture has the potential to usher in the next great agricultural revolution. Figure 14
As per the data and analysis, one of the most effective ways to boost output levels is via broad adoption of new science-based alternatives. Apart from better crop protection measures and better seeds, farmers may boost their production via the use of contemporary irrigation techniques, mobile technology, crop management tools, farm management software such as Agrivi, fertilizer, and mechanization management. Farmers may now produce so much with so little labor, striving for higher yields while using less inputs, thanks to technological advancements.
The volume and price of exports provide insight into the leverage able prospects of India’s agriculture industry. Rising production levels enable the extra surplus to be utilized to boost agricultural exports. It would not only increase the quality of the product but will also offer more competitive rates on the international market, thus boosting demand for Indian agricultural products.
By identifying clusters of area and productivity, we may pinpoint states where agricultural land is not producing enough and develop strategies to increase output. Mixed farming, cross farming, and other similar techniques may be researched.
By studying the direct effect of rainfall on production, we may utilize weather forecasting to predict the amount of rainfall that a particular area is likely to get, and the center can then construct alternative irrigation systems such as canals, pumps, and so on.
Additionally, the statistics on farmer suicides and the insurance coverage given to these farmers demonstrate how financial inclusivity and awareness about such programs must be taught to farmers. Additionally, the state government may examine the cost of cultivation and perhaps decrease Smart agriculture enables farmers to make more effective use of agricultural inputs such as manures, chemicals, culturing, and irrigation systems. Increased input viability results in increased harvest production and/or quality, all without polluting the environment. However, it has been difficult to quantify the cost-saving benefits of precise agricultural management. At the present time, a portion of the innovations being used are in their infancy, making it tough to quantify hardware and services. This may be used to demonstrate our present financial expressions in relation to a particular invention. Exactness agriculture has the potential to solve both the financial and environmental concerns associated with today’s creation farming. While questions regarding economic viability and the optimal ways to employ the mechanical devices we currently have remain, the concept of “making the best decision in the right place at the right time” has a strong intuitive appeal. The development of massive data analysis in the agricultural sector is resulting in the discovery of novel improving outcomes from these massive data sets.
The future studies will focus on overcoming barriers and using big data analytics in agriculture to extract proficiency from unorganized raw data.
TABLE OF WORD COUNT
|State of the art||425|
|Properties of the data||429|
Pham, X., & Stack, M. (2018). How data analytics is transforming agriculture. Business horizons, 61(1), 125-133.
Hoelscher, J., & Mortimer, A. (2018). Using Tableau to visualize data and drive decision-making. Journal of Accounting Education, 44, 49-59.
Abhisek. (2018, July 18). Crop production statistics in India (1997-2014). Retrieved August 8 2021, from https://www.kaggle.com/abhiseklewan/crop-production-statistics-from-1997-in-india
Lytos, A., Lagkas, T., Sarigiannidis, P., Zervakis, M., & Livanos, G. (2020). Towards smart farming: Systems, frameworks and exploitation of multiple sources. Computer Networks, 172, 107147.
Coble, K. H., Mishra, A. K., Ferrell, S., & Griffin, T. (2018). Big data in agriculture: A challenge for the future. Applied Economic Perspectives and Policy, 40(1), 79-96.
Retrieved August 8 2021, from https://data.gov.in/search/site?query=crops
Wagh, Sanjeev & Bhende, Manisha & Thakare, Anuradha. (2021). Data Visualization Using Tableau. 10.1201/9780429443237-14
Saggi, M. K., & Jain, S. (2018). A survey towards an integration of big data analytics to big insights for value-creation. Information Processing & Management, 54(5), 758-790.