Software For Data Analysis Programming With R Pdf Free Download

05 Des, 2021 Posting Komentar

* The only advanced programming book on R * Begins with simple interactive use and progresses by gradual stages * Written by the award winning author of the S language from which R evolved John Chambers has been the principal designer of the S language since its beginning, and in 1999 received the ACM System Software award for S, the only statistical software to receive this award. He is author or coauthor of the landmark books on S. Now he turns to R, the enormously successful open-source system based on the S language. R's international support and the thousands of packages and other contributions have made it the standard for statistical computing in research and teaching. This book guides the reader through programming with R, beginning with simple interactive use and progressing by gradual stages, starting with simple functions. More advanced programming techniques can be added as needed, allowing users to grow into software contributors, benefiting their careers and the community. R packages provide a powerful mechanism for contributions to be organized and communicated. The techniques covered include such modern programming enhancements as classes and methods, namespaces, and interfaces to spreadsheets or data bases, as well as computations for data visualization, numerical methods, and the use of text data.

Matt Bogard

R is a statistical programming language with a command line interface that is becoming more and more popular every day. I have used R for data visualization, data mining/machine learning, as well as social network analysis. Initially embraced largely in academia, R is becoming the software of choice in various corporate settings.

John Chambers

Nearly everything that happens in R results from a function call. Therefore, basic programming centers on creating and refining functions. Function definition should begin small-scale, directly from interactive use of commands (Section 3.1). The essential concepts apply to all functions, however. This chapter discusses functional programming concepts (Section 3.2, page 43) and the relation between function calls and function objects (3.3, 50). It then covers essential techniques for writing and developing effective functions: details of the language (3.4, 58); techniques for debugging (3.5, 61), including preemptive tracing (3.6, 67); handling of errors and other conditions (3.7, 74); and design of tests for trustworthy software (3.8, 76).

John Chambers

This chapter looks at the organization and construction of R packages. You mainly need this information when you decide to organize your own code into package form, although it's useful to understand packages if you need to modify an existing package or if you have problems installing one.

John Chambers

This chapter surveys a variety of topics dealing with different kinds of data and the computations provided for them. The topics are "basic" in two senses: they are among those most often covered in introductions to R or S-Plus; and most of them go back to fairly early stages in the long evolution of the S language.

John Chambers

One of the main attractions of R is its software for visualizing data and presenting results through displays. R provides functions to generate plots from data, plus a flexible environment for modifying the details of the plots and for creating new software. This chapter examines programming for graphics using R, emphasizing some concepts underlying most of the R software for graphics. The first section outlines the organization of this software. Section 7.2, page 242, relates the software to the x−y plot as the most valuable model for statistical graphics with R. The next four sections provide the essential concepts for computational graphics in R (7.3, 253) and relate those to the three main packages for general-purpose graphics, base graphics (7.4, 263), grid (7.5, 271), and lattice (7.6, 280).

John Chambers

Although statistical computing, following the lead of statistical theory, has tended to be defined mostly in terms of numerical data, many applications arise from information in the form of text. This chapter discusses some computational tools that have proven useful in analyzing text data. Computations in both R and a variety of other systems are useful, and can complement each other. The first section sets out some basic concepts and computations that cut across different systems. Section 8.2, page 294, gives techniques for importing text data into R. Section 8.3, page 298, discusses regular expressions, a key technique for dealing with text. The next two sections apply a variety of techniques in R (8.4, 304) and in Perl (8.5, 309). Finally, Section 8.6, page 318, gives some fairly extended examples of text computations.

John Chambers

This chapter presents techniques for defining new classes of R objects. It and the closely related Chapter 10 on methods represent a more serious level of programming than most of the earlier discussion. Together, the techniques in the two chapters cope with more complex applications while retaining the functional, object-based concepts of programming with R.

John Chambers

This chapter describes the design and implementation of generic functions and methods in R, including basic techniques for creating methods, the definition of generic functions, and a discussion of method dispatch, the mechanism by which a method is chosen to evaluate a call to a generic function. Section 10.2, page 384, describes the fundamental mechanism for creating new methods. Section 10.3, page 387, discusses important functions that are frequently made generic; Section 10.5, page 396, provides more detail on the generic functions themselves; Section 10.6, page 405, discusses method selection—the matching of arguments in a call to the method signatures.

John Chambers

This chapter and the following one discuss techniques for making use of software written in other languages and systems. This chapter covers calling software in C, Fortran, and to some extent C++. Given that R's implementation is based on a program written in C, it's natural that techniques are available for incorporating additional C software not available for general interfaces. This chapter describes several. The simplest interfaces are to routines in C or Fortran that do not include R-dependent features (Section 11.2, page 415). For greater control, at the cost of more programming effort, C routines may manipulate R objects directly (Section 11.3, page 420). Functional interfaces to C++ are discussed in Section 11.4, page 425, although the difference in programming model in this case is discussed in Section 12.6, page 440. For trustworthy software, the interfaces to C and Fortran should be registered as part of initializing a package (Section 11.5, page 426).

John Chambers

This chapter discusses general inter-system interfaces between computations in R and those done in other languages and systems. "Other" generally has two senses here: The implementation usually involves communicating with another application; and more fundamentally, the computational model for the other system may be different from that in R.

John Chambers

This chapter takes a modest look under the hood at the R engine, that is, the program that runs R. As with an automobile, you can use R without worrying very much about how it works. But computing with data is more complicated than driving a car (fortunately for highway safety), and at some point you may need to make some basic choices between different approaches to your computation. Understanding something about what actually happens will often show that some approaches suit the system better than others.

... First, it has shortcomings when dealing with large files -a single sample file size starts from 12Gb. Second, the developers restrict the access to the source code, reducing the possibilities of software modification and adaptation to particular research needs [19,26]. Third, the code's obscurantism veils the analytical workflow and forces the user to choose between a finite set of pre-processing steps. ...

... These options fall short when compared with typical spectral analysis steps [27]. Given the software limitations, we set out to develop a program able to automate the analysis of µFTIR images built on trustworthy and reproducible research principles(see Chambers's [26] explanation on the subject). Our main goal was to implement a set of front-end tools to analyze the output of µFTIR spectrometers. ...

... The package improves the reproducibility of the results, since procedural scripts can be shared and published together with scientific articles. The software open-source nature allows trustworthy analysis and scientific communication [26]. Moreover, R -a functional programming language -is strongly modular, facilitating the addition of new functions and analytical techniques. ...

uFTIR is an R package that implements an automatic approach to analyze μFTIR hyperspectral images with a strong focus on microplastic recognition in environmental samples. The package performs image classification using a Spectral Angle Mapper algorithm in a library search approach. It interacts with other R packages used for spectral analysis. It exports its output as raster and vector files that can be post-processed in common Geographical Information Systems software. The package was designed around the principles of modular development, compatibility, and open-source software. We hope our contribution will serve researchers to size the occurrence of microplastics in ecosystems.

... Throughout this chapter we assume that the reader has some prior knowledge of R, and we give code examples without dealing with the basics of R, as for example the different types of R objects. For more details on R and its programming language, the reader may refer to the many courses available for free on the R and Bioconductor websites as well as many good books on R and Bioconductor such as [14], [15], [16]. ...

... Because the Principal Component Analysis (PCA) is based on the eigendecomposition of the covariance matrix, it will summarize the same information that appears in the correlation matrix. The prcomp function can be used to perform PCA in R. 15. In general, it is necessary to carry out a first MS-based experiment with a minimum number of replicates per biological condition (at least 3 to be able to estimate the variance of intensities for each modified peptide in each condition). ...

Quentin Giai Gianetto

Protein post-translational modifications (PTMs) are essential elements of cellular communication. Their variations in abundance can affect cellular pathways, leading to cellular disorders and diseases. A widely used method for revealing PTM-mediated regulatory networks is their label-free quantitation (LFQ) by high-resolution mass spectrometry. The raw data resulting from such experiments are generally interpreted using specific software, such as MaxQuant, MassChroQ or Proline for instance. They provide data matrices containing quantified intensities for each modified peptide identified. Statistical analyses are then necessary (1) to ensure that the quantified data are of good enough quality and sufficiently reproducible, (2) to highlight the modified peptides that are differentially abundant between the biological conditions under study. The objective of this chapter is therefore to provide a complete data analysis pipeline for analyzing the quantified intensities of modified peptides in presence of two or more biological conditions using the R software. We illustrate our pipeline starting from MaxQuant outputs dealing with the analysis of A549-ACE2 cells infected by SARS-CoV-2 at different time stamps, freely available on PRIDE (PXD020019).

... Pois como foi descrito por Ihaka e Gentleman (1996), a linguagem R foi desenvolvida com fortes influências das linguagens S e Scheme. Sendo que a própria sintaxe da linguagem R, se assemelha muito a da linguagem S. Por isso, muitos autores como Peng (2015) e Chambers (2008), caracterizam a linguagem R como um dialeto da linguagem S. Segundo Ihaka e Gentleman (1996) a linguagem S representava uma forma concisa de se expressar idéias e operações estatísticas para um computador e, por isso, foi uma fonte de inspiração importante para o R. Em outras palavras, comparado às demais linguagens, a linguagem S oferecia uma sintaxe mais atrativa e confortável para estatísticos executarem as suas ideias, e grande parte dessa sintaxe, foi transportada para o R. ...

... Além disso, temos diversos livros-textos importantes sobre a linguagem, que oferecem diversos conhecimentos extremamente valiosos, como as obras de Wickham e Grolemund (2017), Gillespie e Lovelace (2017), Peng (2015), Grolemund (2014), Chambers (2008), Adler (2010), além da documentação oficial da linguagem presente em R Core Team (2020a), R Core Team (2020b). ...

Este livro é um trabalho em constante desenvolvimento e expansão, e busca oferecer uma descrição profunda sobre os fundamentos da linguagem R, e como eles se aplicam no contexto da análise de dados. Sua principal contribuição para a literatura brasileira hoje, está no combate de dois problemas recorrentes nos materiais disponíveis em português sobre a linguagem: 1) a falta de profundidade de muitos materiais, que tentam abordar muitos assuntos em um espaço muito curto; 2) a alta especialização de muitos materiais, que são de difícil transposição para aplicações gerais em análises de dados.

... The class methods implement functionalities for accessing and modifying the data and for interaction between objects, which are primarily mathematical operations. The object orientation is realized in R via the S4 object system (Chambers 2008). This system fulfills most of the fundamental concepts of object-oriented programming listed in Armstrong (2006) and is thus more rigorous than R's widely used S3 system, which is used, e.g., by fda or fda.usc. ...

... For 'funData' and 'irregFunData', the data are organized in two fields or slots, as they are called for S4 classes (Chambers 2008): The slot @argvals contains the observation points and the slot @X contains the observed data. For 'funData', the @argvals slot is a list, containing the common sampling grid for all functions and @X is an array containing all observations. ...

Clara Happ-Kurz

This paper introduces the funData R package as an object-oriented implementation of functional data. It implements a unified framework for dense univariate and multivariate functional data on one- and higher dimensional domains as well as for irregular functional data. The aim of this package is to provide a user-friendly, self-contained core toolbox for functional data, including important functionalities for creating, accessing and modifying functional data objects, that can serve as a basis for other packages. The package further contains a full simulation toolbox, which is a useful feature when implementing and testing new methodological developments. Based on the theory of object-oriented data analysis, it is shown why it is natural to implement functional data in an object-oriented manner. The classes and methods provided by funData are illustrated in many examples using two freely available datasets. The MFPCA package, which implements multivariate functional principal component analysis, is presented as an example for an advanced methodological package that uses the funData package as a basis, including a case study with real data. Both packages are publicly available on GitHub and the Comprehensive R Archive Network.

... The state of a workspace can be snapshotted to a file and restored as needed for inspection, amendment, and execution, while preserving the state of the computation during periods of inactivity. R [R Core Team 2019] uses a similar workspace [Chambers 2008] concept. In contrast to our work, the systems mentioned above do not use any points-to analysis, but rely solely on already executed code for the snapshot. ...

Christian Wimmer
Codruţ Stancu
Peter Hofer
Thomas Wuerthinger

Arbitrary program extension at run time in language-based VMs, e.g., Java's dynamic class loading, comes at a startup cost: high memory footprint and slow warmup. Cloud computing amplifies the startup overhead. Microservices and serverless cloud functions lead to small, self-contained applications that are started often. Slow startup and high memory footprint directly affect the cloud hosting costs, and slow startup can also break service-level agreements. Many applications are limited to a prescribed set of pre-tested classes, i.e., use a closed-world assumption at deployment time. For such Java applications, GraalVM Native Image offers fast startup and stable performance. GraalVM Native Image uses a novel iterative application of points-to analysis and heap snapshotting, followed by ahead-of-time compilation with an optimizing compiler. Initialization code can run at build time, i.e., executables can be tailored to a particular application configuration. Execution at run time starts with a pre-populated heap, leveraging copy-on-write memory sharing. We show that this approach improves the startup performance by up to two orders of magnitude compared to the Java HotSpot VM, while preserving peak performance. This allows Java applications to have a better startup performance than Go applications and the V8 JavaScript VM.

... Due to the under-sampling, not all of the 84,919 eligible negative encounters were included in the training and testing. All statistical analyses were performed using R (version 3.6.0)[20]. ...

Background: With the limited availability of testing for the presence of the SARS-CoV-2 virus and concerns surrounding the accuracy of existing methods, other means of identifying patients are urgently needed. Previous studies showing a correlation between certain laboratory tests and diagnosis suggest an alternative method based on an ensemble of tests. Methods: We have trained a machine learning model to analyze the correlation between SARS-CoV-2 test results and 20 routine laboratory tests collected within a 2-day period around the SARS-CoV-2 test date. We used the model to compare SARS-CoV-2 positive and negative patients. Results: In a cohort of 75,991 veteran inpatients and outpatients who tested for SARS-CoV-2 in the months of March through July, 2020, 7,335 of whom were positive by RT-PCR or antigen testing, and who had at least 15 of 20 lab results within the window period, our model predicted the results of the SARS-CoV-2 test with a specificity of 86.8%, a sensitivity of 82.4%, and an overall accuracy of 86.4% (with a 95% confidence interval of [86.0%, 86.9%]). Conclusions: While molecular-based and antibody tests remain the reference standard method for confirming a SARS-CoV-2 diagnosis, their clinical sensitivity is not well known. The model described herein may provide a complementary method of determining SARS-CoV-2 infection status, based on a fully independent set of indicators, that can help confirm results from other tests as well as identify positive cases missed by molecular testing.

... (10)] within R software version 3.3.3 (11). ...

Guilan Li
Yang Gao
Kun Li
Zujun Jiang

Acute myeloid leukemia (AML) is the most common childhood cancer and is a major cause of morbidity among adults with hematologic malignancies. Several novel genetic alterations, which target critical cellular pathways, including alterations in lymphoid development-regulating genes, tumor suppressors and oncogenes that contribute to leukemogenesis, have been identified. The present study aimed to identify molecular markers associated with the occurrence and poor prognosis of AML. Information on these molecular markers may facilitate prediction of clinical outcomes. Clinical data and RNA expression profiles of AML specimens from The Cancer Genome Atlas database were assessed. Mutation data were analyzed and mapped using the maftools package in R software. Kyoto Encyclopedia of Genes and Genomes, Reactome and Gene Ontology analyses were performed using the clusterProfiler package in R software. Furthermore, Kaplan-Meier survival analysis was performed using the survminer package in R software. The expression data of RNAs were subjected to univariate Cox regression analysis, which demonstrated that the mutation loads varied considerably among patients with AML. Subsequently, the expression data of mRNAs, microRNAs (miRNAs/miR) and long non-coding RNAs (lncRNAs) were subjected to univariate Cox regression analysis to determine the the 100 genes most associated with the survival of patients with AML, which revealed 48 mRNAs and 52 miRNAs. The top 1,900 mRNAs (P<0.05) were selected through enrichment analysis to determine their functional role in AML prognosis. The results demonstrated that these molecules were involved in the transforming growth factor-β, SMAD and fibroblast growth factor receptor-1 fusion mutant signaling pathways. Survival analysis indicated that patients with AML, with high MYH15, TREML2, ATP13A2, MMP7, hsa-let-7a-2-3p, hsa-miR-362-3p, hsa-miR-500a-5p, hsa-miR-500b-5p, hsa-miR-362-5p, LINC00987, LACAT143, THCAT393, THCAT531 and KHCAT230 expression levels had a shorter survival time compared with those without these factors. Conversely, a high KANSL1L expression level in patients was associated with a longer survival time. The present study determined genetic mutations, mRNAs, miRNAs, lncRNAs and signaling pathways involved in AML, in order to elucidate the underlying molecular mechanisms of the development and recurrence of this disease.

... Typically, researchers need access to technology packages such as Adobe Photoshop TM , GIMP 1 , or NIH ImageJ (Schneider et al., 2012), but these all have their own limitations such as file size restrictions or lack of parallel processing support. Some basic scripting ability and access to data analysis software such as ImageMagick (The ImageMagick Development Team, 2020), R (Chambers, 2008) or Matlab (MATLAB, 2010) are useful. However, even with these tools and skills, transformations can be time consuming when applied to hundreds of images for whole brain comparative studies. ...

With recent technological advances in microscopy and image acquisition of tissue sections, further developments of tools are required for viewing, transforming, and analyzing the ever-increasing amounts of high-resolution data produced. In the field of neuroscience, histological images of whole rodent brain sections are commonly used for investigating brain connections as well as cellular and molecular organization in the normal and diseased brain, but present a problem for the typical neuroscientist with no or limited programming experience in terms of the pre- and post-processing steps needed for analysis. To meet this need we have designed Nutil, an open access and stand-alone executable software that enables automated transformations, post-processing, and analyses of 2D section images using multi-core processing (OpenMP). The software is written in C++ for efficiency, and provides the user with a clean and easy graphical user interface for specifying the input and output parameters. Nutil currently contains four separate tools: (1) A transformation toolchain named "Transform" that allows for rotation, mirroring and scaling, resizing, and renaming of very large tiled tiff images. (2) "TiffCreator" enables the generation of tiled TIFF images from other image formats such as PNG and JPEG. (3) A "Resize" tool completes the preprocessing toolset and allows downscaling of PNG and JPEG images with output in PNG format. (4) The fourth tool is a post-processing method called "Quantifier" that enables the quantification of segmented objects in the context of regions defined by brain atlas maps generated with the QuickNII software based on a 3D reference atlas (mouse or rat). The output consists of a set of report files, point cloud coordinate files for visualization in reference atlas space, and reference atlas images superimposed with color-coded objects. The Nutil software is made available by the Human Brain Project (https://www.humanbrainproject.eu) at https://www.nitrc.org/projects/nutil/.

... All means were calculated by excel software, and R-software was use to proceed statistical analysis (Chambers, 2008). ...

Roger DARMAN Djoulde
Matsowa Bouopda Sidoine
Venassius Wirnkar Lendzemo

In order to produce biscuits from off‐season sorghum, a local "Muskwari" sorghum was milled and sieved. This flour was used to produce shortbread biscuits with different substitutions rates of wheat flour to that of sorghum. The standard formulation of this same type of shortbread biscuits was used and biscuits were produced with incorporation rates of wheat flour to that of sorghum, from 0% to 100%, with a gap of 10 between two consecutives percentages. The technological characterization of the sorghum flour produced indicates a good water absorption capacity, and interesting solubility index and swelling rate. Technological aspect indicated that by changing speed and kneading time, resting the dough, it is possible to produce 100% sorghum flour shortbread biscuits. Shortbread biscuits made from 70% of wheat flour incorporation had the best average scores for overall preference criteria (6.97 ± 1.30), color (7.1 ± 1.45), and texture (6.62 ± 1.54). For smell and taste criteria, the 40% biscuits and the witness received the highest average scores, respectively, namely 6.77 ± 1.55 for smell and 7.12 ± 1.29 for taste. Analysis of the nutritional and energy intake of the control biscuit and the 70% substitution revealed that between the two, the latter had a significantly higher intake of total carbohydrates (58.51 g), dietary fiber (2.15 g), and total energy (454.1 kcal In order to produce shortbread biscuits from off‐season sorghum, a local "Muskwari" sorghum was milled and sieved. The obtained flour was of 0.5 mm grain size. This flour was used to produce shortbread biscuits with different substitutions rates of wheat flour to that of sorghum.

... Other monthly temporal lags 182 (i.e. 2 and 3 months before trap inspection) were tested but their coefficients were not 183 statistically significant (see Table A2 in Appendix A). Analyses were carried out in R-cran 184 software (Chambers 2008). 185 Population dynamics parameters are obtained from a statistical Moran curve approach, which 186 accounts for unlimited growth rate ( 0 ) which modulates the density independent mortality 187 (DI) and the density dependent mortality (DD). ...

Understanding geographic population dynamics of mosquitoes is an essential requirement for estimating the risk of mosquito-borne disease transmission and geographically targeted interventions. However, the use of population dynamics measures as predictors in spatio-temporal point processes has not been investigated before. In this work we compared the model fitting statistics of four spatio-temporal log-Gaussian Cox models: (i) with no predictors; (ii) mosquito abundance as predictor; (iii) intrinsic growth rate as predictor; (iv) intrinsic growth rate and density of mosquitoes as predictors. This analysis is based on Aedes aegypti mosquito surveillance and human dengue data obtained from the urban area of Caratinga, Brazil. We used a statistical Moran Curve approach to estimate the intrinsic growth rate and a zero inflated Poisson kriging model for estimating mosquito abundance at locations of dengue cases. The incidence of dengue cases were positively associated with mosquito intrinsic growth rate and this model outperformed, in terms of predictive accuracy, the abundance and the null models. The latter includes only the spatio-temporal random effect but no predictors. In the light of these results we suggest that the intrinsic growth rate should be investigated further as a potential tool for predicting the risk of dengue transmission and targeting health interventions for vector-borne diseases.

... This measures data in terms of its principal components rather than on a normal x-y axis. A total of 196 soil pit data with all physical, chemical and geographic information were processed in R studio (for description see Chambers, 2008) using the general PCA and 'ggfortify' library. The number of principle components (PC) selected for further analysis was based on Scree plots and the Kaiser criteria which suggests that eigenvalues for each PC should be above one (Abdi and Beaton, 2019). ...

Currently, Land Use Land Use Change and Forestry (LULUCF) reporting for national inventory purposes largely relies on tier 1 reporting methodologies, due to the lack of availability of soil property and other activity data at an adequate spatial resolution. In order to better inform coherent climate mitigation strategies and to enhance knowledge in this area, the SOLUM project developed a spatially-integrated soils and land use dataset for Ireland that could inform more robust tier 2 and 3 estimates of Soil Organic Carbon (SOC) stock changes. A spatially integrated Land Use and Soil inventory for Ireland (LUSII), was developed using existing datasets including Land Use Cover Area Survey (LUCAS) and the Soil Information System (SIS) for Ireland. LUCAS data were reclassified and a rule-based approach was developed that ascribed grassland land use classes that are relevant for the LULUCF sector of national greenhouse gas emission and removals inventories. Soil data were then aligned to the land use points, where soils data from SIS were reclassified to soil groups in accordance with World reference Base (WRB) taxonomic classes. A rule-based classification approach using soil texture as a key variable was also incorporated. The ability of this approach to potentially detect changes in land use and associated soil carbon over time was demonstrated using the repeated LUCAS survey data, county specific soil data and ortho-photography information. Tier 2 SOC factors for different soils, land uses and management types were developed using a newly collated soils database and a novel approach of classifying soils into clusters based on soil textural properties. Building on the existing National Soil Database (NSDB) and through the interrogation of SIS data, a systematic and robust approach to stratify soils into categories, from which SOC factors could be derived was developed using principle component analysis (PCA) to identify which important physical, chemical or geographical features may be used to categorise soil series in to groups (clusters). The development of tier 2 factors incorporated a rule-based approach to derive soil clusters and land use classification within a geospatial framework. While it is not possible to directly compare reference SOC values for tier 1 and 2 approaches due to the soil classification systems used, the comparison of land use and management factors on SOC reference values highlighted some contrasting trends between these approaches. However, these differences are very small and within the bounds of statistical uncertainty. The overall uncertainty of the developed tier 2 models is 47%, which is nearly half of the uncertainty estimate cited for tier 1 methods, and the adoption of tier 2 methods could improve the LULUCF inventory. However, the introduction of new tier 2 models for reporting SOC stock changes could have a large impact on overall emission/removal trends. Finally, the ECOSSE biogeochemical model was used to simulate SOC in Irish agricultural systems to gain a better process-based understanding of the main factors influencing soil carbon stock changes at the point/site scale and country scale under mineral soils with different land use/management practices. When sourcing the main model inputs, particularly data for the soil characteristics, the lack of data on repeated measurements of SOC over time for Irish sites became apparent. In response, SOC data from the NSDB, Teagasc-SIS and LUCAS databases were used, where data were carefully matched to identify 83 grassland sites that could be modelled. These grasslands were then grouped into management categories defined by stocking rates, to assess the impacts of management intensity on SOC. The model outputs indicated an overestimation of SOC, highlighted the sensitivity of the model to the initial SOC inputs and demonstrated the need for replicated measurements of SOC over time to improve model evaluation and eventual parameterization.

... R studio is software related to computing and data processing for statistics (Chambers, 2008). R Studio is an integrated development environment (IDE) for R software which is a programming language for statistics and graphics. ...

Annisa Alma Yunia
Dianne Amor Kusuma
Bambang Suhandi
Budi Nurani Ruchjana

Indonesia is a tropical country that has two seasons, rainy and dry. Nowadays, the earth is experiencing the climate change phenomenon which causes erratic rainfall. The rainfall is influenced by several factors, one of which is the local scale factor. This research was aimed to build a rainfall model in Sulawesi to find out how the rainfall relationship with local scale factor in Sulawesi. In this research, the data used were secondary data which consisted of 15 samples with 6 variables from Badan Pusat Statistik (BPS). The limitation of the sample size in this study was due to the limited secondary data available in the field. The data was processed using Principal Component Regression Analysis. The first step was reducing local scale factor variables so that the principal component variable could be obtained that can explain variability from the original data which then that variable was analyzed using principal regression analysis. The data were analyzed by utilizing R Studio software. The results show that two principal component variables can explain 75.2% of the variability of original data and only one principal component variable that was significant to the rainfall variable. The regression model explained that the relationship between rainfall, humidity, air temperature, air pressure, and solar radiation was in the same direction while the relationship between rainfall and wind velocity was not in the same direction. Overall, the results of the study provided an overview of the application of the Principal Component Regression analysis to model the rainfall phenomenon in the Sulawesi region using the R program.

... To evaluate the spatial component, the geographic locations (x,y) of the sampling sites were transformed to Cartesian coordinates using the SoDA package (72), and the Euclidean distance was calculated using vegan. Distance-decay curves were produced using linear regressions of the Euclidean distance of the geographic locations against the Bray-Curtis dissimilarity distance and the Euclidean distance of scaled environmental variables. ...

Bacterial community composition is largely influenced by environmental factors, and this applies to the Arctic region. However, little is known about the role of spatial factors in structuring such communities. In this study, we evaluated the influence of spatial scale on bacterial community structure across an Arctic landscape. Our results showed that spatial factors accounted for approximately 10% of the variation at the landscape scale, equivalent to observations across the whole Arctic region, suggesting that while the role and magnitude of other processes involved in community structure may vary, the role of dispersal may be stable globally in the region. We assessed dispersal limitation by identifying the spatial autocorrelation distance, standing at approximately 60 m, which would be required in order to obtain fully independent samples and may inform future sampling strategies in the region. Finally, indicator taxa with strong statistical correlations with environment variables were identified. However, we showed that these strong taxa-environment associations may not always be reflected in the geographical distribution of these taxa. IMPORTANCE The significance of this study is threefold. It investigated the influence of spatial scale on the soil bacterial community composition across a typical Arctic landscape and demonstrated that conclusions reached when examining the influence of specific environmental variables on bacterial community composition are dependent upon the spatial scales over which they are investigated. This study identified a dispersal limitation (spatial autocorrelation) distance of approximately 60 m, required to obtain samples with fully independent bacterial communities, and therefore, should serve to inform future sampling strategies in the region and potentially elsewhere. The work also showed that strong taxa-environment statistical associations may not be reflected in the observed landscape distribution of the indicator taxa.

... Individual parameters of the photosynthetic models were estimated as follows: apparent maximum quantum yield (α), light compensation point (I comp ) and the light-saturated rate of net photosynthesis (A sat ) were modelled using the function nls2 (https ://cran.r-proje ct.org/web/packa ges/nls2/) in the R statistical software version 3.6.1 27 . This function determines the non-linear (weighted) least-square estimates of the parameters of a non-linear model 31 . ...

The study estimates the parameters of the photosynthesis–irradiance relationship (PN/I) of a sedge-grass marsh (Czech Republic, Europe), represented as an active "green" surface—a hypothetical "big-leaf". Photosynthetic parameters of the "big-leaf" are based on in situ measurements of the leaf PN/I curves of the dominant plant species. The non-rectangular hyperbola was selected as the best model for fitting the PN/I relationships. The plant species had different parameters of this relationship. The highest light-saturated rate of photosynthesis (Asat) was recorded for Glyceria maxima and Acorus calamus followed by Carex acuta and Phalaris arundinacea. The lowest Asat was recorded for Calamagrostis canescens. The parameters of the PN/I relationship were calculated also for different growth periods. The highest Asat was calculated for the spring period followed by the summer and autumn periods. The effect of the species composition of the local plant community on the photosynthetic parameters of the "big-leaf" was addressed by introducing both real (recorded) and hypothetical species compositions corresponding to "wet" and "dry" hydrological conditions. We can conclude that the species composition (or diversity) is essential for reaching a high Asat of the "big-leaf "representing the sedge-grass marsh in different growth periods.

... Pour leurs positions, on place k noeudsà droite de max(t) et k noeudsà gauche de min(t) (Kochevar, 1974). Ainsi, on obtient #t + k fonctions B-spline dans la base (Chambers, 2008). ...

Khader Khadraoui

Nous étudions la régression bayésienne sous contraintes de régularité et de forme. Pour cela,on considère une base de B-spline pour obtenir une courbe lisse et nous démontrons que la forme d'une spline engendrée par une base de B-spline est contrôlée par un ensemble de points de contrôle qui ne sont pas situés sur la courbe de la spline. On propose différents types de contraintes de forme (monotonie, unimodalité, convexité, etc). Ces contraintes sont prises en compte grâce à la loi a priori. L'inférence bayésienne a permis de dériver la distribution posteriori sous forme explicite à une constante près. En utilisant un algorithme hybride de type Metropolis-Hastings avec une étape de Gibbs, on propose des simulations suivant la distribution a posteriori tronquée. Nous estimons la fonction de régression par le mode a posteriori. Un algorithme de type recuit simulé a permis de calculer le mode a posteriori. La convergence des algorithmes de simulations et du calcul de l'estimateur est prouvée. En particulier, quand les noeuds des B-splines sont variables, l'analyse bayésienne de la régression sous contrainte devient complexe. On propose des schémas de simulations originaux permettant de générer suivant la loi a posteriori lorsque la densité tronquée des coefficients de régression prend des dimensions variables.

... All statistical calculations were implemented using Mplus Version 7.3 and R [25,26]. The weighted least square mean and variance adjusted (WLSMV) estimator was applied. ...

The Depression Anxiety and Stress Scales-21 (DASS-21) involves a simple structure first-order three-factor oblique model, with factors for depression, anxiety, and stress. Recently, concerns have been raised over the value of using confirmatory factor analysis (CFA) for studying the factor structure of scales in general. However, such concerns can be circumvented using exploratory structural equation modeling (ESEM). Consequently, the present study used CFA and ESEM with target rotation to examine the factor structure of the DASS-21 among an adult community. It compared first-order CFA, ESEM with target rotation, bi-factor CFA (BCFA), and bi-factor BESEM with target rotation models with group/specific factors for depression, anxiety, and stress. A total of 738 adults (males = 374, and females = 364; M = 25.29 years; SD = 7.61 years) completed the DASS-21. While all models examined showed good global fit values, one or more of the group/specific factors in the BCFA, ESEM with target rotation and BESEM with target rotation models were poorly defined. As the first-order CFA model was most parsimonious, with well-defined factors that were also supported in terms of their reliabilities and validities, this model was selected as the preferred DASS-21 model. The implications of the findings for use and revision of the DASS-21 are discussed.

... The power analysis was performed using the power.anova.test function in the R statistical analysis software [38] according to Cohen [39]. The power analysis showed that the statistical power for comparing the IL-1β response between the three experimental groups was 59.4% at the p=0.05 level and 71.5% at the p=0.10 level. ...

Adrienne M. Kania
Kailee N. Weiler
Angeline P. Kurian
Harald M Stauss

Context The parasympathetic-mediated inflammatory reflex inhibits excessive proinflammatory cytokine production. Noninvasive techniques, including occipitoatlantal decompression (OA-D) and transcutaneous auricular vagus nerve stimulation (taVNS), have been demonstrated to increase parasympathetic tone. Objectives To test the hypothesis that OA-D and taVNS increase parasympathetic nervous system activity and inhibit proinflammatory cytokine mobilization and/or production. Methods Healthy adult participants were randomized to receive OA-D (5 min of OA-D followed by 10 min of rest; n=8), taVNS (15 min; n=9), or no intervention (15 min, time control; n=10) on three consecutive days. Before and after these interventions, saliva samples were collected for determination of the cytokines interleukin-1β (IL-1β), interleukin-6 (IL-6), interleukin-8 (IL-8), and tumor necrosis factor α (TNF-α). Arterial blood pressure and the electrocardiogram were recorded for a 30-min baseline, throughout the intervention, and during a 30-min recovery period to derive heart rate and blood pressure variability markers as indices of vagal and sympathetic control. Results OA-D and taVNS increased root mean square of successive RR interval differences (RMSSD) and high frequency heart rate variability, which are established markers for parasympathetic modulation of cardiac function. In all three groups, the experimental protocol was associated with a significant increase in salivary cytokine concentrations. However, the increase in IL-1β was significantly less in the taVNS group (+66 ± 13 pg/mL; p<0.05) than in the time control group (+142 ± 24 pg/mL). A similar trend was observed in the taVNS group for TNF-α (+1.7 ± 0.3 pg/mL vs. 4.1 ± 1.3 pg/mL; p<0.10). In the OA-D group baseline IL-6, IL-8, and TNF-α levels on the third study day were significantly lower than on the first study day (IL-6: 2.3 ± 0.4 vs. 3.2 ± 0.6 pg/mL, p<0.05; IL-8: 190 ± 61 vs. 483 ± 125 pg/mL, p <0.05; TNF-α: 1.2 ± 0.3 vs. 2.3 ± 0.4 pg/mL, p<0.05). OA-D decreased mean blood pressure from the first (100 ± 8 mmHg) to the second (92 ± 6 mmHg; p<0.05) and third (93 ± 8 mmHg; p<0.05) study days and reduced low frequency spectral power of systolic blood pressure variability (19 ± 3 mmHg ² after OA-D vs. 28 ± 5 mmHg ² before OA-D; p<0.05), a marker of sympathetic modulation of vascular tone. OA-D also increased baroreceptor-heart rate reflex sensitivity from the first (13.7 ± 3.0 ms/mmHg) to the second (18.4 ± 4.3 ms/mmHg; p<0.05) and third (16.9 ± 4.2 ms/mmHg; p<0.05) study days. Conclusions Both OA-D and taVNS elicited antiinflammatory responses that were associated with increases in heart rate variability-derived markers for parasympathetic function. These findings suggest that OA-D and taVNS activate the parasympathetic antiinflammatory reflex. Furthermore, an antihypertensive effect was observed with OA-D that may be mediated by reduced sympathetic modulation of vascular tone and/or increased baroreceptor reflex sensitivity.

... Comparisons of data obtained before (baseline) and after (recovery) the interventions were done by the nonparametric Wilcoxon test for repeated measures. For these comparisons, we also computed a retrospective power analysis using the R statistical analysis software [29] according to Cohen [30]. Statistical significance is assumed at p<0.05 and trends are described at p<0.10. Figure 2: Extraction of RR-intervals and PQ-intervals using the Analyzer module of the freely available HemoLab software [27]. ...

Ariana S. Dalgleish
Adrianna Z. Jelen
Adrienne M. Kania
Harald M Stauss

Context Management of atrial fibrillation includes either rhythm control that aims at establishing a sinus rhythm or rate control that aims at lowering the ventricular rate, usually with atrioventricular nodal blocking agents. Another potential strategy for ventricular rate control is to induce a negative dromotropic effect by augmenting cardiac vagal activity, which might be possible through noninvasive and nonpharmacologic techniques. Thus, the hypothesis of this study was that occipitoatlantal decompression (OA-D) and transcutaneous auricular vagus nerve stimulation (taVNS) not only increase cardiac parasympathetic tone as assessed by heart rate variability (HRV), but also slow atrioventricular conduction, assessed by the PQ-interval of the electrocardiogram (EKG) in generally healthy study participants without atrial fibrillation. Objectives To test whether OA-D and/or transcutaneous taVNS, which have been demonstrated to increase cardiac parasympathetic nervous system activity, would also elicit a negative dromotropic effect and prolong atrioventricular conduction. Methods EKGs were recorded in 28 healthy volunteers on three consecutive days during a 30 min baseline recording, a 15 min intervention, and a 30 min recovery period. Participants were randomly assigned to one of three experimental groups that differed in the 15 min intervention. The first group received OA-D for 5 min, followed by 10 min of rest. The second group received 15 min of taVNS. The intervention in the third group that served as a time control group (CTR) consisted of 15 min of rest. The RR- and PQ-intervals were extracted from the EKGs and then used to assess HRV and AV-conduction, respectively. Results The OA-D group had nine participants (32.1%), the taVNS group had 10 participants (35.7%), and the CTR group had nine participants (32.1%). The root mean square of successive differences between normal heartbeats (RMSSD), an HRV measure of cardiac parasympathetic modulation, tended to be higher during the recovery period than during the baseline recording in the OA-D group (mean ± standard error of the mean [SEM], 54.6 ± 15.5 vs. 49.8 ± 15.8 ms; p<0.10) and increased significantly in the taVNS group (mean ± SEM, 28.8 ± 5.7 vs. 24.7 ± 4.8 ms; p<0.05), but not in the control group (mean ± SEM, 31.4 ± 4.2 vs. 28.5 ± 3.8 ms; p=0.31). This increase in RMSSD was accompanied by a lengthening of the PQ-interval in the OA-D (mean ± SEM, 170.5 ± 9.6 vs. 166.8 ± 9.7 ms; p<0.05) and taVNS (mean ± SEM, 166.6 ± 6.0 vs. 162.1 ± 5.6 ms; p<0.05) groups, but not in the control group (mean ± SEM, 164.3 ± 9.2 vs. 163.1 ± 9.1 ms; p=0.31). The PQ-intervals during the baseline recordings did not differ on the three study days in any of the three groups, suggesting that the negative dromotropic effect of OA-D and taVNS did not last into the following day. Conclusions The lengthening of the PQ-interval in the OA-D and taVNS groups was accompanied by an increase in RMSSD. This implies that the negative dromotropic effects of OA-D and taVNS are mediated through an increase in cardiac parasympathetic tone. Whether these findings suggest their utility in controlling ventricular rates during persistent atrial fibrillation remains to be determined.

... To create spatial predictors, the geographic coordinates of the sampling sites were transformed to geodetic cartesian (x,y) coordinates using the SoDA package in R (Chambers, 2008) and the Euclidean distances among the sites was calculated using vegan. These were used for distance-decay curves and Mantel tests as they provide a tangible measure of spatial distance. ...

Although widely used in ecology, comparative analyses of diversity and niche properties are still lacking for microorganisms, especially concerning niche variations. In this study, we identified important topoclimatic, edaphic, spatial and biotic drivers of the alpha and beta diversity of bacterial, archaeal, fungal and protist communities. Then, we calculated the niche breadth and position of each taxon along environmental gradients within all taxonomic groups, to determine how these vary within and between groups. Quantifying the niches of microbial taxa is necessary to then forecast how taxa and the communities they compose might respond to environmental changes. We found that edaphic properties were the most important drivers of both community diversity and composition for all microbial groups. Protists presented the largest niche breadths, followed by bacteria and archaea, with fungi displaying the smallest. Niche breadth generally decreased towards environmental extremes, especially along edaphic gradients, suggesting increased specialisation of microbial taxa in highly selective environments. Overall, we showed that microorganisms have well defined niches, as do macro-organisms, likely driving part of the observed spatial patterns of community variations. Assessing niche variation more widely in microbial ecology should open new perspectives, especially to tackle global change effects on microbes.

... For the analysis of subsequent development of venous thromboembolism, pulmonary thromboembolism, ischemic cerebral infarction, and myocardial infarction, presented as 100-day incidence, we included patients who survived at least 2 weeks postadmission and whose admission date was at least 60 days before the date of the analysis (January 24, 2021). For this Student t test analysis, we combined ondansetron groups 1 and 2. All analyses were conducted with R [27] and Excel (Microsoft) [28]. ...

Background: The coronavirus disease 2019 (COVID-19) pandemic has led to a surge in clinical trials evaluating investigational and approved drugs. Retrospective analysis of drugs taken by COVID-19 inpatients provides key information on drugs associated with better or worse outcomes. Methods: We conducted a retrospective cohort study of 10 741 patients testing positive for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection within 3 days of admission to compare risk of 30-day all-cause mortality in patients receiving ondansetron using multivariate Cox proportional hazard models. All-cause mortality, length of hospital stay, adverse events such as ischemic cerebral infarction, and subsequent positive COVID-19 tests were measured. Results: Administration of ≥8 mg of ondansetron within 48 hours of admission was correlated with an adjusted hazard ratio for 30-day all-cause mortality of 0.55 (95% CI, 0.42-0.70; P < .001) and 0.52 (95% CI, 0.31-0.87; P = .012) for all and intensive care unit-admitted patients, respectively. Decreased lengths of stay (9.2 vs 11.6; P < .001), frequencies of subsequent positive SARS-CoV-2 tests (53.6% vs 75.0%; P = .01), and long-term risks of ischemic cerebral ischemia (3.2% vs 6.1%; P < .001) were also noted. Conclusions: If confirmed by prospective clinical trials, our results suggest that ondansetron, a safe, widely available drug, could be used to decrease morbidity and mortality in at-risk populations.

... The higher the R-squared, the better the model fit. Analyses were done in R (4.0.3), a statistical programming language (Chambers, 2008). The caret package was used for the machine learning analyses, including crossvalidation and hyperparameter tuning (Kuhn, 2012). ...

Objectives To develop and test an internationally applicable mapping function for converting WHODAS-2.0 scores to disability weights, thereby enabling WHODAS-2.0 to be used in cost-utility analyses and sectoral decision-making. Methods Data from 14 countries were used from the WHO Multi-Country Survey Study on Health and Responsiveness, administered among nationally representative samples of respondents aged 18+ years who were non-institutionalized and living in private households. For the combined total of 92,006 respondents, available WHODAS-2.0 items (for both 36-item and 12-item versions) were mapped onto disability weight estimates using a machine learning approach, whereby data were split into separate training and test sets; cross-validation was used to compare the performance of different regression and penalized regression models. Sensitivity analyses considered different imputation strategies and compared overall model performance with that of country-specific models. Results Mapping functions converted WHODAS-2.0 scores into disability weights; R-squared values of 0.700–0.754 were obtained for the test data set. Penalized regression models reached comparable performance to standard regression models but with fewer predictors. Imputation had little impact on model performance. Model performance of the generic model on country-specific test sets was comparable to model performance of country-specific models. Conclusions Disability weights can be generated with good accuracy using WHODAS 2.0 scores, including in national settings where health state valuations are not directly available, which signifies the utility of WHODAS as an outcome measure in evaluative studies that express intervention benefits in terms of QALYs gained.

... Prior to performing the analysis of variance (ANOVA), the data obtained was subjected to homogeneity and normality tests using the R software (Chambers, 2008). ...

Antônio Ramos Cavalcante
W B Lima
Laysa Gabryella de Souza Laurentino
G T M Kubo

The objective of this study was to evaluate the effect of increasing doses of poultry litter biochar, incubated in the soil for different periods, on some attributes of the radish plant. For this, an experiment was carried out in an agricultural greenhouse at the Federal University of Campina Grande-UFCG, where the treatments resulted were the combination of two factors: 4 incubation times (0, 30, 60 and 90 days) and 4 doses of biochar (50, 100, 150 and 200 grams per plant, corresponding to 0, 12.5, 25.00, 37.50 and 50.00 t ha-1 , respectively, with three replications, totalizing 48 experimental plots in a completely randomized design. After the incubation period, radish Raphanus sativus L. (cv. Apolo) was sown and cultivated up to 30 days. The agronomic development of the radish was evaluated with respect to the variables: number of leaves; root length (cm); and fresh and dry phytomass of the leaves and roots (g). For the experimental conditions, it was concluded that the radish culture responded positively to the fertilization with aviary bed biochar and to its incubation time in the soil. The doses interval of 100 to 150 g / plant, was the most adequate and the one that presented the best results for the radish evaluated attributes. The incubation time for the appropriate biochar for radishes would be 30 to 60 days.

... We used R version 4.0.3 statistical software for analyses [39]. ...

Getiye Dejenu Kibret
Daniel Demant
Andrew Hayen

Background: Ethiopia is a Sub-Saharan country with very high neonatal mortality rates, varying across its regions. The rate of neonatal mortality reduction in Ethiopia is slow, and Ethiopia may not meet the third United Nations sustainable development target by 2030. This study aimed to investigate the spatial variations and contributing factors for neonatal mortality rates in Ethiopia. Methods: We analysed data from the 2016 Ethiopian Demographic and Health Survey, which used a two-stage cluster sampling technique with a census enumeration area as primary and households as secondary sampling units. A hierarchical Bayesian logistic regression model was fitted accounting for socio-economic, health service-related and geographic factors. Results: Higher neonatal mortality rates were observed in eastern, northeastern and southeastern Ethiopia, and the Somali region had higher risks of neonatal mortality. Neonates from dry and drought-affected areas had a higher risk of mortality compared to more humid and less drought-affected areas. Application of traditional substances on the cord increased the risk of neonatal mortality (Adjusted Odds Ratio (AOR) =2.05, 95% Confidence Interval (CI): 1.10 to 4.26) and health provider counselling on newborn dangers signs in the first two days after birth had a lower odds of neonatal mortality (AOR=0.34, 95% CI: 0.13, 0.75). Conclusions: Applying traditional substances on the umbilical cord, lack of counselling services about neonatal danger signs within the first two days of birth and residing in dry and drought-affected areas were associated with a higher risk of neonatal mortality. Policy-makers and resource administrators at different administrative levels could leverage the findings to prioritise and target areas identified with higher neonatal mortality rates.

... We developed the application in R (R Development Core Team, 2016), and R-Shiny (Chang et al., 2015). R is a computer programming language for statistical computing, and therefore ideal for data handling and advanced statistical analysis (Chambers, 2008). R-Shiny is a Web framework for the online implementation of programs written in the R language (Chang et al., 2015). ...

... Risk factors such as sex and breed of calves, syndrome and diagnosis made were recorded into Microsoft excel® spread sheet. R Software version 4.0.3 was used to analyze the data's (Chambers 2008). Chi-square test was employed to evaluate the existence of association between risk factors and calves GIT problem. ...

Calf diarrhea has been regarded as the most common cause of neonatal morbidity and mortality. A case study was conducted from September 2018 to August 2019 as health monitoring and routine follow-up of cross breed calves in HARC dairy farm with objectives of determining gastrointestinal health problems of calves in the farm. A total of 469 clinical cases of crossbreed calves were diagnosed with gastrointestinal health problems. Among these, 95.75 %(449) of the cases were diagnosed as gastroenteritis of crossbreed calves whereas 4.05 %(19) of the calves suffered from GIT parasitosis cases. The analysis indicated that, diarrhea was the major syndrome of gastroenteritis in calves. The syndrome of diarrhea showed a statistically significant association (p-value=0.00 *) with gastroenteritis. Sex wise evaluation of gastroenteritis and GIT parasitosis showed no statistically significant difference of the cases between male and female calves (P=0.0116). Beside this, majority of the male and female calves were diagnosed and experienced cases of gastroenteritis. Generally, the gastrointestinal health problem is the major contributing factor to calf morbidity and mortality in HARC dairy farm. Therefore, appropriate prevention and control measures of gastrointestinal health problems should be instituted and adopted against diseases of calves and enhance replacement females and economic gain from the dairy sector.

... Statistical analyses of pixel level extraction were done with the base R [83] as well as: boot [84], dlm [85], devtools [86], ggfortify [87], ggplot2 [88,89], gridExtra [90], foreign [91], knitr [92], leafletR [93], lubridate [94], PerformanceAnalytics [95], plotly [96], raster [97], rgdal [98], rmarkdown [99], sp [100], and truncreg [98]. The complete linux package dependency list along with Rmd notebooks are hosted on Github and CyVerse. ...

In this work we explore three methods for quantifying ecosystem vegetation responses spatially and temporally using Google's Earth Engine, implementing an Ecosystem Moisture Stress Index (EMSI) to monitor vegetation health in agricultural, pastoral, and natural landscapes across the entire era of spaceborne remote sensing. EMSI is the multitemporal standard (z) score of the Normalized Difference Vegetation Index (NDVI) given as I, for a pixel (x,y) at the observational period t. The EMSI is calculated as: zxyt = (Ixyt − ?xyT)/?xyT, where the index value of the observational date (Ixyt) is subtracted from the mean (?xyT) of the same date or range of days in a reference time series of length T (in years), divided by the standard deviation (?xyT), during the same day or range of dates in the reference time series. EMSI exhibits high significance (z > |2.0 ± 1.98σ|) across all geographic locations and time periods examined. Our results provide an expanded basis for detection and monitoring: (i) ecosystem phenology and health; (ii) wildfire potential or burn severity; (iii) herbivory; (iv) changes in ecosystem resilience; and (v) change and intensity of land use practices. We provide the code and analysis tools as a research object, part of the findable, accessible, interoperable, reusable (FAIR) data principles.

... As first analysis, descriptive statistics are given in Tab. 5 below. The model parameters are estimated via the maximum likelihood method (with the so-called BFGS algorithm) and the R software [30] is used for all the computations. The MLEs and the corresponding standard errors (SEs) for all the model parameters are given in Tab. ...

The purpose of this research is the segmentation of lungs computed tomography (CT) scan for the diagnosis of COVID-19 by using machine learning methods. Our dataset contains data from patients who are prone to the epidemic. It contains three types of lungs CT images (Normal, Pneumonia, and COVID-19) collected from two different sources; the first one is the Radiology Department of Nishtar Hospital Multan and Civil Hospital Bahawalpur, Pakistan, and the second one is a publicly free available medical imaging database known as Radiopaedia. For the preprocessing, a novel fuzzy c-mean automated region-growing segmentation approach is deployed to take an automated region of interest (ROIs) and acquire 52 hybrid statistical features for each ROIs. Also, 12 optimized statistical features are selected via the chi-square feature reduction technique. For the classification, five machine learning classifiers named as deep learning J4, multilayer perceptron, support vector machine, random forest, and naive Bayes are deployed to optimize the hybrid statistical features dataset. It is observed that the deep learning J4 has promising results (sensitivity and specificity: 0.987; accuracy: 98.67%) among all the deployed classifiers. As a complementary study, a statistical work is devoted to the use of a new statistical model to fit the main datasets of COVID-19 collected in Pakistan.

... Two key concepts in R are that everything that exists is an object and that everything that happens is a function call. 79 Objects are data structures that have specific properties (attributes) and methods that act on properties (eg, to print, modify, or perform calculations on the object attributes). ...

Jorge L Sepulveda

Bioinformatics pipelines are essential in the analysis of genomic and transcriptomic data generated by next-generation sequencing. Recent guidelines emphasize the need for rigorous validation and assessment of robustness, reproducibility, and quality of NGS analytic pipelines intended for clinical use. Software tools written in the R statistical language and in particular the set of tools available in the Bioconductor repository are widely used in research bioinformatics, and these frameworks offer several advantages for use in clinical bioinformatics, including the breath of available tools, modular nature of software "packages", ease of installation, enforcement of interoperability, version control, and short learning curve. This review provides an introduction to R and Bioconductor software, its advantages and limitations for clinical bioinformatics, and illustrative examples of tools that can be used in various steps of next-generation sequencing analysis.

Astrid Jourdan

R is a programming language oriented toward data analysis, data mining and statistics. This language is based on the notion of vector, which simplifies mathematical calculations and considerably reduces the use of iterative structures. RStudio is an integrated development environment (IDE) specifically created to work with R. This chapter discusses the use of R for the multivariate analysis, focusing of principal component analysis, multiple correspondence analysis, and clustering. The multiple correspondence analysis is run with MCA function and the results are displayed with explor package. Before to run clustering, the quantitative variables should be scaled and the categorical variable CLIMAT and the Error point should be removed. The result of scale function is a matrix. It is necessary to transform it in a dataframe.

Brady D. Lund

This article discusses the use of R programing language for executing a sentiment analysis of tweets pertaining to library topics. This discussion is situated within the literature of marketing and management sciences, which is employing methods of machine learning and business intelligence to make informed decision-making, and library administration, which has expressed great interest in social media engagement within its literature but has yet to adopt these types of analysis. Presented in this article is a sample code with instructions on how users may execute it within R to retrieve and analyze tweets relevant to library services. Two examples created using the code (analysis of top librarians' tweets and analysis of posts about major book publishers) are used to demonstrate the functionality of the code. The code presented in this article may be used by libraries to analyze tweets about their library and library-related topics, which, in turn, may inform management and marketing design.

Christine Ju
Joseph Bove
Steven Hochman

Introduction: In-service exam scores are used by residency programs as a marker for progress and success on board exams. Conference curriculum helps residents prepare for these exams. At our institution, due to resident feedback a change in curriculum was initiated. Our objective was to determine whether assigned Evidence-Based Medicine (EBM) articles and Rosh Review questions were non-inferior to Tintinalli textbook readings. We further hypothesized that the non-textbook assigned curriculum would lead to higher resident satisfaction, greater utilization, and a preference over the old curriculum. Methods: We collected scores from both the allopathic In-training Examination (ITE) and osteopathic Emergency Medicine Residency In-service Exam (RISE) scores taken by our program's residents from both the 2015-2016 and 2016-2017 residency years. We compared scores pre-curriculum change (pre-CC) to scores post-curriculum change (post-CC). A five-question survey was sent to the residents regarding their satisfaction, preference, and utilization of the two curricula. Results: Resident scores post-CC were shown to be non-inferior to their scores pre-CC for both exams. There was also no significant difference when we compared scores from each class post-CC to their respective class year pre-CC for both exams. Our survey showed significantly more satisfaction, utilization, and preference for this new curriculum among residents. Conclusion: We found question-based learning and Evidence-Based Medicine articles non-inferior to textbook readings. This study provides evidence to support a move away from textbook readings without sacrificing scores on examinations.

The carbon cycle includes important fluxes of methane (CH 4) and carbon dioxide (CO 2) between the ecosystem and the atmosphere. The fluxes may acquire either positive (release) or negative values (consumption). We calculated these fluxes based on short-campaign in situ chamber measurements from four ecosystems of South Vietnam: intact mountain rain forest, rice field, Melaleuca forest and mangroves (different sites with Avicennia or Rhizophora and a typhoon-disturbed gap). Soil measurements were supplemented by chamber measurements of gas fluxes from the tree stems. Measuring CH 4 and CO 2 together facilitates the assessment of the ratio between these two gases in connection with current conditions and specificity of individual ecosystems. The highest fluxes of CH 4 were recorded in the Melaleuca forest, being within the range from 356.7 to 784.2 mg CH 4-C m −2 day −1 accompanied by higher fluxes of CH 4 release from Melaleuca tree stems (8.0-262.1 mg CH 4-C m −2 day −1). Significant negative soil fluxes of CH 4 were recorded in the mountain rain forest, within the range from − 0.3 to − 0.8 mg CH 4-C m −2 day −1. Fluxes of CO 2 indicate prevailing aerobic activity in the soils of the ecosystems investigated. Quite a large variability of CO 2 fluxes was recorded in the soil of the Avicennia mangroves. The in situ measurements of different ecosystems are fundamental for follow-up measurements at different levels such as aerial and satellite gas fluxes observations.

Paul R. Rosenbaum

Simple calculations in the statistical language R illustrate the computations involved in one simple form of multivariate matching. The focus is on how matching is done, not on the many aspects of the design of an observational study. The process is made tangible by describing it in detail, step-by-step, closely inspecting intermediate results; however, essentially, three steps are illustrated: (1) creating a distance matrix, (2) adding a propensity score caliper to the distance matrix, and (3) finding an optimal match. In practice, matching involves bookkeeping and efficient use of computer memory that are best handled by dedicated software for matching. Sect. 14.10 describes currently available software.

Time series analysis is one of the most used tools to forecast based on past data. This work develops a scalable forecasting methodology that attempts to overcome the difficulties of traditional time series analysis; utilizing new computational tools and data structures that facilitate integration with business applications and reduce the learning curve needed to obtain right forecasts. The methodology consists of five phases: (1) Importing data directly from the cloud or the user's device, (2) Tidying and transforming, (3) Visualization, (4) Automatically model and validate the results, and (5) Communicate the obtained forecasts with an automated report. The methodology was used in an applied case considering ten time series from real retail sales indexes in Colombia, showing appreciable improvements with an average decrease on the Mean Absolute Percentage Error (MAPE) of 50.56%. Link: http://ojs.uac.edu.co/index.php/prospectiva/article/view/2243/2261

A. I. Horev
S. V. Bukharin
V. N. Ponomareva

Among the existing characteristics of a financial condition of the enterprise a specific place is held by a three-component financial situation indicator (degree of a covering of stocks by own sources) which considers some arti-cles of balance which are seldom used at usual assessment of a financial state. Therefore for establishment of im-portance of this indicator it is offered to pass from a discrete three-component indicator to a continuous indicator – coefficient of sufficiency of a stocks covering with own sources of means and to carry out the correlation analysis of communication of the last with results of the scoring analysis of the same enterprises. To consider influence of a three-component indicator on the general results of assessment of a financial state, the technique of accounting of its influence on the example of indicators of structure of the capital is developed, i.e. the set of the considered finan-cial coefficients due to introduction of additional sign – the mentioned coefficient of sufficiency of a stocks cover-ing is expanded own sources. Comparison of estimates of the generalized indicator of structure of the capital before expansion at the expense of sufficiency coefficient in an indistinct set form allows define the recommended scopes of the offered approach.

R has always provided an application programming interface (API) for extensions. Based on the C language, it uses a number of macros and other low-level constructs to exchange data structures between the R process and any dynamically-loaded component modules authors added to it. With the introduction of the Rcpp package, and its later refinements, this process has become considerably easier yet also more robust. By now, Rcpp has become the most popular extension mechanism for R . This article introduces Rcpp , and illustrates with several examples how the Rcpp Attributes mechanism in particular eases the transition of objects between R and C++ code.

Badri Toppur
C. Swaminathan

Brakes India is a manufacturer of braking systems in India, with the largest market share. Axle Tech America (ATA) a US based manufacturer, is a supplier of quality, cost effective transaxles, for customers, who manufacture utility vehicles, tractors, lawn mowers, and golf carts. Brakes India supplied machined castings to M/S ATA. The Indian supplier provided six parts, monthly, over a period of four years. The study was aimed at developing a production schedule, that was optimal in normal times, and adapted to capacity contraction due to disruptions such as caused by the pandemic, and consequent lockdown of facilities.

Since the first detected cases of COVID-19 in Brazil, researchers have made a great effort to try to understand the disease. Understanding the impact of the disease on people can be instrumental in identifying which groups can be considered at risk. Therefore, this study researches a probabilistic model based on a statistical model of non-linear regression analyzing the following variables: age, if you are a health professional, if you are resident in the Metropolitan Region of Belém (RMB), State of Pará and gender with the objective of identifying those people who have a greater impact on the number of people infected and killed by COVID-19, that is, people who are more likely to die. To carry out the research, we used the data of all infected people by COVID-19 in the State of Pará until July 2020. It can be verified according to the proposal of the probabilistic model that elderly people, with a odds ratio of 1.69 (95% CI 1.52-1.88), residents of Metropolitan Region of Belém, with an odds ratio of 2.14 (95% CI 2.02 - 2.27) and men, with an odds ratio of 1.83 (95% CI 1.73 - 1.95) are groups of people with a higher risk of dying from diseases, while health professionals, with a 0.36 chance ratio (CI9 5% 0.29 - 0.45), are less likely to die.

Jan-Philipp Kolb

Synthetic universes are used to reproduce real phenomena in simulations. Requirements and methods to generate synthetic universes are presented in this work. Three examples show the possible application of synthetic universes.

Ernane Martins

A presente obra intitulada "Ensino, Pesquisa e Desenvolvimento na Engenharia Eletrônica e Computação" apresenta 15 capítulos, que abordam assuntos importantes sobre o panorama atual da Engenharia Eletrônica e Computação no Brasil, tais como: Algoritmo Genético, Cidades Inteligentes, Análise de Softwares; Desenvolvimento de Aplicativos para Dispositivos Móveis; Desenvolvimento de Jogos; Software de Supervisão Remota; Escalonamento de Processos; Inspeção de código; Processamento Digital de Imagens; Shadow IT; Sistema preditivo de ocorrência de falta em redes elétricas; Recursos Computacionais e Pensamento Computacional.

Esse artigo apresenta o relato de experiência sobre um projeto da Robótica Educacional, utilizando o Pensamento Computacional para desencadear o processo de aprendizagem da programação de estudantes de uma escola de Santo Ângelo, RS. O estudo iniciou em fevereiro com previsão de término para novembro de 2019, envolvendo 289 estudantes das turmas do 1º ao 5º ano do Ensino Fundamental. A metodologia do estudo usada foi a pesquisa bibliográfica e estudo de caso, sendo desenvolvido no laboratório de tecnologias da escola, com aulas semanais e tarefas para serem realizadas em casa. Para o desenvolvimento do projeto elaborou-se um planejamento por meio de plataformas de aprendizagem, como o Code.org e o Codespark, que trabalham o desenvolvimento de competências e habilidades relacionadas à tecnologia e a computação, e também utilizou-se atividades desplugadas, que permitem ao estudante experimentar, analisar, criar soluções e aprender através dos erros e acertos. Investigou-se como o desenvolvimento do Pensamento Computacional contribui o ensino da Robótica Educacional por meio de uma distinta forma de abordar os conceitos básicos da Ciência da Computação para resolver problemas, desenvolver sistemas e também para entender o comportamento humano. Por meio desta experiência percebe-se que a introdução do Pensamento Computacional contribui para a inserção da robótica educacional nas séries iniciais da escola em estudo. O projeto encontra-se em desenvolvimento e já é possível observar que a utilização dessas plataformas incentiva o pensamento criativo, o raciocínio sistemático e o trabalho colaborativo, competências essenciais para o educando desenvolver-se de forma crítica e cognitiva.

Philippe Aubry

The Hanurav-Vijayan procedure belongs to the class of exact, draw-by-draw algorithms for unequal probability sampling without replacement. This procedure has been implemented in market-leading commercial statistical packages such as SAS/STAT and SPSS, which have popularized its use. Unfortunately, the description of the Hanurav-Vijayan procedure in the documentation of these softwares is partially flawed, propagating errors issued from the statistical literature. Besides, despite it is often used, this procedure has almost fallen in oblivion in the literature devoted to sampling probability designs, and no comparisons were made with other competing methods. It is the purpose of the present paper to rehabilitate the Hanurav-Vijayan procedure by correctly describing its implementation details and providing some basic elements of comparison with the well-known Rao-Sampford procedure.

Teng Ma
Caiqing Yao
Xin Shen
Zhihong Sun

Aging is associated with gut microbiota alterations, characterized by changes in intestinal microbial diversity and composition. However, no study has yet focused on investigating age-related changes in the low-abundant but potentially beneficial subpopulations of gut lactic acid bacteria (LAB) and Bifidobacterium. Our study found that the subjects' age correlated negatively with the alpha diversity of the gut bifidobacterial microbiota, and such correlation was not observed in the gut LAB subpopulation. Principal coordinate analysis (PCoA) and analysis of distribution of operational taxonomic units (OTUs) revealed that the structure and composition of the gut bifidobacterial subpopulation of the longevous elderly group were rather different from that of the other three age groups. The same analyses were applied to identify age-dependent characteristics of the gut LAB subpopulation, and the results revealed that the gut LAB subpopulation of young adults was significantly different from that of all three elderly groups. Our study identified several potentially beneficial bacteria (e.g., Bifidobacterium breve and Bifidobacterium longum) that were enriched in the longevous elderly group (P < 0.05), and the relative abundance of Bifidobacterium adolescentis decreased significantly with the increase in age (P < 0.05). Although both bifidobacteria and LAB are generally considered as health-promoting taxa, their age-dependent distribution varied from each other, suggesting their different life stage changes and potentially different functional roles. This study provided novel species-level gut bifidobacterial and LAB microbiota profiles of a large cohort of subjects and identified several age-or longevity-associated features and biomarkers. Key points • The alpha diversity of the gut bifidobacterial microbiota decreased with age, while LAB did not change. • The structure and composition of the gut bifidobacterial subpopulation of the longevous elderly group were rather different from that of the other three age groups. • Several potentially beneficial bacteria (e.g., Bifidobacterium breve and Bifidobacterium longum) that were enriched in the longevous elderly group. Graphical abstract

The comfort of pedestrians on a footbridge is a rather complex problem that has been discussed for a long time. Vibration acceleration is generally accepted as the main controlling factor that influences pedestrian comfort. However, no consensus has been reached regarding which acceleration-based parameters should be used as an index for assessing comfort. Only simple comfort limits, rather than specific relationships between comfort and the corresponding indices, are currently available for assessing the vibration serviceability of a pedestrian bridge. In this study, the vibration acceleration of 21 pedestrian bridges in Beijing, China, were recorded under different service conditions, and questionnaire surveys on pedestrian comfort were conducted. The acquired testing and survey results were utilized to analyze the correlation between pedestrian comfort and bridge vibration. A procedure for assessing the vibration serviceability of pedestrian bridges considering pedestrian comfort is proposed based on the relation between pedestrian comfort and the maximum footbridge acceleration.

ResearchGate has not been able to resolve any references for this publication.

Source: https://www.researchgate.net/publication/235961497_Software_for_Data_Analysis_Programming_with_R

Posted by: ferdinandgabrielson.blogspot.com

FerdinandGabrielson

Widget HTML Atas

Software For Data Analysis Programming With R Pdf Free Download