What is EcoloPy?
EcoloPy directly draws inspiration from the UNTB package in R [Hankin2007], to infer neutrality of an ecosystem distribution of species. It is able to fit sampling data of a community into a neutral model and implements functions to interact and represent the original data and the results of statistical tests.
Ewens and Etienne neutral models are implemented.
Why an other package?
Several packages or programs were already developed in order to deal with species abundances data, implementing statistical functions in order to fit data in ecological models and even able to test for neutrality [JabotChave2011] [Etienne2007] [Hankin2007]. However none of those programs were able to deal with genomic data, with abundances reaching the million of individuals. The main computational bottleneck, or in this case cul-de-sac, being the calculation of Etienne’s sampling formula [Etienne2005] where computation of K(D, A) uses stirling numbers.
- In order to adapt the algorithm already implemented, to genomic dataset we developed the Ecolopy package, that, as a main point, uses the GMP [Granlund2000] and MPFR [Fousse2007] libraries through GMPY biding [Martelli2007]. Other improvements specific to Ecolopy, and needed for dealing with genomic dataset where implemented, mainly:
- usage of recurrence function of stirling numbers, and building of table of stirling numbers (already used by [JabotChave2011])
- table of stirling numbers is reduced on the fly keeping only numbers needed, in order to save memory.
- model optimization using different optimization strategies from Scipy [Jones2001].
In top of those necessary technical improvements, EcoloPy presents one main advantage as it is entirely written in Python [vanRossumdeBoer1991], a programming language that offers a strong support for integration with other languages and tools, and whose popularity is raising among the bioinformatics community [Bassi2007]. Ecolopy is still a fully ripened package as the number of functions and ecological models is still low. But the package was designed to provide a scalable program architecture.
- The ecological models currently implemented are:
-