.. moduleauthor:: Francois Serra
.. currentmodule:: ecolopy

Tutorial
********

.. contents::

Just after counting species abundances in an ecosystem
======================================================


Load Community
--------------

Community is a class, and derived objects represent simply a distribution of species abundance , with associated function in order to calculate descriptive statistics or to fit it to evolutionary models.

The data needed to create this object consists on a list of values corresponding to the abundances of each species. We can either give to Community a python list of values:
::

  Community ([1, 4, 4, 12, 54, 68, 32, 15])

or the path to a file containing those values:
::

  Community ('whatever_path/mydata.txt')

mydata.txt would contain the same list of values, one per row:

::

  1
  4
  4
  12
  54
  68
  32
  15

In the next step we are going to create an object :func:`ecolopy_dev.community.Community` that will represent the distribution of abundances of the well known/studied BCI dataset.

We are going to load this object under the name 'com':

::

  from ecolopy import Community
  com = Community ('bci_full.txt')

Quick descriptive statistics
----------------------------

In order to see quickly how does this community looks like, we can use the print command:

::

  print com

output:

.. code-block:: text

  Community (object)
      Number of individuals (J) : 21457
      Number of species (S)     : 225
      Shannon entropy (shannon) : 4.2704
      Metacommunity size (j_tot): 64371
      Models computed           : 
      Model loaded              : None
 
With:

* Number of individuals correspond to the total of the given community
* Number of species should correspond to the number of element in your input list, or to the number of line in your input file
* Shannon entropy [Shannon1948]_ is computed according to:

.. math::
  :nowrap:

  \begin{eqnarray}
    H(X) = \sum_{0\le i\le n} p(x_i) * log (p(x_i))
  \end{eqnarray}

X being the number of individuals for each species and n the number of species.

* Metacommunity size: correspond to 3 times the community size if not defined by user, we could have write to fix it at a given value instead of default:

::

  com = Community ('bci_full.txt', j_tot=10000000)

* Models computed: :func:`ecolopy_dev.community.Community` can be associated to an Ecological model, the user
  need first to compute them.
* Current Model: once computed, we can associate our community to a given model


RSA curve
^^^^^^^^^

An other way to see quickly how our community looks-like is to print relative species abundance curve (RSA) with :func:`ecolopy_dev.community.Community.rsa_ascii`:

::

  print com.rsa_ascii(pch='o', width=90, height=40)
  """
  (%) Relative
  Communitys

  8.0021 +o
         | 
         | oo
         |   o
  2.6181 +    ooo
         |       o
         |        ooo
  1.0319 +           oo
         |             oo
         |               o
         |                o
         |                 oooo
  0.4067 +                     ooo
         |                        oooo
         |                            oooo
         |                                oo
         |                                  ooo
  0.1603 +                                     oooo
         |                                         ooooo
         |                                              ooooo
         |                                                   o
         |                                                    oooo
  0.0632 +                                                        ooooo
         |                                                             ooo
         |                                                                oo
         |                                                                  oo
         |                                                                    
  0.0249 +                                                                    oooo
         |                                                                        oo
         |                                                                          ooo
         |                                                                             ooooo
         |                                                                                  oooooooo
   1/inf +----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
         0    12   25   37   50   62   75   87   100  112  125  137  150  162  175  187  200  212  222
  
                                                Species rank                                       
  """

Also, if pylab is installed nicer plot can be generated, through the :func:`ecolopy_dev.community.Community.draw_rsa`:

::

   com.draw_rsa()


.. _rsa-fig:

.. figure:: ../ex_figures/rsa.png
  :scale: 80 %


Fit to ecological model
=======================

Once our distribution of abundances loaded into an Community object, EcoloPy proposes a set of ecological models that we can try to fit to our data.

Ewens model
-----------

As described by Hubbell [Hubbell2001]_ 'Ewens model' uses Ewens sampling formula [Ewens1972]_ in order to explain the distribution and abundances of species.

This model assumes that:

.. math::
  :nowrap:

  \begin{eqnarray}
    I  = \frac{m}{(1 - m) * (J - 1)}
  \end{eqnarray}

we just have to type:

::

  com.fit_model('ewens')

this step is usually very fast. When finished just type, to load this model as our current model:

::

  com.set_current_model('ewens')
  print com

output:

.. code-block:: text

  Community (object)
      Number of individuals (J) : 21457
      Number of species (S)     : 225
      Shannon's index (shannon) : 4.2704
      Metacommunity size (j_tot): 64371
      Models computed           : ewens
      Model loaded              : EwensModel
          I                       : 17.4945806282
          theta                   : 34.9622847952
          m                       : 0.000814705802191
          lnL                     : 318.84864864926203

Etienne model
-------------

Now we can run an other model like the one proposed by Etienne [Etienne2005]_:

::

  com.fit_model('etienne')
  com.set_current_model('etienne')
  print com

output:

.. code-block:: text

  Getting K(D,A) according to Etienne 2005 formula:
    Getting some stirling numbers...
      1000 of 1717, size: 3145976
    Computing K(D,A) at species 108 out of 108

  Community (object)
      Number of individuals (J) : 21457
      Number of species (S)     : 225
      Shannon's index (shannon) : 4.2704
      Metacommunity size (j_tot): 64371
      Models computed           : ewens, etienne
      Model loaded              : EtienneModel
          I                       : 2211.10111912
          theta                   : 47.6743015824
          m                       : 0.0934250928321
          lnL                     : 308.72540670819615


Best optimization strategy for Etienne model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`ecolopy_dev.community.Community.fit_model` function, when called for Etienne model, allows to define the optimization strategy to use all from Scipy [Jones2001]_ (see scipy.optimize documentation).
The fmin optimization strategy is the one used by default, it is fast, but do not allow to set bounds for 
the values of parameters to optimize, sometimes ending with values of *theta* almost infinite. 

To circumvent this problem, other optimization strategies are available, it is recommended to use several of them.
A simple way to find the best optimization would be:

::

  tmp = {}
  for met in ['fmin', 'slsqp', 'l_bfgs_b', 'tnc']:
      print 'Optimizing with %s...' % met
      try:
          com.fit_model(name='etienne', method=met, verbose=False)
          model = com.get_model('etienne')
          tmp[met] ={}
	  tmp[met]['model'] = model
          tmp[met]['theta'] = model.theta
          tmp[met]['I']     = model.I
          tmp[met]['m']     = model.m
          tmp[met]['lnL']   = model.lnL
      except Exception as e:
          print '    optimization failed: ' + e.args[0]
  
  # in case optimization by fmin failed to found correct values for theta and m:
  if not (1 <= tmp['fmin']['theta'] < com.S and \
          1e-50 <= tmp['fmin']['m'] < 1-1e-50):
      del (tmp['fmin'])
  
  # find the model with the higher likelihood:
  met = min(tmp, key=lambda x: tmp[x]['lnL'])
  
  # load it as 'etienne' model
  com.set_model(tmp[met]['model'])

*Also it is a good thing to try different starting values for optimization, by default* :func:`ecolopy_dev.community.Community.fit_model` *will use the number of species for theta and 0.5 for m (for the full doc see* :func:`ecolopy_dev.models.etienne_model.EtienneModel.optimize` *, the funticon called internally by fit_model)*

Generate contour image of likelihood
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
.. _contour-fig:

.. figure:: ../ex_figures/contour_lnl.png
  :scale: 55 %

  Contour plots representing log likelihood values of BCI dataset fitting Etienne  model with different values of *theta* and *m*. Global optima computed by the optimization is found here at the intersection of dashed lines, while local optima are marked by white crosses (7 of them are found, including global optima).

In the case of Etienne model EcoloPy allow user to draw contour frame of likelihood :num:`Figure #contour-fig` (:func:`ecolopy_dev.utils.draw_contour_likelihood`).


::

  from ecolopy.utils import draw_contour_likelihood
  # this can take 2 or 3 hours to compute
  draw_contour_likelihood(com, theta_range=[20,100], m_range=[0.05, 0.8], num_dots=100)


Log-normal model
----------------

In EcoloPy is also implemented log normal model:

::

  com.fit_model('lognormal')
  com.set_current_model ('lognormal')

  print com

output: 

.. code-block:: text

  Community (object)
      Number of individuals (J) : 4219
      Number of species (S)     : 162
      Shannon's index (shannon) : 4.5938
      Metacommunity size (j_tot): 12657
      Models computed           : lognormal, ewens, etienne
      Model loaded              : LognormalModel
          mu                      : 3.14269458985
          sd                      : 1.78719175872
          lnL                     : None
        
*Note: As likelihood value of log-normal model is not comparable to the one of Etienne or Ewens models, it is not calculated.*


Comparing Models
================

Browsing parameters
-------------------

Now we have fit our community to some models, a summary of the corresponding parameters are available through the print function, but each of them can also be reach like this:

::

  com.set_current_model('ewens')
  com.theta
  com.m
  com.I
  com.lnL

output: 

.. code-block:: text

  34.962254203932339
  0.00081470508933989701
  17.494565308269266
  318.84864864917472

or you can get any attribute through the model linked to our community:

::

  model = com.get_model('ewens')
  model.lnL

output: 

.. code-block:: text

  318.84864864917472

*All these values correspond to the current model (in this case ewens).*


Searching for best model
------------------------

We have run now several models, within which the nested models ewens and etienne, for those we can run a likelihood ratio test using the lrt function.

This function will compute a chi square test for 2 times the difference in likelihoods, with one degree of freedom (corresponding to the estimation of parameter *m*):

::

  print com.lrt ('ewens', 'etienne')

output:

.. code-block:: text

  6.80784682569e-06


Generate random distribution
============================

In order to compare our distribution of abundances to the expected one according to a specific model, we can generate random neutral distributions.
By default EcoloPy will use the parameters of the current model but this can be change passing to the function the name of the wanted model:

::

  com.generate_random_neutral_distribution(model='etienne')

output:

.. code-block:: text

  [17.0, 867.0, 397.0, 184.0, 71.0, 
  ...
  ...
  2.0, 2.0, 1.0, 1.0, 1.0, 1.0]

Testing for neutrality
======================

Getting significance of deviation from neutrality
-------------------------------------------------

In order to test if our distribution of abundances is neutral, Ecolopy implements a test based on comparing the distribution of values of Shannon entropies compared to the observed one (:func:`ecolopy_dev.community.Community.test_neutrality`).

::

  # call test_neutrality function with the full option, making it to return the list
  # of simulated entropies (WARNING: this number of generations is large... 
  # it will take a while)
  pval, neut_h = com.test_neutrality(model='etienne', gens=10000, full=True)
  print pval

output:

.. code-block:: text

  0.1492

According to this p-value, we would conclude that our dataset do not deviates significantly from neutrality.

Variations:
^^^^^^^^^^^
* **more restrictive** test can be done by fixing the number of species (S) to the observed number. For our example S-fixed would fix it to 225, and the result of such test would be:

::

  print com.test_neutrality(model='etienne', gens=10000, fix_s=True)

output:

.. code-block:: text

  0.0977

* another test can also be conducted through the same function, based on the **comparison of likelihoods** instead of Shannon's index [Etienne2007]_

::

  print com.test_neutrality(model='etienne', gens=1000,method='loglike', verbose=True)

output (this time, we set the verbose option, in order to follow the progression of the generation of random neutral community):

.. code-block:: text

  Generating random neutral abundances 1000 out of 1000
  0.353


Plotting distribution of simulations:
-------------------------------------

We can also draw the distribution of simulated entropies and visually compare them to the observed value :num:`Figure #shannon-hist` (:func:`ecolopy_dev.utils.draw_shannon_distrib`).

::

  # using previously generated neut_h
  draw_shannon_distrib(neut_h, com.shannon)


.. _shannon-hist:

.. figure:: ../ex_figures/shannon_hist.png

  Histogram of distribution of shannon entropy values after 10.000 simulation under neutral model Etienne. 
  Red vertical bar represents the observed value of entropy for our dataset, and the orange shade, 
  the corresponding deviation inferred from simulation.


Saving/Loading Community object
===============================

Once done EcoloPy allow user to save Community object and EcologicalModels object into cPikle with :func:`ecolopy_dev.community.Community.dump_community` and :func:`ecolopy_dev.community.Community.load_community` functions.

::

  # save it
  com.dump_community('stored_bci.pik')
  # (re)load it
  com = Community('stored_bci.pik')

References
==========


.. [Etienne2005] Rampal S Etienne, A new sampling formula for neutral biodiversity. Ecology Letters 8(3) (2005), 253-260.

.. [Etienne2007] Rampal S Etienne, A neutral sampling formula for multiple samples and an ’exact’ test of neutrality. Ecology letters 10(7) (2007), 608-18.

.. [Ewens1972] Waren J Ewens, The sampling theory of selectively neutral alleles. Theoretical population biology 3(1) (1972), 87-112.

.. [Hubbell2001] Stephen P Hubbell, The unified Neutral Theory of Biodiversity and Biogeography. Princeton University Press, 2001.

.. [Jones2001] Eric Jones, Travis Oliphant, Pearu Peterson, and Others, Scipy: Open source scientific tools for Python, 2001.

.. [Shannon1948] C E Shannon, A mathematical theory of communication, vol. 5. The Bell System Technical Journal, 1948.