aurora.git
7 months agoUpdating .auroraversion to release version 0.16.0. rel/0.16.0
Joshua Cohen [Tue, 27 Sep 2016 19:36:40 +0000 (14:36 -0500)] 
Updating .auroraversion to release version 0.16.0.

8 months agoUpdating .auroraversion to 0.16.0-rc2. rel/0.16.0-rc2
Joshua Cohen [Thu, 22 Sep 2016 19:10:18 +0000 (14:10 -0500)] 
Updating .auroraversion to 0.16.0-rc2.

8 months agoIncrementing snapshot version to 0.17.0-SNAPSHOT.
Joshua Cohen [Thu, 22 Sep 2016 19:10:18 +0000 (14:10 -0500)] 
Incrementing snapshot version to 0.17.0-SNAPSHOT.

8 months agoUpdating CHANGELOG for 0.16.0 release.
Joshua Cohen [Thu, 22 Sep 2016 19:10:18 +0000 (14:10 -0500)] 
Updating CHANGELOG for 0.16.0 release.

8 months agoPrepare release notes for release.
Joshua Cohen [Thu, 22 Sep 2016 19:08:36 +0000 (14:08 -0500)] 
Prepare release notes for release.

Also update committers' guide with details on this step and on reverting auto-generated commits for
subsequent release candidates.

Reviewed at https://reviews.apache.org/r/52167/

8 months agoRevert "Updating CHANGELOG for 0.16.0 release."
Joshua Cohen [Thu, 22 Sep 2016 18:07:42 +0000 (13:07 -0500)] 
Revert "Updating CHANGELOG for 0.16.0 release."

This reverts commit f806da7f17e2f5da15c65d948cdf4159458416c3.

8 months agoRevert "Incrementing snapshot version to 0.17.0-SNAPSHOT."
Joshua Cohen [Thu, 22 Sep 2016 18:07:33 +0000 (13:07 -0500)] 
Revert "Incrementing snapshot version to 0.17.0-SNAPSHOT."

This reverts commit 843289478c1ac6f68873317d0e33fb39e3550569.

8 months agoShutting down scheduler on unhandled BatchWorker error.
Maxim Khutornenko [Thu, 22 Sep 2016 17:54:14 +0000 (10:54 -0700)] 
Shutting down scheduler on unhandled BatchWorker error.

Bugs closed: AURORA-1779

Reviewed at https://reviews.apache.org/r/52141/

8 months agoswitching from launchTask to acceptOffers
Dmitriy Shirchenko [Wed, 21 Sep 2016 19:42:00 +0000 (12:42 -0700)] 
switching from launchTask to acceptOffers

Bugs closed: AURORA-1776

Reviewed at https://reviews.apache.org/r/52074/

8 months agoIncrementing snapshot version to 0.17.0-SNAPSHOT.
Joshua Cohen [Tue, 20 Sep 2016 19:59:36 +0000 (14:59 -0500)] 
Incrementing snapshot version to 0.17.0-SNAPSHOT.

8 months agoUpdating CHANGELOG for 0.16.0 release.
Joshua Cohen [Tue, 20 Sep 2016 19:59:36 +0000 (14:59 -0500)] 
Updating CHANGELOG for 0.16.0 release.

8 months agoRevert "Incrementing snapshot version to 0.17.0-SNAPSHOT."
Joshua Cohen [Tue, 20 Sep 2016 19:55:54 +0000 (14:55 -0500)] 
Revert "Incrementing snapshot version to 0.17.0-SNAPSHOT."

This reverts commit 4c4040fd259ef562de9f94b2b947b7fc3db4945a.

8 months agoRevert "Updating CHANGELOG for 0.16.0 release."
Joshua Cohen [Tue, 20 Sep 2016 19:54:56 +0000 (14:54 -0500)] 
Revert "Updating CHANGELOG for 0.16.0 release."

This reverts commit a0e9f7ee39c93e9e555325f64dd4b285574a019e.

8 months agoClean up some license issues.
Joshua Cohen [Tue, 20 Sep 2016 19:54:34 +0000 (14:54 -0500)] 
Clean up some license issues.

Reviewed at https://reviews.apache.org/r/52093/

8 months agoFix host maintenance commands to properly initialize the api client.
Joshua Cohen [Tue, 20 Sep 2016 19:26:49 +0000 (14:26 -0500)] 
Fix host maintenance commands to properly initialize the api client.

Bugs closed: AURORA-1777

Reviewed at https://reviews.apache.org/r/52087/

8 months agoRefactor of Webhook and no longer posting entire task state via webhook on scheduler...
Dmitriy Shirchenko [Tue, 20 Sep 2016 16:57:24 +0000 (18:57 +0200)] 
Refactor of Webhook and no longer posting entire task state via webhook on scheduler restart

This is a refactor with addition of HttpClient injected into Webhook class as opposed to previous usage of lower level HtttpURLConnection objects. Additionally due to peformance issues, it is unncessary to POST entire task state to webhook endpoint on every scheduler restart so that is removed in this patch.

Bugs closed: AURORA-1772

Reviewed at https://reviews.apache.org/r/51980/

8 months agoIncrementing snapshot version to 0.17.0-SNAPSHOT.
Joshua Cohen [Mon, 19 Sep 2016 19:26:47 +0000 (14:26 -0500)] 
Incrementing snapshot version to 0.17.0-SNAPSHOT.

8 months agoUpdating CHANGELOG for 0.16.0 release.
Joshua Cohen [Mon, 19 Sep 2016 19:26:47 +0000 (14:26 -0500)] 
Updating CHANGELOG for 0.16.0 release.

8 months agoClarifying documentation for new contributors by adding a step for them to ask for...
Dmitriy Shirchenko [Mon, 19 Sep 2016 19:13:25 +0000 (14:13 -0500)] 
Clarifying documentation for new contributors by adding a step for them to ask for their JIRA id to
get whitelisted.

Due to spam bots, JIRA accounts need to be whitelisted before they can self-assign tasks.

Reviewed at https://reviews.apache.org/r/52049/

8 months agoAdd a release note about full task filesystem isolation.
Joshua Cohen [Mon, 19 Sep 2016 19:09:47 +0000 (14:09 -0500)] 
Add a release note about full task filesystem isolation.

Reviewed at https://reviews.apache.org/r/52051/

8 months agoAdding gpg key for jcohen@apache.org
Joshua Cohen [Mon, 19 Sep 2016 18:57:31 +0000 (13:57 -0500)] 
Adding gpg key for jcohen@apache.org

Reviewed at https://reviews.apache.org/r/52046/

8 months agoEnable the `project_info` plugin.
John Sirois [Sun, 18 Sep 2016 19:08:15 +0000 (13:08 -0600)] 
Enable the `project_info` plugin.

This brings in the `dependencies` goal which we use in
`build-support/python/make-pycharm-virtualenv`.

Bugs closed: AURORA-1775

Reviewed at https://reviews.apache.org/r/51987/

8 months agoChange framework_name default value from 'TwitterScheduler' to 'Aurora'
Santhosh Kumar Shanmugham [Fri, 16 Sep 2016 21:54:46 +0000 (14:54 -0700)] 
Change framework_name default value from 'TwitterScheduler' to 'Aurora'

Bugs closed: AURORA-1688

Reviewed at https://reviews.apache.org/r/51874/

8 months agoBatching writes - Part 3 (of 3): Converting TaskScheduler to use BatchWorker.
Maxim Khutornenko [Fri, 16 Sep 2016 21:17:48 +0000 (14:17 -0700)] 
Batching writes - Part 3 (of 3): Converting TaskScheduler to use BatchWorker.

Reviewed at https://reviews.apache.org/r/51765/

8 months agoBatching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.
Maxim Khutornenko [Fri, 16 Sep 2016 21:17:26 +0000 (14:17 -0700)] 
Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

Reviewed at https://reviews.apache.org/r/51763/

8 months agoBatching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.
Maxim Khutornenko [Fri, 16 Sep 2016 21:17:04 +0000 (14:17 -0700)] 
Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.

Reviewed at https://reviews.apache.org/r/51759/

8 months agoFix for AURORA-1739, enables golang thrift bindings to create jobs
Renan DelValle [Fri, 16 Sep 2016 20:55:04 +0000 (15:55 -0500)] 
Fix for AURORA-1739, enables golang thrift bindings to create jobs

Change in the thrift API to make thee cronSchedule string in JobConfiguration an optional.

Reviewed at https://reviews.apache.org/r/51973/

8 months agoEnsure shell health checkers running for tasks running under an isolated fileystem...
Joshua Cohen [Thu, 15 Sep 2016 18:48:05 +0000 (13:48 -0500)] 
Ensure shell health checkers running for tasks running under an isolated fileystem are run within
that filesystem.

Reviewed at https://reviews.apache.org/r/51899/

8 months agoRemove --release-threshold option from aurora job restart.
Joshua Cohen [Thu, 15 Sep 2016 18:30:23 +0000 (13:30 -0500)] 
Remove --release-threshold option from aurora job restart.

Bugs closed: AURORA-1681

Reviewed at https://reviews.apache.org/r/51924/

8 months agoIntroduce UpdateMetadata fields in JobUpdateRequest.
Santhosh Kumar Shanmugham [Tue, 13 Sep 2016 23:55:20 +0000 (16:55 -0700)] 
Introduce UpdateMetadata fields in JobUpdateRequest.

Bugs closed: AURORA-1711

Reviewed at https://reviews.apache.org/r/51384/

8 months agoAurora admin commands for reconciliation
Karthik Anantha Padmanabhan [Tue, 13 Sep 2016 20:02:59 +0000 (13:02 -0700)] 
Aurora admin commands for reconciliation

* A new command for task reconciliation `reconcile_tasks` was added to the
  aurora_admin CLI. It takes type of reconciliation and the batch size(for
  explicit reconciliation) as options.
* As part of this change two thirft APIs were also added -
  `triggerImplicitTaskReconciliation` and `triggerExplicitTaskReconciliation`.

Testing Done:
* Manually tested on my local vagrant installation.
* ./build-support/jenkins/build.sh

Bugs closed: AURORA-1602

Reviewed at https://reviews.apache.org/r/51662/

8 months agoExtend getJobUpdateDetails to accept JobUpdateQuery
Zameer Manji [Tue, 13 Sep 2016 19:45:17 +0000 (12:45 -0700)] 
Extend getJobUpdateDetails to accept JobUpdateQuery

This extends getJobUpdateDetails to return a list of details instead of being
scoped to a single update.

Bugs closed: AURORA-1764

Reviewed at https://reviews.apache.org/r/51712/

8 months agoImplement `toString` on lazy modules.
Zameer Manji [Mon, 12 Sep 2016 23:30:17 +0000 (16:30 -0700)] 
Implement `toString` on lazy modules.

This will change the help output from:
`-shiro_realm_modules (default [org.apache.aurora.scheduler.app.MoreModules$1@158a8276])`
to
`-shiro_realm_modules (default [class org.apache.aurora.scheduler.http.api.security.IniShiroRealmModule])`

Bugs closed: AURORA-1770

Reviewed at https://reviews.apache.org/r/51826/

8 months agoIntroduce a flag to treat RAM as a revocable resources
Stephan Erb [Mon, 12 Sep 2016 22:09:29 +0000 (00:09 +0200)] 
Introduce a flag to treat RAM as a revocable resources

We plan to open source a very simple Mesos ResourceEstimator and QosController that supports RAM and CPU oversubscription (ETA ~2 weeks). We have been using it internally with a patched Aurora version where the hardcoded `isMesosRevocable` flag of RAM has been set to `true`. This patch makes this behaviour configurable.

Reviewed at https://reviews.apache.org/r/51807/

8 months agoDocument how to generate a changelog
Stephan Erb [Mon, 12 Sep 2016 10:57:10 +0000 (12:57 +0200)] 
Document how to generate a changelog

Let's pass on that druid knowledge from one release manager to the next.

Reviewed at https://reviews.apache.org/r/51758/

8 months agoDocument the Mesos containerizer
Stephan Erb [Fri, 9 Sep 2016 08:06:49 +0000 (10:06 +0200)] 
Document the Mesos containerizer

Included changes:

* consistent example jobs for both containerizers
* short enduser and operator  documentation
* shuffled the reference documentation so that it is clear certain limitations apply only to the Docker containerizer

Bugs closed: AURORA-1640

Reviewed at https://reviews.apache.org/r/51664/

8 months agoUpdate e2e tests to verify running a docker image via the unified containerizer.
Joshua Cohen [Thu, 8 Sep 2016 22:21:40 +0000 (17:21 -0500)] 
Update e2e tests to verify running a docker image via the unified containerizer.

Reviewed at https://reviews.apache.org/r/51746/

8 months agoModify the watch_secs assertion on scheduler
Kai Huang [Thu, 8 Sep 2016 00:06:39 +0000 (17:06 -0700)] 
Modify the watch_secs assertion on scheduler

This feature intends to improve reliability and performance of the Aurora
scheduler job updater by relying on health check status rather than watch_secs
timeout when deciding an individual instance update state.

See this epic: https://issues.apache.org/jira/browse/AURORA-894
and the design doc:
https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
for more details and background.

Testing Done:
./gradlew build

./gradlew :test --tests "org.apache.aurora.scheduler.updater.JobUpdaterIT"

./build-support/jenkins/build.sh

Bugs closed: AURORA-894

Reviewed at https://reviews.apache.org/r/51536/

8 months agoRemove HttpServletRequestParams.
Zameer Manji [Tue, 6 Sep 2016 22:25:20 +0000 (15:25 -0700)] 
Remove HttpServletRequestParams.

`HttpServletRequestParams` is dead code can be removed safely.

Reviewed at https://reviews.apache.org/r/51667/

8 months agoAdd MEDIAN_TIME_TO_STARTING as a new metric.
Kai Huang [Tue, 6 Sep 2016 19:26:13 +0000 (12:26 -0700)] 
Add MEDIAN_TIME_TO_STARTING as a new metric.

A new MTTS (Median Time To Starting) metric is added to the sla module in
addition to MTTA and MTTR.

This review request is related to my previous review request:
https://reviews.apache.org/r/51536

In the new implementation, the executor starts health check at STARTING, if a
successful health check is performed before initial_interval_sec expires, it
transitions into RUNNING state. Therefore, MTTS gives us an idea of how long it
takes for a task to become active, whereas the difference between MTTR and MTTS
represents the warm-up period for a task.

See the following issues for more backgrounds:

https://issues.apache.org/jira/browse/AURORA-1221

https://issues.apache.org/jira/browse/AURORA-1222

The new metrics represents the median time spent waiting for a set of tasks to
reach STARTING status within a time frame(including the tasks turning into
RUNNING state within the time frame).

Here I regard STARTING as an active state. However, STARTING state is account
for platform and job uptime calculations.

Testing Done:
./gradlew build

./gradlew :test

./build-support/jenkins/build.sh

Reviewed at https://reviews.apache.org/r/51580/

8 months agoRemove static Stats method `exportSize`.
Zameer Manji [Tue, 6 Sep 2016 19:18:36 +0000 (12:18 -0700)] 
Remove static Stats method `exportSize`.

Reviewed at https://reviews.apache.org/r/51469/

8 months agoAdd Dynamic Reservations design document
Stephan Erb [Sat, 3 Sep 2016 22:38:19 +0000 (00:38 +0200)] 
Add Dynamic Reservations design document

Reviewed at https://reviews.apache.org/r/51595/

8 months agoExtend the resource isolation and oversubscription documentation
Stephan Erb [Sat, 3 Sep 2016 22:02:44 +0000 (00:02 +0200)] 
Extend the resource isolation and oversubscription documentation

I had to answer a couple of questions regarding these over the recent weeks and thought it might make sense to update the docs accordingly.

Reviewed at https://reviews.apache.org/r/51602/

8 months agoUpgrade to latest CherryPy.
John Sirois [Fri, 2 Sep 2016 21:28:04 +0000 (15:28 -0600)] 
Upgrade to latest CherryPy.

We can now move past the UTF-8 encoding issues of 6.0.0 - 7.1.0 since
the UTF-8 encoded test filename was removed from the package here:
  https://github.com/cherrypy/cherrypy/commit/b8e2518d

See the changelog here:
  https://github.com/cherrypy/cherrypy/blob/v8.0.0/CHANGES.txt

Reviewed at https://reviews.apache.org/r/51615/

8 months agoAllow E_NAME_IN_USE in useradd/groupadd.
Zhitao Li [Fri, 2 Sep 2016 18:51:46 +0000 (12:51 -0600)] 
Allow E_NAME_IN_USE in useradd/groupadd.

Bugs closed: AURORA-1761

Reviewed at https://reviews.apache.org/r/51564/

8 months agoFix a Python unittest that is not asserting anything
Stephan Erb [Tue, 30 Aug 2016 20:52:00 +0000 (22:52 +0200)] 
Fix a Python unittest that is not asserting anything

I discovered this one during a failed attempt to update to a new mock version. I have only aimed for a minimal fix:

* incorrect usage of PropertyMock: https://docs.python.org/dev/library/unittest.mock.html#unittest.mock.PropertyMock
* assert_called_once() does not exist: https://engineeringblog.yelp.com/2015/02/assert_called_once-threat-or-menace.html

Reviewed at https://reviews.apache.org/r/51535/

8 months agoUpdate 3dparty Python dependencies
Stephan Erb [Tue, 30 Aug 2016 20:48:22 +0000 (22:48 +0200)] 
Update 3dparty Python dependencies

I have skimmed the changelogs and there does not seem to be anything
worth calling out in particular. Full changelogs:

* https://github.com/cherrypy/cherrypy/blob/v5.6.0/CHANGES.txt
* http://docs.makotemplates.org/en/latest/changelog.html
* https://github.com/pantsbuild/pex/blob/v1.1.14/CHANGES.rst
* https://github.com/giampaolo/psutil/blob/release-4.3.0/HISTORY.rst
* https://github.com/requests/requests-kerberos/blob/v0.10.0/HISTORY.rst
* https://github.com/kennethreitz/requests/blob/v2.11.1/HISTORY.rst

I have skipped the following updates for now:

* bottle: has a conflicting requirement in common
* mock: it leads to some test failures
* thrift: I didn't dare to touch that

Reviewed at https://reviews.apache.org/r/51499/

8 months agoEnable -zk_use_curator by default and deprecate.
John Sirois [Tue, 30 Aug 2016 20:40:55 +0000 (14:40 -0600)] 
Enable -zk_use_curator by default and deprecate.

The flag is noted as deprecated for removal n a future release.

Bugs closed: AURORA-1669

Reviewed at https://reviews.apache.org/r/51506/

8 months agoMinor improvements to the custom executor docs
Stephan Erb [Tue, 30 Aug 2016 19:34:22 +0000 (21:34 +0200)] 
Minor improvements to the custom executor docs

* Add link from README.md to features/custom-executors.md
* Add short introduction paragraph to features/custom-executors.md

Reviewed at https://reviews.apache.org/r/51531/

8 months agoClean up leaking of mounts into the host's mtab.
Joshua Cohen [Mon, 29 Aug 2016 20:52:44 +0000 (15:52 -0500)] 
Clean up leaking of mounts into the host's mtab.

Reviewed at https://reviews.apache.org/r/51502/

8 months agoConfigure ssh for e2e tests once globally, rather than as part of a specific test...
Joshua Cohen [Mon, 29 Aug 2016 19:19:55 +0000 (14:19 -0500)] 
Configure ssh for e2e tests once globally, rather than as part of a specific test case.

Reviewed at https://reviews.apache.org/r/51500/

8 months agoCatch IOError.
David Robinson [Mon, 29 Aug 2016 18:21:46 +0000 (11:21 -0700)] 
Catch IOError.

Bugs closed: AURORA-1752

Reviewed at https://reviews.apache.org/r/51307/

8 months agoRe-enable python style check in the integration build.
Santhosh Kumar Shanmugham [Mon, 29 Aug 2016 17:02:12 +0000 (11:02 -0600)] 
Re-enable python style check in the integration build.

pants test does not appear to invoke the python checkstyle. Re-enable
it by explicitly calling in the integration build script. Also fix the
few issues that have already been commited.

Reviewed at https://reviews.apache.org/r/51484/

8 months agoAdded Uber to the relevant user list
Tarun Gogineni [Mon, 29 Aug 2016 16:35:17 +0000 (09:35 -0700)] 
Added Uber to the relevant user list

Testing Done:
Their latest blog post regarding their engineering stack mentioned their use of aurora for long running jobs.

Reviewed at https://reviews.apache.org/r/51474/

8 months agocontainerizer documentation and example changed to non-deprecated syntax
Tarun Gogineni [Fri, 26 Aug 2016 18:43:19 +0000 (13:43 -0500)] 
containerizer documentation and example changed to non-deprecated syntax

Bugs closed: AURORA-1754

Reviewed at https://reviews.apache.org/r/51438/

8 months agoA few executor fixes for filesystem isolation:
Joshua Cohen [Fri, 26 Aug 2016 17:49:25 +0000 (12:49 -0500)] 
A few executor fixes for filesystem isolation:

- Add an option to skip the groupadd/useradd calls into the task's filesystem.
- Mount any configured volumes into the task's filesystem.
- Clean up http server script used by appc e2e tests.
- Properly support CWD and .thermos_profile.

Reviewed at https://reviews.apache.org/r/51298/

8 months agoUnset PYTHONPATH before calling pants
Stephan Erb [Fri, 26 Aug 2016 06:30:45 +0000 (08:30 +0200)] 
Unset PYTHONPATH before calling pants

Our tests are started via `./pants test.pytest` and are then calling `./pants binary`
within some test setup routines. Looks like that the `PYTHONPATH` can be tainted for
the second run. Unsetting it seems to prevent test failures of the kind:

```
Traceback (most recent call last):
File "/home/jenkins/.cache/pants/setup/bootstrap-Linux-x86_64/0.0.80/bin/pants", line 7, in <module>
 from pants.bin.pants_exe import main
ImportError: No module named pants.bin.pants_exe
```

Bugs closed: AURORA-1717

Reviewed at https://reviews.apache.org/r/51366/

9 months agoFix thermos killing heuristic to permit setuid(2).
Zameer Manji [Tue, 23 Aug 2016 22:07:45 +0000 (15:07 -0700)] 
Fix thermos killing heuristic to permit setuid(2).

Previously this process killing heuristic would not allow killing of a process
if the uid it was launched with differs from the real uid of the currently
running process. The logic is too conservative because it doesn't factor in
that a process launched as root can use `setuid(2)` to change it's real uid.

This patch fixes the heuristic by permitting killing of a process launched as
root but the real uid is now not root.

Bugs closed: AURORA-1753

Reviewed at https://reviews.apache.org/r/51348/

9 months agoOnly warn about terminated executors if their exit code is not 0.
Stephan Erb [Tue, 23 Aug 2016 16:15:53 +0000 (18:15 +0200)] 
Only warn about terminated executors if their exit code is not 0.

I have left a comment in the corresponding Mesos issue MESOS-313, so hopefully
we can remove that guard here in the future.

Bugs closed: AURORA-1719

Reviewed at https://reviews.apache.org/r/51306/

9 months agoReduce static method exposure for Stats.
Zameer Manji [Mon, 22 Aug 2016 18:45:18 +0000 (11:45 -0700)] 
Reduce static method exposure for Stats.

`org.apache.aurora.common.stats.Stats` has several static methods that are not
used in our codebase. This patch deletes the unused methods and reduces the
visability of other static methods where possible.

Reviewed at https://reviews.apache.org/r/51264/

9 months agoMoving custom executors documentation to features, adding gorealis to tools
Renan DelValle [Thu, 18 Aug 2016 19:08:14 +0000 (14:08 -0500)] 
Moving custom executors documentation to features, adding gorealis to tools

Moved custom executor documentation from docs/operations/configuration.md since it grew out of
proportion.

Added an explaination on how a thrift object needs to be configured in a createJob call and
referenced using a custom Client for now.

Reviewed at https://reviews.apache.org/r/51192/

9 months agoAdd rollback functionality to the scheduler
Igor Morozov [Thu, 11 Aug 2016 20:56:48 +0000 (13:56 -0700)] 
Add rollback functionality to the scheduler

For active job updates in ROLLING_FORWARD, ROLL_BACK_PAUSED,
ROLL_BACK_AWAITING_PULSE, ROLL_FORWARD_PAUSED or ROLL_FORWARD_AWAITING_PULSE
state it is possible now to initiate a rollback by calling a corresponding API
function.  Rollback is also supported in aurora CLI tool via new command: aurora
update rollback CLUSTER/ROLE/ENV/NAME

Bugs closed: AURORA-1721

Reviewed at https://reviews.apache.org/r/50168/

9 months agoAURORA-1656 Fix broken links in tier documentation
Mehrdad Nurolahzade [Wed, 10 Aug 2016 19:36:53 +0000 (12:36 -0700)] 
AURORA-1656 Fix broken links in tier documentation

Bugs closed: AURORA-1656

Reviewed at https://reviews.apache.org/r/50902/

9 months agoBump jetty dependency to the latest release.
Zameer Manji [Wed, 10 Aug 2016 18:02:56 +0000 (11:02 -0700)] 
Bump jetty dependency to the latest release.

A useful fix from the jetty-9.3.10.v20160621 release:
>  623 Add --gzip suffix to 304 responses with ETAGs

Without this fix adding ETAG support to the scheduler with gzipped requests and
responses is not possible.

Reviewed at https://reviews.apache.org/r/50937/

9 months agoRemove unnecessary guice container parameters.
Zameer Manji [Wed, 10 Aug 2016 17:53:47 +0000 (10:53 -0700)] 
Remove unnecessary guice container parameters.

I noticed these configuration parameters have no effect. Both the API and JAX-RS
endpoints like /vars return gipped content.

Testing Done:
$ curl -I -X GET http://192.168.33.7:8081/vars -H 'Accept-Encoding: gzip, deflate'
HTTP/1.1 200 OK
Date: Mon, 08 Aug 2016 15:18:12 GMT
Content-Type: text/plain
Vary: Accept-Encoding, User-Agent
Content-Encoding: gzip
Transfer-Encoding: chunked
Server: Jetty(9.3.6.v20151106)

Reviewed at https://reviews.apache.org/r/50931/

9 months agoFix typo in `RELEASE-NOTES.md`.
Zameer Manji [Fri, 5 Aug 2016 18:30:58 +0000 (11:30 -0700)] 
Fix typo in `RELEASE-NOTES.md`.

9 months agoPopulate the source field of ExecutorInfo.
Zameer Manji [Fri, 5 Aug 2016 17:54:09 +0000 (10:54 -0700)] 
Populate the source field of ExecutorInfo.

b912e17 stopped populating the source field of the executor. For backwards
compatibility we should continue to populate this field and the `source` label.

Bugs closed: AURORA-1745

Reviewed at https://reviews.apache.org/r/50826/

9 months agoUpdate install docs for 0.15.0
Stephan Erb [Fri, 5 Aug 2016 17:04:09 +0000 (19:04 +0200)] 
Update install docs for 0.15.0

Mirrors the changes in https://github.com/apache/aurora-packaging/commit/2a9dd402b7b0fe75d994ed83de36f820619fd836

Reviewed at https://reviews.apache.org/r/50859/

9 months agoUse update_job instead of creating new config object when modifying. This avoids...
David McLaughlin [Thu, 4 Aug 2016 21:20:15 +0000 (14:20 -0700)] 
Use update_job instead of creating new config object when modifying. This avoids losing any state (e.g. metadata) attached to the config.

Reviewed at https://reviews.apache.org/r/50819/

9 months agoMultiple executor support in Scheduler
Renan DelValle [Thu, 4 Aug 2016 14:24:32 +0000 (09:24 -0500)] 
Multiple executor support in Scheduler

Adds support for using multiple executors in a single scheduler.

Reviewed at https://reviews.apache.org/r/50480/

9 months agoSupport TBinaryProtocol over HTTP
Zameer Manji [Wed, 3 Aug 2016 22:31:18 +0000 (15:31 -0700)] 
Support TBinaryProtocol over HTTP

This replaces the `TServlet` servlet from thrift with our own servlet which
dispatches thrift responses based on the content type of the request. This
enables a client to use either the thrift json protocol or the binary protocol
when communicating with the scheduler.

Without this patch the current behaviour is:
- Regardless of content type of the request, assume the request is json and
  return json with a `application/x-thrift` content type.

With this patch the behaviour becomes:
- A request with no content type header, or a type of `application/x-thrift` or
  `application/json` or `application/vnd.apache.thrift.json` is assumed to be
  JSON.
- A request with a content type header of `application/vnd.apache.thrift.binary`
  is assumed to be binary.
- A request with an `Accept` header of `application/vnd.apache.thrift.binary`
  will have a binary response.
- A request with any other `Accept` header will have a JSON response.

Bugs closed: AURORA-1743

Reviewed at https://reviews.apache.org/r/50685/

9 months agoIsolate the executor's filesystem from the task's.
Joshua Cohen [Wed, 3 Aug 2016 16:16:44 +0000 (11:16 -0500)] 
Isolate the executor's filesystem from the task's.

This changes the approach to launching tasks with filesystem images in the unified containerizer.
Instead of adding an `Image` to the `MesosContainer`, we instead add the task filesystem as a
`Volume` with an associated image. This image is mounted in the mesos directory under the `taskfs`
path. The executor, on start up does the following:

1. Creates user/group under the taskfs root.
2. Bind mounts the mesos directory under the taskfs root.
2. Uses mesos-containerizer's launch subcommand to pivot into the task fs and execute each process.

Reviewed at https://reviews.apache.org/r/47853/

9 months agoUpgrade to Mesos 1.0.0
Joshua Cohen [Mon, 1 Aug 2016 15:33:32 +0000 (10:33 -0500)] 
Upgrade to Mesos 1.0.0

I also took this as an opportunity to switch us from mesos.native to mesos.executor on the Python
side, meaning Docker containers will no longer require all mesos deps.

Release notes here: http://mesos.apache.org/blog/mesos-1-0-0-released/
Upgrade notes here: http://mesos.apache.org/documentation/latest/upgrades/

Reviewed at https://reviews.apache.org/r/50584/

9 months agoAURORA-1741 Added missing test cases
Mehrdad Nurolahzade [Fri, 29 Jul 2016 23:50:25 +0000 (18:50 -0500)] 
AURORA-1741 Added missing test cases

Bugs closed: AURORA-1741

Reviewed at https://reviews.apache.org/r/50617/

9 months agoAURORA-1656 Document tier concept
Mehrdad Nurolahzade [Fri, 29 Jul 2016 20:14:50 +0000 (13:14 -0700)] 
AURORA-1656 Document tier concept

Bugs closed: AURORA-1656

Reviewed at https://reviews.apache.org/r/50530/

9 months agoImprove `executorLost` error message by including the slave id.
Zameer Manji [Wed, 27 Jul 2016 18:33:58 +0000 (11:33 -0700)] 
Improve `executorLost` error message by including the slave id.

Reviewed at https://reviews.apache.org/r/50478/

9 months agoAURORA-1741 Fix pystachio binding bug introduced by AURORA-1710
Mehrdad Nurolahzade [Tue, 26 Jul 2016 03:02:25 +0000 (20:02 -0700)] 
AURORA-1741 Fix pystachio binding bug introduced by AURORA-1710

Bugs closed: AURORA-1741

Reviewed at https://reviews.apache.org/r/50432/

9 months agoAURORA-1710 Make 'tier' required and remove support for 'production' flag in Job...
Mehrdad Nurolahzade [Mon, 25 Jul 2016 16:30:28 +0000 (11:30 -0500)] 
AURORA-1710 Make 'tier' required and remove support for 'production' flag in Job configuration - CLI changes

Bugs closed: AURORA-1710

Reviewed at https://reviews.apache.org/r/49048/

10 months agoDisplay reservations and persistent volumes in /offers debug http endpoint
Mehrdad Nurolahzade [Sat, 23 Jul 2016 16:48:48 +0000 (18:48 +0200)] 
Display reservations and persistent volumes in /offers debug http endpoint

Bugs closed: AURORA-1736

Reviewed at https://reviews.apache.org/r/50052/

10 months agoUpgrade pants to 1.1.0-rc7.
John Sirois [Tue, 12 Jul 2016 15:00:02 +0000 (09:00 -0600)] 
Upgrade pants to 1.1.0-rc7.

This helps test the pants release candidate to pave our way to upgrade to
the 1.1.0 release upgrade and it now limits our backends to python-only
for real. This means ivy is no longer bootstrapped to run python tests
for example.

Reviewed at https://reviews.apache.org/r/49872/

10 months agoUpgrade to gradle 2.14.
John Sirois [Mon, 11 Jul 2016 16:24:04 +0000 (10:24 -0600)] 
Upgrade to gradle 2.14.

This brings Gradle to 2.14, release notes here:
  https://docs.gradle.org/2.13/release-notes
  https://docs.gradle.org/2.14/release-notes

Since the Gradle daemon is no longer incubating it is enabled by
default for the project and the Vagrant provisioning is simplified
as a result.

Reviewed at https://reviews.apache.org/r/49899/

10 months agoUpdate virtualenv version to 15.0.2
Stephan Erb [Sun, 10 Jul 2016 22:26:33 +0000 (16:26 -0600)] 
Update virtualenv version to 15.0.2

Update virtualenv to 15.0.2. This includes updates to `pip==8.1.2`, `setuptools==21.2.1`, and `wheel==0.29.0`. Full changelog: https://virtualenv.pypa.io/en/stable/changes/

Additional changes to our scripts:

* abort the pants bootstrapping script if an error is encountered
* switch to download URLs that are working for the latest virtualenv version
* disable the recently added auto-updating of pip and setuptools so that we gain a reproducable build environment

Bugs closed: AURORA-1717

Reviewed at https://reviews.apache.org/r/49868/

10 months agoRevert "AURORA-1710 Make 'tier' required and remove support for 'production' flag...
Joshua Cohen [Fri, 8 Jul 2016 22:15:25 +0000 (17:15 -0500)] 
Revert "AURORA-1710 Make 'tier' required and remove support for 'production' flag in Job configuration - CLI changes"

This reverts commit 7701d218cd9c22cb4a2f107d28695d57e679b402.

10 months agoAURORA-1710 Make 'tier' required and remove support for 'production' flag in Job...
Mehrdad Nurolahzade [Fri, 8 Jul 2016 20:33:24 +0000 (15:33 -0500)] 
AURORA-1710 Make 'tier' required and remove support for 'production' flag in Job configuration - CLI changes

Bugs closed: AURORA-1710

Reviewed at https://reviews.apache.org/r/49048/

10 months agoUpdating release docs to remind about rc argument usage.
Maxim Khutornenko [Wed, 6 Jul 2016 19:34:28 +0000 (12:34 -0700)] 
Updating release docs to remind about rc argument usage.

Reviewed at https://reviews.apache.org/r/49642/

10 months agoFix thrift t_java_generator.ccc patch.
John Sirois [Mon, 4 Jul 2016 19:37:50 +0000 (13:37 -0600)] 
Fix thrift t_java_generator.ccc patch.

The initial patch inverted logic for emitting `else if` clauses.

Bugs closed: AURORA-1727

Reviewed at https://reviews.apache.org/r/49595/

10 months agoUpgrade to pants 1.1.0-pre6.
John Sirois [Mon, 4 Jul 2016 15:37:05 +0000 (09:37 -0600)] 
Upgrade to pants 1.1.0-pre6.

This picks up a fix for creating wheels for aurora components.

Release notes here:
  https://pypi.python.org/pypi/pantsbuild.pants/1.1.0-pre6

Bugs closed: AURORA-1620

Reviewed at https://reviews.apache.org/r/49593/

10 months agoClose `PathChildrenCache` before its framework.
John Sirois [Mon, 4 Jul 2016 14:13:33 +0000 (08:13 -0600)] 
Close `PathChildrenCache` before its framework.

Previously these lifecycles were modeled as independent when, in fact,
a `CuratorFramework`'s clients must be closed befor it is closed to
prevent errors in the clients from attempting to use a closed
`CuratorFramework`.

The proof that closing was always safe already existed in
`CuratorServiceGroupMonitorTest::testExceptionalLifecycle`, but this
safety is now documented and more explicitly tested.

Bugs closed: AURORA-1729

Reviewed at https://reviews.apache.org/r/49578/

10 months agoEnsure e2e key has its own authorized_keys line.
John Sirois [Sun, 3 Jul 2016 22:01:12 +0000 (16:01 -0600)] 
Ensure e2e key has its own authorized_keys line.

Previously it was assumed the existing `~/.ssh/authorized_keys` file
ended in a newline which need not be the case.

Bugs closed: AURORA-1728

Reviewed at https://reviews.apache.org/r/49577/

10 months agoUpdate install docs to for Aurora 0.14.0
Stephan Erb [Sun, 3 Jul 2016 18:35:55 +0000 (20:35 +0200)] 
Update install docs to for Aurora 0.14.0

Reviewed at https://reviews.apache.org/r/49339/

10 months agoPatch thrift to compile under modern gcc.
John Sirois [Fri, 1 Jul 2016 19:45:19 +0000 (13:45 -0600)] 
Patch thrift to compile under modern gcc.

This also patches the thrift compiler Makefile.in to eliminate all
generators not used by Aurora to both work around their modern gcc
compiler errors and speed up the thrift compiler build.

Bugs closed: AURORA-1727

Reviewed at https://reviews.apache.org/r/49528/

10 months agoReduce log level of finding a valid leader
Stephan Erb [Fri, 1 Jul 2016 19:39:46 +0000 (21:39 +0200)] 
Reduce log level of finding a valid leader

Since the introduction of the Curator-backend [1], we have been logging the currently active leader on each request. This happens even if the queried scheduler is also the leading one. By reducing the log level to debug, we can reduce the logging noise of Aurora significantly.

```
I0630 11:38:04.871 [qtp806277656-123, Slf4jRequestLog:60] 10.x.x.x - - [30/Jun/2016:11:38:04 +0000] "POST //master1.example.org/api HTTP/1.1" 307 0
I0630 11:38:04.879 [qtp806277656-129, LeaderRedirect:197] Found leader scheduler at [ServiceInstance(serviceEndpoint:Endpoint(host:master2.example.org, port:8081), additionalEndpoints:{http=Endpoint(host:master2.example.org port:8081)}, status:ALIVE)]
I0630 11:38:04.879 [qtp806277656-129, LeaderRedirect:197] Found leader scheduler at [ServiceInstance(serviceEndpoint:Endpoint(host:master2.example.org, port:8081), additionalEndpoints:{http=Endpoint(host:master2.example.org, port:8081)}, status:ALIVE)]
I0630 11:38:04.880 [qtp806277656-129, LeaderRedirect:197] Found leader scheduler at [ServiceInstance(serviceEndpoint:Endpoint(host:master2.example.org, port:8081), additionalEndpoints:{http=Endpoint(host:master2.example.org, port:8081)}, status:ALIVE)]
```

[1] https://github.com/apache/aurora/commit/103dae6871eaa76914ab7fe17adaa174e93f537a#diff-1a20d033e88f293107537e09636007ffR190

Reviewed at https://reviews.apache.org/r/49527/

10 months agoFix Process log configuration handling.
John Sirois [Fri, 1 Jul 2016 19:35:11 +0000 (13:35 -0600)] 
Fix Process log configuration handling.

Previously flagged configuration of Process logging mode would
blow up and claimed defaulting of the rotation policy did not
occur.

Bugs closed: AURORA-1724

Reviewed at https://reviews.apache.org/r/49399/

10 months agoIncrementing snapshot version to 0.16.0-SNAPSHOT.
Maxim Khutornenko [Fri, 1 Jul 2016 01:54:54 +0000 (18:54 -0700)] 
Incrementing snapshot version to 0.16.0-SNAPSHOT.

10 months agoUpdating CHANGELOG for 0.15.0 release.
Maxim Khutornenko [Fri, 1 Jul 2016 01:54:54 +0000 (18:54 -0700)] 
Updating CHANGELOG for 0.15.0 release.

10 months agoFixing e2e tests failing due to mesos-slave state.
Maxim Khutornenko [Fri, 1 Jul 2016 01:37:46 +0000 (18:37 -0700)] 
Fixing e2e tests failing due to mesos-slave state.

Reviewed at https://reviews.apache.org/r/49478/

10 months agoRevert "Updating CHANGELOG for 0.15.0 release."
Maxim Khutornenko [Thu, 30 Jun 2016 23:07:26 +0000 (16:07 -0700)] 
Revert "Updating CHANGELOG for 0.15.0 release."

This reverts commit 5de1803a247964bd7c4227562c33c4b9108c3c21.

10 months agoRevert "Incrementing snapshot version to 0.16.0-SNAPSHOT."
Maxim Khutornenko [Thu, 30 Jun 2016 23:06:57 +0000 (16:06 -0700)] 
Revert "Incrementing snapshot version to 0.16.0-SNAPSHOT."

This reverts commit 94e2eea541c3c6d408ac9f22492457060009c8d4.

10 months agoIncrementing snapshot version to 0.16.0-SNAPSHOT.
Maxim Khutornenko [Thu, 30 Jun 2016 18:28:14 +0000 (11:28 -0700)] 
Incrementing snapshot version to 0.16.0-SNAPSHOT.

10 months agoUpdating CHANGELOG for 0.15.0 release.
Maxim Khutornenko [Thu, 30 Jun 2016 18:28:14 +0000 (11:28 -0700)] 
Updating CHANGELOG for 0.15.0 release.