mesos.git
6 weeks agoADD: New frameworks and executor to the community list. master
Andreas Peters [Sun, 7 Aug 2022 12:53:50 +0000 (14:53 +0200)] 
ADD: New frameworks and executor to the community list.

7 weeks agoFixed random SlaveRecoveryTest.PingTimeoutDuringRecovery test failure. (#436)
cf-natali [Sun, 7 Aug 2022 10:29:21 +0000 (11:29 +0100)] 
Fixed random SlaveRecoveryTest.PingTimeoutDuringRecovery test failure. (#436)

This test would randomly fail with:
```
18:16:59 3: F0501 17:16:59.192818 19175 slave.cpp:1445] Check
failed:
state == DISCONNECTED || state == RUNNING || state == TERMINATING
RECOVERING
```

The cause was that the test re-starts the slave with the same PID, which
means that timers started by the previous slave process could fire while
the new slave process was running.

In this specific case, what happened is that the previous slave's ping
timer would fire in the middle of recovery of the second slave instance,
yielding this assertion.

Fixed by cancelling the `pingTimer` in the slave destructor.

Tested by running the test in a loop, while running a CPU-intensive
workload - `stress-ng --cpu $(nproc)0` in parallel.

7 weeks agoUpdated website's docker image Ruby version.
Andreas Peters [Sun, 7 Aug 2022 10:26:27 +0000 (12:26 +0200)] 
Updated website's docker image Ruby version.

To fix incompatibility with updated dependencies.

We will separately look into updating to a more recent Ubuntu base image.

2 months agoBump tzinfo from 1.2.5 to 1.2.10 in /site
dependabot[bot] [Fri, 22 Jul 2022 01:10:47 +0000 (01:10 +0000)] 
Bump tzinfo from 1.2.5 to 1.2.10 in /site

Bumps [tzinfo](https://github.com/tzinfo/tzinfo) from 1.2.5 to 1.2.10.
- [Release notes](https://github.com/tzinfo/tzinfo/releases)
- [Changelog](https://github.com/tzinfo/tzinfo/blob/master/CHANGES.md)
- [Commits](https://github.com/tzinfo/tzinfo/compare/v1.2.5...v1.2.10)

---
updated-dependencies:
- dependency-name: tzinfo
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
3 months agoADD: matrix slack bridge.
Andreas Peters [Tue, 7 Jun 2022 13:46:32 +0000 (15:46 +0200)] 
ADD: matrix slack bridge.

3 months agoRemoves Twitter embeds from website community page and older blog posts.
Dave Lester [Mon, 6 Jun 2022 03:00:12 +0000 (20:00 -0700)] 
Removes Twitter embeds from website community page and older blog posts.

4 months agoBump pyinstaller from 3.4 to 3.6 in /src/python/cli_new
dependabot[bot] [Mon, 23 May 2022 02:28:11 +0000 (02:28 +0000)] 
Bump pyinstaller from 3.4 to 3.6 in /src/python/cli_new

Bumps [pyinstaller](https://github.com/pyinstaller/pyinstaller) from 3.4 to 3.6.
- [Release notes](https://github.com/pyinstaller/pyinstaller/releases)
- [Changelog](https://github.com/pyinstaller/pyinstaller/blob/develop/doc/CHANGES-3.rst)
- [Commits](https://github.com/pyinstaller/pyinstaller/compare/v3.4...v3.6)

---
updated-dependencies:
- dependency-name: pyinstaller
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
4 months agoBump json from 2.2.0 to 2.6.2 in /site
dependabot[bot] [Mon, 23 May 2022 01:53:26 +0000 (01:53 +0000)] 
Bump json from 2.2.0 to 2.6.2 in /site

Bumps [json](https://github.com/flori/json) from 2.2.0 to 2.6.2.
- [Release notes](https://github.com/flori/json/releases)
- [Changelog](https://github.com/flori/json/blob/master/CHANGES.md)
- [Commits](https://github.com/flori/json/compare/v2.2.0...v2.6.2)

---
updated-dependencies:
- dependency-name: json
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
4 months agoBump pygments from 2.1.3 to 2.7.4 in /src/python/cli_new
dependabot[bot] [Mon, 23 May 2022 01:54:26 +0000 (01:54 +0000)] 
Bump pygments from 2.1.3 to 2.7.4 in /src/python/cli_new

Bumps [pygments](https://github.com/pygments/pygments) from 2.1.3 to 2.7.4.
- [Release notes](https://github.com/pygments/pygments/releases)
- [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES)
- [Commits](https://github.com/pygments/pygments/compare/2.1.3...2.7.4)

---
updated-dependencies:
- dependency-name: pygments
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
4 months agoBump nokogiri from 1.10.4 to 1.13.6 in /site
dependabot[bot] [Wed, 18 May 2022 22:02:11 +0000 (22:02 +0000)] 
Bump nokogiri from 1.10.4 to 1.13.6 in /site

Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.10.4 to 1.13.6.
- [Release notes](https://github.com/sparklemotion/nokogiri/releases)
- [Changelog](https://github.com/sparklemotion/nokogiri/blob/main/CHANGELOG.md)
- [Commits](https://github.com/sparklemotion/nokogiri/compare/v1.10.4...v1.13.6)

---
updated-dependencies:
- dependency-name: nokogiri
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
4 months agoFixed clang-tidy warnings due to capturing this in a deferred lambda.
Charles-Francois Natali [Fri, 29 Apr 2022 21:47:22 +0000 (22:47 +0100)] 
Fixed clang-tidy warnings due to capturing this in a deferred lambda.

Use `defer(self(), lambda)` instead to avoid the risk of use-after-free.

See on the mesos-tidy CI job:
```
/tmp/SRC/src/csi/v0_volume_manager.cpp:1078:13: warning: callback
capturing this should be dispatched/deferred to a specific PID
[mesos-this-capture]
      .then([=](const Map<string, string>& secrets) {
```

Together with the recent fixes merged, this fix should allow the
mesos-tidy CI job to be green again:
https://ci-builds.apache.org/job/Mesos/job/Mesos-Tidybot/

4 months agoRevert "Fixed random SlaveRecoveryTest.PingTimeoutDuringRecovery test failure."
cf-natali [Tue, 3 May 2022 18:35:49 +0000 (19:35 +0100)] 
Revert "Fixed random SlaveRecoveryTest.PingTimeoutDuringRecovery test failure."

This reverts commit 57088b11a328355b7f2a53a1b3fba9928a2fde73.

5 months agoFixed OversubscriptionTest.FixedResourceEstimator. (#434)
cf-natali [Sun, 1 May 2022 20:08:05 +0000 (21:08 +0100)] 
Fixed OversubscriptionTest.FixedResourceEstimator. (#434)

Depending on the recovery timing, the slave could send an
`UpdateSlaveMessage` message before the resource estimator is ready, so
if that's the case, wait for another update.

5 months agoFixed random SlaveRecoveryTest.PingTimeoutDuringRecovery test failure.
Charles-Francois Natali [Sun, 1 May 2022 18:10:44 +0000 (19:10 +0100)] 
Fixed random SlaveRecoveryTest.PingTimeoutDuringRecovery test failure.

This test would randomly fail with:
```
18:16:59 3: F0501 17:16:59.192818 19175 slave.cpp:1445] Check failed:
   state == DISCONNECTED || state == RUNNING || state == TERMINATING
RECOVERING
```

The cause was that the test re-starts the slave with the same PID, which
means that timers started by the previous slave process could fire while
the new slave process was running.

In this specific case, what happened is that the previous slave's ping
timer would fire in the middle of recovery of the second slave instance,
yielding this assertion.

Fixed by making sure to use `Clock::advance` and `Clock::settle` after
terminating the first instance to ensure that there are no pending
timers.

Tested by running the test in a loop, while running a CPU-intensive
workload - `stress-ng --cpu $(nproc)0` in parallel.

5 months agoUpdates paths to images for 2016 dev community status.
Dave Lester [Sat, 30 Apr 2022 05:34:21 +0000 (22:34 -0700)] 
Updates paths to images for 2016 dev community status.

5 months agoRemoves dynamic chart embed from 2016 blog post and replaces with static images.
Dave Lester [Sat, 30 Apr 2022 05:24:30 +0000 (22:24 -0700)] 
Removes dynamic chart embed from 2016 blog post and replaces with static images.

5 months agoUpdates css and js to be local vs relying on project-specific CDNs.
Dave Lester [Sat, 30 Apr 2022 05:05:44 +0000 (22:05 -0700)] 
Updates css and js to be local vs relying on project-specific CDNs.

5 months agoRemoves embeds from website: Google Analytics, social share buttons, and remote images.
Dave Lester [Sat, 30 Apr 2022 04:10:33 +0000 (21:10 -0700)] 
Removes embeds from website: Google Analytics, social share buttons, and remote images.

5 months agoFixed some clang-tidy errors: build re2.
Charles-Francois Natali [Wed, 27 Apr 2022 20:33:50 +0000 (21:33 +0100)] 
Fixed some clang-tidy errors: build re2.

5 months agoFixed some clang-tidy warnings: disable lint for self-assignment test.
Charles-Francois Natali [Tue, 26 Apr 2022 22:29:48 +0000 (23:29 +0100)] 
Fixed some clang-tidy warnings: disable lint for self-assignment test.

5 months agoFixed some clang-tidy warnings: use override.
Charles-Francois Natali [Tue, 26 Apr 2022 22:28:09 +0000 (23:28 +0100)] 
Fixed some clang-tidy warnings: use override.

5 months agoFixed mesos-tidy to actually log clang-tidy errors on failure. (#429)
cf-natali [Tue, 26 Apr 2022 22:21:28 +0000 (23:21 +0100)] 
Fixed mesos-tidy to actually log clang-tidy errors on failure. (#429)

Because it uses `set -e`, the `entrypoint.sh` script would exit -
silently - if `clang-tidy` exited with a non-zero exit status, without
showing the `clang-tidy` output.

5 months agoAdded support for limiting mesos-tidy parallelism. (#428)
cf-natali [Mon, 25 Apr 2022 20:48:24 +0000 (21:48 +0100)] 
Added support for limiting mesos-tidy parallelism. (#428)

Instead of seting the parallelism to the number of cores, allow it to be
overwritten on an ad-hoc basis via the `JOBS` environment variable,
similar to mesos-build.

This is useful for example to avoid running OOM, which I suspect is
what's happening on the Jenkins CI.

5 months agoFixed mesos-tidy build parallelism. (#427)
cf-natali [Mon, 25 Apr 2022 07:03:46 +0000 (08:03 +0100)] 
Fixed mesos-tidy build parallelism. (#427)

- Export `CMAKE_BUILD_PARALLEL_LEVEL`, otherwise it's not passed to
  child cmake processes.
- Do not pass `--parallel` to cmake processes: if `--parallel` is passed
  without an argument, cmake/make will interpret it by default as
  unbounded parallelism, despite `CMAKE_BUILD_PARALLEL_LEVEL` being set.

After this change, the build parallelism is correctly set to the number
of cores instead of unbounded.

5 months agoCopy .gitignore instead of creating a symlink in setup-dev.
Andreas Peters [Thu, 21 Apr 2022 19:05:17 +0000 (21:05 +0200)] 
Copy .gitignore instead of creating a symlink in setup-dev.

`.gitignore` symlinks are ignored by `git` version 2.32.0 and above.
See https://github.com/git/git/blob/master/Documentation/RelNotes/2.32.0.txt

5 months agoFixed a crash in Storage Local Resource ProviderProcess.
Charles-Francois Natali [Mon, 18 Apr 2022 19:11:30 +0000 (20:11 +0100)] 
Fixed a crash in Storage Local Resource ProviderProcess.

`StorageLocalResourceProviderProcess::connected` can crash on a check
that the current state is `DISCONNECTED` if the current state is
`READY`, which can happen if the periodic reconciliation runs after
disconnection.

It can be reproduced by running
`ContentType/AgentResourceProviderConfigApiTest.Add/0` in a loop,
preferably with some CPU-intensive workload in the background to affect
the timing.

Update the check to allow `READY` as well.

5 months agoEnabled squash option for github PR.
Charles-Francois Natali [Fri, 8 Apr 2022 06:27:00 +0000 (07:27 +0100)] 
Enabled squash option for github PR.

Squashing makes it easier to have a clean history and rewrite commit
messages without involving the original submitter, which might not be
aware of the project contribution guidelines.

6 months agoCHANGE: remove bintray from docs.
Andreas Peters [Fri, 25 Mar 2022 08:03:41 +0000 (09:03 +0100)] 
CHANGE: remove bintray from docs.

6 months agoADD: unofficial repository.
Andreas Peters [Wed, 23 Mar 2022 07:21:20 +0000 (08:21 +0100)] 
ADD: unofficial repository.

7 months agoAdded Capability.Type.QUOTA_V2 to v1 operator API.
Charles-Francois Natali [Thu, 17 Feb 2022 01:00:10 +0000 (09:00 +0800)] 
Added Capability.Type.QUOTA_V2 to v1 operator API.

Closes #MESOS-10235.

This closes #419

7 months agoADD: missing release notes, REMOVE: not working links.
Andreas Peters [Mon, 7 Feb 2022 10:06:28 +0000 (11:06 +0100)] 
ADD: missing release notes, REMOVE: not working links.

7 months agofix: Add missing comma
Marek Šuppa [Sat, 29 Jan 2022 00:23:09 +0000 (01:23 +0100)] 
fix: Add missing comma

* Add missing comma to `__all__`

8 months agoADD: task inspect function.
Andreas Peters [Tue, 30 Nov 2021 20:16:15 +0000 (21:16 +0100)] 
ADD: task inspect function.

UPDATE: code refactor.

FIX: revoke accidental changes.

10 months agoFix fetcher cache reduction
Thomas Langé [Fri, 26 Nov 2021 08:15:45 +0000 (09:15 +0100)] 
Fix fetcher cache reduction

When artifact size is smaller than expected, we want to reduce recorded
cache usage.

To do it, we actually compute the delta as an off_t (signed). If delta is
negative, we reclaim the extra space as it's not really used.

This attempt was failing because computed delta is negative,
and passed as Bytes to releaseSpace, which is uint32_t
(unsigned). In most cases, casting a negative value into an uint will
just give an unrelated value due to sign bit disappearing.

By passing the positive value to Bytes(), we have a safer cast.

10 months agoADD: framework plugin for the mesos-cli.
Andreas Peters [Thu, 11 Nov 2021 08:15:35 +0000 (09:15 +0100)] 
ADD: framework plugin for the mesos-cli.

11 months agoAdd a gc test case with negative duration scheduled
Hao Su [Fri, 15 Oct 2021 05:28:31 +0000 (22:28 -0700)] 
Add a gc test case with negative duration scheduled

11 months agoFix Timeout overflow issue caused by negative duration
Hao Su [Fri, 1 Oct 2021 16:15:45 +0000 (09:15 -0700)] 
Fix Timeout overflow issue caused by negative duration

11 months agoFixed an agent crash in case of duplicate task ID.
Charles-Francois Natali [Thu, 9 Sep 2021 21:12:42 +0000 (22:12 +0100)] 
Fixed an agent crash in case of duplicate task ID.

When using the command executor, if a task is started with the same ID
as a previous task for which the agent still has the executor, reject
the task instead of crashing.

Closes MESOS-9657.

11 months agoMESOS-10230: update jquery to version 3.6.0.
Andreas Peters [Mon, 4 Oct 2021 07:13:00 +0000 (09:13 +0200)] 
MESOS-10230:  update jquery to version 3.6.0.

ADD: udpated jquery file.

11 months agoremove tasks limit and condition. add global functions.
Andreas Peters [Tue, 21 Sep 2021 10:03:36 +0000 (12:03 +0200)] 
remove tasks limit and condition. add global functions.

remove unneeded empty lines

FIX test script.

REMOVE: not needed newline.

FIX: Missing config.

MODIFY text length.

13 months agoAdd links to the CI jobs
Martin Tzvetanov Grigorov [Mon, 30 Aug 2021 12:10:02 +0000 (15:10 +0300)] 
Add links to the CI jobs

- active ones: Linux x86_64 and aarch64
- inactive: Windows
- none: Mac OS X

Make it more clear what 'PR' is - a Pull Request

13 months agoFixed cpplint errors.
Charles-Francois Natali [Mon, 9 Aug 2021 17:38:46 +0000 (18:38 +0100)] 
Fixed cpplint errors.

13 months agoFixed stout cpplint errors.
Charles-Francois Natali [Mon, 9 Aug 2021 18:34:29 +0000 (19:34 +0100)] 
Fixed stout cpplint errors.

13 months agoFixed NestedMesosContainerizerTest hangs on errors.
Charles-Francois Natali [Mon, 9 Aug 2021 14:19:09 +0000 (22:19 +0800)] 
Fixed NestedMesosContainerizerTest hangs on errors.

Those tests would use a named pipe to synchronize with the task being
started. The problem is that if the task fails to start, reading from
the pipe would block indefinitely, making the tests just hang.

We could update the code to use a read with a timeout, however it's a
bit fiddly and it's simpler to just use the presence as a regular file
as a barrier.

See https://issues.apache.org/jira/browse/MESOS-10226 for context.

Tested by @martin-g

This closes #402

14 months agoLinkedHashMap: fixed handling of self-assignment.
Charles-Francois Natali [Tue, 27 Jul 2021 12:39:21 +0000 (20:39 +0800)] 
LinkedHashMap: fixed handling of self-assignment.

Self-assigning a LinkedHashMap i.e. `map = map` would cause the map
to be cleared.
Found with clang-tidy.

This closes #400

14 months agoUse override on overriding methods - found with clang-tidy.
Charles-Francois Natali [Sat, 24 Jul 2021 13:29:35 +0000 (14:29 +0100)] 
Use override on overriding methods - found with clang-tidy.

14 months agoFixed a bug where the cgroup task killer leaves the cgroup frozen.
Charles-Francois Natali [Wed, 21 Jul 2021 13:13:55 +0000 (21:13 +0800)] 
Fixed a bug where the cgroup task killer leaves the cgroup frozen.

This closes #388

14 months agoAdded Mesos authentication to the Mesos cli
Andreas Peters [Wed, 21 Jul 2021 13:05:39 +0000 (21:05 +0800)] 
Added Mesos authentication to the Mesos cli

The following points I have done:

- Add authentication against Mesos master and agent.
- Add option to skip SSL verification of the mesos-agent.
- Change the order of "task list" to get back more running states.

This closes #383

14 months agoUpdated documentation for Systemd's parameter `delegate`
Andreas Peters [Wed, 14 Jul 2021 11:45:45 +0000 (19:45 +0800)] 
Updated documentation for Systemd's parameter `delegate`

This closes #398

14 months agoAdded asf.yaml configuration for the new website handling.
Andreas Peters [Mon, 5 Jul 2021 12:53:53 +0000 (20:53 +0800)] 
Added asf.yaml configuration for the new website handling.

This patch will persist the .asf.yaml configuration for the new website
publishing mechanism (https://github.com/apache/mesos-site/pull/2).

This closes #399

15 months agoFixed a bug where timers wouldn't expire after `process:reinitialize`.
Charles-Francois Natali [Sat, 26 Jun 2021 18:04:33 +0000 (19:04 +0100)] 
Fixed a bug where timers wouldn't expire after `process:reinitialize`.

Pending `ticks` are used by `scheduleTick` to decide whether to schedule
an event loop tick when a new timer is scheduled - since we only need to
schedule the event loop tick if the new timer is supposed to expire
earlier than the current earliest timer.
Unfortunately `Clock::finalize` didn't clear `ticks`, which means that
the following could happen:
- schedule a timer T0 for expiration at time t0
- call `process::reinitalize`, which calls `Clock::finalize` but doesn't
  clear `ticks`
- schedule a new timer T1 for expiration at time t1 > t0: since
  `scheduleTick` would see that there was already the earlier pending
  tick for T0 in `ticks` with t0 < t1, it wouldn't actually schedule a
  tick of the event loop

Therefore new timers would never fire again.

This caused e.g.
`DockerContainerizerIPv6Test.ROOT_DOCKER_LaunchIPv6HostNetwork` test to
hang since it called `process::reinitialized` while having some active
timers - e.g. the reaper periodic timer.

Also added a test specifically for this bug.

15 months agoAdded Saad Ur Rahman to contributors list.
Saad Ur Rahman [Fri, 25 Jun 2021 14:02:53 +0000 (22:02 +0800)] 
Added Saad Ur Rahman to contributors list.

This closes #396

15 months agoFixed crashes on ARM64 due to bad interaction of libunwind with libgcc.
Charles-Francois Natali [Fri, 25 Jun 2021 13:15:58 +0000 (21:15 +0800)] 
Fixed crashes on ARM64 due to bad interaction of libunwind with libgcc.

Closes MESOS-10223.

@qianzhangxa

This closes #395

15 months agoFixed `ldcache.parse` to handle excess tail data in Ubuntu <ld.so.cache>
Saad Ur Rahman [Thu, 24 Jun 2021 12:35:22 +0000 (20:35 +0800)] 
Fixed `ldcache.parse` to handle excess tail data in Ubuntu <ld.so.cache>

**_[MESOS-10244](https://issues.apache.org/jira/browse/MESOS-10224)_**

There is excess data on the tail end of the `ld.cache.so` file on
`Ubuntu 21.04`. With this fix the `data` pointer check will not fail
if it falls short of the end of the cache's buffer end.

This closes #394

15 months agoBackported fix for picojson -Wparentheses warning with recent GCC.
Charles-Francois Natali [Sun, 13 Jun 2021 13:48:53 +0000 (21:48 +0800)] 
Backported fix for picojson -Wparentheses warning with recent GCC.

This closes #392

15 months agoBackported a boost-mpl commit to ignore GCC's -Wparentheses.
Charles-Francois Natali [Tue, 8 Jun 2021 14:14:02 +0000 (22:14 +0800)] 
Backported a boost-mpl commit to ignore GCC's -Wparentheses.

In order to allow us to build with `-Werror`.
Since boost is quite a large dependency and the change is so small, it
felt much simpler and less risky than updating boost just to silence
this warning.

See https://github.com/boostorg/mpl/pull/34 for the upstream fix.

This together with https://github.com/apache/mesos/pull/392 allows to
build with `-Werror` using this somewhat recent gcc:
```
gcc (Debian 10.2.1-6) 10.2.1 20210110
```

@asekretenko @qianzhangxa

This closes #393

15 months agoFixed parsing of `perf` output on some locales.
Charles-Francois Natali [Fri, 4 Jun 2021 14:17:52 +0000 (22:17 +0800)] 
Fixed parsing of `perf` output on some locales.

If the locale is such that `LC_NUMERIC` uses the comma ',' as decimal
separator, parsing won't work - because of unexpected number of fields
and floating points format - so make sure it's set to `C`.

Example:
```
[ RUN      ]
CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_PERF_PerfTest
../../src/tests/containerizer/cgroups_tests.cpp:1024: Failure
(statistics).failure(): Failed to parse perf sample: Failed to parse
perf sample line
'6376827291,,cycles,mesos_test,2011741096,100,00,3,GHz': Unexpected
number of fields (9)
[  FAILED  ]
CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_PERF_PerfTest (2157
ms)
```

Standalone reproducer, using '/' as separator for readability:
```
root@thinkpad:~# LC_NUMERIC=fr_FR.UTF-8 perf stat --field-separator "/"
-- true
0,31/msec/task-clock/306721/100,00/0/CPUs utilized
0//context-switches/306721/100,00/0/K/sec
0//cpu-migrations/306721/100,00/0/K/sec
44//page-faults/306721/100,00/0/M/sec
788234//cycles/311478/100,00/2/GHz
538077//instructions/311478/100,00/0/insn per cycle
106749//branches/311478/100,00/348/M/sec
4556//branch-misses/311478/100,00/4/of all branches
```

This closes #391

15 months agoFixed a LevelDBStorage bug where positions would overflow.
Charles-Francois Natali [Sat, 15 May 2021 23:07:09 +0000 (00:07 +0100)] 
Fixed a LevelDBStorage bug where positions would overflow.

Positions were encoded with a width of 10, which limited the
maximum position to 9'999'999'999. And actually the code suffered
from overflow above INT32_MAX.
In order to be backward compatible, we still encode positions up to
9'999'999'999 using a width of 10, however for positions above
that, we switch to a width of 20 with the twist that we prepend 'A'
in order to preserve lexicographic ordering. Here's what it looks like:

0000000000
.......
9999999998
9999999999
A00000000010000000000
A00000000010000000001

The reason this works is because the only property which is required
by the encoding function is that it is strictly monotonically
increasing, i.e. if i < j, then encode(i) < encode(j) in lexicographic
order.

Closes #10216.

16 months agoFixed parsing of ld.so.cache on new glibc.
Charles-Francois Natali [Mon, 31 May 2021 13:23:29 +0000 (21:23 +0800)] 
Fixed parsing of ld.so.cache on new glibc.

Since glibc 2.32, `ld.so.cache` now defaults to the "new" format, instead
of the "compat" format which was in use since glibc 2.2 (around 20 years
ago). It is now the default on e.g. Debian bullseye, and any recent Linux
distribution.

The code change adds support for the "new" format along with the existing
support for the "compat".

Before:
```
root@thinkpad:/home/cf/src/mesos/build# ldconfig -c new
root@thinkpad:/home/cf/src/mesos/build# ./bin/mesos-tests.sh --gtest_filter=*Ld*
[...]
[==========] Running 4 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 1 test from LdcacheTest
[ RUN      ] LdcacheTest.Parse
../../src/tests/ldcache_tests.cpp:43: Failure
cache: Invalid format
[  FAILED  ] LdcacheTest.Parse (0 ms)
[----------] 1 test from LdcacheTest (0 ms total)

[----------] 3 tests from Ldd
[ RUN      ] Ldd.BinSh
../../src/tests/ldd_tests.cpp:43: Failure
cache: Invalid format
[  FAILED  ] Ldd.BinSh (0 ms)
[ RUN      ] Ldd.EmptyCache
[       OK ] Ldd.EmptyCache (1 ms)
[ RUN      ] Ldd.MissingFile
../../src/tests/ldd_tests.cpp:77: Failure
cache: Invalid format
[  FAILED  ] Ldd.MissingFile (0 ms)
[----------] 3 tests from Ldd (1 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 2 test cases ran. (8 ms total)
[  PASSED  ] 1 test.
[  FAILED  ] 3 tests, listed below:
[  FAILED  ] LdcacheTest.Parse
[  FAILED  ] Ldd.BinSh
[  FAILED  ] Ldd.MissingFile

 3 FAILED TESTS
```

After:

```
root@thinkpad:/home/cf/src/mesos/build# ldconfig -c new
root@thinkpad:/home/cf/src/mesos/build# ./bin/mesos-tests.sh --gtest_filter=*Ld*
[...]
[==========] Running 4 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 1 test from LdcacheTest
[ RUN      ] LdcacheTest.Parse
[       OK ] LdcacheTest.Parse (529 ms)
[----------] 1 test from LdcacheTest (529 ms total)

[----------] 3 tests from Ldd
[ RUN      ] Ldd.BinSh
[       OK ] Ldd.BinSh (3 ms)
[ RUN      ] Ldd.EmptyCache
[       OK ] Ldd.EmptyCache (0 ms)
[ RUN      ] Ldd.MissingFile
[       OK ] Ldd.MissingFile (0 ms)
[----------] 3 tests from Ldd (3 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 2 test cases ran. (541 ms total)
[  PASSED  ] 4 tests.
```

This closes #384

16 months agoFixed compilation against Linux 5.9+ capability.h.
Charles-Francois Natali [Sun, 16 May 2021 19:07:03 +0000 (20:07 +0100)] 
Fixed compilation against Linux 5.9+ capability.h.

They should be defined without the "CAP_" prefix to avoid clashing with
the corresponding definitions in `linux/capability.h` - and be
consistent with other capabilities.

16 months agoadd Andreas Peters/AVENTER as contributer
Andreas Peters [Tue, 18 May 2021 06:07:22 +0000 (08:07 +0200)] 
add Andreas Peters/AVENTER as contributer

16 months agoRevert "Fixed read of uninitialized variables detected with UBSAN."
Andrei Sekretenko [Sun, 16 May 2021 07:53:25 +0000 (09:53 +0200)] 
Revert "Fixed read of uninitialized variables detected with UBSAN."

This reverts commit add9f1de771693884548095703ed2bd4bc6cfc16.

As it turns out, fixing this UB this way exposes some other issue
in containerizer tests - for example,
`MesosContainerizerTest.StatusWithContainerID` - which results
in failures to wait for cgroup destruction on some platforms
in the affected tests. Temporarily reverting this pending further
investigation. See https://github.com/apache/mesos/pull/385

16 months agoFixed read of uninitialized variables detected with UBSAN.
Charles-Francois Natali [Sat, 15 May 2021 11:39:18 +0000 (12:39 +0100)] 
Fixed read of uninitialized variables detected with UBSAN.

17 months agofix(docs): corrects common typos in project documentation
plan-do-break-fix [Sat, 24 Apr 2021 20:09:25 +0000 (15:09 -0500)] 
fix(docs): corrects common typos in project documentation

17 months agoFixed a bug preventing agent recovery when executor GC is interrupted.
Charles-Francois Natali [Sat, 30 Jan 2021 18:41:37 +0000 (18:41 +0000)] 
Fixed a bug preventing agent recovery when executor GC is interrupted.

If the agent is interrupted after garbage collecting the executor's
latest run meta directory but before garbage collecting the top-level
executor meta directory, the "latest" symlink will dangle, which would
cause the agent executor recovery to fail.
Instead, we can simply ignore if the "latest" symlink dangles, since
it's always created after the latest run directory it points to, and
never deleted until the top-level executor meta directory is garbage
collected.

19 months agoFixed NNP isolator test on systems with POSIX-compliant /bin/sh.
Charles-Francois Natali [Sun, 31 Jan 2021 10:08:36 +0000 (10:08 +0000)] 
Fixed NNP isolator test on systems with POSIX-compliant /bin/sh.

The test used some non-POSIX features such as arrays when parsing
/proc/self/status, which breaks on systems where /bin/sh is
POSIX-compliant, e.g. on Debian which uses dash.

20 months agoFix flaky MasterTest.NonCheckpointingFrameworkAgentDisconnectionExecutorOnly.
Charles-Francois Natali [Thu, 28 Jan 2021 21:08:01 +0000 (16:08 -0500)] 
Fix flaky MasterTest.NonCheckpointingFrameworkAgentDisconnectionExecutorOnly.

After stopping the driver, we need to wait for the events to reach the
master before checking that the framework has been removed.

This closes #379

20 months agoAdded CAP_PERFMON, CAP_BPF and CAP_CHECKPOINT_RESTORE support.
Charles-Francois Natali [Sun, 24 Jan 2021 17:15:34 +0000 (17:15 +0000)] 
Added CAP_PERFMON, CAP_BPF and CAP_CHECKPOINT_RESTORE support.

Part of fix for #10203.

20 months agoFixed agent crash when kernel supports unknown capabilities.
Charles-Francois Natali [Sat, 23 Jan 2021 18:51:57 +0000 (18:51 +0000)] 
Fixed agent crash when kernel supports unknown capabilities.

When capabilities are enabled, the capabilities initialisation code
would check that /proc/sys/kernel/cap_last_cap is less than
MAX_CAPABILITY, i.e. that the kernel doesn't support any capability the
code doesn't expect.  However the error message attempted to format
cap_last_cap value as a Capability enum, which would crash.

Part of fix for #10203.

20 months agoCherry picked grpc patches fixing glibc 2.30 symbol conflict.
Omer Ozarslan [Sat, 16 Jan 2021 05:45:30 +0000 (23:45 -0600)] 
Cherry picked grpc patches fixing glibc 2.30 symbol conflict.

Function name gettid() causes linking error with glibc 2.30. Below two
commits from the upstream is cherry-picked to fix this issue in the
bundle.

- https://github.com/grpc/grpc/commit/de6255941a5e1c2fb2d50e57f84e38c09f45023d
- https://github.com/grpc/grpc/commit/57586a1ca7f17b1916aed3dea4ff8de872dbf853

20 months agoAdded support for multiple event loop / IO threads for libev.
Benjamin Mahler [Tue, 12 Jan 2021 18:17:53 +0000 (13:17 -0500)] 
Added support for multiple event loop / IO threads for libev.

The current approach to I/O in libprocess, with a single thread
performing all of the the I/O polling and I/O syscalls, cannot keep
up with the I/O load on massive scale mesos clusters (which use
libev rather than libevent).

This adds support via a LIBPROCESS_LIBEV_NUM_IO_THREADS env variable
for configuring the number of threads running libev event loops,
which allows users to spread the IO load across multiple threads.

Review: https://reviews.apache.org/r/73136

20 months agoAvoid using the libev default loop.
Benjamin Mahler [Tue, 12 Jan 2021 19:05:26 +0000 (14:05 -0500)] 
Avoid using the libev default loop.

The default loop in libev supports child process events, which we
don't want and explicitly ignore. We can just use dynamic loops
instead which allows us to properly destroy the loop when libprocess
finalizes.

This also helps minimize the changes needed once we add support for
multiple loops, since those need to be dynamic loops as well.

Review: https://reviews.apache.org/r/73137

20 months agoRemoved unused io watcher queue from libev.cpp.
Benjamin Mahler [Tue, 5 Jan 2021 20:12:30 +0000 (15:12 -0500)] 
Removed unused io watcher queue from libev.cpp.

The TODO removed in this change was implemented long ago, and hence
the io watchers queue is no longer used.

Review: https://reviews.apache.org/r/73135

20 months agoFixed agent reregistration and marking as unreachable race.
Ilya Pronin [Tue, 12 Jan 2021 23:09:21 +0000 (18:09 -0500)] 
Fixed agent reregistration and marking as unreachable race.

During master failover if agent reregistration runs concurrently with
marking the agent as unreachable and finishes before the MarkUnreachable
operation is complete, the assertion that the agent is in the recovered
set in Master::_markUnreachable() doesn't hold. The reason for this is
because after readmitting the agent the master removes it from the
recovered set in Master::__reregisterSlave().

We can fix this by ignoring agent reregistration requests while a
marking unreachable operation is in progress, similarly to how we do it
for marking gone. Once the marking operation is complete, the agent will
be able to reregister as usual.

Review: https://reviews.apache.org/r/73131/

21 months agoUpdate ProcessTest.Remote
Abdul Qadeer [Fri, 4 Dec 2020 00:43:06 +0000 (16:43 -0800)] 
Update ProcessTest.Remote

21 months agoAdd recipient address in Host header field
Abdul Qadeer [Mon, 23 Nov 2020 22:30:55 +0000 (14:30 -0800)] 
Add recipient address in Host header field

22 months agoUpdated `building.md` for the 1.11.0 release.
Andrei Sekretenko [Tue, 24 Nov 2020 15:40:17 +0000 (16:40 +0100)] 
Updated `building.md` for the 1.11.0 release.

22 months agoMade the `latest_stable` in the releases list point to 1.11.0.
Andrei Sekretenko [Tue, 24 Nov 2020 15:36:16 +0000 (16:36 +0100)] 
Made the `latest_stable` in the releases list point to 1.11.0.

22 months agoLinked the 1.10.0 release blogpost to the release list.
Andrei Sekretenko [Tue, 24 Nov 2020 15:30:51 +0000 (16:30 +0100)] 
Linked the 1.10.0 release blogpost to the release list.

22 months agoAdded the `--offer_constraints_re2_max*` flags to the documentation.
Andrei Sekretenko [Mon, 16 Nov 2020 15:42:11 +0000 (16:42 +0100)] 
Added the `--offer_constraints_re2_max*` flags to the documentation.

22 months agoUpdated CHANGELOG for 1.11.0.
Andrei Sekretenko [Fri, 13 Nov 2020 21:21:52 +0000 (22:21 +0100)] 
Updated CHANGELOG for 1.11.0.

22 months agoAdd aventer to the company list
Andreas Peters [Sat, 7 Nov 2020 14:45:11 +0000 (15:45 +0100)] 
Add aventer to the company list

22 months agoAdd mishmash io to powered-by-mesos.md
ntsvetanov [Thu, 8 Oct 2020 12:14:36 +0000 (15:14 +0300)] 
Add mishmash io to powered-by-mesos.md

Add mishmash io to powered-by-mesos.md

22 months agoDocumented setting offer constraints via the scheduler API.
Andrei Sekretenko [Tue, 3 Nov 2020 20:19:54 +0000 (21:19 +0100)] 
Documented setting offer constraints via the scheduler API.

Review: https://reviews.apache.org/r/73004

23 months agoAdded validation that offer constraints are set only for existing roles.
Andrei Sekretenko [Mon, 12 Oct 2020 20:06:51 +0000 (22:06 +0200)] 
Added validation that offer constraints are set only for existing roles.

This patch makes SUBSCRIBE/UPDATE_FRAMEWORK calls validate that
the framework does not specify offer constraints for roles to which
it is not going to be subscribed.

Review: https://reviews.apache.org/r/72956

23 months agoConsolidated creation and validation of `allocator::Framework` options.
Andrei Sekretenko [Mon, 12 Oct 2020 18:47:01 +0000 (20:47 +0200)] 
Consolidated creation and validation of `allocator::Framework` options.

This merges three near-identical pieces of scattered code in SUBSCRIBE
and UPDATE_FRAMEWORK execution paths in the Master that validate
and construct parts of `allocator::FrameworkOptions` (the set of
suppressed roles and the offer constraints filter) into a single
function.

This is a prerequisite to adding validation of offer constraint roles.

Review: https://reviews.apache.org/r/72955

23 months agoMoved failover timeout validation to stateless FrameworkInfo validation.
Andrei Sekretenko [Wed, 14 Oct 2020 15:46:46 +0000 (17:46 +0200)] 
Moved failover timeout validation to stateless FrameworkInfo validation.

This turns the validation of the failover timeout in `FrameworkInfo`
into  a part of `validation::framework::validate()` that performs
all the other validations that depend on `FrameworkInfo` only.

Review: https://reviews.apache.org/r/72964

23 months agoUpdated Postgres URL in CentOS 6 Dockerfile.
Benjamin Mahler [Mon, 26 Oct 2020 20:58:35 +0000 (16:58 -0400)] 
Updated Postgres URL in CentOS 6 Dockerfile.

The link was pointing to an rpm package that has since been
replaced on the upstream server.

23 months agoAdded autoreconf version logging to boostrap script.
Benjamin Mahler [Mon, 26 Oct 2020 20:58:23 +0000 (16:58 -0400)] 
Added autoreconf version logging to boostrap script.

23 months agoFixed javadoc error `type arguments not allowed here`.
Andrei Sekretenko [Wed, 21 Oct 2020 19:52:49 +0000 (21:52 +0200)] 
Fixed javadoc error `type arguments not allowed here`.

This fixes a javadoc build failure introduced by
c28fd3a93e0d9d9a868aec2380abd1dd338304ef that has been occurring
on platforms that use older versions of javadoc.

23 months agoAdded 1.10.0 release blog post.
Benjamin Mahler [Tue, 20 Oct 2020 19:06:43 +0000 (15:06 -0400)] 
Added 1.10.0 release blog post.

23 months agoFixed the tests warning messages.
Dong Zhu [Fri, 16 Oct 2020 17:31:54 +0000 (13:31 -0400)] 
Fixed the tests warning messages.

Remove unnecessary codes which lead to the following warnning
messages while performing tests:

...mesos/build/src/colors.sh: No such file or directory
...mesos/build/src/atexit.sh: No such file or directory

Review: https://reviews.apache.org/r/72709/

23 months agoUpdated Mesos version to 1.12.0.
Andrei Sekretenko [Thu, 15 Oct 2020 14:19:46 +0000 (16:19 +0200)] 
Updated Mesos version to 1.12.0.

23 months agoAdded suppressed roles to `allocator/offer_constraints_debug` endpoint.
Andrei Sekretenko [Mon, 12 Oct 2020 13:42:50 +0000 (15:42 +0200)] 
Added suppressed roles to `allocator/offer_constraints_debug` endpoint.

This simplifies debugging frameworks that use offer constraints.

For example, in case a framework is not receiving offers for a role,
getting a suppressed roles snapshot simultaneously with agent
filtering results helps to figure out whether the framework is
mis-specifying offer constraints, or just wrongly suppresses a role.

Review: https://reviews.apache.org/r/72954

23 months agoMade the offer constraints filter non-optional inside the allocator.
Andrei Sekretenko [Mon, 12 Oct 2020 13:42:50 +0000 (15:42 +0200)] 
Made the offer constraints filter non-optional inside the allocator.

Now that the master always constructs an offer constraints filter
for a framework (potentially a no-op one) and always passes this filter
into an allocator, storing the filter as an `Option` inside the
hierarchical allocator is no longer necessary.

Review: https://reviews.apache.org/r/72953

23 months agoIgnored the directoy `/dev/nvidia-caps` when globing Nvidia GPU devices.
Qian Zhang [Sat, 10 Oct 2020 07:04:57 +0000 (15:04 +0800)] 
Ignored the directoy `/dev/nvidia-caps` when globing Nvidia GPU devices.

The directory `/dev/nvidia-caps` was introduced in CUDA 11.0, just
ignore it since we only care about the Nvidia GPU device files.

Review: https://reviews.apache.org/r/72945

23 months agoCorrected the example of the managed CSI plugin.
Qian Zhang [Tue, 13 Oct 2020 01:58:44 +0000 (09:58 +0800)] 
Corrected the example of the managed CSI plugin.

Review: https://reviews.apache.org/r/72846

23 months agoAdded doc for the `volume/csi` isolator.
Qian Zhang [Wed, 9 Sep 2020 02:26:52 +0000 (10:26 +0800)] 
Added doc for the `volume/csi` isolator.

Review: https://reviews.apache.org/r/72845

23 months agoRe-added the obsolete `updateFramework` signature into libmesos-java.so.
Andrei Sekretenko [Thu, 1 Oct 2020 12:06:59 +0000 (14:06 +0200)] 
Re-added the obsolete `updateFramework` signature into libmesos-java.so.

This patch converts the implementation of the obsolete 2-parameter
`SchedulerDriver.updateFramework(frameworkInfo, suppressedRoles)`
from a wrapper around the new signature back into a JNI method
that 930c7e98d17e71192dae1d49b4b2217cc2dbd8b2 attempted to remove.

This is needed to keep compatibility between older versions of
`mesos.jar` and newer versions of `libmesos-java.so`.

Review: https://reviews.apache.org/r/72922

2 years agoInferred CSI volume's `readonly` field from volume mode.
Qian Zhang [Sat, 19 Sep 2020 03:11:04 +0000 (11:11 +0800)] 
Inferred CSI volume's `readonly` field from volume mode.

Review: https://reviews.apache.org/r/72888