Skip to content

Commit

Permalink
Deployed ded618c with MkDocs version: 1.5.3
Browse files Browse the repository at this point in the history
  • Loading branch information
cartalla committed May 8, 2024
1 parent bf6bf77 commit 0f412f5
Show file tree
Hide file tree
Showing 17 changed files with 163 additions and 163 deletions.
14 changes: 7 additions & 7 deletions CONTRIBUTING/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -133,12 +133,12 @@
</div></div>
<div class="col-md-9" role="main">

<h1 id="contributing-guidelines">Contributing Guidelines<a class="headerlink" href="#contributing-guidelines" title="Permanent link"></a></h1>
<h1 id="contributing-guidelines">Contributing Guidelines</h1>
<p>Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional
documentation, we greatly value feedback and contributions from our community.</p>
<p>Please read through this document before submitting any issues or pull requests to ensure we have all the necessary
information to effectively respond to your bug report or contribution.</p>
<h2 id="reporting-bugsfeature-requests">Reporting Bugs/Feature Requests<a class="headerlink" href="#reporting-bugsfeature-requests" title="Permanent link"></a></h2>
<h2 id="reporting-bugsfeature-requests">Reporting Bugs/Feature Requests</h2>
<p>We welcome you to use the GitHub issue tracker to report bugs or suggest features.</p>
<p>When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already
reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:</p>
Expand All @@ -148,7 +148,7 @@ <h2 id="reporting-bugsfeature-requests">Reporting Bugs/Feature Requests<a class=
<li>Any modifications you've made relevant to the bug</li>
<li>Anything unusual about your environment or deployment</li>
</ul>
<h2 id="contributing-via-pull-requests">Contributing via Pull Requests<a class="headerlink" href="#contributing-via-pull-requests" title="Permanent link"></a></h2>
<h2 id="contributing-via-pull-requests">Contributing via Pull Requests</h2>
<p>Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:</p>
<ol>
<li>You are working against the latest source on the <em>main</em> branch.</li>
Expand All @@ -166,15 +166,15 @@ <h2 id="contributing-via-pull-requests">Contributing via Pull Requests<a class="
</ol>
<p>GitHub provides additional document on <a href="https://help.github.com/articles/fork-a-repo/">forking a repository</a> and
<a href="https://help.github.com/articles/creating-a-pull-request/">creating a pull request</a>.</p>
<h2 id="finding-contributions-to-work-on">Finding contributions to work on<a class="headerlink" href="#finding-contributions-to-work-on" title="Permanent link"></a></h2>
<h2 id="finding-contributions-to-work-on">Finding contributions to work on</h2>
<p>Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start.</p>
<h2 id="code-of-conduct">Code of Conduct<a class="headerlink" href="#code-of-conduct" title="Permanent link"></a></h2>
<h2 id="code-of-conduct">Code of Conduct</h2>
<p>This project has adopted the <a href="https://aws.github.io/code-of-conduct">Amazon Open Source Code of Conduct</a>.
For more information see the <a href="https://aws.github.io/code-of-conduct-faq">Code of Conduct FAQ</a> or contact
[email protected] with any additional questions or comments.</p>
<h2 id="security-issue-notifications">Security issue notifications<a class="headerlink" href="#security-issue-notifications" title="Permanent link"></a></h2>
<h2 id="security-issue-notifications">Security issue notifications</h2>
<p>If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our <a href="http://aws.amazon.com/security/vulnerability-reporting/">vulnerability reporting page</a>. Please do <strong>not</strong> create a public github issue.</p>
<h2 id="licensing">Licensing<a class="headerlink" href="#licensing" title="Permanent link"></a></h2>
<h2 id="licensing">Licensing</h2>
<p>See the <a href="LICENSE">LICENSE</a> file for our project's licensing. We will ask you to confirm the licensing of your contribution.</p></div>
</div>
</div>
Expand Down
176 changes: 88 additions & 88 deletions config/index.html

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions custom-amis/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@
</div></div>
<div class="col-md-9" role="main">

<h1 id="custom-amis-for-parallelcluster">Custom AMIs for ParallelCluster<a class="headerlink" href="#custom-amis-for-parallelcluster" title="Permanent link"></a></h1>
<h1 id="custom-amis-for-parallelcluster">Custom AMIs for ParallelCluster</h1>
<p>ParallelCluster supports <a href="https://docs.aws.amazon.com/parallelcluster/latest/ug/building-custom-ami-v3.html">building custom ParallelCluster AMIs for the head and compute nodes</a>. You can specify a custom AMI for the entire cluster (head and compute nodes) and you can also specify a custom AMI for just the compute nodes.
By default, ParallelCluster will use pre-built AMIs for the OS that you select.
The exception is Rocky 8 and 9, for which ParallelCluster does not provide pre-built AMIs.
Expand Down Expand Up @@ -161,13 +161,13 @@ <h1 id="custom-amis-for-parallelcluster">Custom AMIs for ParallelCluster<a class
There is currently a bug where the stack deletion will fail.
This doesn't mean that the AMI build failed.
Simply select the stack and delete it manually and it should successfully delete.</p>
<h2 id="fpga-developer-ami">FPGA Developer AMI<a class="headerlink" href="#fpga-developer-ami" title="Permanent link"></a></h2>
<h2 id="fpga-developer-ami">FPGA Developer AMI</h2>
<p>The build file with <strong>fpga</strong> in the name is based on the FPGS Developer AMI.
The FPGA Developer AMI has the Xilinx Vivado tools that can be used free of additional
charges when run on AWS EC2 instances to develop FPGA images that can be run on AWS F1 instances.</p>
<p>First subscribe to the FPGA developer AMI in the <a href="https://us-east-1.console.aws.amazon.com/marketplace/home?region=us-east-1#/landing">AWS Marketplace</a>.
There are 2 versions, one for <a href="https://aws.amazon.com/marketplace/pp/prodview-gimv3gqbpe57k?ref=cns_1clkPro">CentOS 7</a> and the other for <a href="https://aws.amazon.com/marketplace/pp/prodview-iehshpgi7hcjg?ref=cns_1clkPro">Amazon Linux 2</a>.</p>
<h2 id="deploy-or-update-the-cluster">Deploy or update the Cluster<a class="headerlink" href="#deploy-or-update-the-cluster" title="Permanent link"></a></h2>
<h2 id="deploy-or-update-the-cluster">Deploy or update the Cluster</h2>
<p>After the AMI is built, add it to the config and create or update your cluster to use the AMI.
You can set the AMI for the compute and head nodes using <strong>slurm/ParallelClusterConfig/Os/CustomAmi</strong> and for the compute nodes only using <strong>slurm/ParallelClusterConfig/ComputeNodeAmi</strong>.</p>
<p><strong>Note</strong>: You cannot update the OS of the cluster or the AMI of the head node. If they need to change then you will need to create a new cluster.</p>
Expand Down
12 changes: 6 additions & 6 deletions debug/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -139,9 +139,9 @@
</div></div>
<div class="col-md-9" role="main">

<h1 id="debug">Debug<a class="headerlink" href="#debug" title="Permanent link"></a></h1>
<h1 id="debug">Debug</h1>
<p>For ParallelCluster and Slurm issues, refer to the official <a href="https://docs.aws.amazon.com/parallelcluster/latest/ug/troubleshooting-v3.html">AWS ParallelCluster Troubleshooting documentation</a>.</p>
<h2 id="slurm-head-node">Slurm Head Node<a class="headerlink" href="#slurm-head-node" title="Permanent link"></a></h2>
<h2 id="slurm-head-node">Slurm Head Node</h2>
<p>If slurm commands hang, then it's likely a problem with the Slurm controller.</p>
<p>Connect to the head node from the EC2 console using SSM Manager or ssh and switch to the root user.</p>
<p><code>sudo su</code></p>
Expand All @@ -160,14 +160,14 @@ <h2 id="slurm-head-node">Slurm Head Node<a class="headerlink" href="#slurm-head-
<p>Then you can run slurmctld:</p>
<pre><code>$slurmctld -D -vvvvv
</code></pre>
<h2 id="compute-nodes">Compute Nodes<a class="headerlink" href="#compute-nodes" title="Permanent link"></a></h2>
<h2 id="compute-nodes">Compute Nodes</h2>
<p>If there are problems with the compute nodes, connect to them using SSM Manager.</p>
<p>Check for cloud-init errors the same way as for the slurmctl instance.
The compute nodes do not run ansible; their AMIs are configured using ansible.</p>
<p>Also check the <code>slurmd.log</code>.</p>
<p>Check that the slurm daemon is running.</p>
<p><code>systemctl status slurmd</code></p>
<h3 id="log-files">Log Files<a class="headerlink" href="#log-files" title="Permanent link"></a></h3>
<h3 id="log-files">Log Files</h3>
<table>
<thead>
<tr>
Expand All @@ -182,11 +182,11 @@ <h3 id="log-files">Log Files<a class="headerlink" href="#log-files" title="Perma
</tr>
</tbody>
</table>
<h2 id="job-stuck-in-pending-state">Job Stuck in Pending State<a class="headerlink" href="#job-stuck-in-pending-state" title="Permanent link"></a></h2>
<h2 id="job-stuck-in-pending-state">Job Stuck in Pending State</h2>
<p>You can use scontrol to get detailed information about a job.</p>
<pre><code>scontrol show job *jobid*
</code></pre>
<h2 id="job-stuck-in-completing-state">Job Stuck in Completing State<a class="headerlink" href="#job-stuck-in-completing-state" title="Permanent link"></a></h2>
<h2 id="job-stuck-in-completing-state">Job Stuck in Completing State</h2>
<p>When a node starts it reports it's number of cores and free memory to the controller.
If the memory is less than in slurm_node.conf then the controller will mark the node
as invalid.
Expand Down
2 changes: 1 addition & 1 deletion delete-cluster/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@
</div></div>
<div class="col-md-9" role="main">

<h1 id="delete-cluster">Delete Cluster<a class="headerlink" href="#delete-cluster" title="Permanent link"></a></h1>
<h1 id="delete-cluster">Delete Cluster</h1>
<p>To delete the cluster all you need to do is delete the configuration CloudFormation stack.
This will delete the ParallelCluster cluster and all of the configuration resources.</p>
<p>If you specified RESEnvironmentName then it will also deconfigure the creation of <code>users_groups.json</code> and also deconfigure the VDI
Expand Down
20 changes: 10 additions & 10 deletions deploy-parallel-cluster/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -155,36 +155,36 @@
</div></div>
<div class="col-md-9" role="main">

<h1 id="deploy-aws-parallelcluster">Deploy AWS ParallelCluster<a class="headerlink" href="#deploy-aws-parallelcluster" title="Permanent link"></a></h1>
<h1 id="deploy-aws-parallelcluster">Deploy AWS ParallelCluster</h1>
<p>A ParallelCluster configuration will be generated and used to create a ParallelCluster slurm cluster.
The first supported ParallelCluster version is 3.6.0.
Version 3.7.0 is the recommended minimum version because it supports compute node weighting that is proportional to instance type
cost so that the least expensive instance types that meet job requirements are used.
The current latest version is 3.8.0.</p>
<h2 id="prerequisites">Prerequisites<a class="headerlink" href="#prerequisites" title="Permanent link"></a></h2>
<h2 id="prerequisites">Prerequisites</h2>
<p>See <a href="../deployment-prerequisites/">Deployment Prerequisites</a> page.</p>
<h3 id="create-parallelcluster-ui-optional-but-recommended">Create ParallelCluster UI (optional but recommended)<a class="headerlink" href="#create-parallelcluster-ui-optional-but-recommended" title="Permanent link"></a></h3>
<h3 id="create-parallelcluster-ui-optional-but-recommended">Create ParallelCluster UI (optional but recommended)</h3>
<p>It is highly recommended to create a ParallelCluster UI to manage your ParallelCluster clusters.
A different UI is required for each version of ParallelCluster that you are using.
The versions are list in the <a href="https://docs.aws.amazon.com/parallelcluster/latest/ug/document_history.html">ParallelCluster Release Notes</a>.
The minimum required version is 3.6.0 which adds support for RHEL 8 and increases the number of allows queues and compute resources.
The suggested version is at least 3.7.0 because it adds configurable compute node weights which we use to prioritize the selection of
compute nodes by their cost.</p>
<p>The instructions are in the <a href="https://docs.aws.amazon.com/parallelcluster/latest/ug/install-pcui-v3.html">ParallelCluster User Guide</a>.</p>
<h3 id="create-parallelcluster-slurm-database">Create ParallelCluster Slurm Database<a class="headerlink" href="#create-parallelcluster-slurm-database" title="Permanent link"></a></h3>
<h3 id="create-parallelcluster-slurm-database">Create ParallelCluster Slurm Database</h3>
<p>The Slurm Database is required for configuring Slurm accounts, users, groups, and fair share scheduling.
It you need these and other features then you will need to create a ParallelCluster Slurm Database.
You do not need to create a new database for each cluster; multiple clusters can share the same database.
Follow the directions in this <a href="https://docs.aws.amazon.com/parallelcluster/latest/ug/tutorials_07_slurm-accounting-v3.html#slurm-accounting-db-stack-v3">ParallelCluster tutorial to configure slurm accounting</a>.</p>
<h2 id="create-the-cluster">Create the Cluster<a class="headerlink" href="#create-the-cluster" title="Permanent link"></a></h2>
<h2 id="create-the-cluster">Create the Cluster</h2>
<p>To install the cluster run the install script. You can override some parameters in the config file
with command line arguments, however it is better to specify all of the parameters in the config file.</p>
<pre><code>./install.sh --config-file &lt;config-file&gt; --cdk-cmd create
</code></pre>
<p>This will create the ParallelCuster configuration file, store it in S3, and then use a lambda function to create the cluster.</p>
<p>If you look in CloudFormation you will see 2 new stacks when deployment is finished.
The first is the configuration stack and the second is the cluster.</p>
<h2 id="create-users_groupsjson">Create users_groups.json<a class="headerlink" href="#create-users_groupsjson" title="Permanent link"></a></h2>
<h2 id="create-users_groupsjson">Create users_groups.json</h2>
<p>Before you can use the cluster you must configure the Linux users and groups for the head and compute nodes.
One way to do that would be to join the cluster to your domain.
But joining each compute node to a domain effectively creates a distributed denial of service (DDOS) attack on the demain controller
Expand Down Expand Up @@ -231,7 +231,7 @@ <h2 id="create-users_groupsjson">Create users_groups.json<a class="headerlink" h
<p>Now the cluster is ready to be used by sshing into the head node or a login node, if you configured one.</p>
<p>If you configured extra file systems for the cluster that contain the users' home directories, then they should be able to ssh
in with their own ssh keys.</p>
<h2 id="configure-submission-hosts-to-use-the-cluster">Configure submission hosts to use the cluster<a class="headerlink" href="#configure-submission-hosts-to-use-the-cluster" title="Permanent link"></a></h2>
<h2 id="configure-submission-hosts-to-use-the-cluster">Configure submission hosts to use the cluster</h2>
<p>ParallelCluster was built assuming that users would ssh into the head node or login nodes to execute Slurm commands.
This can be undesirable for a number of reasons.
First, users shouldn't be given ssh access to a critical infrastructure like the cluster head node.
Expand Down Expand Up @@ -266,7 +266,7 @@ <h2 id="configure-submission-hosts-to-use-the-cluster">Configure submission host
It also configures the modulefile that sets up the environment to use the slurm cluster.</p>
<p>The clusters have been configured so that a submission host can use more than one cluster by simply changing the modulefile that is loaded.</p>
<p>On the submission host just open a new shell and load the modulefile for your cluster and you can access Slurm.</p>
<h2 id="customize-the-compute-node-ami">Customize the compute node AMI<a class="headerlink" href="#customize-the-compute-node-ami" title="Permanent link"></a></h2>
<h2 id="customize-the-compute-node-ami">Customize the compute node AMI</h2>
<p>The easiest way to create a custom AMI is to find the default ParallelCluster AMI in the UI.
Create an instance using the AMI and make whatever customizations you require such as installing packages and
configuring users and groups.</p>
Expand All @@ -279,7 +279,7 @@ <h2 id="customize-the-compute-node-ami">Customize the compute node AMI<a class="
ComputeNodeAmi: ami-0fdb972bda05d2932
</code></pre>
<p>Then update your aws-eda-slurm-cluster stack by running the install script again.</p>
<h2 id="run-your-first-job">Run Your First Job<a class="headerlink" href="#run-your-first-job" title="Permanent link"></a></h2>
<h2 id="run-your-first-job">Run Your First Job</h2>
<p>Run the following command in a shell to configure your environment to use your slurm cluster.</p>
<pre><code>module load {{ClusterName}}
</code></pre>
Expand All @@ -292,7 +292,7 @@ <h2 id="run-your-first-job">Run Your First Job<a class="headerlink" href="#run-y
<p>To open an interactive shell on a slurm node.</p>
<pre><code>srun --pty /bin/bash
</code></pre>
<h2 id="slurm-documentation">Slurm Documentation<a class="headerlink" href="#slurm-documentation" title="Permanent link"></a></h2>
<h2 id="slurm-documentation">Slurm Documentation</h2>
<p><a href="https://slurm.schedmd.com">https://slurm.schedmd.com</a></p></div>
</div>
</div>
Expand Down
Loading

0 comments on commit 0f412f5

Please sign in to comment.