Knowledge gained from Projects

CATcher:

ARIF KHALID
MISRA ADITYA
NEREUS NG WEI BIN
NGUYEN KHOI NGUYEN

MarkBind:

EYO KAI WEN, KEVIN
LAM JIU FONG
WANG JINGTING
WANG YIWEN
XU SHUYAO

RepoSense:

ALVIS NG
GEORGE TAY QUAN YAO
JONAS ONG SI WEI
POON YIP HANG, RYAN

TEAMMATES:

CHING MING YUAN
DOMINIC BERZIN CHUA WAY GIN
TYE JIA JUN, MARQUES
XENOS FIORENZO ANONG
YEO DI SHENG
ZHU YUANXI

CATcher

ARIF KHALID

Angular

In order to work on CATcher and WATcher, I had to learn how to use Angular. With a background in react, it was a difficult transition due to the added checks and strict nature of Angular.
Below are a few of my learning points:

Components:
- Each component consists of 4 different files, each of them critical to know. Logic can be contained in either typescript of html component files and you initialise other components through the HTML rather than the typescript file
- Components also have a module file which is where its dependencies are stated, i.e., the other components, services, modules it depends on
Services:
- Each service is like a component but without anything to display. They perform functions that could be contained within components but are extracted out to increase modularity and reusability
- Like components, services can depend on other services and are often injected into components as dependencies
Modules:
- Modules are containers for a dedicated group of files consisting of components, services or other modules
- Each module conventionally contains all the code pertaining to a certain feature
- The root module thus contains all code in the code base, child modules under the root module contain more feature-specific code in a hierarchial structure
- Modules are critical to understand in order to understand the code base and create new features
RxJS
- While not exactly part of angular, it is important in learning angular as they are often if not always used in tandem
- RxJS is a library that allows reactive programming, i.e., the ability to subscribe to changes instead of polling for a change
- This makes it easier to compose asynchronous and cleaner, more optimized code using observer pattern
- Observers are a very useful tool that allows me to react to changes by subscribing to an event. This contributes to cleaner, more optimised and reusable code.
- Pipes allow you to consevutively call functions on the prior function's output, similar to function chaining. This allows us to have cleaner and reusable and more understandable code since you don't need to call functions separately and you can create functions out of a chain of other functions easily.
- Not to be confused with angular pipes which run via the "|" symbol in the html file, allowing you to transform data before it is displayed to the user.

I learned Angular through various Youtube tutorials, Udemy tutorials, reading the documentation and trying out different things through personal test projects venturing into Angular.

Youtube taught me basic fundamentals of Angular.
Udemy taught me more in depth and guided me through small personal projects.
The documentation gave me deeper understanding and insight into details not covered in tutorials

TypeScript

Angular uses TypeScript, so I needed to learn TypeScript. I had only a background in JavaScript while working with React and learning TypeScript had its own difficulties. Below are a few of my learning points:

What and Why TypeScript:
- TypeScript acts as a wrapper over JavaScript, compiling into JavaScript code behind the scenes when you build your project
- The reason people use TypeScript is because of the increased strictness where things have to be statically typed. This reduces the occurences of bugs and makes bugs easier to find when they do occur
- This makes TypeScript an extremely useful language to pick up and is used widely in industry
Types:
- As in the name, typescript has types and almost everything is required to by statically typed
- The "any" type bypassed this requirement but is generally regarded as a bad practice as you have made TypeScript into JavaScript
- You can define your own types and use those types, similar to a typedef in other languages. This is often how objects are passed in TypeScript

I learned TypeScript through Youtube tutorials

Youtube taught me the fundamentals as well as understanding the why and underlying implementation of typescript

Continuous Integration/Continuous Deployment (CI/CD)

As an area I have litte experience in, I wanted to dive into the CI/CD pipeline of CATcher and WATcher, gain an understanding of how it works and contribute to make it better. Below are a few of my learning points:

Automated testing
- With large projects like CATcher and WATcher, there are many areas that can and unavoidably will go wrong with many contributors editing different parts of the code base
- Manual testing is very time consuming when there are so many features to test, any one of which could have been broken by any changes to the code
- Human error might also cause us to miss certain bugs as we simply did not test for them
- Automated testing allows for pre-written tests that perform these checks quickly on a headless browser when making any changes, greatly reducing the occurence of uncaught bugs introduced
- Test case design must be comprehensive in positive and negative cases without testing every specific possible input, instead grouping inputs such as all invalid types given into one test case
Continuous deployment
- With mission-critical projects like CATcher, it is imperative to have automated deployment
- One reason is to maintain stability of the deployment, completely negating human errors such as forgetting any one step in deployment. The deployment made is done the same everytime through an automated process
- Another reason is to speed up development as developers will not need to go through the manual deployment on every release
Github Actions
- Github actions is a very useful CI/CD tool when the code is already hosted on github
- Compared to alternatives, it is much simpler to set up as it is one click away for every github repo, create a workflow yml file and thats it
- There are many pre-defined actions such as actions/checkout that you can use to simplify your dev-ops. In this case you don't need to write your own code to checkout your repository
Angular deployment
- Angular has a package that allows you to build directly into your github pages
- This simplifies the process further since you simply call this command through the github actions for an immediate deployment

I learned CI/CD through inspecting the code base, trying out different workflows in my own repos and youtube tutorials

The code base gave me a guideline as to the proper way and usage of workflows, along with the proper syntax of creating a workflow
Youtube gave me broader knowledge into creating my own workflows not specific to the CATcher project
Trying out creating my own workflows and contributing to WATcher workflows solidified my understanding and gave me confidence in what I learned

Code Quality

Code quality is always important but is especially so when there are so many people working on the same project. Since large portions of WATcher was copied from CATcher, WATcher was made overtly large with a great number of redundant code. It was very poor code quality and the importance of code quality was made clear.

Code Cleanliness
- Redundant code clutters the code base, making it especially hard to understand certain functionality since you have to sift through so much to find what you are looking for
- As a new developer, it created an unecessarily difficult experience getting a grasp of the code base
- Over reliance on comments also clutters the code base when code should be self-explanatory
- Over three levels of indentation should be avoided, at which point the code is made very hard to understand and inner indents should be refactored into separate functions
Code simplicity (KISS)
- There are many ways to do the same thing and it is always best to Keep It Simple
- Always use the simplest way to come to the same outcome, even if they use unecessary variables
- Variables and functions should be aptly named so they are understood readily such as a filteredData variable for storing data after it has been filtered
- Since code is read more than it is written, keeping it simple allows future developers, even yourself to understand the purpose and reason behind any piece of code
Documentation
- Documentation is important to help others understand parts of the code that are not immediately apparent
- However, it is important to not rely too heavily on documentation and wherever possible, code you write should be self-explanatory
- Instead of writing a one-liner that does everything, split logically linked portions into separate parts, using different functions or storing outputs in appropriately named variables
Following coding style
- Assuming you are not the originator of the project, you need to follow the coding style of the project as well
- Since there are always multiple ways of doing the same thing, it is often arguable which way is the best. When joining an already established project, it is critical to follow the coding style of your predecessors
- An example would be returning a complete object instead of a part of an object and appending to a newly created object in the parent function. Both accomplish the same thing and arguably are equally understandable

I learned about code quality through analysing the responses of seniors to my own pull requests as well as other's pull requests, supplementing my knowledge by reading articles on code quality both generally and specific to web development

Inspection of pull requests gave me understanding of what is good quality code and what is considered bad along with the reasoning behind those decisions
Articles online provided me with more general guidelines pertaining to code quality in large projects, helping fill in the gaps that I didnt encounter in PR reviews

Testing

Testing is another important part of any project as it reduces the occurrence of major errors and bugs throughout development. With little prior experience in testing, I sought to learn more about it and apply it in WATcher.

Jasmine
- A testing framework for javascript
- Clean and intuitive syntax
- Suite of functionality developed over many years
- I learned Jasmine through looking through test cases in CATcher and WATcher, along with reading its official documentation
  - describe(string, function) houses related specs labeled by string and defined as function
  - it(string, function) defines a spec labeled by string and defined as function
  - expect(any).toBeFalse defines an expectation of any. There are a large number of matchers for any possible comparison
  - beforeEach(function) defines a function to be called before each of the specs in this describe block
  - createSpyObj(string, string[]) creates a spy object that acts as a stub for classes that are depended on by what is being tested. Spies can track calls to it and all arguments
Test case design
- Boundary Value Analysis and equivalence partitioning
  - Boundary value analysis is a technique where tests are designed to include representatives of boundary values in a range
  - Equivalence partitioning is a technique where input data is partitioned into units of equivalent data for which tests can be written
  - These techniques allow for a smaller number of tests to be written, for essentially the same amount of coverage
    - This is because inputs which would fail/pass for the same reason, such as being an input of an invalid type, are grouped as a single or only a few test cases.
    - The alternative would be to create tests for each input type in this example, straining developer resources for not much benefit
- Testing for behaviour
  - A common mistake is to test for implementation rather than behaviour
  - This would result in failed test cases when implementation changes even though the resulting behaviour, what the user would experience, remains the same
  - Test cases should test for what the result is versus what the implementation is
  - An example would be testing whether a variable changes in component A correctly vs testing what other components receive from component A after the change
  - A developer might edit the implementation of component A so the variable no longer changes, however the accurate behaviour of emission to other components remains the same and the test cases should not fail
- Testing coverage
  - Test coverage is how much of the code has actually been ran through during testing
    - Function/method coverage : based on functions executed e.g., testing executed 90 out of 100 functions
    - Statement coverage : based on the number of lines of code executed e.g., testing executed 23k out of 25k LOC
    - Decision/branch coverage : based on the decision points exercised e.g., an if statement evaluated to both true and false with separate test cases during testing is considered 'covered'
    - Condition coverage : based on the boolean sub-expressions, each evaluated to both true and false with different test cases
  - A good future implementation would be to implement code coverage as a github action report when making pull requests to main
  - At the very least, all public functions of a class should be uniquely tested in order to verify behaviour seen by other components I learned about testing web applications through Nereus, reading Jasmine documentation, articles and youtube videos about testing and the CS2113 website
Nereus imparted knowledge of testing which helped me understand the core fundamentals, allowing me to more quickly pick up the technique as I learnt, especially the test case implementation
The Jasmine documentation gave me confidence in creating my own test cases for unique behaviour such as changing routes in testing
Youtube videos, articles and the CS2113 website helped me to understand and implement test case design techniques to create comprehensive and well designed test cases

MISRA ADITYA

Angular

Underpinning the development of CATcher and WATcher, it was of paramount importance to understand the nuances of the Angular framework. This presented a challenge, transitioning from ReactJs - a framework I was comfortable with. The structure of Angular, contrasted with React's flexibility, necessitated a deep and rigorous engagement with Angular's ecosystem.

Transitioning from ReactJs:
- I was initially struck by Angular's comprehensive framework, which unlike ReactJs's straightforward library-based approach, provides a full suite of development tools. Angular mandates a structured environment that rigorously applies TypeScript for static typing, modules for encapsulation, and injectors for dependency management, ensuring robust, scalable applications. This all-inclusive nature required adapting to a relatively complex development environment.
Angular Directives and DOM Manipulation:
- Directives are essentially markers that Angular allows us to attach to elements to influence their behaviour in a specific way.
- These constructs allow for direct DOM manipulation, a capability not native to React. I leveraged structural directives like *ngIf for conditional rendering, and attribute directives to modify behaviors of DOM elements dynamically. This exploration provided practical insights into complex DOM operations without full page reloads, facilitating rich, responsive user interactions.
Form Validation with Angular Validators:
- Linking back to the use of Angular Directives, their coupling with Validators makes for robustness and expressiveness, especially in the case of forms, to allow for proper feedback to be given to the user.
- I learned about the creation and use of custom validators as part of the @angular/forms library, which are a crucial aspect of an application that consists of Form-based components. Angular's form validation is highly robust, integrated deeply with its reactive and template-driven forms, facilitating complex form handling with built-in validators and custom validation functions. In contrast, React forms often require additional libraries for similar levels of validation robustness.
Software Maintenance:
- In a bid to keep the application and its dependencies up-to-date, constant upgrade to newer versions of the tech stack used, is crucial. This also falls in line with our goal to follow best software practices.
- My role extended to maintaining the application's health by upgrading Angular and its ecosystem. This task required a thorough understanding of semantic versioning, dependency conflicts, and the Angular update cycle. React, while flexible, typically requires third-party tools to manage similar tasks, leading to a more hands-on and sometimes fragmented maintenance experience.

Through contributions and an extensive understanding of the codebase, I have attained a certain degree of comfort with Angular as a frontend framework, and will further practice the use of Angular and its features in personal projects.

Docker Integration in WATcher Documentation

Incorporating Docker into the WATcher documentation development process was a strategic move to standardize and streamline the development environment. My involvement with setting up a Dev Container using Docker provided valuable insights into containerization and its impact on development workflows.

Understanding and Implementing Dev Containers:
- Dev Containers provide a consistent, isolated, and replicable development environment for all contributors, regardless of their local setup. This is crucial for eliminating differences observed in working on the development environment on different systems.
- I utilized Docker to encapsulate the build environment into a Docker Image, defined by a Dockerfile. This approach ensures that all dependencies and runtime environments are uniformly configured across different development setups.
Customization and Configuration:
- The use of the VSCode Dev Container as a base allowed for significant customization tailored to the specific needs of the WATcher documentation project. By parameterizing the build process (devcontainer.json), I was able to define and manage configurations such as environment variables, port forwarding, and startup commands efficiently.
- One of the key benefits of implementing Docker was significantly reducing the onboarding time for new developers. By providing a container that includes all necessary dependencies pre-installed and pre-configured, new team members could get up and running with minimal setup.

Working with Docker deepened my understanding of containerization technologies and their role in modern software development. It highlighted the importance of infrastructure as code (IaC) in automating and simplifying development processes. It reinforced best practices in DevOps, particularly in terms of environment standardization.

Linters

As part of maintaining development tools, I worked on migrating the project from using TSLint (which is now deprecated), to ESLint. This helped me understand the true role of linters, and how they are defined for a project.

Understanding Linters:
- As studied in CS2103T, linters are vital tools in modern software development, used for static code analysis to enforce coding standards and identify problematic patterns in the code. My experience with linters deepened during the migration, reinforcing their role in improving code quality, reducing bugs, and maintaining consistency across large codebases.
The Migration Process:
- The migration from TSLint to ESLint involved a strategic review and translation of linting rules to ensure continuity and adherence to our established coding practices. Despite the availability of automated tools like tslint-to-eslint, which attempts to convert configurations, many rules required manual adjustments to align with ESLint’s syntax and capabilities, as well as to not make too many disruptive changes to the codebase.
- This process was meticulous, involving:
  - Rule Assessment: Evaluating each TSLint rule and its impact on our codebase, determining essential rules, and mapping them to their ESLint counterparts.
  - Configuration Translation: Manually configuring ESLint rules where automated tools fell short, ensuring that our new linting setup maintained the integrity and intent of the original rules without compromising on code quality.
  - Testing and Adjustment: Rigorously testing the new ESLint configurations across our projects to identify any discrepancies and adjust rules to better fit our development practices and project specifics.
- ESLint provides comprehensive support for both JavaScript and TypeScript, offering a unified linting solution that reduces complexity and improves analysis accuracy. ESLint’s architecture allows for extensive customization and extension, enabling the integration of plugins that address specific needs such as React-specific linting rules or accessibility checks.

The migration to ESLint has not only streamlined the development environment but also enriched my understanding of effective coding practices.

Jasmine

While developing tests for the ErrorHandlingService and MilestoneService in WATcher, I gained significant insights into Jasmine's powerful features and how they can be leveraged to create thorough, reliable, and maintainable test suites.

Behavior-Driven Development Approach:
- Jasmine's BDD framework encourages a more descriptive and natural language style in writing tests, which aligns well with how software behavior is described in real-world scenarios.
Mocking and Spying:
- Jasmine's mocking and spying capabilities were instrumental in isolating the tests. By creating spy objects for dependencies like MatSnackBar and LoggingService, I could simulate their behavior and assert that they were being called correctly without actually triggering real side effects.
- A spy object was created for GithubService using Jasmine's createSpyObj method, which allowed us to simulate its fetchAllMilestones method without actual HTTP requests. This approach isolates the test environment from external dependencies.
Asynchronous Testing:
- The use of RxJS's of function to return observable sequences makes the method calls predictable and easily testable.
- Jasmine's asynchronous testing capability, demonstrated with the done callback, was crucial. It ensures that tests involving observables only complete after all asynchronous operations have been resolved, providing an accurate assessment of the method's behavior.
Conditional Behavior Testing:
- The handleError() method's conditional logic, which dictates different responses based on the error type, highlighted the importance of comprehensive test paths. I learned to utilize Jasmine's it blocks effectively to specify and isolate each logical branch within the method. This practice ensures that every potential execution path is tested, which is crucial for error handling logic where different types of errors can lead to different user experiences.

This exploration into Jasmine's capabilities not only enhanced my technical skills but also deepened my understanding of strategic test planning and execution in a behavior-driven development context. The experience emphasized the value of detailed, well-structured tests in maintaining high software quality and reliability.

UI/UX Design

A rather inconspicuous but significant learning point, while working on WATcher and CATcher, was UI and UX design. Since the main aim of these applications is to assist students, tutors, professors to understand, contextualise and identify key information with ease, several design decisions had to made from the point of view of how it would most benefit the users.

Some of these included:

#361 Make ItemsPerPage common for all card views
- Implementing a consistent ItemsPerPage filter across different views ensures that users have a predictable and stable interface, improving usability and reducing cognitive load.
#363 Remodel the design of the Filter bar
- Redesigning the filter bar to create a design that is both functionally effective and aesthetically pleasing, requiring a balance between form and function.
#307 Add tool tip for hidden users
- Adding tooltips is a critical aspect of UI design for enhancing user understanding without cluttering the interface, to ensure that they appear in contexts where users most need guidance.
#318 Add sorting by Status
- Understanding of the most logical ways users might want to organize data, enhancing the application's usability.
#337 Add icon for PRs without milestones
- The use of icons to convey information is a staple in UI design, providing visual cues that help users quickly grasp the status of items.

Working on these PRs likely provided a deep dive into the principles of user-centered design, focusing on enhancing the user's journey through intuitive layouts, actionable insights, and consistent behaviors across the application. The challenges often revolved around integrating new features without disrupting the existing user experience, requiring careful consideration of design continuity and user expectations.

NEREUS NG WEI BIN

Angular

Components

Components are the main building blocks for Angular. Each components consists of 3 files:

HTML: Defines the layout of the component's view.
CSS: Defines the component-specific styles.
Typescript: Implements the component's logic and behavior.

Refer to the Angular Documentation for guidelines on creating components.

Attribute Directive

Attribute directives can change the appearance or behavior of DOM elements and Angular components.

For detailed guidance, refer to the Angular Documentation. It provides guidelines on creating and applying attribute directive, covering user events handling and passing values to attribute directive.

In PR #1239, I implemented an attribute directive that listen to click event and will open error snackbar if the target link is an internal link.

`NgTemplateOutlet`

NgTemplateOutlet is a directive in Angular that allows for dynamic rendering of templates. It allows the template to be specified along with the context data that should be injected into it.

I utilized NgTemplateOutlet to dynamically render different headers for the card view component based on current grouping criteria. Refer to CardViewComponenet for implementation details.

Jasmine

Jasmine is a behavior-driven development framewrok specific for JavaScript unit testing.

I primarily learned how to use Jasmine from its documentation. I utilized it extensively while working on WATcher test case refactoring. Some relevant PRs include: PR #241, PR #244, PR #245, PR #246, PR #247

describe: Define a group of spec (suite)
it: Define a single spec.
expect: Create an expectation for a spec.
Class Spy: Mock functions (spies) that can be used to track function calls.

Asynchronous Testing with Observables

When dealing with asynchronous operations like observables, Jasmine provides support through the use of the done function. This allows for effective testing of asynchronous behavior by signaling when a test has completed its execution.

Here's an example from my pull request:

it('should throw error for URL without repo parameter', (done) => {
  const urlWithoutRepo = '/issuesViewer';

  phaseService.setupFromUrl(urlWithoutRepo).subscribe({
    error: (err) => {
      expect(err).toEqual(new Error(ErrorMessageService.invalidUrlMessage()));
      done(); // Signal that the test has completed
    }
  });
});

Resources: Angular — Unit Testing recipes (v2+)

Testing for Behavior

It's essential to test for behavior rather than implementation details. This principle was emphasized by a senior in my pull reqeust. By focusing on behavior, tests become more resilient to changes in the codebase and provide better documentation for how components and functions should be used.

Here's an example that illustrates the difference between testing behavior and implementation:

Context: changeRepositoryIfValid will call changeCurrentRepository if repository is valid.

// Test for behavior
it('should set current repository if repository is valid', async () => {
  githubServiceSpy.isRepositoryPresent.and.returnValue(of(true));

  await phaseService.changeRepositoryIfValid(WATCHER_REPO);

  expect(phaseService.currentRepo).toEqual(WATCHER_REPO);
});

// Test for implementation
it('should call changeRepository method if repository is valid', async () => {
  githubServiceSpy.isRepositoryPresent.and.returnValue(of(true));

  const changeCurrentRepositorySpy = spyOn(phaseService, 'changeCurrentRepository');

  await phaseService.changeRepositoryIfValid(WATCHER_REPO);

  expect(changeCurrentRepositorySpy).toHaveBeenCalledWith(WATCHER_REPO);
});

Design Pattern

Strategy Design Pattern

The Strategy design pattern allows for the selection of algorithms at runtime by defining a family of interchangeable algorithms and encapsulating each one. It enables flexibility and easy extension of functionality without modifying existing code.

I utilized the Strategy Design Pattern to implement a "Group by" feature that organizes issues / prs based on different criteria.

Implementation of group by feature :

Grouping Strategy Interface (GroupingStrategy): Defines a common interface for all supported grouping strategies.
Concrete Grouping Strategy: Each strategy groups the issues/prs based on different criterias.
- AssigneeGroupingStrategy
- MilestoneGroupingStrategy
Context (GroupingContextService): This service is used to apply the grouping strategies based on user selection.

NGUYEN KHOI NGUYEN

Angular

Of course, Angular is the framework used to run CATcher and WATcher, so learning how it works is an essential part of contributing to the projects. These projects are my first experience using Angular.

As I have experienced React.js and Alpine.js, with experience of working in frontend development during my internship, I expected to pick up Angular with ease. However, slightly different from my expectation, the OOP aspect of Angular makes it quite difficult to pick up.

There are a few interesting concepts that I picked up along the way:

Each class is decorated with @Component to mark it as an Angular component. This decorator determines a few important properties of the component, including the query selector, the HTML template and the stylesheets.
Class fields, if used in HTML templates, updates the rendered HTML template if its value is changed. In React.js, this is only possible with a state hook.
Dependencies of a component can be injected from root using factory methods, and does not have to be explicitly instantiated.

The knowledge of how a component is declared allows me to confidently create a new component in WATcher-PR#235, which was the component to show a list of users with 0 PRs and issues.

One interesting thing about Angular is that it provides a few methods that developers can make use of, to reduce the complexity of component class. This knowledge allows me to make WATcher-PR#230, where I directly modified the Angular model used in the HTML template.

RxJS

I initially had a lot of trouble trying to understand the operators in RxJS. Ultimately, I was able to understand how it works, and the differences between different operators on an Observable. I was able to see the similarities between different RxJS operators and Java stream methods.

Observable::pipe allows methods to modify the value within the Observable, notably with map and mergeMap.
Observable::subscribe listens for changes within the Observable.

The knowledge of RxJS operators allow me to modify the underlying processes of the Angular services, and created CATcher-PR#1234, where I set branch for image uploads to main.

One thing to note about RxJS operator is that, Observable pipes are treated as functions, in a sense that they are only called when there is a subscriber. If there are multiple pipes merged into one, each individual pipes are called when there is a subscriber. Consider this example:

Observable a = from(); // some declaration
Observable b = a.pipe(f);
Observable x = b.pipe(g);
Observable y = b.pipe(h);
Observable c = merge(x, y);
c.subscribe();

Notice that c is a merged Observable from pipes of f, g and f, h. So, g and h each are called once, but f is called twice! Imagine if f is a function making multiple API calls to Github.

This knowledge allows me to reduce Github API calls for issues. To get issues from a repository, one must make multiple API calls, each obtaining 100 issues. These API calls are contained within the function f. So instead of splitting the pipe, I refactored to merge g and h and continue the pipe from b:

Observable a = from(); // some declaration
Observable b = a.pipe(f);
Observable c = b.pipe(merge(g, h));
c.subscribe();

MarkBind

EYO KAI WEN, KEVIN

Summary of Key Contributions

Worked on adding Software Project Documentation template to MarkBind, allowing for users to have a starting point for using MarkBind in their project documentation.

Researched into possible integrations of Bun and Bootstrap v5.2 and v5.3 into MarkBind, to determine the value and feasibility of these integrations.

Worked on customizing list icons, such that icons for list items can be customized to apply to the current item only instead of default inheritance to future items.

Worked largely on DevOps side of MarkBind, utilizing GitHub Actions and workflows to handle automation of tasks. These tasks include checking for commit messages in PR descriptions, SEMVER impact labels, reminding adding of new contributors to contributor list.

Researched and implemented the use of DangerJS to automate checking of changes coupling of implementation files to test or documentation files, to ensure that any changes to the repository is properly documented and tested.

Researched into the implementation of automating unassigning of assigned users to issues after a certain period of inactivity.

Researched into common security practices for GitHub Actions, and implementing these practices into the MarkBind repository. These practices are also documented for future contributors to the project.

MarkBind codebase

Learned the underlying workings and how different parts of the codebase are linked together to provide MarkBind's functionalities. From the parser to the renderer, and the different plugins that can be used to extend MarkBind's capabilities. Learned how to implement new features, adding relevant test and documentation to ensure that the codebase is maintainable and modifiable.

GitHub Actions

Learned how GitHub Actions fits into the development workflow, and how to use it to automate tasks. I used the GitHub Actions documentation to learn about the different types of workflows, how to create and configure workflows, and how to use the different actions available.

Resource: GitHub Actions Documentation
Summary: GitHub Actions makes it easy to automate all your software workflows, now with world-class CI/CD. Build, test, and deploy your code right from GitHub. Make code reviews, branch management, and issue triaging work the way you want.
Resource: GitHub Actions Workflow Syntax
Summary: GitHub Actions uses YAML syntax to define the events, jobs, and steps that make up your workflow. You can create a custom workflow or use a starter workflow template to get started.
Resource: GITHUB_TOKEN
Summary: The GITHUB_TOKEN is a GitHub Actions token that is automatically generated by GitHub and used to authenticate in a workflow run. It is scoped to the repository that contains the workflow file, and can be used to perform read and write operations on the repository. It is automatically available to your workflow and does not need to be stored as a secret.

Learned yaml and bash for creation of workflows.

Resource: YAML Syntax
Summary: YAML is a human-readable data serialization standard that can be used in conjunction with all programming languages and is often used to write configuration files. It can also be used in workflows to define the structure of the workflow, including the events, jobs, and steps that make up the workflow.
Resource: Bash Scripting
Summary: Bash is a Unix shell and command language written by Brian Fox for the GNU Project as a free software replacement for the Bourne shell. It has been distributed widely as the shell for the GNU operating system and as a default shell on Linux and OS X.
Resource: Bash Parameter Expansion
Summary: Parameter expansion is a way to manipulate variables in Bash. It is a form of variable substitution that allows for the manipulation of variables in various ways.

Learned how to use other actions in workflows, such as the actions/checkout action to check out a repository onto the runner, allowing subsequent steps to execute operations on the checked-out repository.

Resource: GitHub Marketplace
Summary: The GitHub Marketplace is a collection of actions that can be used in your workflows. You can search for actions by category, language, or other criteria, and use them in your workflows to automate tasks.

Learned how to use DangerJS to aid with workflows.

Resource: DangerJS
Summary: Danger runs during your CI process, and helps with automating common code review chores. This provides another layer of automation over the code review process, ensuring that all changes are properly documented and tested.

When to create new workflows (outside of modifiability) Although keeping multiple jobs within the same workflow file is possible, sometimes it may be better not to. Jobs are run based on event triggers such as pull requests etc, but you must add it to the top. This meant that if you had multiple jobs in the same workflow file, they would all run when the event trigger was activated. If you wanted a trigger to only trigger a specific job, you would need to add a check to exclude all other jobs from that trigger.

Pull request trigger by default has the types opened, synchronize, and reopened.

Testing and debugging workflows This can be done locally with the help of Docker and act.

Benefits of local testing: Fast Feedback - Avoid commit/push every time you want to test out the changes.

Resource: Act Usage
Summary: Act reads in your GitHub Actions from .github/workflows/ and determines the set of actions that need to be run. It uses the Docker API to either pull or build the necessary images, as defined in your workflow files and finally determines the execution path based on the dependencies that were defined. Once it has the execution path, it then uses the Docker API to run containers for each action based on the images prepared earlier.
Resource: Docker Docs
Summary:

Steps (PR):

Download act and Docker.
Start up Docker daemon.
Create a JSON file with the appropriate PR file structure (can use python script to generate it).
Run act pull_request -j specific-job -e pr-event.json to run a specific job on the PR event environment.

Keywords

uses: Can be used to reference an action in the same repository, a public repository, or a published Docker container image. The uses keyword can also be used to reference an action in a private repository by specifying the repository using the repository keyword.

env: It is best to avoid having expressions ${{ }} in run portion of a step. Instead, env allows defining of variables that store the expression.

awk : Can be used to extract a section of body, from a line containing START to a line containing END (inclusive of full line). section=$(echo "$body" | awk '/START/,/END/')

Use of third-party actions

Resource: GitHub Actions Marketplace
Summary: The GitHub Actions Marketplace is a collection of actions that can be used in your workflows. You can search for actions by category, language, or other criteria, and use them in your workflows to automate tasks.

Useful actions:

Action	Description
`actions/checkout@v3`	Check out a repository onto the runner, allowing subsequent steps to execute operations on the checked-out repository, i.e. gaining access to the repository's code.
`actions/github-script@v7`	Run a script in the GitHub Actions environment using the GitHub CLI. Refer to here
`actions/setup-node@v3`	Set up a Node.js environment for use in workflows.
`actions/stale`	Close stale issues and pull requests.
`boundfoxstudios/action-unassign-contributor-after-days-of-inactivity`	Automatically unassign user from issues after a certain period of inactivity.

Extra information about how stale and unassign actions work in the context of MarkBind

The definition of inactivity for the GitHub action is any form of history to the issue, be it labeling, comments or references. The action works such that issues and PRs are treated and checked for inactivity separately. This means that any updates done to a PR regarding this issue, will not reset inactivity for the issue.

How unassign and stale actions work:

Stale action adds Stale label to issue or PR based on inactivity (default 60 days)
Unassign action routinely checks for this Stale label, then checks whether it's been a set amount of days after the Stale label has been added with no other activity (default 7 days)
For issues passing the check before, it un-assigns users and removes Stale label

Reference workflow of real-life example

Solution using unassign and stale actions

Add the Stale label after 6 days and ping a reminder, then have the unassign-contributor-after-days-of-inactivity run 1 day after. It can also only check on issues that are actually assigned to someone, so that theres no redundancy. Limitations:

Any changes in PR regarding issue will not reset inactivity of issue, meaning if discussion and updates are done on the PR instead of the issue, the issue risks being Stale and the user being unassigned despite them actively working on the PR.
It can ping a general reminder (without resetting the inactivity) but it cannot ping the user directly with @username in the reminder due to how the code is written. It is possible to separately ping the user in another comment but that will cause a reset in inactivity. This means slightly lower visibility for the reminder.

Improvements for limitation 1

Building on unassign action, which at some point it might be better off just building our own unassign action for better integration and control Check corresponding PR (requires more implementation) Add additional check before setting Stale label to check if corresponding PR has history. This can be done through checking the list of open PRs and their descriptions whether it mentions the issue. It can also be done through looking at the issue’s history, for PRs that mention it, then checking the history of those PRs. This should be quite manageable since the number of open PRs at any point of time is still relatively low for Markbind’s scale.

Check corresponding issue (requires more implementation) On any activity in PRs, check description to find issues linked to the PR, so activity on PR can be translated to activity on the issue as well by posting a comment on the corresponding issue or something of that nature. This might require checking for a specific issue that has the user assigned to avoid commenting on relevant but not directly linked issues, if the PR has multiple relevant issues. We can also only call it on commits instead of any activity, so as to avoid over-polluting the issue with comments.

Improvements for limitation 2

Ping after unassign Same as before, add Stale level after 6 days, but don't need to ping the user, wait until unassign 1 day after, then ping the user that they have been unassigned and if they are still working on it, ask them to reassign themselves. This would likely fit better with a longer timeline. This solves the visibility problem as it can directly ping the user as resetting the inactivity after the user has been unassigned wouldn't matter.

Implement our own stale action (requires more implementation) Implement a simplified version of stale action that now allows pinging of user before applying the Stale label.

pull_request_target

Due to security reasons, for permissions given to GITHUB_TOKEN, the maximum access for pull requests from public forked repositories is restricted to only read, so it is not possible to add labels since there is no write access. GitHub introduced the pull_request_target event that will bypass this restriction and give the token write access. Pros:

It allows labelling of PR
Increased security as base branch workflows can be trusted, protecting job from running modified and malicious workflows Cons:
It can only run on pull_request events and not pull_request_review events which means need to run on PR merge rather than on PR approved.
This event runs in the context of the base of the pull request, rather than in the context of the merge commit, as the pull_request event does. This could lead to security vulnerabilities if scripts run are not properly checked for malicious code. Can be aided by seeking approval before running the job, refer to change repo settings

Alternative implementations

Workaround Pros: this can allow for still triggering on PR approved Cons: immensely complicates the workflow

Personal Access Token (PAT) Create a PAT with the necessary permissions and add it to your repository's secrets. Then, modify the workflow to use this secret instead of the GITHUB_TOKEN. Pros: this can allow for still triggering on PR approved Cons: exposes your repository to risks if the forked code can access the token

GitHub Actions Security

Resource: Using pull_request_target
Resource: Security and Cheatsheet
Resource: Security Lab

Specific version tags When using third-party actions, pin the version with a commit hash rather than a tag to shield your workflow from potential supply-chain compromise, since tags can be adjusted to a malicious codebase state but commit hashes provide immutability. This can be done by going to the codebase for the specific tag version and looking for the latest commit of the version desired and copying the commit’s full SHA from the url link. Use: uses: someperson/post-issue@f054a8b5c1271c37293245628f1cae047eff08c9 Instead of: uses: someperson/post-issue@v7 Downside is that the updates must be done by updating the commit hash instead of it being done automatically through moving the tag to a new release. This can be solved by using tools like Dependabot or Renovatebot by adding a comment of the version used, enabling automated dependency updates. Tools like StepSecurity can also be used.

Minimally scoped credentials Every credential used in the workflow should have the minimum required permissions to execute the job. In particular, use the ‘permissions’ key to make sure the GITHUB_TOKEN is configured with the least privileges for each job. permissions can be restricted at the repo, workflow or job level. Environment variables, like ${{ secrets.GITHUB_TOKEN }}, should be limited by scope, and should be declared at the step level when possible.

Pull_request_target (must be used for write access if PR is from forked repo) Do not use actions/checkout with this as it can give write permission and secrets access to untrusted code. Any building step, script execution, or even action call could be used to compromise the entire repository. This can be fixed by adding code to ensure that the code being checked out belongs to the base branch, which would also be limiting since the code checked out is not up to date for the PR. This can be done using:

- uses: actions/checkout@v4
  with:
    ref: $

Triggers workflows based on the latest commit of the pull request's base branch. Even if workflow files are modified or deleted on feature branches, workflows on the default branch aren't affected so you can prevent malicious code from being executed in CI without code review. Another solution that allows pull_request_target with actions/checkout used on the PR branch, is to add an additional step of running workflow only on approval by trusted users, such that the trusted user has to check the changes in the code from the PR to ensure there is no malicious code before running the workflow.

Untrusted input Don't directly reference values you do not control, such as echo “${{github.event.pull_request.title}}”, since it can contain malicious code and lead to an injection attack. Instead use an action with arguments (recommended):

uses: fakeaction/printtitle@v3 
with: 
title: $

Or bind the value to an intermediate environment variable:

env: 
PR_TITLE: $
run: | 
echo “$PR_TITLE”

Use OIDC and respective Secrets Manager for access to cloud providers instead of using secrets. Use GitHub Secrets to store keys securely and reference in workflows using ${{}}. Can use GitGuardian Shield to help with scanning for security vulnerabilities.

Typescript

Typescript is a superset of JavaScript that adds static typing to the language. By manipulating variables and functions, Typescript can help catch errors before they occur.

Syntax	Name	Feature
`?`	Optional chaining operator	variable returns undefined if doesn't exist. Also used for optional function parameters or class properties
`??`	Nullish coalescing operator	returns the right-hand operand when the left-hand operand is null or undefined.
`!`	Non-null assertion operator	variable is not null or undefined, only used if you are sure that value will exist.

Process of upgrading dependencies and packages

MarkBind uses a monorepo structure, which means that multiple packages are contained in a single repository. The process of upgrading dependencies and packages in MarkBind involves the following steps:

Checking current versions: Check the current versions of the dependencies and packages in the project. This can be done by looking at the package.json file for each project. The command npm ls package_name will output which packages are using what versions.
Review changelog and documentation: Review the changelog and documentation for the dependencies and packages to see what changes have been made in the new versions.
Upgrade dependencies and packages: Update the relevant package.json file or the root one for dependencies across all packages, then run npm run setup

LAM JIU FONG

Node Package Manager (npm)

A default package manager for Node.js.

npm simplifies the process of installing, updating, and managing external libraries or modules in a Node.js project.
npm allows developers to define and run scripts in their project's package.json file, automating common development tasks.
npm allows developers to publish their own packages on the npm registry.

Aspects learnt:

npm CLI - A powerful tool to interact with npm. Learnt the usages of the basic commands like npm install, npm update, npm run <scripts> etc. and how they helped streamline the development process.
package.json - Learnt how to interpret different parts of the json file eg. "scripts", "dependencies" and how to manage them.
How to publish my own package to the public.
How to use .npmignore

Resources:

npm Docs - Documentation for the npm registry, website, and command-line interface

Stylelint

A CSS linter that helps enforce conventions and avoid errors.

Stylelint has over 100 built-in rules for modern CSS syntax and features, but it is customisable and supports plugins/configs.
It can fix problems automatically where possible.
It can extract embedded styles from HTML, Markdown and CSS-in-JS template literals.
It can also parse CSS-like languages like SCSS, Sass, Less and SugarSS.

Aspects learnt:

Configuring the linter using the stylelintrc.js file, a configuration object for Stylelint to suit our own needs.
Integrating Stylelint into our project.

Resources:

Stylelint offical Docs

Commander.js

A JavaScript library that provides a framework for building command-line interfaces (CLIs) in Node.js applications

Aspects learnt:

Define commands, options, and arguments using Commander.js for Markbind.

Resources:

Commander.js README

Github Actions

A CI/CD platform allowing developers to automate workflows directly within their GitHub repository.

It supports customised workflows using YAML files to automate tasks such as building, testing, and deploying code.

Aspects learnt:

Understanding how Github Actions works in a specific repository.
Interpreting .yml files in .github/workflow.

Chrome DevTools

A set of web developer tools built directly into the Chrome browser.

We can utilise it to diagnose problems and monitor our program's performance eg. time used for each file to load
We can see what is actually happening under the hood eg. which files are loaded before others

Aspects learnt:

How to check the attributes of each HTML component on the page.
How to change the behavior of the browser in terms of loading speed by utilising the Network section - disable cache and change network settings
How to monitor the behavior and performance of the browser by using the Performance insights section

Resources:

Most stack overflow articles will teach us how to interpret the output of Chrome DevTools, I realise it is easy to find such articles by searching "How to know xxxx", eg. How to know if lazy loading is working

Nunjucks

A template engine for Javascript. It provides a way to mix static content with dynamic data.

Aspect learnt:

I mostly learnt about nunjucks' API and learn to integrate it into our project.
Learnt how Nunjucks works under the hood, from configuring to parsing to rendering, I have developed a strong understanding on how to integrate Nunjucks to my own project.

Resources:

Nunjucks offiical API

Jest

A Javascript testing framework that focuses on simplicity when writing tests.

Aspect learnt:

Learnt to differentiate mocks and spies and their particular use cases.
Learnt how to use jest.mock, jest.fn to implement mocks and jest.spyOn to create spies.

Resources:

mdn web_docs

A website that documents web technologies for developers. The articles are written by developers that covers a lot of aspects related to the web.

Aspects learnt:

Learnt about some fundamentals about the web eg. how browser renders the files, how the HTML elements like <img> and <script> behaves, along with some common issues eg. lazy loading

Resources:

The website itself

WANG JINGTING

Vue and Jest/Vue Test Utils

While working with Vue components this semester, I've learned more about props and script in vue when working on the template for panels through adding a new prop isSeamless and writing new script for the panel component.

MarkBind uses Jest together with Vue Test Utils for its snapshot tests, which test Vue components against their expected snapshots. While updating the component, I wrote new tests to ensure that the Vue components are working as expected.

Resources

ESM/CJS interoperality

An interesting issue I've encountered this semester while researching on integrating a full search functionality is the issue of importing esm like pagefind into cjs modules. CommonJS uses the require('something') syntax for importing other modules and ESM uses the import {stuff} from './somewhere' syntax for importing.

Another crucial difference is that CJS imports are synchronous while ESM imports are asynchronous. As such, when importing ES modules into CJS, the normal require('pagefind') syntax would result in an error. Instead, you'll need to use await import('pagefind') to asynchronously import the module. This difference in imports is something that should be taken note of since we use both the ESM import syntax and CJS require syntax in various files in MarkBind.

Resources

Nunjucks

Nunjucks is a rich and powerful templating language for JavaScript. MarkBind supports Nunjucks for templating and I’ve used Nunjucks specifically to create a set of mappings of topics to their pages, and to write macros.

Aspects

macro

Nunjucks macro allows one to define reusable chunks of content. A great benefit of macro is the reduction of code duplication due to its ability to encapsulate chunks of code into templates and its ability to accept parameters so that the output can be customised based on the inputs provided.

set and import

While combining the syntax pages in this commit, I worked on a set that keeps track of the various syntax topics and their information. This was a good exercise to experience how to create a variable using set and import it in other files to access its values using import.

Resources

Nunjucks Templating Documentation

Bootstrap

MarkBind has Vue.js components built on the popular BootStrap framework. Much of Bootstrap's features are supported in and out of these components as well. While creating the portfolio template, I got to learn more about the various components and layouts of Bootstrap.

Aspects

grid

Bootstrap grid built with flexbox and is fully responsive. More specific aspects I've learned

When building grid layouts, all content goes in columns. The hierarchy of Bootstrap’s grid goes from container to row to column, which needs to be kept in mind while adding content.
There are 12 template columns available per row, which allows for different combinations of elements that span any number of columns. The number 12 is very important when customising the width for each column to prevent unintended layout changes as Bootstrap does column wrapping when more than 12 columns are placed in a single row.

Components offered by Bootstrap

Explored various components offered by Bootstrap, such as accordions, cards, carousels

Resources

Bootstrap docs

WANG YIWEN

Tool/Technology git

To sync a forked repository with the original repository after discarding all the changes in the forked repository:

git remote add upstream <URL_of_original_repository>
git fetch upstream
git checkout master
git reset --hard upstream/master
git push origin master --force

To append a new commit onto the already existing commit you can do the following:

git add .
git commit --amend
git push origin <branch_name> --force

If there is significant changes to file after renaming, git treat it as a new file and the history of the file is lost. So to preserve the history of the file, need to separate the renaming and the changes (and do rebase and merge).

Use squash merge to keep PR history concise.

r.patch, r.minor, r.major

Tool/Technology JavaScript/TypeScript

Use !! To change to boolean.

Use type && {key: value} to quick define type.

Use as to cast type.

use ! to assert non-null.

In functional programming, many methods are bundled in the style like Array.{method}.

Redux/Saga is a predictable state container for JavaScript apps. It is a state management tool, and is often used with React (not very relevant to markbind).Pinia often used for vue.

Vscode's "goto references" does not work well with javascript (mixed inside the typescript). As currently, some of the core packages are not migrated to typescript yet, the references are not recognized. So need to use "findin file" instead.

Possible to auto re-compiling the typescript file into javascript files when it is changed, and only recompile the changed files.

If given not enough parameters, javascript treat missing ones as undefined; if given extra paremeters, javascript will ignore;

Gain knowledge on vue component basics (eg:V-model, v-for, v-once, meaning of <Script setup lang=“ts”> , Reactive variablem ref VS shallowRef (triggerRef) and customRef, How to ref a div without get element by id...)

Gain knowledge on and how new line in vue could be treated as brace and leads to subtle bugs.

Gain knowledge on vue component life cycle knowledge and how "onMounted" can fix the mermaid plugin issue.

Introduced to the concept server side rendering, gain knowledge of the Vue hydration issue.

Tool/Technology Miscellaneous

Workflow vice, a good practice is to not immediately merge a pull request after it is reviewed. Instead, wait for a day or two to see if there are any other comments or suggestions.

For command line tool (like markbind), a good project structure is to have a cli and a core folder. The cli folder contains the code for the command line interface, while the core folder contains the core logic of the project. When building from source, need to npm link the cli folder.

More comfortable with using loggers to debug.

Jest as the testing framework (and debugger).

Jest can mock other tools (fs for example).

Jest can spyon other methods, so track is usage(eg: how many times is it called, what are the arguements the methods are called with)

Snapshot (Recursively comparing every folder and file in the expcted folder with the actual generated files) as a way to do the functional testing.

Cheerio to convert html string to dom, and locate elements in the dom.

Understand the concept of hoisting in JavaScript.

Npm is different from yarn in that it has a flat dependency tree, while yarn has a nested dependency tree. So yarn allows the reuse of the same package in the dependency tree, while npm does not.

Can use npm run to list all the runnable scripts.

Fix issues and simple bugs is the best way to gain familiarity with the codebase.

Understand the difference between inline markdown and non-inline markdown.

Can use comment like eslint-disable-next-line no-await-in-loop to disable eslint for the next line for a specific rule.

Better understand the workflow for frontend: markbind serve -d or npm run build:web to test frontend changes, the frontend markbind.min.js and markbind.min.css bundles are only updated during release.

Gain knowledge on debugging frontend with browser, and understand css inheritance.

XU SHUYAO

How MarkBind Works

MarkBind Rendering Flow

The rendering flow for creating a complete website from Markdown using MarkBind involves several key components and processes:

Site Initialization:

The rendering process starts with the initialization of a Site instance in Site/index.ts.
The Site constructor sets up the necessary paths, templates, and initializes various managers and processors.

Site Configuration:

The readSiteConfig method in Site/index.ts reads the site configuration file (site.json) using the SiteConfig class.
The site configuration includes settings such as base URL, page configuration, asset paths, and more.

Page Collection:

The collectAddressablePages method in Site/index.ts collects the addressable pages based on the site configuration.
It processes the pages and pagesExclude options to determine the valid pages.

Plugin Setup:

The PluginManager class in plugins/PluginManager.ts handles the initialization and management of plugins.
It collects the specified plugins from the site configuration and loads them using the Plugin class.

Asset Building:

The buildAssets method in Site/index.ts builds the necessary assets for the site.
It copies the specified assets from the site's root path to the output path.

Markdown Processing:

The generatePages method in Site/index.ts initiates the rendering process for each page.
It creates a Page instance for each page using the createPage method.
The Page class in Page/index.ts handles the rendering of individual pages.
It uses the process method of the NodeProcessor class to process the Markdown content.

Plugin Execution:

During the page rendering process, the PluginManager executes the relevant hooks for each plugin.
Plugins can define hooks such as beforeSiteGenerate, processNode, postRender, etc., to modify the page content or perform additional tasks.

Layout and Template Rendering:

The LayoutManager class in Layout/LayoutManager.ts handles the generation and combination of layouts with pages.
It uses the Layout class to process and render the layout files.
The rendered pages are then passed through the VariableProcessor to replace variables with their corresponding values.

Post-rendering Tasks:

After rendering, additional tasks such as writing site data and copying core web assets are performed.

Output Generation:

Finally, the rendered pages and assets are written to the output directory specified during site initialization.

Vue.js Integration in MarkBind

MarkBind integrates Vue.js for building user interfaces and enhancing the rendering process. Here's how Vue.js is used in MarkBind:

Server-Side Rendering (SSR): MarkBind utilizes Vue's server-side rendering capabilities to pre-render the pages on the server. The pageVueServerRenderer module in Page/PageVueServerRenderer.ts handles the compilation of Vue pages and the creation of render functions.
Client-Side Hydration: After the initial server-side rendering, the rendered HTML is sent to the client. On the client-side, Vue takes over and hydrates the pre-rendered HTML, making it interactive and reactive.
Vue Components: MarkBind defines various Vue components to encapsulate reusable UI elements and functionality. These components are used throughout the rendered pages to enhance the user experience.
Integration with MarkBind Plugins: MarkBind plugins can also utilize Vue.js to extend the functionality of the application. Plugins can define custom Vue components, directives, and hooks to interact with the rendering process.

Build Process and Asset Management

MarkBind follows a build process to generate the final website and manage the necessary assets:

Asset Building: The buildAssets method in Site/index.ts handles the building of assets for the site. It copies the specified assets from the site's root path to the output path.
Core Web Assets: MarkBind relies on core web assets such as CSS and JavaScript files. The copyCoreWebAsset method in Site/index.ts copies these assets from the @markbind/core-web package to the output directory.
Plugin Assets: Plugins can also provide their own assets, such as CSS and JavaScript files. The PluginManager handles the collection and copying of plugin assets to the output directory.
Icon Assets: MarkBind supports various icon libraries, including Font Awesome, Glyphicons, Octicons, and Material Icons. The copyFontAwesomeAsset, copyOcticonsAsset, and copyMaterialIconsAsset methods in Site/index.ts handle the copying of these icon assets to the output directory.
Bootstrap Theme: MarkBind allows customization of the Bootstrap theme used in the site. The copyBootstrapTheme method in Site/index.ts handles the copying of the selected Bootstrap theme to the output directory.

Libraries used in MarkBind

markdown-it

MarkBind uses markdown-it for rendering html from markdown files. markdown-it is a fast markdown parser and has very extensive plugins support and great extensibility.

Adding custom rules to `markdown-it` through adding a rule to `markdown-it`'s attribute

Adding custom rules to markdown-it can be done easily by adding a rule to the attribute. For example, if we want to add our rules for rendering fenced code blocks, we can do so by adding a rule to the markdown-it's attribute.

markdownIt.renderer.rules.fence = (tokens: Token[],
                                   idx: number, options: Options, env: any, slf: Renderer) => {}

Parameters

tokens (Token[]): An array of Token objects. Each token represents a segment of the parsed Markdown content. Tokens of particular interest for the fence rule include:
- token.type: The type of the token (e.g., 'fence', 'code', 'paragraph').
- token.info: Contains the language specified after the opening set of backticks, if any, plus additional options.
- token.content: The text content within the fenced code block.
idx (number): The index of the current fence token within the tokens array. This lets us find tokens before and after the fence if needed.
options (Options): This object contains global options passed to the Markdown-it parser. This could include settings specific to our setup.
env (any): An object containing environment variables and potentially additional data derived from the parsed Markdown. This can be useful for accessing context when defining rendering logic.
slf (Renderer): A reference to the Markdown-it Renderer object itself. This is primarily used when we need to call other rendering rules to process nested Markdown code within the fenced block.

Purpose of the fence renderer rule

The markdownIt.renderer.rules.fence function is responsible for taking a fence token (representing a fenced code block) and converting it into the appropriate HTML output. This could include syntax highlighting, if our setup supports it.

How it Works

Inside the function, we have access to all the information in the tokens, options, and the environment. We can craft custom logic to generate the desired HTML structure for our fenced code blocks. Here's a very basic example:

markdownIt.renderer.rules.fence = (tokens, idx, options, env, slf) => {
  const token = tokens[idx];
  const content = token.content;
  const language = token.info.trim(); // Language after the opening backticks

  return `<pre><code class="language-${language}">${content}</code></pre>`;
};

Cheerio

MarkBind uses Cheerio for parsing and manipulating the HTML structure of Markdown files after they have been processed by markdown-it. Cheerio is a fast, flexible, and lean implementation of core jQuery designed specifically for the server.

Loading HTML into Cheerio

To use Cheerio, we first need to load HTML into it. This is done by passing the HTML string to the cheerio.load function.

const $ = cheerio.load('<h2 class="title">Hello world</h2>');

The $ variable now contains a Cheerio instance that wraps the parsed HTML, and can be used similarly to how we would use jQuery in the browser.

Selecting Elements

Cheerio uses CSS selectors to select elements, just like jQuery. Here are some examples:

// Select all h2 elements
$('h2');

// Select the element with id "main"
$('#main');

// Select all elements with class "text"
$('.text');

// Select all a tags within h2 elements
$('h2 a');

Manipulating Elements

Once we have selected elements, we can manipulate them in various ways. Some common methods include:

addClass(className): Adds the specified class to the selected elements.
removeClass(className): Removes the specified class from the selected elements.
attr(attributeName, value): Gets or sets the value of the specified attribute.
text(newText): Gets or sets the text content of the selected elements.
html(newHtml): Gets or sets the inner HTML of the selected elements.
append(content): Appends the specified content to the end of each selected element.
prepend(content): Prepends the specified content to the beginning of each selected element.
remove(): Removes the selected elements from the DOM.

Here's an example that demonstrates some of these methods:

// Add the class "highlight" to all h2 elements
$('h2').addClass('highlight');

// Set the text of the element with id "title"
$('#title').text('New Title');

// Append a span to each paragraph
$('p').append('<span>Some appended text</span>');

Rendering Back to HTML

After manipulating the parsed HTML with Cheerio, we can render it back to an HTML string using the html method.

$.html();
//=> '<h2 class="title highlight">New Title</h2><p>Some text<span>Some appended text</span></p>'

This is useful when we need to save the manipulated HTML back to a file or send it as a response in a web application.

Cheerio provides a simple and efficient way to parse and manipulate HTML structures in MarkBind plugins, enabling powerful transformations of the rendered Markdown content.

Vue.js (focusing on custom directives)

Vue.js is a progressive JavaScript framework for building user interfaces. It provides a declarative and component-based approach to UI development, making it easier to create and maintain complex applications.

Custom Directives in Vue

Vue allows we to extend the behavior of HTML elements or Vue components through custom directives. Custom directives provide a way to encapsulate and reuse DOM manipulation logic across your application.

Vue.directive('my-directive', {
  bind(el, binding, vnode) {
    // Directive initialization logic
  },
  inserted(el, binding, vnode) {
    // Logic to be executed when the directive is inserted into the DOM
  },
  update(el, binding, vnode, oldVnode) {
    // Logic to be executed when the directive's bound value changes
  },
  componentUpdated(el, binding, vnode, oldVnode) {
    // Logic to be executed after the containing component's VNode has updated
  },
  unbind(el, binding, vnode) {
    // Cleanup logic when the directive is unbound from the element
  }
})

Parameters

el (HTMLElement): The DOM element the directive is bound to. This allows we to perform direct DOM manipulations or access the element's properties.
binding (DirectiveBinding): An object containing the directive's binding information, including:
- binding.value: The value passed to the directive. It can be a primitive value, an object, or a function.
- binding.oldValue: The previous value of the directive, only available in the update and componentUpdated hooks.
- binding.arg: The argument passed to the directive, if any, denoted by a colon (e.g., v-my-directive:arg).
- binding.modifiers: An object containing any modifiers applied to the directive (e.g., v-my-directive.modifier).
vnode (VNode): The virtual node representing the bound element. It provides access to the Vue instance properties and methods.
oldVnode (VNode): The previous virtual node, only available in the update and componentUpdated hooks.

Directive Lifecycle Hooks

Custom directives have access to several lifecycle hooks that allow we to execute logic at different stages:

bind: Called once when the directive is first bound to the element. This is where we can perform any one-time setup or initialization.
inserted: Called when the bound element is inserted into the parent node. This is a good place to execute logic that relies on the element being in the DOM.
update: Called whenever the bound value of the directive changes. we can compare the current and old values and perform updates accordingly.
componentUpdated: Called after the containing component's VNode and the VNodes of its children have updated.
unbind: Called when the directive is unbound from the element. This is where we can perform any necessary cleanup or teardown logic.

Usage

To use a custom directive, we can attach it to an element or component using the v- prefix followed by the directive name. For example:

<div v-my-directive="value"></div>

In this case, my-directive is the name of the custom directive, and value is the value being passed to the directive.

Custom directives provide a powerful way to encapsulate and reuse DOM manipulation logic in Vue applications. They allow we to extend the behavior of elements and solve specific problems related to integrating external libraries or custom functionality.

Solving Issues with Third-Party Library Integration

When integrating third-party libraries into Vue components, we may encounter scenarios where the library's initialization script doesn't work as expected within the component's lifecycle. This can happen due to timing differences between the library's initialization and the component's rendering process.

One common issue is when a library relies on the presence of certain DOM elements during its initialization, but those elements are dynamically rendered by the Vue component and may not be available when the library's initialization script runs.

To solve this problem, we can leverage Vue's custom directives. By creating a custom directive that handles the library's initialization logic, we can ensure that the initialization happens at the appropriate time within the component's lifecycle.

Here's an example of how we can create a custom directive to initialize a third-party library on a specific element:

Vue.directive('my-library', {
  inserted(el) {
    // Initialize the library on the element
    myLibrary.init(el);
  },
  unbind(el) {
    // Cleanup the library when the directive is unbound
    myLibrary.destroy(el);
  }
})

In this example, the my-library directive is responsible for initializing the myLibrary on the bound element when it is inserted into the DOM. It also handles the cleanup process when the directive is unbound from the element.

To use this directive in a Vue component, we can simply attach it to the desired element:

<template>
  <div v-my-library>
    <!-- Element content -->
  </div>
</template>

By using the custom directive, we ensure that the library initialization happens at the appropriate time within the component's lifecycle, solving the issue of initialization timing.

Some more general web development knowledge

JavaScript Module Systems

CommonJS (CJS)

CommonJS is a module system used in Node.js and other JavaScript environments. It uses the require() function to import modules and module.exports or exports to export modules.

// Importing a module
const myModule = require('./myModule')

// Exporting a module
module.exports = {
  // Exported properties and methods
}

ECMAScript Modules (ESM)

ECMAScript Modules (ESM) is the standard module system introduced in JavaScript with ES6 (ECMAScript 2015). It uses the import and export keywords for importing and exporting modules.

// Importing a module
import myModule from './myModule'

// Exporting a module
export default {
  // Exported properties and methods
}

ESM provides benefits such as static analysis, tree shaking, and better performance compared to CJS. However, CJS is still widely used, especially in older codebases and libraries.

Differences between CJS and ESM

Syntax: CJS uses require() and module.exports, while ESM uses import and export.
Synchronous vs. Asynchronous: CJS imports are synchronous and blocking, while ESM imports are asynchronous and non-blocking.
Browser Support: ESM is supported natively in modern browsers, while CJS requires a bundler or transpiler for browser compatibility.
Interoperability: ESM and CJS modules can interoperate to some extent, but there are differences in how they handle default exports and named exports.

Node.js supports both CJS and ESM, and the module system used depends on the file extension (.mjs for ESM and .js for CJS) or the "type": "module" field in the package.json file.

HTML and DOM Rendering Order

When a web page is loaded, the browser follows a specific order to render the content:

Parsing HTML: The browser parses the HTML document and constructs the Document Object Model (DOM) tree.
Loading External Resources: The browser loads external resources such as CSS files, JavaScript files, and images referenced in the HTML.
Constructing the CSSOM: The browser parses the CSS and constructs the CSS Object Model (CSSOM) tree.
Executing JavaScript: The browser executes JavaScript code, which can manipulate the DOM and CSSOM.
Rendering: The browser combines the DOM and CSSOM to create the render tree, which represents the visual layout of the page. It then paints the pixels on the screen.

Content partially generated by GenAI

RepoSense

ALVIS NG

Vue.js and Typescript

Helped out with migration from JavaScript to TypeScript and currently pending a feature to compare all repos at once.

DevOps

Planning to focus on DevOps for the remainder of the semester.

GEORGE TAY QUAN YAO

Java

Java is used extensively in the backend for RepoSense, from the generation of the RepoSense report, to the different git commands required to clone repositories and analyse them. It was not difficult to pick up Java as I had some prior experience in Java in previous classes such as CS2030S, CS2040S and CS2103T, but the intricacies surrounding the different Java libraries was something that I was never properly exposed to and had to learn over time as I worked on the project.

Aspects Learned

Some of the aspects I have learnt regarding Java:

Learning to read Java source code and using the built-in IntelliJ Java Profiler to identify possible optimisations in the current codebase
- One particular aspect of Java that was the target of optimisation was Regex. I realised that Regex was used extensively in the codebase in different contexts, whether to split up strings or to find a matching string pattern in a given string. However, after identifying some parts of the code that were potentially buggy or slow (especially snippets whereby Regex operations are used in conjunction with iteration) using the IntelliJ Java Profiler and consulting the Java source code, as well as some online sources, I was able to identify a latent anti-pattern in the way Regex code was written in the code base
- This experience taught me that Regex code should never be mixed with iteration and that we should precompile Pattern objects for repeated use instead
Learning more about different code design patterns such as the Builder pattern and how to adapt it to our codebase to suit our own needs
- The Builder pattern was one of the patterns that was not taught in the previous modules that I have taken, but I was aware of it due to reading up on coding patterns prior me taking up this project. At first, I thought that the Builder pattern could be adopted as is into the codebase without consideration of our use cases, however, through the guidance of my mentors, I soon realised that not all parts of the Builder pattern was relevant to what we needed the pattern for, and that we need to adapt the pattern to our use, and not the other way around.

Resources Used

Java Source Code
- The source code for Java was used to verify the time and memory usage of the code given by the code profiler, by cross-referencing bottlenecks identified in the code profile with the source code to identify inefficient code in the codebase
Online Java resources such as baeldung.com
- The resources provided alternative ways of achieving a certain results, which may help to increase the efficiency of the code
Past modules taken that taught Java code
- CS2030S, CS2040S and CS2103T

Docker

Docker is something that I have always wanted to work with, especially so in combination with other container orchestration tools such as Kubernetes. I looked into the possibility of Docker being used to containerise RepoSense, enabling us to better test RepoSense, provide end users with a premade container with RepoSense's dependencies resolved for them, and a way to quickly deploy RepoSense to their favourite cloud providers (e.g. AWS ECS, etc.) for greater availability of RepoSense for their target users.

Aspects Learned

Some of the aspects I have learnt:

Learning Dockerfile syntax
- While Docker is not exactly new to me as I have previously worked with Docker products in a past project, I have long forgotten the different syntax required for a Dockerfile, as well as the different Docker commands that work with building and running containers
Docker integration with Github Actions
- It was also a challenge to integrate Docker with Github Actions for CI/CD purposes as it is not as easy to debug when things go wrong during the build and execution process. Every time changes are made, a new image needs to be created, tested and then deployed, all of which incurs precious Github Actions build time. Not to mention that the environment provided by Github Actions' Runners are distinctly different that my local environment (Github Runners run on x86 while my Mac environment runs on ARM64), making testing extremely difficult and time consuming

Resources Used

Docker references
- The references for the different Dockerfile syntax and Docker commands was referenced heavily during the creation of the different POC Dockerfiles
Stackoverflow and other online references
- Some more obscure ways of doing things in Docker had to reference the code which others have written, as it was not readily available in the references for Docker
ChatGPT
- ChatGPT was used to debug and identify potential reasons as to why my Dockerfile might be buggy and fails builds when they are pushed onto Github

JONAS ONG SI WEI

Vue + Pug

Summary

Vue is a frontend JavaScript framework to build web user interfaces, while Pug is a preprocessor that speeds up writing HTML. In RepoSense, Vue brings the interactivity in the generated reports and Pug makes it easier to write more concise and readable HTML.

Aspects Learnt

Having some experience in frontend libraries/frameworks like React, I did not have too steep a learning curve when learning Vue; however, I still took some time to get used to concepts and the syntax of Vue including state management, the idea of single-file components, conditional rendering, reactive UI elements, and much more. One particular aspect I enjoyed learning and implementing in Vue was the ease of declaring state in a component just within the data() function. This was, to me, contrasted with React where useState and useEffect are more complicated and tricky to use.

Resources Used

Official Vue and Pug documentation
Existing codebase and PRs
ChatGPT and GitHub Copilot for specific syntax

Cypress

Cypress is a testing framework that allows for end-to-end testing of web applications. I used it in RepoSense to write tests for the UI.

Aspects Learnt

Cypress was a new tool to me and I had to learn how to write tests using this tool as well as how to set up the test environment. Many Cypress commands are based on natural words like .then, .get, .to.deep, just to name a few, but the concepts of Cypress like asynchonicity, closures, and its inclusion of jQuery make it unfamiliar to me.

Resources Used

Official Cypress documentation
Existing codebase and PRs
ChatGPT and GitHub Copilot for specific syntax

...

StackOverflow, ChatGPT, Documentation

TEAMMATES

CHING MING YUAN

Data Migration

Data migration is critical aspect of software development and system maintenance, it involves moving data efficiently while maintaining data integrity, security, and consistency. Having the chance to be involve in data migration really opened my eyes to its general procedure. We were tasked with migrating NoSQL datastore entity to SQL postgresql.

Efficiency, the longer the script, the longer the application is down
Validation, what to do if validation fails?
Order of migration
Batch processing, batching expensive operation would result in higher efficiency
Recovery, able to recover after a crash or SIGKILL

End to End (E2E) testing

E2E tests are a type of software testing that evaluates the entire workflow of an application from start to finish, simulating real user interactions. The purpose of E2E testing is to ensure that all components of an application, including the user interface, backend services, databases, and external integrations, work together correctly to achieve the desired functionality. Here's an explanation of E2E tests and how they are conducted. As E2E tests are very expensive to run, it is crucial that we identify the important workflows and simulate the actions involved by interacting with the UI. You then assert the expected conditions are present after the interaction. Teammates uses Selenium to locate and interact with the elements in the UI. I have to admit, this is my first time doing tests for Frontend much less the whole application. It was cool to see the browser jump around and simulate the required action. I also saw the value in this as I managed to uncover many bugs that was not caught in earlier tests.

References:

Mockito

Mockito facilitates unit testing by mocking dependencies. Mock objects are used to simulated objects that mimic the behaviors of real objects in a controlled way, allowing developers to isolate and test specific components of their code without relying on actual dependencies or external systems. While I have written Stubs in CS2103T, this is my first time using a dedicated mocking library and it has changed my life. I also have used what I have learnt in many job interviews.

mock method to initialise the mock object
when/then for you to inject the controlled outcome
verify mainly to check number of invocations

References:

Hibernate

TEAMMATES uses Hibernate, an Object-Relational Mapper(ORM). ORM are widely used in software development today as it provides several benefit to developers. While I have used ORMs before, such as Prisma, it is my first time using Hibernate. ORMs simplifies database interactions by allowing developers to work with Java objects directly, abstracting away the complexities of SQL queries. Also, as the name suggest, it allows us to map Java Objects to database table and their relationship. Allowing for easier and seamless operations with the database table. I read up on some Hibernate basics:

JPA (Java Persistence API)
Criteria API to make database queries
Transactions
Batch processing to improve performance, especially in Data Migration
Lazy loading to improve performance
operations such as evict, persist, merge,

References:

Google Cloud Compute

I was required to deploy a staging environment for the course entity migration. It was my first time using GCP so I managed to gain familiarity with the vast tools that GCP offers. The guides provided by the seniors was just very descriptive and encouraged me to explore tweaking settings to better fit my use case.

References:

GCP docs

DOMINIC BERZIN CHUA WAY GIN

Hibernate

As part of the v9-migration, I had to familiarise myself with the Hibernate ORM. It is my first time using Hibernate ORM, as I was only familiar with the Eloquent ORM from Laravel, as well as the ORM from Django. ORMs are extremely beneficial as they essentially translate between data representations used in the application and those in the database. It also makes your code more readable as it simplifies complex queries and makes transitioning between various database engines seamless, should the need arise.

Aspects Learnt:

Learnt the fundamentals of Object-Relational Mapping (ORM), enabling the conversion of data between the database and object-oriented programming languages, in particular Java
Usage of persist and merge to insert or update an entity respectively
Learnt about Hibernate's internal caching mechanisms
Managing transactions

Resources

Solr

TEAMMATES uses Solr for full-text search, and is structured to function for both the datastore and SQL databases.

Aspects Learnt:

Gained proficiency in Solr's query syntax to perform powerful searches, including filtering, sorting, and faceting to retrieve relevant documents efficiently
Understood how Solr is integrated into the TEAMMATES backend for searching or indexing

Resources:

PostgreSQL

Having only used SQLite and MySQL in the past, I had to familiarise myself with PostgreSQL as it is the SQL database used in TEAMMATES.

Aspects Learnt:

Learnt about PostgreSQL's architecture, including its use of processes for client connections, MVCC (Multiversion Concurrency Control), and its write-ahead logging (WAL) for data integrity
Write-Ahead Logging (WAL) involves recording changes to a log before any changes are made to the actual database. This method is crucial for recovery after a crash, as it ensures that all committed transactions are saved.
MVCC allows multiple users to access the database concurrently without locking the data. This means readers don't block writers and vice-versa, leading to increased performance and lower waiting times during operations, which is a significant advantage over MySQL's more traditional locking mechanisms
Learnt about the differences between the 3 SQL database engines

Resources:

Differences between MySQL, SQLite and PostgreSQL

Angular

Having had no experience utilising Angular prior to working on TEAMMATES, I was introduced to several neat features that Angular has to offer.

Aspects Learnt:

Angular's component-based architecture makes it easy to build and maintain large applications. Each component is encapsulated with its own functionality and is responsible for a specific UI element. This modularity allowed me to quickly understand and contribute to the project, as I could focus on individual components without being overwhelmed by the entire codebase.
Angular's dependency injection system is a design pattern in which a class receives its dependencies from external sources rather than creating them itself. This approach simplifies the development of large applications by making it easier to manage and test components.
Angular offers the trackBy function, which I used in conjunction with the *ngFor directive to manage lists more efficiently. Normally, *ngFor can be inefficient because it re-renders the entire list when the data changes. However, by implementing trackBy, Angular can track each item's identity and only re-render items that have actually changed. This reduces the performance cost, especially in large lists where only a few items change.

Google Cloud

When deploying the staging environment for the ARF upgrade, I managed to work with and gain familiarity with the deployment workflow, as well as several GCP tools and the gcloud sdk.

Aspects Learnt

Navigating GCP and the services they have to offer
Setting up OAuth 2.0 Client and Gmail API credentials
Configuring up a VPC for communication between various services
Deployment using gcloud
Navigating server logs to investigate issues

Resources:

Guide created by mentors

Snapshot Testing

Snapshot testing with Jest is an effective strategy to ensure that user interfaces remain consistent despite code changes. It's important for developers to maintain updated snapshots and commit these changes as part of their regular development process.

Snapshot tests are particularly useful for detecting unexpected changes in the UI. By capturing the "snapshot" of an output, developers can compare the current component render against a stored version. If changes occur that aren't captured in a new snapshot, the test will fail, signaling the need for a review.

Unit Testing with Mockito

Mockito is a popular Java-based framework used primarily for unit testing. It allows developers to isolate the units of code they are testing, to focus solely on the component of software that is being tested.

Mockito allows developers to create mock implementations of dependencies for a particular class. This way, developers can isolate the behavior of the class itself without needing the actual dependencies to be active. By using mock objects instead of real ones, tests can be simplified as they don’t have to cater to the complexities of actual dependencies, such as database connections or external services. Mockito also provides tools to verify that certain behaviors happened during the test. For example, it can verify that a method was called with certain parameters or a certain number of times.

Resources:

Mockito Tutorials

E2E Testing

E2E Testing allows us to ensure that the application functions as expected from the perspective of the user. This type of testing simulates real user scenarios to validate the complete functionality of the application. Common tools for conducting E2E testing include Selenium, Playwright, and Cypress.

Throughout the semester, I had to migrate several E2E tests and also create some new ones as part of the ARF project, which exposed me to the Page Object Model, which allows for easier testing and maintenance. It enhances code reusability as the same Page Object Model can be reused across related test cases.

E2E Tests may be the most complicated type of test to write, as it involves multiple components of the application; testing it as a whole, rather than in isolated components. As such, pinpointing the sources of errors or failures can be difficult. E2E Tests can also be flaky at times, passing in one run, and failing on others, which could occur due to numerous reasons such as timing issues, concurrency problems or subtle bugs that occur under specific circumstances. However, it is still highly useful as it helps to identify issues in the interaction between integrated components, and also simulates real user scenarios.

Resources:

TYE JIA JUN, MARQUES

Datastore

Datastore is a NoSQL document database. While it provides scalability and performance advantages, it falls short when dealing with complex queries. While writing migration scripts, I read up on the following from the Datastore documentation:

property indexes and composite indexes
optimization on multiple field queries
cursor paging

References:

Datastore docs

Hibernate

TEAMMATES uses Hibernate, an Object-Relational Mapping framework which allows us to interact with the database without writing SQL commands. It abstracts these low-level database interactions, enabling developers to work with high-level objects and queries instead. I read up on some Hibernate basics:

JPA persistence context / Hibernate session
entity states: transient, persistent, detached
entity operations: persist, merge, evict
criteria queries

References:

Mockito

Mockito facilitates unit testing by reducing the setup needed to create and define behaviour of mocked objects. The provided mock, when/then and verify methods not only simplify the test writing process, but also enhance their readability and clarity for future developers.

References:

Docker

I was introduced to Docker during the onboarding process. I learnt about containers and the benefits of containerization, such as portability and isolation, and how they enable developers on different infrastructure to work in a consistent environment.

References:

XENOS FIORENZO ANONG

Hibernate

As part of the V9 migration, I had to rewrite the logic to query from the SQL database using Hibernate ORM API instead of from Datastore.

ORM

TEAMMATES' back-end code follows the Object-Oriented (OO) paradigm. The code works with objects. This allows easy mapping of objects in the problem domain (e.g. app user) to objects in code (e.g. User).

For the data to persist beyond a single session, it must be stored/persisted into a database. V9 uses PostgreSQL, a relational database management system (RDBMS) to store data.

It is difficult to translate data from a relational model to an object models, resulting in the object-relational impedance mismatch.

A Object/Relational Mapping (ORM) framework helps bridge the object-relational impedance mismatch, allowing us to work with data from an RDBMS in a familiar OO fashion.

Hibernate website: What is an ORM

JPA

Jakarta Persistence, formerly known as Java Persistence API (JPA) is an API for persistence and ORM. Hibernate implements this specification.

JPA specification

Criteria API

The Criteria API allows us to make database queries using objects in code rather than using query strings. The queries can be built based on certain criteria (e.g. matching field).

Using Join<X, Y>, we can navigate to related entities in a query, allowing us to access fields of a related class. For example, when querying a User, we can access its associated Account.

Jakarta EE tutorial: Using the Criteria API to Create Queries

Persistence Operations

Hibernate maintains a persistence context, which serves as a cache of objects. This context allows for in-code objects to be synced with the data in the database.

Using persist(), merge(), and remove(), we can create, update, and remove an object's data from the database. These methods schedule SQL statements according to the current state of the Java object.

clear() clears the cached state and stops syncing existing Java objects with their corresponding database data. flush() synchronises the cached state with the database state. When writing integration tests, I found it helpful to clear() and flush() the persistence contexts before every test, to ensure that operations done in one test do not affect the others in unexpected ways.

Hibernate 6.4 introduction docs: Operations on the persistence context

Mockito

To isolate units in unit testing, it is useful to create mocks or stubs of other components that are used by the unit.

Verify

We can create a mock of a class using mock(). We can then use this mocked object as we would a normal object (e.g. calling methods). Afterwards, we can verify several things, such as whether a particular method was called with particular parameters, and how many times a particular method call was performed.

Stub

If a method needs to return a value when called, the return value can be stubbed before the method of the mocked object is called. The same method can be stubbed with different outputs for different parameters. Exceptions can also be stubbed in a similar way.

Mockito website

Angular forms

As part of the instructor account request form (ARF) project, I had to create an Angular form.

Overview

Angular has 2 form types: template-driven, and reactive.

Template-driven forms have implicit data models which are determined by the form view itself. Template-driven forms are simpler to add, but are more complicated to test and scale.

Reactive forms require an explicitly-defined data model that is then bound to the view. The explicit definition of the model in reactive forms makes it easier to scale and test, particularly for more complex forms.

Angular forms overview

Accessibility

Standard HTML attributes may still need to be set on Angular form inputs to ensure accessibility. For instance, Angular's Required validator does not set the required attribute on the element, which is used by screen readers, so we need to set it also. Another example would be setting the aria-invalid attribute when validation fails.

To make inline validation messages accessible, use aria-describedby to make it clear which input the error is associated with.

Building accessible forms with Angular

Validation

Angular has some built-in validator functions that can be used to validate form inputs, and allows for custom validators to be created. Validators can be synchronous or asynchronous.

By default, all validators run when the input values change. When there are many validators, the form may lag if validation is done this frequently. To improve performance, the form or input's updateOn option can be set to submit or blur to only run the validators on submit or blur.

Validating form input

Git

git rebase can be used to keep branch commit history readable and remove clutter from frequent merge commits.

In particular, the --onto option allows the root to be changed, which is useful when rebasing onto a branch that has itself been modified or rebased.

Each Git commit has a committer date and an author date. When rebasing, the committer date is altered. To prevent this, use --committer-date-is-author-date.

More about rebasing with --onto

YEO DI SHENG

Hibernate

session.flush()

EntityManagers do not always immediate execute the underly SQL statement. One such example is when we create and persist a new entity, the createdAt timestamp is not updated in the entity object in our application until we call flush().

This is because by calling flush() we can ensure that all outstanding SQL statements are executed and that the persistence context and the db is synchronized.

Persistent entities

Persistent entities are entities that are known by the persistence provider, Hibernate in this case. An entity(object) can be made persistent by either saving or reading an object from a session. Any changes (e.g., calling a setter) made to persistent entities are automatically persisted into the database.

We can stop hibernate from tracking and automatically updating the entities by calling detach(Entity) or evict(Entity). This will result in the entity becoming detached. While detached, Hibernate will have no longer track the changes made to the entity. To save the changes to the database or make the entity persistent again, we can use merge(Entity).

References

While using the new SQL db, we often find ourselves needing to refer to another related entity for example FeedbackSessionLogs.setStudent(studentEntity). This would often require us to query the db for the object and then call the setter. This is inefficient especially if we already have information like the studentEntity's primary key.

Hibernate provides a getReference() method which returns a proxy to an entity, that only contains a primary key, and other information are lazily fetched. While creating the proxy, Hibernate does not query the db. Here is an article that goes through different scenarios using reference to see which operations would result in Hibernate performing a SELECT query and which does not. It also includes some information on cached entities in Hibernate.

It is important to note that, since Hibernate does not check that the entity actually exists in the db on creation of the proxy, the proxy might contain a primary key that does not exist in the db. The application should be designed to handle such scenarios when using references. Here is more information on the difference between getReference() and find().

Testing

Mockito

In unit testing, a single component is isolated and tested by replacing its dependencies with stubs/mocks. This allows us to test only the behaviour of the SUT.

Mockito provides multiple methods that help to verify the behaviour of the SUT and also determine how the mocked dependencies are supposed to behave.

verify() this method allows us to verify that a method of a mocked class is called. It can be combined with other methods like times(x) which allowsus to verify that the method is only called x times.
Argument matchers like anyInt(), anyString() and allows us to define a custom matcher using argThat(). These argument matchers can be used to ensure that the correct arguments are being passed to the other dependencies. This is useful if the method you are testing does not return a value useful for determining the correctness of the method.
when() and thenReturn() These are methods that allow us to define the behaviour of other dependencies that are not under test.

For e.g., when(mockLogic.someMethod(args)).thenReturn(value) makes it such that when the SUT invokes someMethod() with args from the mockLogic class, value is will be returned by someMethod(args).

GCP

Learnt about how the different features that are provided by GCP and other third parties come together to make Teammates possible.

Most of the information is from the Platform Guide in the teammates-ops repo.

Setting up OAuth 2.0 to allow users to login with their google credentials
Google cloud storage
Google cloud SQL
Debugging using logs from Google Cloud's logging service
Setting up a cron job
Using email sending services like Mailjet
Using DBeaver to insepct and manipulate the SQL database

ZHU YUANXI

End to end testing with Selenium

Purpose of e2e tests

E2E testing complements other form of tests by ensuring that the entire system works as intended and meets the user's requirements and expectations. E2E testing involves testing the application in a production-like environment or as close to it as possible. The complete application is tested from start to finish and it ensures that the application functions correctly from the user's perspective, including all the steps involved in completing a task or workflow.

How to write E2E tests

Page Object Pattern is used in TEAMMATES to facilitate UI changes. In this pattern, a class is created for each page . It helps separate the details of the interactions with the webpage with the rest of the test. Page Object provides an interface for the tests to interact with the page's UI without having to directly manipulate the HTML elements.

Resources:

Page Object Pattern: https://martinfowler.com/bliki/PageObject.html
Examples of e2e test with Selenium: https://medium.com/@iamfaisalkhatri/end-to-end-testing-using-selenium-webdriver-and-java-4ff8667716ca
Selenium documentation: https://www.selenium.dev/documentation/webdriver/actions_api/

Unit tests

Mocking

Mock objects can isolate the component being tested by replacing actual dependencies with mocked ones that simulates the behaviour of the real ones. In this way, the unit test can focus on testing the function of a single component without involving the entire system.

Using Mockito

We can use the Mockito framework in Junit tests. Use the mock() method to create a mocked object of some class. Once created, a mock will remember all interactions. Then we can selectively verify whatever interactions we are interested in.

We can verify the number of invocations of a method using verify(). For example:

verify(accountsDb, times(1)).getAccountByGoogleId(googleId);

We can force a method to return a specific value with stubbing. For example:

when(accountsDb.getAccountByGoogleId(googleId)).thenReturn(account);

Resources:

Mockito documentation: https://javadoc.io/doc/org.mockito/mockito-core/latest/org/mockito/Mockito.html

Snapshot testing with Jest

Snapshot tests are a very useful tool when we want to make sure the UI does not change unexpectedly. Hence, when changing the UI, we need to regenerate the snapshots and commit them.

Generated snapshots do not include platform specific or other non-deterministic data. We can use mock or spy on calls to the class constructor for ES6 class and all of its methods using Jest.

Resourses:

JestJs documentation: https://jestjs.io/docs/snapshot-testing

Hibernate ORM

Entity lifecycle
Hibernate can help manage objects and automatically propagate the changes to database. Hibernate entity lifecycle state explains how the entity is related to a persistence context.

Transient: a newly created object (such as created by calling the constructor) is not associated with a Hibernate session. To save it to the database, persist need to be called to
Persistent: entity has been associated with a database table row.Any change made to such entity is going to be detected and propagated to the database (during the Session flush-time)
Detached: A detached entity is not tracked anymore by any persistence context
Removed: An entity is in a deleted (removed) state if Session.delete(entity) has been called, and the Session has marked the entity for deletion.

JPA & Hibernate Annotations
Annotations are used to provide the metadata for the Object and Relational Table mapping directly in the Java source code. Annotations such as @Entity, @Column, @Table, and @OneToMany, can define the structure of the database schema.

They are also important for enhancing performance. For example, a join column if not specified with a lazy fetch strategy will cause unnecessary fetch if the data in the join column are not often needed. (Similar problems were presented during data migration some annotations had to be changed in the migration branch).

Resources:

Data Migration with Technology Integration

Optimization

Techniques such as batch processing can minimize overhead and maximize throughput during data transfer. Avoiding unnecessary joins and fetches are also important. For example, EntityManager.getReference can be used to lazily fetch the foreign key referenced object.
Dependency Preservation

topological ordering of the tables being migrated should be established and migrate entities in order.
Patchable Script Design

large-scale data migration needs to be patchable as it needs to be executed multiple times to minimise downtime.
Testing with Staging

the initial step involved setting up a Google Cloud SQL instance for testing to provide insights into the real-time performance and scalability.

Resources: