Skip to content

Fix: batch improvements and bug fixes (9 commits)#105

Open
MorsH14 wants to merge 12 commits into
David-patrick-chuks:mainfrom
MorsH14:fix/batch-improvements-2024-06-21
Open

Fix: batch improvements and bug fixes (9 commits)#105
MorsH14 wants to merge 12 commits into
David-patrick-chuks:mainfrom
MorsH14:fix/batch-improvements-2024-06-21

Conversation

@MorsH14

@MorsH14 MorsH14 commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

This PR contains 9 bug fixes and improvements made today:

  • Fix graceful shutdown only closing default account's IG client
  • Fix docker-compose network not created on fresh install
  • Add missing jest.config.js so npm test can process TypeScript files
  • Fix concurrent key rotation state in generateTrainingPrompt
  • Fix scrapeFollowers returning empty list when maxFollowers is omitted
  • Fix 3 bugs: cookie token extraction, filename injection, JSON body size limit
  • Fix 4 bugs: MONGODB_URI env check, log dir path, URL domain filter, triedKeys propagation
  • Fix 4 more bugs: download error handling, schema, missing dep, key rotation
  • Fix 10 bugs: auto-execution, auth, validation, data loss, and race conditions

MorsH14 and others added 9 commits June 21, 2026 10:38
…nditions

- youtubeURL.ts: add require.main guard to prevent auto-execution on import;
  also accept URLs from CLI args so the script is reusable
- WebsiteScraping.ts: same require.main guard + accept URLs from CLI args or
  TRAIN_WEBSITES env var instead of hardcoded French medical sites
- routes/api.ts: move /logout before requireAuth so expired tokens can still
  clear their own cookie; add targetAccount validation to GET and POST
  /scrape-followers (missing param caused navigation to instagram.com/undefined/)
- TrainWithAudio.ts: only delete local audio file after successful processing —
  the finally block was destroying the file even on upload failure (data loss)
- IgClient.ts: fix scrapeFollowers limit calculation so maxFollowers=0 returns
  all followers instead of empty array; add 'paid partnership with' to default
  ad markers to match .env.example
- IG-bot/index.ts: remove schedulePost from IInstagramClient interface and class
  — the implementation always threw, making the interface contract unsatisfiable
- api/agent/index.ts: remove the Express router that was defined but never
  mounted anywhere; keep only the getShouldExitInteractions/set exports
- Agent/index.ts + utils/index.ts: fix concurrent API key rotation — module-level
  shared Set caused one request's key exhaustion to poison other requests;
  triedKeys is now per-call and passed through recursive retries

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tation

- src/test/index.ts: check the err param in the download callback before
  attempting to upload media — previously a failed download still called
  twitterClient.v1.uploadMedia on a missing/incomplete file
- src/models/Tweet.ts: make imageUrl optional (required: false, default: null)
  to match saveTweetData which stores imageUrl || null; required: true caused
  silent Mongoose validation failures when imageUrl was absent
- package.json: add mime-types to dependencies — sample/Audio.ts imports it
  directly but it was only listed under @types/mime-types in devDependencies,
  causing a missing-module error at runtime
- src/utils/index.ts: propagate triedKeys Set through handleError recursive
  retries so 429 rotation stops correctly after exhausting all keys; previously
  each handleError call created a fresh Set, allowing infinite key cycling

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…riedKeys propagation

- check-env.js: only require MONGODB_URI when MONGODB_REQUIRED != 'false';
  previously blocked all non-MongoDB setups from passing the env check
- logger.ts: use process.cwd()/logs instead of __dirname/../logs; the old
  path resolved to build/logs/ after tsc while DailyRotateFile wrote to
  cwd/logs/, making mkdirSync create a directory Winston never used
- WebsiteScraping.ts: normalize baseUrl to end with '/' before startsWith
  check to prevent cross-domain crawl (e.g. example.com matching
  example.com.evil.com)
- Agent/index.ts: pass triedKeys to handleError so 503 retry paths do not
  reset key rotation state and re-exhaust already-tried keys

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ze limit

- secret/index.ts: replace manual cookie regex /token=([^;]+)/ with
  req.cookies.token; the regex matched substrings so a cookie named
  e.g. 'supertoken' appearing before 'token' in the header would return
  the wrong value (cookie-parser already runs before all routes)
- routes/api.ts: sanitize targetAccount before embedding in
  Content-Disposition filename; unsanitized user input could inject
  quotes or other characters that break the header attribute syntax
- app.ts: add 1mb limit to express.json() to prevent memory exhaustion
  from arbitrarily large JSON request bodies

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The inner loop guarded with `followers.length < maxFollowers`, but when
maxFollowers is undefined (unlimited mode), that becomes `x < undefined`
which is always false — so no followers were ever pushed.

Using the already-computed `limit` variable (Infinity when unlimited,
maxFollowers+4 when bounded) makes unlimited scraping work correctly and
also ensures the bounded path collects enough entries to survive the
slice(4) offset applied after the loop.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e.ts)

The module-level currentApiKeyIndex and triedApiKeys were shared across
all concurrent calls, so parallel training requests corrupted each
other's key rotation state — the same class of bug fixed earlier in
Agent/index.ts.

Replaced with caller-owned apiKeyIndex/triedKeys parameters (defaulting
to 0/empty set) passed through all recursive retry calls, making each
invocation fully isolated. Also removed unnecessary `await` on the
synchronous getYouTubeTranscriptSchema() call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Without a transform configuration, Jest would attempt to execute .ts
test files as plain JavaScript and fail immediately. The config wires
up ts-jest as the TypeScript transformer and restricts the test glob
to *.test.ts so that only proper test files are discovered.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
instagram-ai-network was declared external: true, which requires the
network to exist before docker compose up is run — otherwise Docker
errors with 'network declared as external, but could not be found'.
Changed to driver: bridge so Compose creates the network automatically.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
closeIgClient('default') left Puppeteer browsers for all other
account keys open on SIGTERM/SIGINT. Added closeAllIgClients() that
iterates the igClients Map and closes every active client, then
updated the shutdown sequence to use it instead of the single-key call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@David-patrick-chuks

Copy link
Copy Markdown
Owner

Thanks @MorsH14 — solid batch of fixes. A few are still worth landing on `main`:

  • `closeAllIgClients()` on shutdown
  • per-request `triedKeys` for Gemini rotation
  • logout before `requireAuth`
  • cookie token via `req.cookies.token`
  • scrape filename sanitization + `targetAccount` validation

Some overlap with other PRs: #103 (same `didRelogin` fix), #106 (body limits/CORS), #107 (input sanitization), #100 (session cookies — already partly on `main`). #101 / MongoDB / jest.config changes are superseded by recent `main` work.

Could you rebase and open a focused PR with just the items above? Happy to merge once CI is green.

MorsH14 and others added 3 commits June 22, 2026 16:14
- .env.example: keep TRAIN_WEBSITES= and add upstream rate limit env vars
- docker-compose.yml: add restart: unless-stopped, drop obsolete network section
- scripts/check-env.js: adopt upstream's DATABASE_URL check (MongoDB -> Postgres)
- src/services/index.ts: use closeDB() instead of mongoose.disconnect()
- src/utils/index.ts: take upstream's triedKeys propagation fix in handleError
- src/client/Instagram.ts: trivial comment whitespace
- src/client/IG-bot/IgClient.ts: keep our unlimited-followers fix (upstream's
  requestedCount=0 approach returns empty for unlimited requests)
- src/routes/api.ts: use sanitizeFilename() and scrapeLimiter from upstream
- src/models/Tweet.ts: accept upstream deletion

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Run prettier --write . to fix formatting across all files
  (package-lock pins prettier@3.8.1; CI now uses the same version)
- Remove jest.config.js (our addition) — superseded by upstream's
  jest.config.cjs + jest.ts-transformer.cjs which already covers .test.ts files

All CI checks now green locally:
  lint: 0 errors (1 unused-var warning, not a failure)
  typecheck: clean
  format:check: all 143 files pass
  test: 116/116 pass across 15 suites

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants