PowerShell Script for Files Deduplication

If you think you have a lot of duplicated files that consumes your hard drive storage space, this article is for you. Personally, I have a lot of video and pictures on my working hard drive and on a backup HDD. While working with photos and videos I can rename files, copy or move them from folders to folders. As a result, I end up with gigabytes and terabytes occupied with duplicated files. So I need a tool to a) find duplicated files and b) remove duplications. I tried to find good scripts but somehow I was not happy with what I found, so I wrote scripts myself.

Surely you can buy/try a 3-rd party toot with GUI, but if you are comfortable with PowerShell – consider the following.

References

Birds of Eurasia Best Pictures

There is a new site – Birds of Eurasia Best Pictures – with a lot of good pictures of birds from all of the Eurasian continent. It does not seem that pictures are professional, sooner shots are made by amature birdwatchers, but some pictures are really nice.

Pictures are updated around weekly. Here is an example:

Cinereous Vulture (Aegypius monachus)

Shoot in Mongolia by Oleg Belyalov

Another great picture:

Collared Pratincole
by Philippe Campeau,
Kyrgyzstan

Collared Pratincole by Philippe Campeau, Kyrgyzstan

References

Restricted SharePoint Search Deep Dive

Restricted SharePoint Search is a new Microsoft feature to mitigate sites oversharing issue when you are implementing Copilot. The feature is documented here, but still I have some questions, e.g.:

  • How about external data? Copilot can use external data to learn from via agents and connectors. But would Restricted SharePoint Search if implemented allow data from external connectors to be used in copilot?
  • “Users’ OneDrive files, chats, emails, calendars they have access to” – means own data for every single user or all shared OD data?
  • What exactly is “Files from their frequently visited SharePoint sites”? I mean, how frequently user needs to visit site for this?
  • What exactly means “Files that the users viewed, edited, or created.”
  • What about teams chat messages, e-mails, viva engage messages?
  • “Files that were shared directly with the users” – does that mean “individual files shared” or can include folders, libraries, sites?
  • If user is a member of a teams – would all team content included?
  • It says “Files…” but would site pages be included? Or list items? Or list items attachments? Pages is something that people use to create wiki to share knowledge.
  • How long it takes for Microsoft 365 to start restricting results after Restricted SharePoint Search is enabled

Here I’m going to answer the questions above.

So far I build a test scenario using my dev tenant that includes multiple collaborated users and content in the form of files, pages, list items and messages spreaded across multiple sites falling into different categories of Restricted SharePoint Search allowed content.

WIP (Work in progress)

References

SharePoint Governance

Governance in IT is establishing rules, policies, tools and practices that helps you manage and protect your enterprise resources. SharePoint governance (or wider – Collaboration governance) covers

  • resources ownership and lifecycle
  • users’ access to resources
  • compliance with your business standards
  • security of your data

References

Restricted SharePoint Search rationale

Restricted SharePoint Search is a new (2024) Microsoft 365 feature that should help Copilot and general search results be more relevant, especially in large Microsoft 365 environments.

The problem background

When you have a really big number of sites – it is very difficult to keep them all in a well-managed state, e.g. to have reasonable (minimal) permissions provided to each site. So the typical situation (unfortunately) is: we have a lot of overshared sites. There are also a lot of ownerless sites where permissions are not managed. We know that search is security-trimmed, i.e. a user can get search results from content he/she already has access to. But with overshared sites – users get results they should not be able to see. With regular search experience – a user can see with his own eyes the source of the content he/she gets results from – so user can understand that results are coming from sites user should not have access to (overshared sites). But when it comes to AI-based search (Copilot) – user is getting answers, but he/she do not always know the source of that data.

So the problem is – we want to ensure Copilot is trained on a proper set of data and results are curated to users needs and access permissions. So for Copilot we really need to exclude from search scope such sites we are not sure content is valid, accurate and properly secured. We do not want users to get garbage or exposed sensitive information as an authoritative answer from Copilot.

The solution

This is where Restricted SharePoint Search feature should help, as with this feature your can restrict organization-wide search (and Copilot) to a curated list of SharePoint sites – “allowed sites” – public sites that passed attestation and where permissions are checked and data governance policies are applied, and content user work with on daily basis – his/her own documents and content shared with user directly (check details on Microsoft’s How does Restricted SharePoint Search work) – e.g. content user is supposed to have access to normally.

Excluded from search scope would be sites shared with user indirectly, e.g. something that was shared with everyone.

The root cause

Interesting, that with this feature Microsoft is not solving the real issue, but hiding (concealing) the real issue and just making Microsoft 365 to look more secure.

The real problem (root cause) is over-sharing data. But Microsoft already sold us SharePoint (and then Microsoft 365). And now Microsoft is trying to sell us Copilot, so they “solved” the over-sharing issue with “let us limit search” solution instead of “let’s fix oversharing”.

Note 1: Restricted SharePoint Search feature is free – i.e. it is included in standard Microsoft 365 license. Do not be confused with site access restriction policy – feature that require SharePoint Premium license and allows to restrict access to some SharePoint sites with specific groups only.

Note 2: I know that Microsoft is trying to address over-sharing issue as part of their SharePoint Premium (SharePoint Advanced Management) package, e.g. with AI Insights and Data access governance insights – reports that can help prevent oversharing by detecting sites that contain potentially overshared or sensitive content. With Manage content lifecycle we’d decrease amount of “garbage” or outdated content.
But SharePoint Advanced Management is licensed separately, when Restricted SharePoint Search is free.

Note 3: I know that users are an even more real problem because they tend to simplify and share information.

References

New Microsoft Graph Connector service plan

Microsoft Graph connectors allow your organization to index third-party data into Microsoft Graph. Microsoft Graph connectors enable Microsoft 365 Copilot better as it has more information relevant to your organization to answer prompts.

According to Microsoft, Microsoft 365 will soon include a new service plan, Graph Connectors Search with Index, offering a 50 million item index limit per tenant at no cost. Rollout starts September 2024.

Microsoft Search and Intelligence - Data Sources. 
You can build and customize connections managed by your organization. These can index data from apps such as Salesforce, Oracle SQL and Azure DevOps. Connections listed as Search under Connected experiences count toward your search connection quota utilization.

Previously, to index third-party data into Microsoft Graph through Microsoft Graph connectors, you either needed to have a built-in entitlement through specific licenses (e.g., 500 items of index quota per Microsoft Copilot for Microsoft 365 license) or purchase add-on quota. With this change, the index quota per license entitlement is removed, as is add-on cost for additional quota. You now receive an entitlement of 50 million items for each tenant.

Each entity (or record) from the source data system that you add to Microsoft Graph can be considered an item which then shows up as a unique citation in Copilot’s responses, as a unique search result in Microsoft Search, etc. Depending on the type of data source, 1 item is –

  • 1 document (word, excel, ppt, pdf, etc.) in file share
  • 1 wiki page in Confluence
  • 1 webpage in a website
  • 1 ticket/issue in Jira

Total quota utilized is calculated in terms of total items stored in the index. Updates/changes to an item are not counted in any manner. There are no cost implications of updating an item multiple times. It still counts as 1 item only.

Applicable to subscriptions: Office 365 E1, Office 365 E3, Office 365 E5, Microsoft 365 E3, Microsoft 365 E5, Microsoft 365 F1, Microsoft 365 F3, Office 365 F3, Microsoft 365 Business Basic, Microsoft 365 Business Standard, Microsoft 365 Business Premium, Office 365 G1, Office 365 G3, Office 365 G5, Microsoft 365 G3, Microsoft 365 G5, Office 365 A3, Office 365 A5, Microsoft 365 A3, Microsoft 365 A5