Tuesday, December 5, 2017

Waiting on multiple C# .Net awaits

Introduction

Async and Await makes developers life easy without the callback hell in asynchronous programming. But it is equally harmful, if it is in the hands of typewriting coders. Mainly those who don't know how things are working can use async and await in wrong way. This post examines one of such scenario and how to avoid it.

Lets consider the below example. There are 2 independent web service calls to be made and once result is available, do some operation using the results from both the async calls.

private static async Task<string> GetFirstTask(HttpClient client)
{
            Log(nameof(GetFirstTask));
            return await client.GetStringAsync("http://httpbin.org/drip?numbytes=3&duration=3&code=200");
}
private static async Task<string> GetSecondTask(HttpClient client)
{
            Log(nameof(GetSecondTask));
            return await client.GetStringAsync("http://httpbin.org/drip?numbytes=6&duration=6&code=200");
}
private void Process(string first, string second)
{
            Log($"{nameof(Process)} - Length of first is {first.Length} & second is {second.Length}");
}
private static void Log(string msg)
{
            Console.WriteLine($"Thread {Thread.CurrentThread.ManagedThreadId}, Time {DateTime.UtcNow.ToLongTimeString()}, Message {msg}");
}

The first 2 methods returns 2 generic Task<string>. The URL is using httpbin.org which is a hosted service for testing purpose. The duration in the query string controls the delay. Meaning the response will be coming after that duration. Just to avoid Thread.Sleep(). The Process() just display it's parameters.

The normal way

Below is the code we can see more from new async await users.

async internal Task TestNormal_TheBadMethod()
{
    HttpClient client = new HttpClient();
    string firstrequest = await GetFirstTask(client);
    string secondrequest = await GetSecondTask(client);

    Process(firstrequest, secondrequest);
}

The output might be something like below.

Thread 1, Time 8:47:00 PM, Message GetFirstTask
Thread 9, Time 8:47:02 PM, Message GetSecondTask
Thread 7, Time 8:47:07 PM, Message Process - Length of first is 3 & second is 6

Problem

The line where GetFirstTask() is called will wait till the result is obtained. ie wait for 3 seconds to get response from web service. The second task will start only the first is completed. Clearly sequential.

await at method invocation

This is another way developers try.

async internal Task TestViaAwaitAtFunctionCall_StillBad()
{
    Log(nameof(TestViaAwaitAtFunctionCall_StillBad));
    HttpClient client = new HttpClient();
    Process(await GetFirstTask(client), await GetSecondTask(client));
}

Output will look as follows.

Thread 1, Time 8:49:22 PM, Message GetFirstTask
Thread 7, Time 8:49:25 PM, Message GetSecondTask
Thread 9, Time 8:49:30 PM, Message Process - Length of first is 3 & second is 6

Problem

In other languages await keyword at function invocation might make it parallel. But in C# its still sequential. It wait for first await and then process second.

Making it run parallel

So what is the solution? Both the Tasks should be created before we wait for their results. So those tasks will run in parallel. Once await is called, they just give the result if available or wait till the result is available. So the total time is the highest time, not sum of all wait times. Below code snippets does it.

private async Task TestViaTasks_Good()
{
            Log(nameof(TestViaTasks_Good));
            HttpClient client = new HttpClient();
            Task<string> firstrequest = GetFirstTask(client);
            Task<string> secondrequest = GetSecondTask(client);
            Process(await firstrequest, await secondrequest);
}

Output looks below.

Thread 1, Time 8:55:43 PM, Message GetFirstTask
Thread 1, Time 8:55:43 PM, Message GetSecondTask
Thread 8, Time 8:55:48 PM, Message Process - Length of first is 3 & second is 6

Here the Tasks are created before any waits are places on them. So they worked in parallel.

Will this work when second call dependent on first call's result

Not at all. Because the second call cannot start without the result from first call. So this has to be sequential.

More reading

https://stackoverflow.com/questions/33825436/when-do-multiple-awaits-make-sense
https://stackoverflow.com/questions/36976810/effects-of-using-multiple-awaits-in-the-same-method

Tuesday, November 21, 2017

Azure @ Enterprise - Architecture to secure PaaS

When enterprise wants to use somebody else's computer, yes cloud/Azure is essentially just somebody else's computer, they want to make sure it is secure same to their on-premise environment. At least enterprise wants to delay the attack. Never think any security solution is final. Security measures are just about 'when' the attack will happen.

Multi-tenancy and Enterprise

Cloud is all about multi-tenancy. We have to share with others and enterprise always have doubt on it. When we share, will the other guy stole our assets or attack me? Whatever the cloud vendors assure that the cloud is secure and nothing gets leaked between the tenants, enterprise don't want to be just another tenant. They want their own space.

One way is to setup private cloud infrastructure inside enterprise. But it has upfront investment and lose the glory of auto scaling. So that is not the best options unless the security is the first concern.

Enter virtual networks

In order to give own space to enterprise or anybody who don't want to be another tenant, cloud providers started giving virtual networks. It ensures that our cloud resources are accessible only within the boundary of vNet and to have external communication separate gateways/tunnels are needed. They also support seamless connection to the enterprise's on premise network via VPN. That way enterprise can just lift and shift their classic virtual machine based solutions to Cloud and slowly adopt the cloud native features

In Amazon, it is called as VPC (Virtual Private Cloud) and in Azure it is simply virtual network or vNet for short.

vNet limits in Azure

Though there is vNet in Azure not all platform services are vNet ready. For example, we can put a virtual machine easily into vNet but Storage Account cannot. (Feature to limit Storage Account access to vNet is beta at the time of writing). Platform service such as AppService can be hosted inside vNet using a costly feature called AppServiceEnvironment.

Authenticating to platform services

Enterprise normally don't want to keep any secrets in application configuration. eg: Password to database connection should never be kept in web.config in case of an ASP.Net web application. Instead the app has to run under a specific IIS application pool and app pool user will be added to the database server hence app can use windows authentication. Or the authenticated user can be delegated to database directly. Again depends on the nature of application. We can ask then, what is the login password of app pool user? Is one or persons knowing it not a problem in enterprise?

In enterprise people changes often. If the identity is with people, it is easier to manage than keeping password in config. There are chances that someone in support may send the config file in mail or keep in \\shared drive etc... If it is app pool user and the password is with single team who installs the application, it is relatively safe. At least that is what enterprise believe.

Coming back to Azure. Not all Azure resources/platform services(PaaS) support windows/active directory based authentication. The application must preset password when it authenticate against the PaaS. So it demands apps to keep secrets such as passwords and connection strings with password in it. 

Enter KeyVault

Microsoft suggests to use a PaaS called KeyVault to keep secrets. It has HSM (Hardware Security Modules) and much more features. Then the question from enterprise turns out to be "Does KeyVault support vNet?" Answer is No. 
Should we keep the KeyVault's connection string with password in application's config? Then what is the difference between keeping database's connection string with password v/s KeyVault connction string with password in application's config?

So how do we protect KeyVault. If enterprise needs to be just another tenant in KeyVault, how to ensure the security? This post is about one way how can we protect connection to KeyVault which will protect other resources.

Architecture

The key message is simple here. 
In order to connect to KeyVault use certificate instead of password in application's config
Rest is circus we do to get all working together. We can use Azure Function to talk to KeyVault and vNet to protect that function so that nobody from outside can access that Function. Only that Function needs to know the certificate and registered as Azure AD App etc...

Diagram

This is the diagram which shows how the things are put together.

How it works

KeyVault

Since all the Azure resources are not compatible with vNet and Active Directory based authentication, we cannot avoid keep the passwords and secrets. It is advised to keep the secrets in the KeyVault than in app or web.config.

This way the secrets will always be with one team who installs the application. 

Azure AD App

In order to access KeyVault we need to have an Azure Active Directory Application. KeyVault accepts the AD token to get authenticated. 

KeyVault Function/Web App

This is the only component which knows how to connect to Azure KeyVault via Azure AD. It is called by other components in the system whenever they need any secret. This must be the only one registered with AzureAD App.

Alternative is to register all the components as Azure AD App and have the certificate with them. So they can access KeyVault directly

ASE

ASE is required to make sure that the KeyVault Azure Function is accessible within the boundary of enterprise's private area ie vNet. This makes the KeyVault Function secure without accepting security credential in every http request.

Tuesday, November 14, 2017

C# async and await with Thread static

This is continuation of below post about async and await. That post discuss about how the async and await evolved and the basics. The same sample context is used in this post as well. So better read below post before continuing.

http://joymonscode.blogspot.com/2015/06/c-async-and-await-programming-model.html

Async Await and calling thread behavior

There is no objection that it simplified the reasoning about the code. But it may cause trouble, if we implement without understanding how it works. Let us see one example below.

public void Main()
{
           Console.WriteLine($"Main() - Thread Id - {Thread.CurrentThread.ManagedThreadId}");
           for (int counter = 1; counter < 5; counter++)
           {
               if (counter % 3 == 0)
               {
                   WriteFactorialAsyncUsingAwait(counter)
               }
               else
               {
                   Console.WriteLine(counter);
               }
           }
}
private async Task WriteFactorialAsyncUsingAwait(int facno)
{
    Console.WriteLine($"WriteFactorialAsyncUsingAwait() - Thread Id - {Thread.CurrentThread.ManagedThreadId} - Begin");
    int result = await Task.Run(()=> FindFactorialWithSimulatedDelay(facno));
    Console.WriteLine($"WriteFactorialAsyncUsingAwait() - Thread Id - {Thread.CurrentThread.ManagedThreadId} - Factorial of {facno} is {result}");
}


Guess what would be the thread ids printed from WriteFactorialAsyncUsingAwait(). Will those be same?

Those who says same, be prepared to spend nights and weekend debugging. Especially if you have something ThreadStatic before and after await. Below goes the output.

Main() - Thread Id - 1
1
2
WriteFactorialAsyncUsingAwait() - Thread Id - 1 - Begin
4
WriteFactorialAsyncUsingAwait() - Thread Id - 3 - Factorial of 3 is 6

In the code the await is executed in separate thread similar to its Task<> equivalent

Thread.ContinueWith and calling thread behavior

 Lets see what is its Task<> based implementation.

public void Main()
{
    Console.WriteLine($"Main() - Thread Id - {Thread.CurrentThread.ManagedThreadId}");
    for (int counter = 1; counter < 5; counter++)
    {
        if (counter % 3 == 0)
        {
            WriteFactorialAsyncUsingTask(counter);
        }
        else
        {
            Console.WriteLine(counter);
        }
    }
    Console.ReadLine();
}
private void WriteFactorialAsyncUsingTask(int no)
{
    Console.WriteLine($"WriteFactorialAsyncUsingTask() - Thread Id - {Thread.CurrentThread.ManagedThreadId} - Begin");
    Task<int> task=Task.Run<int>(() =>
    {
        int result = FindFactorialWithSimulatedDelay(no);
        return result;
    });
    task.ContinueWith(new Action<Task<int>>((input) =>
    {
        Console.WriteLine($"WriteFactorialAsyncUsingTask() - Thread Id - {Thread.CurrentThread.ManagedThreadId} - Factorial of {no} is {input.Result}");
    }));
}

See the output it is working the same way as of async await model.

Main() - Thread Id - 1
1
2
WriteFactorialAsyncUsingTask() - Thread Id - 1 - Begin
4
WriteFactorialAsyncUsingTask() - Thread Id - 4 - Factorial of 3 is 6

Why the consuming code is written inside ContinueWith({}) callback instead of reading the result from Task.Result property? If it is not via ContinueWith({}), the execution wait on task.Result line, hence the outer loop cannot move to next item. We loose all the benefits of Task then. Below goes the code for task.Result access and see how it blocks the outer loop from executing in parallel.



public void Main()
{
    Console.WriteLine($"Main() - Thread Id - {Thread.CurrentThread.ManagedThreadId}");
    for (int counter = 1; counter < 5; counter++)
    {
        if (counter % 3 == 0)
        {
            WriteFactorialAsyncUsingTask(counter);
        }
        else
        {
            Console.WriteLine(counter);
        }
    }
    Console.ReadLine();
}
private void WriteFactorialAsyncUsingTask(int no)
{
    Console.WriteLine($"WriteFactorialAsyncUsingTask() - Thread Id - {Thread.CurrentThread.ManagedThreadId} - Begin");
    Task<int> task=Task.Run<int>(() =>
    {
        int result = FindFactorialWithSimulatedDelay(no);
        return result;
    });
    Console.WriteLine($"WriteFactorialAsyncUsingTask() - Thread Id - {Thread.CurrentThread.ManagedThreadId} - End - Task Result - {task.Result}");
}

The output below shows clearly that the 4 is processed from the loop only after the 3 is processed. We lost the parallelism. The thread ids are same before and after.

Main() - Thread Id - 1
1
2
WriteFactorialAsyncUsingTask() - Thread Id - 1 - Begin
WriteFactorialAsyncUsingTask() - Thread Id - 1 - End - Task Result - 6
4

Moral of the story

Though async await seems easy to use, usage without understanding will take away our sleep and weekends.

Tuesday, November 7, 2017

Who is responsible for making APIs / apps secure?

If we are from software engineering background we immediately says its developers. Some enterprise people will go ahead and say that it is combined duty of infrastructure and development. Infra has to setup proper VPN access, firewalls etc...If we scope to public internet applications, we can pretty much hear one word its the duty of developers developers developers. May be some will says its architecture too. But mostly if its not enterprise level, application architecture is part of development.

There are some problems with leaving the duty of security to developers

  • Developers always focus or has to focus on application features. 
  • Developers are not experts in security field. They may not to not supposed to be up to date with all the security vulnerabilities found out in the world.

There could be more problems we could think of. So what is the solution in the unsecured world of IT?

Once simple answer is to let developers free from security aspect and give it to security experts. Hiring one security expert and he looking at every line of code produced is not a great idea either. So what to do? Is buy or renting a security product / service a viable option? Seems its viable than betting on developers securing the applications. Those applications or services often referred to as API Management Gateways

What are these API Management Gateways do

Suppose a developer leaves a SQL injection hole and it missed in testing stages and reached production, these gateways are expected to block SQL injection attack by inspecting the payload/traffic. Similarly other attacks also are supposed to be handled by the gateway before reaching to the application servers.

Below are some links of basics and list of players in the application or API security management market.

http://www.forumsys.com/product-solutions/api-security-management/
https://www.roguewave.com/products-services/akana/solutions/api-security
http://www.apiacademy.co/resources/api-management-lesson-201-api-security/

Most of the players are cloud ready. Even cloud providers such as Microsoft Azure have their own offerings to secure applications in Cloud.

Will these gateways reduce performance?

Nothing comes in free. If someone is claiming its adding 0 delay they are wrong. Somewhere instructions are supposed to execute which validate the traffic and take decision. They can make it faster enough that it is not visible to the outside. In order to speed up they often use dedicated appliances or hardware instead of commodity servers.

Comparison

Some sites where different products are compared. It might not be accurate to the date. But a good starting point.

https://www.itcentralstation.com/categories/api-management#top_rated
http://transform.ca.com/API-Management-Platform-Vendor-Comparison.html

Some players

  • https://www.okta.com
  • Layer from CA

Is this the silver bullet

There is no silver bullet in software engineering or in science. Whatever is best at the time and situation adopt it. Embrace change when new better ways are available. 

If the application is highly sensitive and bet on one gateway to protect, it will not be a great solution because that gateway might not be update with our security requirements. For example if a zero day attack is found and gateway is not updating with in days and releasing new versions but our business need in the next hour, probably we should let our developers take care of security. May be we will soon end up in building another gateway but its worth doing it.

Tuesday, October 31, 2017

Exposing Parquet file to SQL 2016 as well as Hadoop (Java/Scala)

This is just an architecture post explaining the possibility of Parquet file exposed to SQL 2016 databae via polybase and other applications accessing normally. The other applications can be anything such as data analytics code running in Hadoop cluster.

Mainly this kind of integration needed when we already have an transaction database such as SQL Server and we have to analyze data. Either we can have scheduled data movement using ETL technologies or we can use polybase to move data from an internal normal table to external polybase table which is backed by parquet file. If the solution is in Azure, the parquet file can be somewere in storage. Once the data is there in parquet file format the analytics algorithms can hit the same. Parquet file is mentioned here because of their familiarity in analytics community.

Below goes such an architecture.
Since the architecture may change over the time, LucidChart diagram is embedded. Please comment if this is not working. Thanks to LucidChart for their freemium model.

Details on implementation such as code snippets are good to share in separate post.

Tuesday, October 24, 2017

Caution JavaScript Ahead - setTimeout() or setInterval() doen't make JavaScript multi threaded

If someone born into JavaScript, they know what is meant by JavaScript is single threaded. Developers coming from other languages also knows that JavaScript is single threaded. But some APIs in JavaScript force them to think that JavaScript is multi threaded. Below is such a situation

setTimeout(() => {
  console.log('Timeout hit and time is ' + new Date());
}, 1000);
console.log('setTimeout at ' + new Date());

There could be N number of reasons for someone wants to execute his code after specified time. It might be initialization delay, rendering time, session timeout handling etc... But is this going to solve the problem by executing the code exactly after 1 second (1000ms)?

If we consider any application there will be more code than these 2 lines. Consider someone else written below code which got executed after we register the handler via setTimeout()

setTimeout(() => {
  console.log('Timeout hit and time is ' + new Date());
}, 1000);
console.log('setTimeout at ' + new Date());
problemMaker()

function problemMaker() {
  //Any sync AJAX call or some code which execute for long time.
  var url = 'https://httpbin.org/delay/5';
  var request = new XMLHttpRequest();
  request.open('GET', url, false);
  request.send();
  document.writeln(request.responseText.length); 
}

Does this ensure that the function gets executed after 1 second? Native JavaScript developers can immediately identify the issue. Other may think it might work. Lets see a test run result in console.

setTimeout at Tue Oct 24 2017 19:46:14 GMT-0400 (Eastern Daylight Time)
Timeout hit and time is Tue Oct 24 2017 19:46:29 GMT-0400 (Eastern Daylight Time)

Yes JavaScript is single threaded whatever we do with setTimeout or setInterval functions. Better do no trust them on when they are going to execute. If we write like this it may work on development machine and may fail in higher environments such as business testing, staging or production. Highly inconsistent issue. Lets avoid saying "It works in my machine".

Sample code located at https://plnkr.co/edit/uexi2U

Tuesday, October 17, 2017

Running multiple instances of AzCopy.exe command

AzCopy.exe is really an amazing tool for data transfer. But if we run multiple instances of AzCopy we may get below error.

AzCopy Command - AzCopy /Source:c:\temp\source /Dest:https://<storage account>.blob.core.windows.net/test /DestSAS:"<SAS>" /pattern:"" /s

An error occurred while reading the restart journal from "C:\Users\<user name>\AppData\Local\Microsoft\Azure\AzCopy". Detailed error: The process cannot access the file 'C:\Users\<user name>\AppData\Local\Microsoft\Azure\AzCopy\AzCopyCheckpoint.jnl' because it is being used by another process.

The error is pretty much clear. AzCopy keeps a journal file for resume functionality and if we don't specify the journal file location in command it uses default location and when second AzCopy starts it cannot read journal file.

The fix is to specify the location for .jnl. AzCopy Command goes as follows
AzCopy /Source:c:\temp\source /Dest:https://<storage account>.blob.core.windows.net/test /DestSAS:"<SAS>" /pattern:"" /s /z:<unique folder for azcopy command>

If we are running AzCopy from the command window it is easy to find out. But if AzCopy is invoked from applications (PowerShell or .Net) in parallel it is difficult to find out because we might have disabled all the messages using /y. AzCopy has /v: switch which redirect the logs to a file. That will help to troubleshoot.